## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

**Open Online Calculus
**

Jim Fowler and Steve Gubkin

EXPERIMENTAL DRAFT

This document was typeset on April 27, 2014.

Contents

Contents

2

1 An n-dimensional space

We package lists of numbers as “vectors.”

In this course we will be studying calculus of many variables. That means that

instead of just seeing how one quantity depends on another, we will see how two

quantities could aﬀect a third, or how ﬁve inputs might cause changes in three

outputs. The very ﬁrst step of this journey is to give a convenient mathematical

framework for talking about lists of numbers. To that end we deﬁne:

Deﬁnition 1 R

n

is the set of all ordered lists containing n real numbers. That

is,

R

n

= ¦(x

1

, x

2

, . . . , x

n

) : x

1

, x

2

, . . . , x

n

∈ R¦.

The number n is called the dimension of R

n

, and R

n

is called n-dimensional

space. When speaking aloud, it is also acceptable to say “are en.” We call the

elements of R

n

points or n-tuples.

Example 2 R

1

is just the set of all real numbers, which is often visualized by

the number line, which is 1-dimensional.

−1 0 1 2 3

Example 3 R

2

is the set of all pairs of real numbers, like (2, 5) or (1.54, π). This

can be visualized by the coordinate plane, which is 2-dimensional.

(1, 2)

Example 4 R

3

is the set of all triples of real numbers. It can be visualized as

3-dimensional space, with three coordinate axes.

Question 5 If (3, 2, e, 1.4) ∈ R

n

, what is n?

Solution

Hint: n-dimensional space consists of ordered n-tuples of numbers. How many coor-

dinates does (3, 2, e, 1.4) have?

Hint:

Warning 6 Be careful to distinguish between commas and periods.

3

1 An n-dimensional space

Hint: n = 4

n = 4

Question 7 Which point is farther away from the point (0, 0)?

Solution

Hint:

(0, 1)

(1, 1)

(−2, 3)

(1, 4)

(a) (0, 1)

(b) (1, 1)

(c) (−2, 3)

(d) (1, 4)

It becomes quite diﬃcult to visualize high dimensional spaces. You can some-

times visualize a higher dimensional object by having 3 spatial dimensions and one

color dimension, or 3 spatial dimensions and one time dimension to get a movie.

Sometimes you can project a higher dimensional object into a lower dimensional

space. If you have the time, you should watch the excellent ﬁlm Dimensions

1

which

will get you to visualize some higher dimensional objects.

Although we may often be working with high-dimensional objects, we will gener-

ally not try to visualize objects in dimensions above 3 in this course. Nevertheless,

we hope the video is enlightening!

1

http://www.dimensions-math.org/

4

2 Vector spaces

Vector spaces are where vectors live.

It will be convenient for us to equip R

n

with two algebraic operations: “vector ad-

dition” and “scalar multiplication” (to be deﬁned soon). This additional structure

will transform R

n

from a mere set into a “vector space.” To distinguish between

R

n

as a set and R

n

as a vector space, we think of elements of R

n

as a set as being

ordered lists, such as

p = (x

1

, x

2

, x

3

, . . . , x

n

),

but elements of R

n

the vector space will be written typographically as vertically

oriented lists ﬂanked with square brackets, like this

v =

_

¸

¸

¸

¸

¸

_

x

1

x

2

x

3

.

.

.

x

n

_

¸

¸

¸

¸

¸

_

We will try to stick to the convention that bold letters like p represent points,

while letters with little arrows above them (like v) represent vectors.

Unfortunately (like practically everybody else in the world), we use the same

symbol R

n

to refer to both the vector space R

n

and the underlying set of points

R

n

.

Vector addition is deﬁned as follows:

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

+

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

n

_

¸

¸

¸

_

=

_

¸

¸

¸

_

x

1

+ y

1

x

2

+ y

2

.

.

.

x

n

+ y

n

_

¸

¸

¸

_

Warning 1 You cannot add vectors in R

n

and R

m

unless n = m.

An element of R is a number, but it is also called a “scalar” in this context, and

vectors can be multiplied by scalars as follows:

c

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

=

_

¸

¸

¸

_

cx

1

cx

2

.

.

.

cx

n

_

¸

¸

¸

_

Warning 2 We have not yet deﬁned a notion of multiplication for vectors. You

might think it is reasonable to deﬁne

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

n

_

¸

¸

¸

_

=

_

¸

¸

¸

_

x

1

y

1

x

2

y

2

.

.

.

x

n

y

n

_

¸

¸

¸

_

,

5

2 Vector spaces

but actually this operation is not especially useful, and will never be utilized in this

course. We will have a notion of “vector multiplication” called the dot product, but

that is not the (faulty) deﬁnition above.

Question 3 Solution

Hint:

_

_

1

2

3

_

_

+

_

_

3

−2

4

_

_

=

_

_

1 + 3

2 +−2

3 + 4

_

_

=

_

_

4

0

7

_

_

What is

_

_

1

2

3

_

_

+

_

_

3

−2

4

_

_

?

Question 4 Solution

Hint: 3

_

_

3

−2

4

_

_

=

_

_

3(3)

3(−2)

3(4)

_

_

=

_

_

9

6

12

_

_

What is 3

_

_

3

−2

4

_

_

?

Question 5 If v

1

=

_

3

−2

_

, v

2

=

_

1

5

_

, and v

3

=

_

1

1

_

can you ﬁnd a, b ∈ R so that

av

1

+ bv

2

= v

3

?

Solution

Hint:

av1 + bv2 = v3

a

_

3

−2

_

+ b

_

1

5

_

=

_

1

1

_

_

3a

−2a

_

+

_

b

5b

_

=

_

1

1

_

_

3a + b

−2a + 5b

_

=

_

1

1

_

Can you turn this into a system of two equations?

6

2 Vector spaces

Hint:

_

3a + b = 1

−2a + 5b = 1

_

15a + 5b = 5

−2a + 5b = 1

_

17a = 4

−2a + 5b = 1

_

_

_

a =

4

17

−2(

4

17

) + 5b = 1

_

_

_

a =

4

17

b =

5

17

a =4/17

Solution b = 5/17

7

3 Geometry

Vectors can be viewed geometrically.

Graphically, we depict a vector

_

¸

¸

¸

¸

_

x

1

x

2

.

.

x

n

_

¸

¸

¸

¸

_

in R

n

as an arrow whose base is at the origin

and whose head is at the point (x

1

, x

2

, ..., x

n

). For example, in R

2

we would depict

the vector v =

_

3

4

_

as follows

v

Question 1 What is the vector w pictured below?

8

3 Geometry

w

Solution

Hint: Consider whether the x and y coordinates are positive or negative.

(a)

_

−4

2

_

(b)

_

3

−3

_

(c)

_

−4

−2

_

(d)

_

4

2

_

9

3 Geometry

Question 2 Hint:

v

On a sheet of paper, draw the vector v =

_

3

1

_

. Click the hint to see if you got

it right.

10

3 Geometry

Question 3 Hint:

v1

v2

v1 +v2

v

1

and v

2

are drawn below. Redraw them on a sheet of paper, and also draw

their sum v

1

+v

2

. Click the hint to see if you got it right.

v

1

v

2

11

3 Geometry

Question 4 Hint:

v

3v

v is drawn below. Redraw it on a sheet of paper, and also draw 3v. Click the

hint to see if you got it right

v

12

3 Geometry

You may have noticed that you can sum vectors graphically by forming a par-

allelogram.

You also may have noticed that multiplying a vector by a scalar leaves the vector

pointing in the same direction but ”scales” its length. That is the reason we call

real numbers ”scalars” when they are coeﬃcients of vectors: it is to remind us that

they act geometrically by scaling the vector.

13

4 Span

Vectors can be combined; all those combinations form the “span.”

Deﬁnition 1 We say that a vector w is a linear combination of the vectors

v

1

, v

2

, v

3

, . . . , v

k

if there are scalars a

1

, a

2

, . . . , a

k

so that w = a

1

v

1

+a

2

v

2

+ +a

k

v

k

.

Deﬁnition 2 The span of a set of vectors v

1

, v

2

, . . . , v

k

∈ R

n

is the set of all

linear combinations of the vectors. Symbolically, span(v

1

, v

2

, . . . , v

k

) = ¦a

1

v

1

+

a

2

v

2

+ + a

k

v

k

: a

1

, a

2

, . . . , a

k

∈ R¦.

Example 3 The span of

_

_

1

0

0

_

_

,

_

_

0

1

0

_

_

is all vectors of the form

_

_

x

y

0

_

_

for some x, y ∈ R

Example 4

_

8

13

_

is in the span of

_

2

3

_

and

_

4

7

_

because 2

_

2

3

_

+

_

4

7

_

=

_

8

13

_

Question 5 Is

_

_

3

4

2

_

_

in the span of

_

_

1

2

0

_

_

and

_

_

3

−3

0

_

_

?

Solution

Hint: The linear combinations of

_

_

1

2

0

_

_

and

_

_

3

−3

0

_

_

are all the vectors of the form

a

_

_

1

2

0

_

_

+ b

_

_

3

−3

0

_

_

for scalars a, b ∈ R. Could

_

_

3

4

2

_

_

be written in such a form?

Hint: No, because the last coordinate of all of these vectors is 0. In fact, graphically,

the span of these two vectors is just the entire xy-plane, and

_

_

3

4

2

_

_

lives oﬀ of that plane.

(a) Yes, it is in the span of those two vectors.

(b) No, it is not in the span of those two vectors.

Graphically, we should think of the span of one vector as the line which contains

the vector (unless the vector is the zero vector, in which case its span is just the

zero vector).

The span of two vectors which are not in the same line is the plane containing

the two vectors.

The span of three vectors which are not in the same plane is the “3D-space”

which contains those 3 vectors.

14

5 Functions

A function relates inputs and outputs.

Deﬁnition 1 A function f from a set A to a set B is an assignment of exactly

one element of B to each element of A. If a is an element of A, we write f(a) for

the element of B which is assigned to a by f.

We call A the domain of f, and B the codomain of f. We will also commonly

write f : A →B which we read out loud as “f from A to B” or “f maps A to B.”

Example 2 Let W = ¦yes, no¦ and A = ¦Dog, Cat, Walrus¦. Let f : A →W be

the function which assigns to each animal in A the answer to the question “Is this

animal commonly a pet?” Then f(Dog) = yes, f(Cat) = yes, and f(Walrus) = no.

In this case, A is the domain, and W is the codomain.

In these activities, we mostly study functions from R

n

to R

m

.

Question 3 Let g : R

1

→R

2

be deﬁned by g(θ) = (cos(θ), sin(θ)).

Solution

Hint:

Warning 4 In everything that follows, cos and sin are in terms of radians.

Hint: g(

π

6

) = (cos(

π

6

), sin(

π

6

))

Hint: If you remember your trig facts, this is (

√

3

2

,

1

2

). Format this as

_

_

_

√

3

2

1

2

_

¸

_ for this

question.

What is g(

π

6

) ? Give your answer as a vertical column of numbers.

Can you imagine what would happen to the point g(θ) as θ moved from 0 to

2π?

Question 5 Let h : R

2

→R

2

be deﬁned by h(x, y) = (x, −y).

Solution

Hint: Consider h(2, 1) = (2, −1).

15

5 Functions

Hint:

(2, 1)

h(2, 1)

Hint: h takes any point (x, y) to its reﬂection in the x−axis.

Hint: Format your answer as

_

2

−1

_

What is h(2, 1)? Format your answer as a vertical column of numbers.

Try to understand this function graphically. How does it transform the plane? The

hint reveals the answer to this question.

Question 6 Let f : R

4

→R

2

be deﬁned by f((x

1

, x

2

, x

3

, x

4

)) = (x

1

x

2

+x

3

, x

4

2

+

x

1

).

Solution

Hint: f(3, 4, 1, 9) = (3 4 + 1, 9

2

+ 3) = (13, 84).

Hint: Format this as

_

13

84

_

.

What is f(3, 4, 1, 9)? Format your answer as a vertical column of numbers.

Note that this function has too many inputs and outputs to visualize easily.

That certainly does not stop it from being a useful and meaningful function; this

is a “massively multivariable” course.

16

6 Composition

One way to build new functions is via “composition.”

Practically the most important thing you can do with functions is to compose them.

Deﬁnition 1 Let f : A → B and g : B → C. Then there is another function

(g ◦ f) : A →C deﬁned by (g ◦ f)(a) = g (f(a)) for each a ∈ A.

It is called the composition of g with f.

Warning 2 The composition is only deﬁned if the codomain of f is the domain

of g.

Question 3 Let A = ¦cat, dog¦, B = ¦(2, 3), (5, 6), (7, 8)¦, C = R. Let f be

deﬁned by f(cat) = (2, 3) and f(dog) = (7, 8). Let g be deﬁned by the rule

g((x, y)) = x + y.

Solution

Hint: First, (g ◦ f)(cat) = g (f(cat)).

Hint: Then note that f (cat) = (2, 3).

Hint: So this is g ((2, 3)) = 2 + 3 = 5.

(g ◦ f)(cat) = 5

Question 4 Let h : R

2

→R

3

be deﬁned by h(x, y) = (x

2

, xy, y), and let ω : R

3

→

R

2

be deﬁned by ω(x, y, z) = (xyz, z).

Solution

Hint:

(ω ◦ h)(x, y) = ω [h(x, y)]

= ω(x

2

, xy, y)

= ((x

2

)(xy)(y), y)

= (x

3

y

2

, y)

What is (ω ◦ h)(x, y)? Format your answer as a vertical column of formulas.

17

7 Higher-order functions

Sometimes functions act on functions.

Functions from R

n

→ R

m

are not the only useful kind of function. While such

functions are our primary object of study in this multivariable calculus class, it will

often be helpful to think about “functions of functions.” The next examples might

seem a bit peculiar, but later on in the course these kinds of mappings will become

very important.

Question 1 Let C

[0,1]

be the set of all continuous functions from [0, 1] to R. Deﬁne

I : C

[0,1]

→R by I(f) =

_

1

0

f(x)dx

Solution

Hint:

I(g) =

_

1

0

g(x)dx

=

_

1

0

x

2

dx

=

1

3

x

3

¸

¸

1

0

=

1

3

(1 −0)

=

1

3

.

If g(x) = x

2

, then I(g) = 1/3

Question 2 Let C

∞

(R) be the set of all inﬁnitely diﬀerentiable (“smooth”) func-

tions on R. Deﬁne Q : C

∞

(R) →C

∞

(R) by Q(f)(x) = f(0) + f

(0)x +

f

(0)

2

x

2

.

Solution

Hint: Question 3 Solution

Hint: f(0) = cos(0) = 1

f(0) = 1

Question 4 Solution

Hint: f

(x) = −sin(x), so f

(0) = −sin(0) = 0

f

(0) = 0

Question 5 Solution

18

7 Higher-order functions

Hint: f

(x) = −cos(x), so f

(0) = −cos(0) = −1

f

(0) = -1

Hint: So Q(f)(x) = 1 −

x

2

2

If f(x) = cos(x), then Q(f)(x) = 1 −x

2

/2?

This is an example of a function which eats a function and spits out another func-

tion. In particular, this takes a function and returns the second order MacLaurin

polynomial of that function.

Question 6 Deﬁne dot

n

: R

n

R

n

→R by dot

n

((x

1

, x

2

, ..., x

n

), (y

1

, y

2

, ..., y

n

)) =

x

1

y

1

+ x

2

y

2

+ x

3

y

3

+ ... + x

n

y

n

.

Solution

Hint: dot3((2, 4, 5), (0, 1, 4)) = 2(0) + 4(1) + 5(4) = 24

dot3((2, 4, 5), (0, 1, 4)) = 24

19

8 Currying

Higher-order functions provide a diﬀerent perspective on functions that take many

inputs.

Deﬁnition 1 Let A and B be two sets. The product AB of the two sets is the

set of all ordered pairs AB = ¦(a, b) : a ∈ A and b ∈ B¦.

Example 2 If A = ¦1, 2, Wolf¦ and B = ¦4, 5¦, then AB = ¦(1, 4), (1, 5), (2, 4), (2, 5), (Wolf, 4), (Wolf, 5)¦

Example 3 We write R

2

for pairs of real numbers, but we could have written

R R instead.

Question 4 Let Func(R, R) be the set of all functions from R to R. Deﬁne Eval :

R Func(R, R) →R by Eval(x, f) = f(x).

Solution

Hint: Eval(−3, g) = g(−3) = [ −3[ = 3

If g(x) = [x[, then Eval(−3, g) = 3?

Question 5 Let Func(A, B) be the set of all functions from A to B for any two

sets A and B.

Let Curry : Func(R

2

, R) →Func(R, Func(R, R)) be deﬁned by Curry(f)(x)(y) =

f(x, y).

Let h : R

2

→R be deﬁned by h(x, y) = x

2

+ xy.

Solution

Hint:

G(3) = Curry(h)(2)(3)

= h(2, 3)

= 2

2

+ 2(3)

= 10

Let G = Curry(h)(2). Then G(3) = 10

This wacky way of thinking is helpful when thinking about the λ-calculus

1

. It

also helps a lot if you ever want to learn to program in Haskell—which is one of

the languages that Ximera was written in.

1

http://en.wikipedia.org/wiki/Lambda_calculus

20

9 Python

Python provides a playground for multivariable functions.

We can use Python to experiment a bit with multivariable functions.

Question 1 Solution Model the function f(x) = x

2

as a python function.

Hint:

Warning 2 Python does not use ^ for exponentiation; it denotes this by **

Hint: Try using return x**2

Python

1 def f(x):

2 return #your code here

3

4 def validator():

5 return (f(4) == 16) and (f(-5) == 25)

Solution Model the function g(x) =

_

−1 if x ≤ 0

1 if x > 0

as a Python function.

Hint: Try using an if

Python

1 def g(x):

2 # your code here

3 return # the value of g(x)

4

5 def validator():

6 return (g(0) == -1) and (g(-17) == -1) and (g(25) == 1)

Solution Model the function h(x, y) =

_

x/(1 + y) if y ,= −1

0 if y = −1

as a Python function.

Python

1 def h(x,y):

2 # your code here

3 return # the value of h(x,y)

4

5 def validator():

6 return (h(6,2) == 2) and (h(17,-1) == 0) and (h(-24,5) == -4)

21

10 Higher-order python

One nice feature of Python is that we can play with functions which act on functions.

Question 1 Here is an example of a higher order function horizontal_shift.

It takes a function f of one variable, and a horizontal shift H, and returns the

function whose graph is the same as f, only shifted horizontally by H units.

Solution Find a function f so that horizontal_shift(f,2) is the squaring function.

Python

1 def horizontal_shift(f,H):

2 # first we define a new function shifted_f which is the appropriate shift of f

3 def shifted_f(x):

4 return f(x-H)

5 # then we return that function

6 return shifted_f

7 def f(x):

8 return # a function so that horizontal_shift(f,2) is the squaring function

9

10 def validator():

11 return (f(1) == 9) and (f(0) == 4) and (f(-3) == 1)

Solution Write a function forward_difference which takes a function f : R → R

and returns another real-valued function deﬁned by forward difference(f)(x) = f(x +

1) −f(x).

Python

1 def forward_difference(f):

2 # Your code here

3

4 def validator():

5 def f(x):

6 return x**2

7 def g(x):

8 return x**3

9 return (forward_difference(f)(3) == 7) and (forward_difference(g)(4) == 61)

22

11 Calculus

We can do some calculus with Python, too.

Let’s try doing some single-variable calculus with a bit of Python.

Let epsilon be a small, but positive number. Suppose f : R → R has been

coded as a Python function f which takes a real number and returns a real number.

Seeing as

f

(x) = lim

h→0

f(x + h) −f(x)

h

,

can you ﬁnd a Python function which approximates f

(x)?

Given a Python function f which takes a real number and returns a real number,

we can approximate f

**(x) by using epsilon. Write a Python function derivative
**

which takes a function f and returns an approximation to its derivative.

Solution

Hint: To approximate this, use (f(x+epsilon) - f(x))/epsilon.

Python

1 epsilon = 0.0001

2 def derivative(f):

3 def df(x): return (f(blah blah) - f(blah blah)) / blah blah

4 return df

5

6 def validator():

7 df = derivative(lambda x: 1+x**2+x**3)

8 if abs(df(2) - 16) > 0.01:

9 return False

10 df = derivative(lambda x: (1+x)**4)

11 if abs(df(-2.642) - -17.708405152) > 0.01:

12 return False

13 return True

This is great! In the future, we’ll review this activity, and then extend it to a multivariable

setting.

23

12 Linear maps

Linear maps respect addition and scalar multiplication.

We begin by deﬁning linear maps.

Deﬁnition 1 A function L : R

n

→ R

m

is called a linear map if it “respects

addition and scalar multiplication.”

Symbolically, for a map to be linear, we must have that L(v + w) = L(v) +L( w)

for all v, w ∈ R

n

and also L(av) = aL(v) for all a ∈ R and v ∈ R

n

.

Deﬁnition 2 Linear Algebra is the branch of mathematics concerning vector

spaces and linear mappings between such spaces.

Question 3 Which of the following functions are linear?

Solution

Hint: For a function to be linear, it must respect scalar multiplication. Let’s see how

f

_

5

_

1

1

__

compares to 5f

__

1

1

__

, and also how h

_

5

_

1

1

__

compares to 5h

__

1

1

__

.

Question 4 Solution

Hint: Remember f is deﬁned by f

__

x

y

__

= x + 2y, so f

_

5

_

1

1

__

= f

__

5

5

__

=

5 + 2(5) = 15

What is f

_

5

_

1

1

__

? 15

Solution

Hint: Remember f is deﬁned by f

__

x

y

__

= x + 2y, so f

__

1

1

__

= 1 + 2 (1) = 3

What is f

__

1

1

__

? 3

Solution Is f

_

5

_

1

1

__

= 5f

__

1

1

__

?

(a) Yes

(b) No

Great! So f has a chance of being linear, since it is respecting scalar multiplication in this

case. What about h?

Solution

Hint: Remember h is deﬁned by h

__

x

y

__

=

_

17

x

_

, so h

_

5

_

1

1

__

= h

__

5

5

__

=

_

17

5

_

What is h

_

5

_

1

1

__

?

Solution

24

12 Linear maps

Hint: Remember h is deﬁned by h

__

x

y

__

=

_

17

x

_

, so h

__

1

1

__

=

_

17

1

_

What is h

__

1

1

__

?

Solution Is h

_

5

_

1

1

__

= 5h

__

1

1

__

?

(a) Yes

(b) No

Great! So h is not linear: by looking at this particular example, we can see that h does

not always respect scalar multiplication. So h is not linear.

Since we know one of the two functions is linear, we can already answer the question:

The answer is f. To be thorough, lets check that f really is linear.

First we check that f really does respect scalar multiplication:

Let a ∈ R be an arbitrary scalar and

_

x

y

_

∈ R

2

be an arbitrary vector. Then

f

_

a

_

x

y

__

= f

__

ax

ay

__

= ax + 2ay

= a (x + 2y)

= af

__

x

y

__

Now we check that f really does respect vector addition:

Let

_

x1

y1

_

and

_

x2

y2

_

be arbitrary vectors in R

2

. Then

f

__

x1

y1

_

+

_

x2

y2

__

= f

__

x1 + x2

y1 + y2

__

= (x1 + x2) + 2 (y1 + y2)

= x1 + x2 + 2y1 + 2y2

= (x1 + 2y1) + (x2 + 2y2)

= f

__

x1

y1

__

+ f

__

x2

y2

__

This proves that f is linear!

(a) f : R

2

→R

1

deﬁned by f

__

x

y

__

= x + 2y

(b) h : R

2

→R

2

deﬁned by h

__

x

y

__

=

_

17

x

_

What about these two functions? Which of them is a linear map?

Solution

25

12 Linear maps

Hint: For a function to be linear, it must respect scalar addition. Let’s see how h(5+2)

compares to h(5)+h(2) and also how g

_

_

_

_

2

3

1

_

_

+

_

_

1

4

5

_

_

_

_

compares to g

_

_

_

_

2

3

1

_

_

_

_

+g

_

_

_

_

1

4

5

_

_

_

_

.

Question 5 Solution

Hint: Remember h is deﬁned by h(x) =

_

_

_

_

x

x

x

4x

_

¸

¸

_

, so h(5 + 2) = h(7) =

_

_

_

_

7

7

7

28

_

¸

¸

_

What is h(5 + 2)?

Solution

Hint: Remember h is deﬁned by h(x) =

_

_

_

_

x

x

x

4x

_

¸

¸

_

, so h(5) +h(2) =

_

_

_

_

5

5

5

20

_

¸

¸

_

+

_

_

_

_

2

2

2

8

_

¸

¸

_

=

_

_

_

_

7

7

7

28

_

¸

¸

_

What is h(5) + h(2)?

Solution Is h(5 + 2) = h(5) + h(2)?

(a) Yes

(b) No

Great! So h has a chance of being linear, since it is respecting vector addition in this

case. What about g?

Solution

Hint: Remember g is deﬁned by g

_

_

_

_

x

y

z

_

_

_

_

=

_

x

xy

_

, so g

_

_

_

_

2

3

1

_

_

+

_

_

1

4

5

_

_

_

_

= g

_

_

_

_

3

7

6

_

_

_

_

=

_

3

3(7)

_

=

_

3

21

_

What is g

_

_

_

_

2

3

1

_

_

+

_

_

1

4

5

_

_

_

_

?

Solution

Hint: Remember g is deﬁned by g

_

_

_

_

x

y

z

_

_

_

_

=

_

x

xy

_

, so

g

_

_

_

_

2

3

1

_

_

_

_

+ g

_

_

_

_

1

4

5

_

_

_

_

=

_

2

2(3)

_

+

_

1

1(4)

_

=

_

2

6

_

+

_

1

4

_

=

_

3

10

_

26

12 Linear maps

What is g

_

_

_

_

2

3

1

_

_

_

_

+ g

_

_

_

_

1

4

5

_

_

_

_

?

Solution Is g

_

_

_

_

2

3

1

_

_

+

_

_

1

4

5

_

_

_

_

= g

_

_

_

_

2

3

1

_

_

_

_

+ g

_

_

_

_

1

4

5

_

_

_

_

(a) Yes

(b) No

Great! So g is not linear: by looking at this particular example, we can see that g

does not always respect vector addition. So g is not linear.

Since we know one of the two functions is linear, we can already answer the question:

The answer is h. To be thorough, lets check that h really is linear.

First we check that h really does respect scalar multiplication:

Let a ∈ R be an arbitrary scalar and x ∈ R be an arbitrary vector. Then

h(ax) =

_

_

_

_

ax

ax

ax

4ax

_

¸

¸

_

= a

_

_

_

_

x

x

x

4x

_

¸

¸

_

= ah(x)

Now we check that h really does respect vector addition:

Let x and y be arbitrary vectors in R

1

. Then

h(x + y) =

_

_

_

_

x + y

x + y

x + y

4(x + y)

_

¸

¸

_

=

_

_

_

_

x + y

x + y

x + y

4x + 4y

_

¸

¸

_

=

_

_

_

_

x

x

x

4x

_

¸

¸

_

+

_

_

_

_

y

y

y

4y

_

¸

¸

_

= h(x) + h(y)

This proves that h is linear!

(a) g : R

3

→ R

2

deﬁned by g

_

_

_

_

x

y

z

_

_

_

_

=

_

x

xy

_

27

12 Linear maps

(b) h : R →R

4

deﬁned by h(x) =

_

_

_

_

x

x

x

4x

_

¸

¸

_

And ﬁnally, which of the following functions are linear?

Solution

Hint: For a function to be linear, it must respect scalar multiplication. Let’s see how

A

_

2

_

2

3

__

compares to 2A

__

2

3

__

and also how G

_

_

_

_

2

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

compares to 2G

_

_

_

_

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

.

Question 6 Solution

Hint: Remember A is deﬁned by A

__

x

y

__

=

_

0

0

_

, so A

_

2

_

2

3

__

= A

__

4

6

__

=

_

0

0

_

What is A

_

2

_

2

3

__

?

Solution

Hint: Remember A is deﬁned by A

__

x

y

__

=

_

0

0

_

, so 2A

__

2

3

__

= 2

_

0

0

_

=

_

0

0

_

What is 2A

__

2

3

__

?

Solution Is A

_

2

_

2

3

__

= 2A

__

2

3

__

)?

(a) Yes

(b) No

Great! So A has a chance of being linear, since it is respecting vector addition in this

case. What about G?

Solution

Hint: Remember G is deﬁned by G

_

_

_

_

_

_

_

_

x

y

z

t

_

¸

¸

_

_

_

_

_

=

_

_

e

x+y

x + z

sin(x + t)

_

_

, so

G

_

_

_

_

2

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

= G

_

_

_

_

_

_

_

_

2

4

6

8

_

¸

¸

_

_

_

_

_

=

_

_

e

2+4

2 + 6

sin(2 + 8)

_

_

=

_

_

e

6

8

sin(10)

_

_

28

12 Linear maps

What is G

_

_

_

_

2

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

?

Solution

Hint: Remember G is deﬁned by G

_

_

_

_

_

_

_

_

x

y

z

t

_

¸

¸

_

_

_

_

_

=

_

_

e

x+y

x + z

sin(x + t)

_

_

, so

2G

_

_

_

_

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

= 2

_

_

e

1+2

1 + 3

sin(1 + 4)

_

_

= 2

_

_

e

3

4

sin(5)

_

_

=

_

_

2e

3

8

2 sin(5)

_

_

What is 2G

_

_

_

_

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

?

Solution Is G

_

_

_

_

2

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

= 2G

_

_

_

_

_

_

_

_

1

2

3

4

_

¸

¸

_

_

_

_

_

?

(a) Yes

(b) No

Great! So G is not linear: by looking at this particular example, we can see that G

does not always respect scalar multiplication. So G is not linear.

Since we know one of the two functions is linear, we can already answer the question:

The answer is A. To be thorough, lets check that A really is linear.

First we check that A really does respect scalar multiplication:

Let c ∈ R be an arbitrary scalar and

_

x

y

_

∈ R

2

be an arbitrary vector. Then

A

_

c

_

x

y

__

= A

__

ax

ay

__

=

_

0

0

_

= a

_

0

0

_

Now we check that A really does respect vector addition:

Let

_

x1

y1

_

and

_

x2

y2

_

be arbitrary vectors in R

2

. Then

29

12 Linear maps

A

__

x1

y1

_

+

_

x2

y2

__

= A

__

x1 + x2

y1 + y2

__

=

_

0

0

_

=

_

0

0

_

+

_

0

0

_

= A

__

x1

y1

__

+ A

__

x2

y2

__

This proves that A is linear!

(a) G : R

4

→R

3

deﬁned by G

_

_

_

_

_

_

_

_

x

y

z

t

_

¸

¸

_

_

_

_

_

=

_

_

e

x+y

x + z

sin(x + t)

_

_

(b) A : R

2

→ R

2

deﬁned by A

__

x

y

__

=

_

0

0

_

Warning 7 Note that the function which sends every vector to the zero vector

is linear.

Question 8 Let L : R

3

→ R

2

be a linear function. Suppose L

_

_

_

_

1

0

0

_

_

_

_

=

_

3

4

_

,

L

_

_

_

_

0

1

0

_

_

_

_

=

_

−2

0

_

, and L

_

_

_

_

0

0

1

_

_

_

_

=

_

1

−1

_

.

Solution

Hint: The only thing we know about linear maps is that they respect scalar multi-

plication and vector addition. So we need to somehow rewrite the vector

_

_

4

−1

2

_

_

in terms

of the vectors

_

_

1

0

0

_

_

,

_

_

0

1

0

_

_

and

_

_

0

0

1

_

_

, scalar multiplication, and vector addition, to exploit

what we know about L.

Question 9 Can you rewrite

_

_

4

−1

2

_

_

in the form a

_

_

1

0

0

_

_

+ b

_

_

0

1

0

_

_

+ c

_

_

0

0

1

_

_

?

Solution

Hint: Observe that

_

_

4

−1

2

_

_

= 4

_

_

1

0

0

_

_

+−1

_

_

0

1

0

_

_

+ 2

_

_

0

0

1

_

_

.

30

12 Linear maps

Hint: Consider the coeﬃcient on

_

_

1

0

0

_

_

.

Hint: In this case, a = 4.

Hint: Moreover, b = −1.

Hint: Finally, c = 2.

a = 4

Solution b = -1

Solution c = 2

Now using the linearity of L, we can see that

L

_

_

_

_

4

−1

2

_

_

_

_

= L

_

_

4

_

_

1

0

0

_

_

+−1

_

_

0

1

0

_

_

+ 2

_

_

0

0

1

_

_

_

_

= 4L

_

_

_

_

1

0

0

_

_

_

_

+−1L

_

_

_

_

0

1

0

_

_

_

_

+ 2L

_

_

_

_

0

0

1

_

_

_

_

Can you ﬁnish oﬀ the computation?

Hint:

L

_

_

_

_

4

−1

2

_

_

_

_

= 4L

_

_

_

_

1

0

0

_

_

_

_

+−1L

_

_

_

_

0

1

0

_

_

_

_

+ 2L

_

_

_

_

0

0

1

_

_

_

_

= 4

_

3

4

_

+−1

_

−2

0

_

+ 2

_

1

−1

_

=

_

12

16

_

+

_

2

0

_

+

_

2

−2

_

=

_

16

14

_

Let v = L

_

_

_

_

4

−1

2

_

_

_

_

. What is v?

Can you generalize this?

Solution

31

12 Linear maps

Hint: The only thing we know about linear maps is that they respect scalar multi-

plication and vector addition. So we need to somehow rewrite the vector

_

_

x

y

z

_

_

in terms

of the vectors

_

_

1

0

0

_

_

,

_

_

0

1

0

_

_

and

_

_

0

0

1

_

_

, scalar multiplication, and vector addition, to exploit

what we know about L.

Question 10 Can you rewrite

_

_

x

y

z

_

_

in the form a

_

_

1

0

0

_

_

+ b

_

_

0

1

0

_

_

+ c

_

_

0

0

1

_

_

?

Solution

Hint:

_

_

x

y

z

_

_

= x

_

_

1

0

0

_

_

+ y

_

_

0

1

0

_

_

+ z

_

_

0

0

1

_

_

a = x

Solution b = y

Solution c = z

Hint: Now using the linearity of L, we can see that

L

_

_

_

_

x

y

z

_

_

_

_

= L

_

_

x

_

_

1

0

0

_

_

+ y

_

_

0

1

0

_

_

+ z

_

_

0

0

1

_

_

_

_

= xL

_

_

_

_

1

0

0

_

_

_

_

+ yL

_

_

_

_

0

1

0

_

_

_

_

+ zL

_

_

_

_

0

0

1

_

_

_

_

Can you ﬁnish oﬀ the computation?

Hint:

L

_

_

_

_

x

y

z

_

_

_

_

= xL

_

_

_

_

1

0

0

_

_

_

_

+ yL

_

_

_

_

0

1

0

_

_

_

_

+ zL

_

_

_

_

0

0

1

_

_

_

_

= x

_

3

4

_

+ y

_

−2

0

_

+ z

_

1

−1

_

=

_

3x

4x

_

+

_

−2y

0

_

+

_

z

−z

_

=

_

3x −2y + z

4x −z

_

Let v = L

_

_

_

_

x

y

z

_

_

_

_

? What is v?

32

12 Linear maps

As you have already discovered a linear map L : R

n

→ R

m

is fully determined

by its action on the “standard basis vectors” e

1

=

_

¸

¸

¸

¸

¸

_

1

0

0

.

.

.

0

_

¸

¸

¸

¸

¸

_

, e

2

=

_

¸

¸

¸

¸

¸

_

0

1

0

.

.

.

0

_

¸

¸

¸

¸

¸

_

, and so on, until

we reach e

n

=

_

¸

¸

¸

¸

¸

_

0

0

.

.

.

0

1

_

¸

¸

¸

¸

¸

_

.

Argue convincingly that if L : R

n

→R

m

is a linear map and you know L(e

i

) for

i = 1, 2, 3, ..., n, then you could ﬁgure out L(v) for any v ∈ R

n

. I want to determine

what L does to any vector v =

_

¸

¸

¸

¸

¸

¸

¸

¸

_

x

1

x

2

x

3

.

.

.

x

n

_

¸

¸

¸

¸

¸

¸

¸

¸

_

∈ R

n

. I can rewrite v as x

1

e

1

+x

2

e

2

+x

3

e

3

+

... +x

n

e

n

. By the linearity of L, L(v) = x

1

L( e

1

)+x

2

L( e

2

)+x

3

L( e

3

)+... +x

n

L( e

n

).

Since I already know the value of L( e

i

) for all i = 1, 2, 3, ..., n, this allows me to

compute L(v). So L is completely determined once I know what it does to each of

the standard basis vectors.

1

1

YouTube link: http://www.youtube.com/watch?v=8BFsz1FCdxM

33

13 Matrices

Matrices are a way to represent linear maps.

To make writing a linear map a little less cumbersome, we will develop a com-

pact notation for linear maps using our previous observation that a linear map is

determined by its action on the standard basis vectors.

Deﬁnition 1 An mn matrix is an array of numbers which has m rows and n

columns. The numbers in a matrix are called entries.

When A is a matrix, we write A = (a

ij

), meaning that a

i,j

is the entry in the

i

th

row and j

th

column of the matrix. Note: We start counting with 1 not 0. So

the upper lefthand entry of the matrix is a

1,1

.

Question 2 The matrix A =

_

_

1 −1

2 4

3 −5

_

_

is an n m matrix.

Solution

Hint: Note that this is n m whereas the deﬁnition above used mn.

Hint: n is the number of rows, and m is the number of columns

Hint: n = 3 and m = 2

In this case, n is 3.

Solution And m is 2.

Remember, we write a

i,j

for the entry in the i

th

row and j

th

column of the

matrix.

Solution

Hint: a3,2 is the entry in the 3

rd

row and the 2

nd

column.

Hint: a3,2 = −5

Therefore a3,2 is −5.

Next, suppose the 3 4 matrix B has b

i,j

= i + j.

Solution

Hint: Question 3 Solution

Hint: b1,2 = 1 + 2 = 3

According to this rule, b1,2 is 3

So the entry in the ﬁrst row and second column of this matrix should be 3.

34

13 Matrices

Hint: B =

_

_

2 3 4 5

3 4 5 6

4 5 6 7

_

_

What is B?

Deﬁnition 4 To each linear map L : R

n

→R

m

we associate a mn matrix A

L

called the matrix of the linear map with respect to the standard coordinates. It is

deﬁned by setting a

i,j

to be the i

th

component of L(e

j

). In other words, the j

th

column of the matrix A

L

is the vector L(e

j

).

Going the other way, we likewise associate to each matrix m n matrix M a

linear map L

M

: R

n

→R

m

by requiring that L(e

j

) be the j

th

column of the matrix

M.

Question 5 The linear map L : R

2

→R

3

satisﬁes L

__

1

0

__

=

_

_

3

−5

2

_

_

and L

__

0

1

__

=

_

_

−1

1

1

_

_

. What is the matrix of L?

Solution

Hint: Remember that, by deﬁnition, the ﬁrst column of this matrix should be L

__

1

0

__

and the second column should be L

__

0

1

__

.

Hint: The matrix of L is

_

_

3 −1

−5 1

2 1

_

_

Let’s do another example.

Question 6 Suppose L is a linear map represented by the matrix A =

_

_

1 −1

2 4

3 −5

_

_

.

Solution

Hint: A should have one column for each basis vector of the domain.

Hint: A has 2 columns, so the dimension of the domain is 2.

The dimension of the domain of L is 2.

Solution

Hint: Each column of A is the image of a basis vector under the action of L

35

13 Matrices

Hint: Since the columns are of length 3, that means L is spitting out vectors of length

3.

Hint: The codomain of L is R

3

which is 3 dimensional.

The dimension of the codomain of L is 3.

Suppose v = L

__

0

1

__

. What is v?

Solution

Hint: Remember that, by deﬁnition, the i

th

column of A is L( ei).

Hint: So, by deﬁnition, L

__

0

1

__

is the second column of the matrix A.

Hint: So L

__

0

1

__

=

_

_

−1

4

−5

_

_

Suppose w = L

__

4

5

__

. What is w?

Solution

Hint: By deﬁnition of the matrix associated to a linear map, we know that L

__

1

0

__

=

_

_

1

2

3

_

_

and L

__

0

1

__

=

_

_

−1

4

−5

_

_

.

Hint: Can you rewrite

_

4

5

_

in terms of

_

1

0

_

and

_

0

1

_

so that you can use the linearity

of L to compute L

__

4

5

__

?

Hint: L

__

4

5

__

= L

_

4

_

1

0

_

+ 5

_

0

1

__

Hint:

L

__

4

5

__

= L

_

4

_

1

0

_

+ 5

_

0

1

__

= 4L

__

1

0

__

+ 5L

__

0

1

__

= 4

_

_

1

2

3

_

_

+ 5

_

_

−1

4

−5

_

_

=

_

_

4

8

12

_

_

+

_

_

−5

20

−25

_

_

=

_

_

−1

28

−13

_

_

36

13 Matrices

What is L

__

x

y

__

?

Solution

Hint: By deﬁnition of the matrix associated to a linear map, we know that L

__

1

0

__

=

_

_

1

2

3

_

_

and L

__

0

1

__

=

_

_

−1

4

−5

_

_

.

Hint: Can you rewrite

_

x

y

_

in terms of

_

1

0

_

and

_

0

1

_

so that you can use the linearity

of L to compute L

__

4

5

__

?

Hint: L

__

x

y

__

= L

_

x

_

1

0

_

+ y

_

0

1

__

Hint:

L

__

x

y

__

= L

_

x

_

1

0

_

+ y

_

0

1

__

= xL

__

1

0

__

+ yL

__

0

1

__

= x

_

_

1

2

3

_

_

+ y

_

_

−1

4

−5

_

_

=

_

_

x

2x

3x

_

_

+

_

_

−y

4y

−5y

_

_

=

_

_

x −y

2x + 4y

3x −5y

_

_

As an antidote to the abstraction, let’s take a look at a simplistic “real world”

example.

Question 7 In the local barter economy, there is an exchange where you can

• trade 1 spoon for 2 apples and 1 orange,

• trade 1 knife for 2 oranges, and

• trade 1 fork for 3 apples and 4 oranges.

Model this as a linear map from L : R

3

→ R

2

, where the coordinates on R

3

are

_

_

spoons

knives

forks

_

_

and the coordinates on R

2

are

_

apples

oranges

_

.

37

13 Matrices

Solution

Hint: Remember the matrix of a linear map is deﬁned by the fact the the kth column

of the matrix is the image of the kth standard basis vector.

Hint:

_

_

1

0

0

_

_

represents one spoon in the codomain. Its image under this linear map is

2 apples and 1 orange, which is represented by the vector

_

2

1

_

in the codomain. So the

ﬁrst column of the matrix should be

_

2

1

_

Hint: The full matrix is

_

2 0 3

1 2 4

_

What is the matrix of the linear map L?

Solution

Hint:

L

_

_

_

_

3

0

4

_

_

_

_

= L

_

_

3

_

_

1

0

0

_

_

+ 4

_

_

0

0

1

_

_

_

_

(1)

= 3L

_

_

_

_

1

0

0

_

_

_

_

+ 4L

_

_

_

_

0

0

1

_

_

_

_

(2)

= 3

_

2

1

_

+ 4

_

3

4

_

(3)

=

_

6

3

_

+

_

12

16

_

(4)

=

_

18

19

_

(5)

So you would be able to get 18 apples and 19 oranges.

Hint: Now the “5 year old” solution: If you have 3 spoons, 0 knives, and 4 forks, and

you traded them all in for fruit, how many apples would you have?

Hint: 3 spoons would get you 6 apples, and 4 forks get you 12 apples, so you would

have a total of 18 apples.

The ﬁrst (“apples”) entry of L

_

_

_

_

3

0

4

_

_

_

_

is 18.

Try to answer this question both by applying the matrix to the vector, but also as a

5 year old would solve it.

38

13 Matrices

Prove the following statement: if S : R

n

→ R

m

and T : R

n

→ R

m

are both

linear maps, then the map (S +T) : R

n

→R

m

deﬁned by (S +T)(v) = S(v) +T(v)

is also linear.

We need to check that (S + T) respects both scalar multiplication and vector

addition.

Scalar multiplication:

Choose and arbitrary scalar c ∈ R and an arbitrary vector v ∈ R

n

. Then

(S + T)(cv) = S(cv) + T(cv) by deﬁnition of (S + T)

= cS(v) + cT(v) by the linearity of S and T

= c (S(v) + T(v)) by the distributivity of scalar multiplication over addition in R

m

= c(S + T)(v) by deﬁnition of (S + T)

Vector addition: Choose two arbitrary vectors v and w in R

n

. Then

(S + T)(v + w) = S(v + w) + T(v + w) by deﬁnition of S + T

= S(v) + S( w) + T(v) + T( w) by the linearity of S and T

= S(v) + T(v) + S( w) + T( w) by the commutativity of vector addition in R

m

= (S + T)(v) + (S + T)( w) by the deﬁnition of S + T.

Prove that if T : R

n

→R

m

is a linear map and c ∈ R is a scalar, then the map

cT : R

n

→R

m

, deﬁned by

(cT)(v) = cT(v)

is also a linear map.

We need to check that cT respects both scalar multiplication and vector addi-

tion.

Scalar multiplication:

Choose and arbitrary scalar a ∈ R and an arbitrary vector v ∈ R

n

. Then

(cT)(av) = cT(av)

= acT(v)

= a(cT)(v)

Vector addition: Choose two arbitrary vectors v and w in R

n

. Then

(cT)(v + w) = cT(v + w)

= c (T(v) + T( w))

= cT(v) + cT( w)

= (cT)(v) + (cT)( w)

Observation 8 The last two exercises show that we have a nice way to both

add linear maps and multiply linear maps by scalars. So linear maps themselves

“feel” a bit like vectors. You do not have to worry about this now, but we will see

that the linear maps from R

n

→R

m

form an “abstract vector space.” Much of the

power of linear algebra is that we can apply linear algebra to spaces of linear maps!

39

14 Composition

The composition of linear maps can be computed with matrices.

Prove that if S : R

n

→R

m

is a linear map, and T : R

m

→R

k

is a linear map, then

the composite function T ◦ S : R

n

→R

k

is also linear.

We need to show that T ◦ S respects scalar multiplication and vector addition:

Scalar multiplication: For every scalar a ∈ R and every vector v ∈ R

n

, we have:

(T ◦ S)(av) = T (S(av))

= T(aS(v)) because S respects scalar multiplication

= aT(S(v)) because T respects scalar multiplication

= a(T ◦ S)(v)

Vector addition: For every two vectors v, w ∈ R

n

, we have:

(T ◦ S)(v + w) = T (S(v + w))

= T(S(v + S( w)))because S respects vector addition

= T(S(v)) + T(S( w))because T respects vector addition

= (T ◦ S)(v) + (T ◦ S)( w)

Question 1 Suppose the matrix of S is M

S

=

_

2 0 −1

−1 1 1

_

and the matrix of

T is M

T

=

_

_

−1 −1

0 2

−1 1

_

_

.

Solution

Hint: Remember that the matrix for S ◦ T will have columns given by (S ◦ T)

__

1

0

__

and (S ◦ T)

__

0

1

__

Hint: Question 2 Solution

Hint:

(S ◦ T)

__

1

0

__

= S

_

T

__

1

0

___

= S

_

_

_

_

−1

0

−1

_

_

_

_

because by deﬁnition, T

__

1

0

__

is the ﬁrst column of the matrix of T

= −1S

_

_

_

_

1

0

0

_

_

_

_

+−1S

_

_

_

_

0

0

1

_

_

_

_

by the linearity of S

= −1

_

2

−1

_

+−1

_

−1

1

_

because ???

=

_

−1

0

_

40

14 Composition

What is (S ◦ T)

__

1

0

__

?

Question 3 Solution

Hint:

(S ◦ T)

__

0

1

__

= S

_

T

__

0

1

___

= S

_

_

_

_

−1

2

1

_

_

_

_

because by deﬁnition, T

__

0

1

__

is the second column of the matrix of T

= −1S

_

_

_

_

1

0

0

_

_

_

_

+ 2S

_

_

_

_

0

1

0

_

_

_

_

+ S

_

_

_

_

0

0

1

_

_

_

_

by the linearity of S

= −1

_

2

−1

_

+ 2

_

0

1

_

+

_

−1

1

_

because ???

=

_

−3

4

_

What is (S ◦ T)

__

0

1

__

?

Hint: The matrix of (S ◦ T) is

_

−1 −3

0 4

_

What is the matrix of S ◦ T?

Solution

Hint: Remember that the matrix for T ◦S will have columns given by (T ◦S)

_

_

_

_

1

0

0

_

_

_

_

,

(T ◦ S)

_

_

_

_

0

1

0

_

_

_

_

and (T ◦ S)

_

_

_

_

0

0

1

_

_

_

_

Hint: Question 4 Solution

Hint:

(T ◦ S)

_

_

_

_

1

0

0

_

_

_

_

= T

_

_

S

_

_

_

_

1

0

0

_

_

_

_

_

_

= T

__

2

−1

__

because by deﬁnition, S

_

_

_

_

1

0

0

_

_

_

_

is the ﬁrst column of the matrix of S

= 2T

__

1

0

__

+−1T

__

0

1

__

by the linearity of T

= 2

_

_

−1

0

−1

_

_

+−1

_

_

−1

2

1

_

_

because ???

=

_

_

−1

−2

−3

_

_

41

14 Composition

What is (T ◦ S)

_

_

_

_

1

0

0

_

_

_

_

?

Question 5 Solution

Hint:

(T ◦ S)

_

_

_

_

0

1

0

_

_

_

_

= T

_

_

S

_

_

_

_

0

1

0

_

_

_

_

_

_

= T

__

0

1

__

because by deﬁnition, S

_

_

_

_

1

0

0

_

_

_

_

is the ﬁrst column of the matrix of S

=

_

_

−1

2

1

_

_

we got lucky: by deﬁnition T

__

0

1

__

is the second column of the matrix of T

What is (T ◦ S)

_

_

_

_

0

1

0

_

_

_

_

?

Question 6 Solution

Hint:

(T ◦ S)

_

_

_

_

0

0

1

_

_

_

_

= T

_

_

S

_

_

_

_

0

0

1

_

_

_

_

_

_

= T

__

−1

1

__

because by deﬁnition, S

_

_

_

_

0

0

1

_

_

_

_

is the third column of the matrix of S

= −1T

__

1

0

__

+ T

__

0

1

__

by the linearity of T

= −1

_

_

−1

0

−1

_

_

+

_

_

−1

2

1

_

_

because ???

=

_

_

0

2

2

_

_

What is (T ◦ S)

_

_

_

_

0

0

1

_

_

_

_

?

Hint: The matrix of (T ◦ S) is

_

_

−1 −1 0

−2 2 2

−3 1 2

_

_

What is the matrix of T ◦ S?

42

14 Composition

Deﬁnition 7 If M is a mn matrix and N is a k m matrix, then the product

NM of the matrices is deﬁned as the matrix of the composition of the linear maps

deﬁned by M and N.

In other words, NM is the matrix of L

N

◦ L

M

.

Warning 8 You may have seen another deﬁnition for matrix multiplication in

the past. That deﬁnition could be seen as a shortcut for how to compute the

product, but it is usually presented devoid of mathematical meaning.

Hopefully our deﬁnition seems properly motivated: matrix multiplication is just

what you do to compose linear maps. We suggest working out the problems here

using our deﬁnition: you will develop your own eﬃcient shortcuts in time.

You have already multiplied two matrices, even though you didn’t know it,

above. Take some time now to get a whole lot of practice. You do not need us

to prompt you: invent your own matrices and try to multiply them, on paper.

What condition is needed on the rows and columns of the two matrices for matrix

multiplication to even make sense? You can check your work using a computer

algebra system, like SAGE

1

or you can use a free web hosted app like Reshih

2

. Use

our deﬁnition, and think through it each time. Try to get faster and more eﬃcient.

Eventually you should be able to do this quite rapidly.

Question 9 Suppose B =

_

1 2

3 4

_

. Find a 22 matrix A so that AB ,= BA. Play

around! Can you ﬁnd more than one?

Solution

Hint: There is no systematic way to answer this question: you just have to play

around, and see what you discover!

Hint: Question 10 Solution

Hint:

_

1 2

3 4

_ _

1 0

0 0

_

=

_

1 0

3 0

_

What is

_

1 2

3 4

_ _

1 0

0 0

_

?

Question 11 Solution

Hint:

_

1 0

0 0

_ _

1 2

3 4

_

=

_

1 2

0 0

_

What is

_

1 0

0 0

_ _

1 2

3 4

_

?

A matrix that doesn’t commute with B is

1

http://www.sagemath.org/

2

http://matrix.reshish.com/

43

14 Composition

Question 12 Solution

Hint: Try some simple matrices. Maybe limit yourself to 2 2 matrices?

Hint: One simple linear map which would work is L

__

x

y

__

=

_

y

0

_

. Applying this

twice to any vector would give you the zero vector. This linear map is great for cooking

up counterexamples to all sorts of naive things you might think about matrices! See this

Mathoverﬂow answer

3

(you will understand more and more of these terms as the course

progresses).

Question 13 Hint: The matrix of L is

_

0 1

0 0

_

What is the matrix of the example linear map L?

Find A ,= 0 with AA = 0. (Note: such a matrix is called “nilpotent”)

Question 14 If A =

_

2 8

3 12

_

, ﬁnd v ,= 0 with Av =

0.

Solution

Hint: Let v =

_

x

y

_

, and solve a system of equations

Hint:

A(v) =

0

_

2 8

3 12

_ _

x

y

_

=

_

0

0

_

_

2x + 8y

3x + 12y

_

=

_

0

0

_

Hint: Both of these conditions (2x + 8y = 0 and 3x + 12y = 0) are saying the same

thing: x = −4y.

Hint: So

_

−4

1

_

works, for example.

Question 15 If A =

_

1 3

2 4

_

, ﬁnd v with Av =

_

0

8

_

.

Solution

3

http://mathoverflow.net/questions/16829/what-are-your-favorite-instructional-counterexamples/

16841#16841

44

14 Composition

Hint: Let v =

_

x

y

_

and solve a system of equations.

Hint:

Av =

_

0

8

_

_

1 3

2 4

_ _

x

y

_

=

_

0

8

_

_

x + 3y

2x + 4y

_

=

_

0

8

_

Hint:

_

x + 3y = 0

2x + 4y = 8

_

x + 3y = 0

x + 2y = 4

_

x + 3y = 0

y = −4

_

x = 12

y = −4

In the last two exercises, you found that solving matrix equations is equivalent

to solving systems of linear equations.

Question 16 Rewrite

_

4x + 7y + z = 3

−x + 8y −z = 2

as A

_

_

x

y

z

_

_

=

_

3

2

_

.

Solution

Hint: A =

_

4 7 1

−1 8 −1

_

45

15 Python

Build up some linear algebra in python.

Exercise 1 We will store a vector as a list. So the vector

_

_

1

2

3

_

_

will be stored as

[1,2,3]. Let’s try to write some Python code for working with lists as if they were

vectors.

Solution

Hint: This was discussed on http://stackoverflow.com/questions/14050824/add-sum-of-values-of-two-lists-into-new-listStackOverﬂow.

Write a “vector add” function. Your function may assume that the two vectors have

the same number of entries.

Python

1 # write a function vector_sum(v,w) which takes two vectors v and w,

2 # and returns the sum v + w.

3 #

4 # For example, vector_sum([1,2], [4,1]) equals [5,3]

5 #

6

7 def vector_sum(v,w):

8 # your code here

9 return # the sum v+w

10

11 def validator():

12 # It would be better to try more cases

13 if vector_sum([-5,23],[10,2])[0] != 5:

14 return False

15 if vector_sum([1,5,6],[2,3,6])[1] != 8:

16 return False

17 return True

18

Solution

Hint: Try a Python “list comprehension”

Hint: For example, return [alpha * x for x in v]

Next, write a scalar multiplication function.

Python

1 # write a function scale_vector(alpha, v) which takes a number alpha and a vector v

2 # and returns alpha * v

3 #

4 # For example, scale_vector(5,[1,2,3]) equals [5,10,15]

5

6 def scale_vector(alpha, v):

7 # your code here

8 return # the scaled vector alpha * v

46

15 Python

9

10 def validator():

11 # It would be better to try more cases

12 if scale_vector(-3,[2,3,10])[1] != -9:

13 return False

14 if scale_vector(10,[4,3,2,1])[2] != 20:

15 return False

16 return True

17

Let’s write a dot product function.

Solution

Python

1 # Write a function dot_product(v,w) which takes two vectors v and w,

2 # and returns the dot product of v and w.

3 #

4 # For example, dot_product([1,2],[0,3]) is 6.

5

6 def dot_product(v,w):

7 # your code here

8 return # the dot product "v dot w"

9

10 def validator():

11 if dot_product([1,2],[-3,5]) != 7:

12 return False

13 if dot_product([0,4,2],[2,3,-7]) != -2:

14 return False

15 return True

And we will store a matrix as a list of lists. For example the list [[1,3,5],[2,4,6]]

will represent the matrix

_

1 3 5

2 4 6

_

.

Note that there are two diﬀerent conventions that we could have chosen: the in-

nermost lists could be the rows, or the columns. There are good reasons to have

chosen the opposite convention: after all, when thinking of a matrix as a linear

map, we should be paying attention to the columns, since the ith column tells us

what the corresponding linear map does when applied to e

i

.

Nevertheless, the innermost lists are rows in our chosen representation.

This way, to talk about the entry m

ij

, we write m[i][j]. Had we made the other

choice, the m

ij

entry would have been accessed by writing j and i in the other order.

This is also the same convention used by the computer algebra system, Sage.

Exercise 2 Write a “matrix multiplication” function.

Solution

47

15 Python

Python

1 # write a function multiply(A,B) which takes two matrices A and B stored in the above format,

2 # and returns the matrix of their product

3

4 def multiply(A,B):

5 # your code here

6 return # the product AB

7

8 def validator():

9 # It would be better to try more cases

10 a = [[-2, 0], [-2, -3], [-1, 3]]

11 b = [[-3, 2, -1, -2], [3, 2, 1, 3]]

12 result = multiply(a,b)

13 if (len(result) != 3):

14 return False

15 if (len(result[0]) != 4):

16 return False

17 if (result[2][1] != 4):

18 return False

19 return True

Fantastic!

Next, let’s think more about how matrices and linear maps are related.

Solution

Hint:

Warning 3 This is a function whose output is a function.

Hint: Try using lambda.

Write a function matrix_to_function which takes a matrix ML representing the linear

map L, and returns a Python function. The returned Python function should take a vector

v and send it to L(v).

Python

1 # For example, if M = [[1,2],[3,4]], then matrix_to_function(M)([0,1]) should be [2,4]

2

3 def matrix_to_function(M):

4 #your code here

5 return # the function which sends v to M(v)

6

7 def validator():

8 if matrix_to_function([[-3,2,4],[5,-7,2]])([5,3,2])[0] != -1:

9 return False

10 if matrix_to_function([[4,3],[2,-1],[-5,3]])([2,-4])[2] != -22:

11 return False

12 return True

Now you can go back and check—for some examples of A, B, and v—that the

following is true: matrix_to_function(A)(matrix_to_function(B)(v)) is the

same as matrix_to_function(multiply(A,B))(v).

48

15 Python

Solution Now let’s go the other way. Write a function function_to_matrix which

takes a Python function f—assumed to be a linear map from R

2

to R

2

—and returns the

2 2 matrix representing that linear map.

Python

1 # For example if you had defined

2 #

3 # def L(v):

4 # return [2*v[0]+3*v[1], -4*v[0]]

5 #

6 # Then function_to_matrix(L) is

7

8 # You may assume that L takes [x,y] to another list with two entries

9 # and you may assume that L is linear

10

11 def function_to_matrix(L):

12 #your code here

13 return # the matrix

14

15 def validator():

16 M = function_to_matrix( lambda v: [3*v[0]+5*v[1], -2*v[0] + 4*v[1]] )

17 if (M[0][0] != 3):

18 return False

19 M = function_to_matrix( lambda v: [2*v[0]-3*v[1], -7*v[0] - 5*v[1]] )

20 if (M[1][0] != -7):

21 return False

22 M = function_to_matrix( lambda v: [v[0]+7*v[1], 3*v[0] - 2*v[1]] )

23 if (M[1][1] != -2):

24 return False

25 return True

Great work! If you like, you can try to compute function_to_matrix(matrix_to_function(M)).

You should get back M.

49

16 An inner product space

The dot product provides a way to compute lengths and angles.

In order to do geometry in R

n

, we will want to be able to compute the length of

a vector, and the angle between two vectors. Miraculously, a single operation will

allow us to compute both quantities.

50

17 Covectors

A covector eats vectors and provides numbers.

Deﬁnition 1 A covector on R

n

is a linear map from R

n

→R.

As a matrix, it is a single row of length n.

Example 2

_

2 −1 3

¸

is the matrix of a covector on R

3

.

Question 3 Solution

Hint:

_

2 −1 3

_

_

_

3

5

7

_

_

= 2(3) +−1(5) + 3(7) = 22

_

2 −1 3

_

_

_

3

5

7

_

_

=22

Now we can do this a bit more abstractly.

Hint:

_

x y z

_

_

_

a

b

c

_

_

= ax + by + cz

_

x y z

¸

_

_

a

b

c

_

_

= ax + by + cz

There is a natural way to turn a vector into a covector, or a covector into a

vector: just turn the matrix 90

◦

one direction or the other!

Deﬁnition 4 We deﬁne the transpose of a vector v =

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

to be the covector

v

with matrix

_

x

1

x

2

x

n

¸

.

Similarly we deﬁne the transpose of a covector ω :

_

x

1

x

2

x

n

¸

to be

the vector ω

with matrix

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

.

Question 5 Suppose v =

_

_

1

4

3

_

_

. What is (v

)

?

Solution

(a) (v

)

=

_

_

1

4

3

_

_

51

17 Covectors

(b) (v

)

=

_

1 4 3

_

Indeed, (v

)

= v and (ω

)

**= ω for any vector v and covector ω.
**

Let v =

_

_

5

3

1

_

_

and w =

_

_

2

−2

7

_

_

Solution

Hint: v

(w) =

_

5 3 1

_

_

_

2

−2

7

_

_

= 5(2) + 3(−2) + 1(7) = 11

v

(w) = 11?

Solution

Hint:

w(v

) =

_

_

2

−2

7

_

_

_

5 3 1

_

=

_

_

10 6 2

−10 −6 −2

35 21 7

_

_

What is wv

?

52

18 Dot product

The standard inner product is the dot product.

Deﬁnition 1 Given two vectors v, w ∈ R

n

, we deﬁne their standard inner product

¸v, w¸ by ¸v, w¸ = v

**( w) ∈ R. We sometimes use the notation v w for ¸v, w¸, and
**

call the operation the dot product.

Warning 2 Note that v

( w) ,= w(v

**): one is a number, while the other is an
**

n n matrix.

Question 3 Make sure for yourself, by using the deﬁnition, that

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

n

_

¸

¸

¸

_

= x

1

y

1

+ x

2

y

2

+ x

3

y

3

+ + x

n

y

n

.

Prove the following facts about the dot product. u, v, w ∈ R

n

and a ∈ R

(a) v w = w v (The dot product is commutative)

(b) (u +v) w = u w +v w and (av) w = a(v w) (The dot product is linear

in the ﬁrst argument)

(c) u (v + w) = u v +u w and v (a w) = a(v w) (The dot product is linear in

the second argument)

(d) v v ≥ 0 (We say that the dot product is “positive deﬁnite”)

(e) if v z = 0 for all z ∈ R

n

, then v =

**0 (The dot product is nondegenerate)
**

1. v w = v

1

w

1

+ v

2

w

2

+ ... + v

n

w

n

= w

1

v

1

+ w

2

v

2

+ ... + w

n

v

n

= w v, so the

dot product is commutative.

(skipping item 2 for now)

3.

u (v + w) = u

(v + w) by deﬁnition

= u

(v) +u

( w) since u

: R

n

→R is linear

= u v +u w by deﬁnition

and

u (a w) = u

(a w) by deﬁnition

= au

( w) since u

: R

n

→R is linear

= au w by deﬁnition

53

18 Dot product

2. follows from 3 and 1

4. v v = v

2

1

+ v

2

2

+ v

2

3

+ ... + v

2

n

, and the square of a real number is nonnega-

tive, so the sum of these squares is also nonnegative.

5. is perhaps the trickiest fact to prove. Observe that if v z = 0 for every z ∈ R

n

,

then this formula is true in particular for z = e

j

. But v e

j

= v

j

. Thus, by dotting

with all of the standard basis vectors, we see that every coordinate of v must be 0.

Thus v is the zero vector

The fact that the dot product is linear in two separate vector variables means

that it is an example of a “bilinear form”. We will make a careful study of bi-

linear forms later in this course: it will turn out that the second derivative of a

multivariable function gives a bilinear form at each point.

So far, the inner product feels like it belongs to the realm of pure algebra. In

the next few exercises, we will start to see some hints of its geometric meaning.

Question 4 Let v =

_

5

1

_

.

Solution

Hint: ¸v, v) = 5

2

+ 1

2

= 26

¸v, v) = 26

Let’s think about this a bit more abstractly. Set v =

_

x

y

_

.

Solution

Hint: ¸v, v) = x

2

+ y

2

¸v, v) = x

2

+ y

2

Notice that the length of the line segment from (0, 0) to (x, y) is

_

x

2

+ y

2

by

the Pythagorean theorem.

54

19 Length

The inner product provides a way to measure the length of a vector.

You should have discovered that v v is the square of the length of the vector v

when viewed as an arrow based at the origin. So far, you have only shown this in

the 2-dimensional case. See if you can do it in three dimensions.

Show that the length of the line segment from (0, 0, 0) to (x, y, z) is

√

v v, where

v =

_

_

x

y

z

_

_

.

Until now, you may not have seen a treatment of length in higher dimensions.

Generalizing the results above, we deﬁne:

Deﬁnition 1 The length of a vector v ∈ R

n

is deﬁned by [v[ =

√

v v.

Question 2 Solution The length of the vector

_

_

_

_

6

2

3

1

_

¸

¸

_

= sqrt(6

2

+ 2

2

+ 3

2

+ 1

2

)

Question 3 Solution

Hint: By the Pythagorean theorem, we can see that the distance is

_

(5 −2)

2

+ (9 −3)

2

Hint: We could also view this as the length of the vector

_

3

6

_

which “points” from

(2, 3) to (5, 9).

The distance between the points (2, 3) and (5, 9) is sqrt(3

2

+ 6

2

)

Deﬁnition 4 The distance between two points p and q in R

n

is deﬁned to be

the length of the “displacement” vector p −q.

Question 5 Solution

Hint: The displacement vector between these points is

_

_

_

_

5 −2

6 −7

9 −3

8 −1

_

¸

¸

_

=

_

_

_

_

3

1

6

7

_

¸

¸

_

Hint: The length of the displacement vector is

_

3

2

+ 1

2

+ 6

2

+ 7

2

The distance between the points (2, 7, 3, 1) and (5, 6, 9, 8) is sqrt(3

2

+ 1 + 6

2

+ 7

2

)

Question 6 Write an equation for the sphere centered at (0, 0, 0, 0) in R

4

of radius

r using the coordinates x, y, z, w on R

4

.

Solution

55

19 Length

Hint: For a point p = (x, y, z, w) to be on the sphere of radius r centered at (0, 0, 0, 0),

the distance from p to the origin must be r

Hint: r =

_

x

2

+ y

2

+ z

2

+ w

2

Hint: x

2

+ y

2

+ z

2

+ w

2

= r

2

x

2

+ y

2

+ z

2

+ w

2

= r

2

Question 7 Write an inequality stating that the point (x, y, z, w) is more than 4

units away from the point (2, 3, 1, 9)

Solution

Hint: The distance between the point (x, y, z, w) and (2, 3, 1, 9) is

_

(x −2)

2

+ (y −3)

2

+ (z −1)

2

+ (w −9)

2

.

Hint: So we need

_

(x −2)

2

+ (y −3)

2

+ (z −1)

2

+ (w −9)

2

> 4

sqrt((x −2)

2

+ (y −3)

2

+ (z −1)

2

+ (w −9)

2

) > 4

Prove that [av[ = [a[[v[ for every a ∈ R.

Warning 8 These two uses of [ [ are distinct: [a[ means the absolute value of

a, and [v[ is the length of v.

[av[ =

_

¸av, av¸ by deﬁnition

=

_

a

2

¸v, v¸ by the linearity of the inner product in each slot

=

√

a

2

_

¸v, v¸

= [a[[v[

56

20 Angles

Dot products can be used to compute angles.

Question 1 Give a vector of length 1 which points in the same direction as v =

_

1

2

_

(i.e. is a positive multiple of v).

Solution

Hint: Remember that you just argued that [av[ = [a[v for any a ∈ R. What positive

a could you choose to make [a[[v[ = 1?

Hint: We need to take a =

1

[v[

Hint: The length of v is

_

1

2

+ 2

2

=

√

5

Hint: The vector

_

_

_

1

√

5

2

√

5

_

¸

_ points in the same direction as v, but has length 1.

Now that we understand the relationship between the inner product and length

of vectors, we will attempt to establish a connection between the inner product and

the angle between two vectors.

Do you remember the law of cosines? It states the following:

Theorem 2 If a triangle has side lengths a, b, and c, then c

2

= a

2

+b

2

−2ab cos(θ),

where θ is the angle opposite the side with length c.

Prove the law of cosines. You may want to read the lovely proof at mathproofs

1

.

You can ﬁnd a beautiful proof here

2

.

We can rephrase this in terms of vectors, since geometrically if v and w are

vectors, the third side of the triangle is the vector w −v.

Theorem 3 For any two vectors v, w ∈ R

n

, [w−v[

2

= [w[

2

+[v[

2

−2[v[[w[ cos(θ),

where θ is the angle between v and w.

(For you sticklers, this is really being taken as the deﬁnition of the angle between

two vectors in arbitrary dimension.)

Rewrite the theorem above by using our deﬁnition of length in terms of the dot

product. Performing some algebra you should obtain a nice expression for v w in

terms of [v[, [w[, and cos(θ).

1

http://mathproofs.blogspot.com/2006/06/law-of-cosines.html

2

http://mathproofs.blogspot.com/2006/06/law-of-cosines.html

57

20 Angles

[w −v[

2

= [v[

2

+[w[

2

−2[v[[w[ cos(θ)

¸w −v, w −v¸ = [v[

2

+[w[

2

−2[v[[w[ cos(θ)

¸w, w −v¸ −¸v, w −v¸ = [v[

2

+[w[

2

−2[v[[w[ cos(θ) by the linearity of the inner product in the ﬁrst slot

¸w, w¸ −¸w, v −¸v, w¸ +¸v, v¸ = [v[

2

+[w[

2

−2[v[[w[ cos(θ) by the linearity of the inner product in the second slot

[w[

2

−2¸v, w¸ +[v[

2

= [v[

2

+[w[

2

−2[v[[w[ cos(θ)

¸v, w¸ = [v[[w[ cos(θ)

You should have discovered the following theorem:

Theorem 4 For any two vectors v, w ∈ R

n

, v w = [v[[w[ cos(θ). In words, the

dot product of two vectors is the product of the lengths of the two vectors, times

the cosine of the angle between them.

This gives an almost totally geometric picture of the dot product: Given two

vectors v and w, [v cos(θ)[ can be viewed as the length of the projection of v onto

the line containing w. So [v[[ w[ cos(θ) is the “length of the projection of v in the

direction of w times the length of w”.

As mentioned above, this theorem is really being used to deﬁne the angle be-

tween two vectors. This is not quite rigorous: how do we even know that

v w

[v[[w[

is

even between −1 and 1, so that it could be the cosine of an angle? This is clear

from the “Euclidean Geometry” perspective, but not as clear from the “Carte-

sian Geometry” perspective. To make sure that everything is okay, we prove the

“Cauchy-Schwarz” theorem which reconciles these two worlds.

58

21 Cauchy-Schwarz

The Cauchy-Schwarz inequality relates the inner product and the norm of the two

vectors.

This is the Cauchy-Schwarz inequality.

Theorem 1 [v w[ ≤ [v[[w[ for any two vectors v, w ∈ R

n

Proof If v or w is the zero vector, the result is trivial. So assume v ,=

0 and

w ,=

**0 Start by noting that ¸v −w, v −w¸ ≥ 0. Expanding this out, we have:
**

¸v, v¸ −2¸v, w¸ +¸w, w¸ ≥ 0

2¸v, w¸ ≤ ¸v, v¸ +¸w, w¸

Now, if v and w are unit vectors, this says that

2¸v, w¸ ≤ 2

¸v, w¸ ≤ 1

Now to prove the result for any pair of nonzero vectors, simply scale them to

make them unit vectors:

¸

1

[v[

v,

1

[ w[

w¸ ≤ 1

¸v, w¸ ≤ [v[[w[

**We are not quite done with the proof, because we have not proven that v
**

w ≥ −[v[[w[. Following the same basic outline, try to prove the other half of this

inequality below. Start by noting that ¸v + w, v + w¸ ≥ 0. Expanding this out, we

have:

¸v, v¸ + 2¸v, w¸ +¸w, w¸ ≥ 0

2¸v, w¸ ≥ −¸v, v¸ +−¸w, w¸

Now, if v and w are unit vectors, this says that

2¸v, w¸ ≥ −2

¸v, w¸ ≥ −1

Now to prove the result for any pair of nonzero vectors, simply scale them to

make them unit vectors:

¸

1

[v[

v,

1

[ w[

w¸ ≥ −1

¸v, w¸ ≤ −[v[[w[

In the next question, we ask you to ﬁll in the details of an alternative proof

which, while a little harder than the one above, is at least as beautiful.

59

21 Cauchy-Schwarz

Question 2 Start by noting that ¸v−w, v−w¸ ≥ 0. Expanding this out, we have:

¸v, v¸ −2¸v, w¸ +¸w, w¸ ≥ 0

2¸v, w¸ ≤ ¸v, v¸ +¸w, w¸

Now notice that the left hand side is unaﬀected by scaling v by a scalar λ and

w by

1

λ

, but the right hand side is! This allows us to breathe new life into the

inequality: we know that for every scalar λ ∈ (0, ∞)

¸v, w¸ ≤ λ

2

[v[

2

+

1

λ

2

[w[

2

This is somewhat miraculous: we have a stronger inequality than the one we

started with “for free.”

This new inequality is strongest when the right hand side (RHS) is minimized.

As it stands the RHS is just a function of one real variable λ.

Solution

Hint: We can minimize the right hand side using single variable calculus.

Hint: Let f(λ) = λ

2

[v[

2

+

1

λ

2

[w[

2

.

Then f

(λ) = 2λ[v[

2

−2

[w[

2

λ

3

The minimum must occur where f

vanishes

Hint:

f

(λ) = 0

2λ[v[

2

−2

[w[

2

λ

3

= 0

λ

4

[v[

2

= [w[

2

λ =

_

[w[

[v[

Hint: You can type [w[ by writing abs(w).

The value of λ which minimizes the left hand side is sqrt(abs(w)/abs(v))

Conclude that the Cauchy-Schwarz theorem is true!

Credit for this beautiful line of reasoning goes to Terry Tao at this blog post

1

.

Question 3 Solution

Hint: We know that v w = [v[[ w[ cos θ

1

https://terrytao.wordpress.com/2007/09/05/amplification-arbitrage-and-the-tensor-power-trick/

60

21 Cauchy-Schwarz

Hint:

_

_

2

3

1

_

_

_

_

1

1

1

_

_

= 2(1) + 3(1) + 1(1) = 6

Hint: [v[ =

√

v v =

√

14

Hint: [ w[ =

√

w w =

√

3

Hint: Thus, 6 =

√

14

√

3 cos(θ)

Hint: Therefore, θ = arccos(

6

√

42

)

The angle between the vectors v =

_

_

2

3

1

_

_

and w =

_

_

1

1

1

_

_

is arccos(6/(sqrt(14)*sqrt(3)))

This problem probably would have stumped you before you started this activity!

Question 4 Find a vector which is perpendicular to w =

_

_

2

3

1

_

_

.

Solution

Hint: For v to be perpendicular to

_

(

_

2, 3, 1), we would need that the angle between

v and w is

π

2

(or

−π

2

). In either case v w = [v[[ w[ cos(

±π

2

) = 0 So we need to ﬁnd a

vector for which v w = 0

Hint: Let v =

_

_

x

y

z

_

_

. Then

v w = 0

_

_

x

y

z

_

_

_

_

2

3

1

_

_

= 0

2x + 3y + z = 0

Hint: There are a whole lot of choices for x, y, and z that ﬁt these criteria (In fact

there is an entire plane of vectors perpendicular to w)

Hint:

_

_

0

1

−3

_

_

works for instance.

61

21 Cauchy-Schwarz

Question 5 Find a vector u which is perpendicular to both v =

_

_

2

3

1

_

_

and w =

_

_

5

9

2

_

_

Solution

Hint: We need both u v = 0 and u w = 0

Hint: Letting u =

_

_

x

y

z

_

_

, we have the conditions

_

2x + 3y + z = 0

5x + 9y + 2z = 0

Hint:

_

4x + 6y + 2z = 0

5x + 9y + 2z = 0

_

x + 3y = 0

5x + 9y + 2z = 0

Hint: Picking whatever you like for x, you should be able to ﬁnd the other values

now. Try x = 3.

Hint:

_

_

3

−1

3

_

_

works.

Prove the “Triangle inequality”: For any two vectors v, w ∈ R

n

, [v + w[ ≤

[v[ +[ w[. Draw a picture. Why is this called the triangle inequality?

The inequality is equivalent to [v + w[

2

≤ [[v[ +[ w[[

2

, which is easier to handle

because it does not involve square roots.

[v + w[

2

= ¸v + w, v + w¸

= [v[

2

+ 2¸v, w¸ +[w[

2

≤ [v[

2

+ 2[v[[w[ +[w[

2

by the Cauchy-Schwarz inequality

= ([v[ +[w[)

2

62

22 Multiplying matrices using dot

products

There is a quick way to multiply matrices using dot products

Question 1 Let M =

_

_

2 3

4 5

1 2

_

_

, and e

2

=

_

_

0

1

0

_

_

.

Solution

Hint:

e

2

M =

_

_

0

1

0

_

_

_

_

2 3

4 5

1 2

_

_

=

_

4 5

_

e

2

M=

Did you notice how multiplying by e

2

on the right selected the 2

nd

row of M?

Prove that if M is an mn matrix and e

j

∈ R

m

is the j

th

standard basis vector

of R

m

, then e

j

M is the j

th

row of M. We know that w = e

j

M is a covector

(row) just by looking at dimensions. What is the i

th

entry of this row? Well, we

can only ﬁgure that out by applying the map to the basis vectors. e

j

M e

i

is the

dot product of e

j

with the i

th

column of M. But that just selects the j

th

element

of that column. So the i

th

element of w is the j

t

h element of the i

th

column of M.

This just says that w is the j

t

h column of M. (Whew.)

Now we can use this observation to great eﬀect. If M is an m n matrix, e

j

is the standard basis of R

m

and

b

k

is the standard basis of R

n

, then we can select

M

j,k

by performing the operation e

j

M

b

k

. This is so important we will label it as

a theorem:

Theorem 2 If M is an m n matrix, e

j

is the standard basis of R

m

and

b

k

is

the standard basis of R

n

, then M

j,k

= e

j

M

b

k

.

Proof The proof is simply that M

b

k

is by deﬁnition the k

th

column of the

matrix, and by our observation above e

j

M

b

k

must be the j

th

row of that column

vector, which consists of the single number M

i,j

Question 3 Let M =

_

4 1 −2

3 1 0

_

.

Solution

Hint: By the above theorem, it will be the entry in the 2

nd

row and the 1

st

column

of M

63

22 Multiplying matrices using dot products

Hint:

_

0 1

_

M

_

_

1

0

0

_

_

= 3

_

0 1

_

M

_

_

1

0

0

_

_

=3

The philosophical import of this theorem is that we can probe the inner structure

of any matrix with simple row and column vectors to ﬁnd out every component of

the matrix. What happens when we apply this insight to a product of matrices?

Question 4 Let A =

_

_

−1 1

2 2

3 0

_

_

and B = [ ]. Let C = AB.

Solution

Hint: By the theorem above, C2,3 =

_

0 1 0

_

C

_

_

_

_

0

0

1

0

_

¸

¸

_

Hint: So C2,3 =

_

0 1 0

_

AB

_

_

_

_

0

0

1

0

_

¸

¸

_

Hint: But

_

0 1 0

_

A is the 2

nd

row of A, and B

_

_

_

_

0

0

1

0

_

¸

¸

_

is the 3

rd

column of B

Hint: So

_

0 1 0

_

A =

_

2 2

_

and B

_

_

_

_

0

0

1

0

_

¸

¸

_

=

_

1

9

_

Hint: Thus C2,3 =

_

2 2

_

_

1

9

_

= 2(1) + 2(9) = 20

Without computing the whole matrix C, can you ﬁnd

C2,3 = 20

Wow! So it looks like we can ﬁnd the entries of a product of two matrices just

by looking at the dot product of rows of the ﬁrst matrix with columns of the second

matrix!

64

22 Multiplying matrices using dot products

Theorem 5 Let A and B be composable matrices. Let C = AB. Then C

i,j

is

the product of the i

th

row of A with the j

th

column of B

Prove this theorem We can prove this by combining the other two theorems

in this section. C

i,j

= e

i

Ce

j

by the second theorem. But C = AB, so we have

C

i,j

= e

i

ABe

j

. By the ﬁrst theorem e

i

A is the i

th

row of A, and by our deﬁnition

of matrix multiplication, Be

j

is the j

th

column of B. So C

i,j

is the product of the

i

th

row of A with the k

th

column of B.

Now try multiplying some matrices of your choosing using this method. This

is likely the deﬁnition of matrix multiplication you learned in high school (or the

same thing deﬁned by some messy formula with a

**). Do you prefer this method?
**

Or do you prefer whatever method you came up with on your own earlier? Maybe

they are the same!

Another note: it is interesting that we are feeding two vectors e

i

and e

j

into

the matrix and getting out a number somehow. In week 4 we will learn that we are

treading in deep water here: this is the very tip of the iceberg of bilinear forms,

which are a kind of 2-tensor.

65

23 Limits

Limits are the diﬀerence between analysis and algebra

Limits are the backbone of calculus. Multivariable calculus is no diﬀerent. In this

section we will deal with limits on an intuitive level.

We will postpone the rigorous -δ analysis to the next section.

Deﬁnition 1 Let f : R

n

→R

m

and let p ∈ R

n

. We say that

lim

x→p

f(x) = L

for some L ∈ R

m

if as x “gets arbitrarily close to ” p, the points f(x) “get arbitrarily

close to L”.

Deﬁnition 2 A function f : R

n

→R

m

is said to be continuous at a point p ∈ R

n

if lim

x→p

f(x) = f(p)

Most functions deﬁned by formulas are continuous where they are deﬁned. For

example, the function f(x, y) = (cos(xy + y

2

), e

sin(x)+y

+ y

2

) is continuous be-

cause each component function is a string of composites of continuous functions.

f(x, y) = (xy, cos(x)/(x+y)) is continuous everywhere it is deﬁned (it is not deﬁned

on the line y = −x, because the denominator of the second component function

vanishes there). This is basically because all of the functions we have names for like

cos(x), sin(x), e

x

, polynomials, rational functions, are all continuous, so if you can

write down a function as a “single formula” it is probably continuous. The prob-

lematic points are basically just zeros of denominators, like our example above.

Piecewise deﬁned functions can also be problematic:

Argue intuitively that the function f : R

2

→R deﬁned by f(x, y) =

_

0 if x < y

1 if x ≥ y

is continuous at every point oﬀ the line y = x, and is discontinuous at every

point on the line y = x For any point p which is not on the line y = x, there is a

little neighborhood of p where f is the constant function 0, which is known to be

continuous. So f is continuous at p. For any point p on the line y = x, we get a

diﬀerent limit if we approach p along the line y = x (we get 1), versus approaching

through points not on the line y = x (we get 0).

Question 3 Solution

Hint: Since xcos(π(x+y)) +sin(

πy

4

) is continuous, we can just evaluate the function

at (1, 2).

Hint: So lim

(x,y)→(1,2)

xcos(π(x+y))+sin(

πy

4

) = 1 cos(π(1+2))+sin(

π2

4

) = −1+1 = 0

lim

(x,y)→(1,2)

xcos(π(x + y)) + sin(

πy

4

) = 0

66

23 Limits

If we are confronted with a limit like lim

(x,y)→(0,0)

x

2

+ xy

x + y

, this is actually a little

bit interesting. The function is not continuous at 0, because it is not even deﬁned

at 0. What is more, the numerator and denominator are both approaching 0,

which each ”pull” the limit in opposite directions. (Dividing by smaller and smaller

numbers would tend to make the value larger and larger, while multiplying by

smaller and smaller numbers has the opposite eﬀect) There are essentially two

ways to work with this:

• show that it does not have a limit by ﬁnding two diﬀerent ways of approaching

(0, 0) which give diﬀerent limiting values, or

• show that it does have a limit by rewriting the expression algebraically as a

continuous function, and just plug in to get the value of the limit.

Question 4 Consider lim

(x,y)→(0,0)

x

2

+ xy

x + y

.

Solution

Hint: This limit does exist, because it can be rewritten as a continuous function.

Do you think the limit exists?

(a) Yes

(b) No

Solution

Hint: lim

(x,y)→(0,0)

x

2

+ xy

x + y

= lim

(x,y)→(0,0)

x(x + y)

(x + y)

Hint: lim

(x,y)→(0,0)

x(x + y)

(x + y)

= lim

(x,y)→(0,0)

x = 0

lim

(x,y)→(0,0)

x

2

+ xy

x + y

=0

Question 5 Consider lim

(x,y)→(3,3)

x

2

−9

xy −3y

.

Solution

Hint: This limit does exist, because it can be rewritten as a continuous function.

Do you think the limit exists?

(a) Yes

(b) No

Solution

Hint: lim

(x,y)→(3,3)

x

2

−9

xy −3y

= lim

(x,y)→(3,3)

(x −3)(x + 3)

y(x −3)

67

23 Limits

Hint: lim

(x,y)→(3,3)

(x −3)(x + 3)

y(x −3)

= lim

(x,y)→(3,3)

x + 3

y

=

3 + 3

3

= 2

lim

(x,y)→(3,3)

x

2

−9

xy −3y

=2

Question 6 Let f : R

2

→R

2

be deﬁned by f(x, y) = (

x

2

y −4y

x −2

, xy)

Solution

Hint: We can consider the limit component by component

Hint:

lim

(x,y)→(2,2)

x

2

y −4y

x −2

= lim

(x,y)→(2,2)

(x −2)(x + 2)y

x −2

= lim

(x,y)→(2,2)

y(x + 2)

= 2(2 + 2)

= 8

Hint: lim

(x,y)→(2,2)

xy = 2(2) = 4, since xy is continuous.

Hint: Format your answer as

_

8

4

_

Writing your answer as a vertical vector, what is lim

(x,y)→(2,2)

f(x, y)?

Question 7 Consider lim

(x,y)→(0,0)

x

y

Solution

Hint: Think about approaching (0, 0) along the line x = 0 ﬁrst, and then along the

line x = y

Hint: If we look at lim

(0,y)→(0,0)

0

y

, this is just the limit of the constant 0 function. So

the function approaches the limit 0 along the line x = 0

Hint: If we look at lim

(t,t)→(0,0)

t

t

, this is just the limit of the constant 1 function. So

the function approaches the limit 1 along the line y = x

Hint: So the limit does not exist.

Do you think the limit exists?

68

23 Limits

(a) Yes

(b) No

The last example showcased how you could show that a limit does not exist by

ﬁnding two diﬀerent paths along which you approach diﬀerent limiting values.

Let’s try another example of that form

Question 8 Solution

Hint: On the line y = kx, we have f(x, y) = f(x, kx) =

x + kx + x

2

x −kx

=

1 + k + x

1 −k

.

Hint: So we have lim

x→0

1 + k + x

1 −k

=

1 + k

1 −k

The limit of f : R

→ R deﬁned by f(x, y) =

x + y + x

2

x −y

as (x, y) → (0, 0) along the line

y = kx is (1 + k)/(1 −k)

The last two questions may have given you the idea that if a limit does not exist,

it must be because you get a diﬀerent value by approaching along two diﬀerent lines.

This is not always the case. Consider the function

f(x, y) =

_

1 if y = x

2

0 if y ,= x

2

Through any line containing the origin, f approaches 0 as points get closer

and closer to (0, 0), but as points approach (0, 0) along the parabola y = x

2

, f

approaches 1. So the limit lim

(x,y)→(0,0)

f(x, y) does not exist, even though the limit

along each line does.

Here is a more ”natural” example of such a phenomenon (deﬁned by a single

formula, not a piecewise deﬁned function):

f(x, y) =

x

2

y

x

4

+ y

2

Along each line y = kx, we have f(x, y) =

kx

3

x

4

+ k

2

x

2

, so lim

x→0

kx

3

x

4

+ k

2

x

2

=

lim

x→0

k

x + k

2

x

−1

= 0. On the other hand, along the parabola y = x

2

, we have

f(x, y) =

x

4

2x

4

=

1

2

where the limit is

1

2

. So even though the limit along all lines

through the origin is 0, the limit does not exist.

69

24 The formal deﬁnition of the limit

Limits are deﬁned by formalizing the notion of closeness.

This optional section explores limits from a formal and rigorous point of view. The

level of mathematical maturity required to get through this section is much higher

than others. If you get through it and understand everything, you can consider

yourself “hardcore.”

Deﬁnition 1 Let U ⊂ R

n

. The closure of U, written U is deﬁned to be the set

of all p ∈ R

n

such that every solid ball centered at p contains at least one point of

U.

Symbolically,

U = ¦p ∈ R

n

: for all r > 0 there exists x ∈ U so that [x −p[ < r¦

Prove that U ⊂ U for any subset U of R

n

.

Let p ∈ U. Then for every r > 0, p is an element of U whose distance to p is

less than r. In other words, since every solid ball centered at p must contain p,

and p is in U, then p must be in the closure of U. So p ∈ U.

Prove that the closure of the open unit ball is the closed unit ball. That is,

show that if U = ¦x : [x[ < 1¦, then U = ¦x : [x[ ≤ 1¦.

Let B = ¦x : [x[ ≤ 1¦. We need to see that U = B.

It is easy to see that B ⊂ U, since for each point p ∈ B, either p ∈ U (in

which case it is in the closure), or [p[ = 1. In this case, for every r > 0, the point

q = p −

1

2

rp is in U and satisﬁes [p −q[ < r.

On the other hand, if [p[ > 1, then a solid ball of radius

[p[ −1

2

centered at p

will not intersect U. So we are done.

Deﬁnition 2 Let f : U → V with U ⊂ R

n

, V ⊂ R

m

and p ∈ U. We say that

lim

x→p

f(x) = L if for every > 0 we can ﬁnd a δ > 0 so that if 0 < [x −p[ < δ and

x ∈ U, then [f(x) −L[ < .

Deﬁnition 3 Let f : U → V with U ⊂ R

n

and V ⊂ R

m

. We say that f is

continuous at p ∈ U if lim

x→p

f(x) = f(p).

Prove, using the -δ deﬁnition of the limit, that f : R

2

→R deﬁned by f(x, y) =

xy is continuous everywhere.

Let p = (a, b). Let > 0 be given. Without loss of generality, assume a, b ≥ 0.

We work ”backwards”:

70

24 The formal deﬁnition of the limit

[xy −ab[ <

⇐= [[(x −a) + a][(y −b) + b] −ab[ <

⇐= [(x −a)(y −b) + a(y −b) + b(x −a)[ <

⇐= [x −a[[y −b[ + a[y −b[ + a[x −a[ < by the triangle inequality

⇐=

_

¸

¸

¸

_

¸

¸

¸

_

[x −a[[y −b[ <

3

a[y −b[ <

3

b[x −a[ <

3

Now it is easy to arrange that a[y −a[ and b[x −a[ are less than

3

. If a = 0, or

b = 0, you do not have to do anything to get that condition satisﬁed, but otherwise

[x − a[ ≤

3b

is implied by [(x, y) − (a, b)[ ≤

3b

√

2

and [y − a[ ≤

3a

is implied by

[(x, y) −(a, b)[ ≤

3a

√

2

.

[x − a[[y − b[ <

3

is implied by [(x, y) − (a, b)[ ≤

√

√

3

. So if we let δ =

min(

3b

√

2

,

3a

√

2

,

√

√

3

) we are done.

Of course, this fact—that (x, y) →xy is continuous—is something you probably

believe intuitively. “Wiggling two numbers by a little bit doesn’t aﬀect their product

by very much.” Making that intuition precise obviously took some work in this

activity.

71

25 Single variable derivative, redux

The derivative is the slope of the best linear approximation.

Our goal is to deﬁne the derivative of a multivariable function, but ﬁrst we will

recast the derivative of a single variable function in a manner which is ripe for

generalization.

The derivative of a function f : R → R at a point x = a is the “instantaneous

rate of change” of f(x) with respect to x. In other words,

f(a + ∆x) ≈ f(a) + f

(a)∆x.

This is really the essential thing to understand about the derivative.

Question 1 Let f be a function with f(3) = 2, and f

(3) = 5.

Solution

Hint: f(3.01) ≈ f(3) + f

(3)(0.01)

Hint: ≈ 2 + 5(0.01)

Hint: ≈ 2.05

Then f(3.01) ≈ 2.05

Question 2 Let f be a function with f(4) = 2 and f(4.2) = 2.6.

Solution

Hint: f(4.2) ≈ f(4) + f

(4)(0.2)

Hint: 2.6 ≈ 2 + f

(4)(0.2)

Hint: f

(4) ≈

2.6 −2

0.2

Hint: f

(4) ≈ 3

Then f

(4) ≈ 3

We have not made precise what we mean by the approximate sign. After all, if

∆x is small enough and f is continuous, f(a +∆x) will be close to f(a), but we do

not want to say that the derivative is always zero. We will make the ≈ sign precise

by asking that the diﬀerence between the actual value and the estimated value goes

to zero faster than ∆x goes to zero.

72

25 Single variable derivative, redux

Deﬁnition 3 Let f : R → R be a function, and let a ∈ R. f is said to be

diﬀerentiable at x = a if there is a number m such that

f(a + ∆x) = f(a) + m∆x + Error

a

(∆x)

with

lim

∆x→0

[Error

a

(∆x)[

[∆x[

= 0

.

If f is diﬀerentiable at a, there is only one such number m, which we call the

derivative of f at a.

Verbally, m is the number which makes the error between the function value

f(a + ∆x) and the linear approximation f(a) + m∆x go to zero “faster than ∆x”

does.

This deﬁnition looks more complicated than the usual deﬁnition (and it is!), but

it has the advantage that it will generalize directly to the derivative of a multivari-

able function.

Conﬁrm that for f(x) = x

2

, we have f

**(2) = 4 using our deﬁnition of the
**

derivative.

f(2 + ∆x) = (2 + ∆x)

2

= 2

2

+ 2(2)∆x + (∆x)

2

So we have f(2 + ∆x) = f(2) + 4∆x + Error(∆(x)), where Error(∆x) = (∆x)

2

lim

∆x→0

Error(∆x)

∆x

= lim

∆x→0

(∆x)

2

∆x

= lim

∆x→0

(∆x)

= 0

Thus,

f

**(2) = 2(2) = 4, according to our new deﬁnition!
**

Show the equivalence of our deﬁnition of the derivative with the “usual” deﬁni-

tion. That is, show that the number m in our deﬁnition satisﬁes m = lim

∆x→0

f(a + ∆x) −f(a)

∆x

.

This also shows the uniqueness of m.

Let f be diﬀerentiable (in the sense above) at x = a, with derivative m. Then

lim

∆x→0

[Error

a

(∆x)[

[∆x[

= 0

where Error

a

(∆x) is deﬁned by f(a + ∆x) = f(a) + m∆x + Error

a

(∆x), i.e.

Error

a

(∆x) = f(a + ∆x) −f(a) −m∆x.

So

73

25 Single variable derivative, redux

lim

∆x→0

[f(a + ∆x) −f(a) −m∆x[

[∆x[

= 0

lim

∆x→0

¸

¸

¸

¸

f(x + ∆x) −f(a)

∆x

−m

¸

¸

¸

¸

= 0

But this implies that

m = lim

∆x→0

f(a + ∆x) −f(a)

∆x

So our deﬁnition of the derivative agrees with the “usual” deﬁnition

74

26 Multivariable derivatives

We introduce the derivative.

The derivative in multiple variables requires a bit more machinery.

1

1

YouTube link: http://www.youtube.com/watch?v=LuDlwFeAv-I

75

27 Intuitively

The derivative is the linear map which best approximates changes in a function near

a point.

The single variable derivative allows us to ﬁnd the best linear approximation to a

function at a point. In several variables we will deﬁne the derivative to be a linear

approximation which approximates the change in the values of a function. In this

section we will explore what the multivariable derivative is from an intuitive point

of view, without making anything too formal.

We give the following wishy-washy “deﬁnition”:

Deﬁnition 1 Let f : R

n

→ R

m

be a function. Then the derivative of f at a

point p ∈ R

n

is the linear map D(f)

¸

¸

p

: R

n

→ R

m

which allows the following

approximation property:

f(p +

h) ≈ f(p) + D(f)

¸

¸

p

(

h)

We will make the sense in which this approximation holds precise in the next

section.

Note: we also call the matrix of the derivative the Jacobian Matrix in honor

of the mathematician Carl Gustav Jacob Jacobi

1

.

Question 2 Let f : R

2

→ R

3

be a function, and suppose f(2, 3) = (4, 8, 9).

Suppose that the matrix of D(f)

¸

¸

(2,3)

is

_

_

−1 3

4 5

2 −3

_

_

.

Solution

Hint: By the deﬁning property of derivatives,

f(2.01, 3.04) ≈ f(2, 3) + D(f)

¸

¸

(2,3)

__

0.01

0.04

__

Hint: =

_

_

4

8

9

_

_

+

_

_

−1 3

4 5

2 −3

_

_

_

0.01

0.04

_

Hint: =

_

_

4

8

9

_

_

+

_

_

0.01(−1) + 0.04(3)

0.01(4) + 0.04(5)

0.01(2) + 0.04(−3)

_

_

Hint: =

_

_

4

8

9

_

_

+

_

_

0.11

0.24

−0.1

_

_

1

http://en.wikipedia.org/wiki/Carl_Gustav_Jacob_Jacobi

76

27 Intuitively

Hint: =

_

_

4.11

8.24

8.9

_

_

Approximate f(2.01, 3.04), giving your answer as a column matrix.

Question 3 Let f : R

2

→R be a function with f(1, 2) = 3, f(1.01, 2) = 3.04 and

f(1, 2.002) = 3.002.

Solution

Hint: Since f : R

2

→R, D(f)

¸

¸

(1,2)

: R

2

→R, so the matrix of the derivative is a row

of length 2.

Hint: To ﬁnd the matrix, we need to see how D(f)

¸

¸

(1,2)

acts on

_

1

0

_

and

_

0

1

_

Hint: f(1.01, 2) ≈ f(1, 2) + D(f)

¸

¸

(1,2)

__

0.01

0

__

by the fundamental property of the

derivative

Hint:

3.04 ≈ 3 + D(f)

¸

¸

(1,2)

__

0.01

0

__

0.04 ≈ 0.01D(f)

¸

¸

(1,2)

__

1

0

__

by the linearity of the derivative

D(f)

¸

¸

(1,2)

__

1

0

__

≈ 4

Hint:

f(1, 2.002) ≈ f(1, 2) + D(f)

¸

¸

(1,2)

__

0

0.02

__

3.002 ≈ 3 + D(f)

¸

¸

(1,2)

__

0

0.02

__

0.002 ≈ 0.002D(f)

¸

¸

(1,2)

__

0

1

__

by the linearity of the derivative

D(f)

¸

¸

(1,2)

__

0

1

__

≈ 1

Hint: Thus the matrix of D(f)

¸

¸

(1,2)

is

_

4 1

_

What is Jacobian matrix of f at (1, 2)?

Solution

77

27 Intuitively

Hint: f(0.9, 2.03) ≈ f(1, 2) + D(f)

¸

¸

(1,2)

__

−0.1

0.03

__

Hint: = 3 +

_

4 1

_

_

−0.1

0.03

_

Hint: = 3 + 4(−0.1) + 1(0.03)

= 2.63

Using your approximation of the Jacobian matrix, f(0.9, 2.03) ≈ 2.63

This problem shows that if a function has a derivative, then only knowing how

it changes in the coordinate directions lets you determine how it changes in any

direction. This is so important it is worth driving it home: we only started with

information about how f(1.01, 2) and f(1, 2.02) compared to f(1, 2), but because

this function had a derivative, we could obtain the approximate value of the function

at any near by point by exploiting linearity. This is powerful.

Prepare yourself : the following two paragraphs are going to be very diﬃcult

to digest.

So far we have only talked about the derivative of a function at a point. The

derivative of a function is actually a function which assigns a linear map to each

point in the domain of the original function. So the derivative is a function which

takes (functions from R

n

→ R

m

) and returns a function which takes ( points in

R

n

) and returns ( linear maps from R

n

→R

m

). This level of abstraction is why we

wanted you to get comfortable with “higher-order functions” earlier. We are not

as crazy as we seem.

As an example, if f : R

2

→R

2

is the function deﬁned by f(x, y) = (x

2

y, y +x),

then it will turn out that at any point (a, b), the derivative Df

¸

¸

(a,b)

will be the

linear map from R

2

to R

2

given by the matrix

_

2ab a

2

1 1

_

(we do not know why yet,

but this is true). So Df is really a function which takes a point (a, b) and spits out

the linear map with matrix

_

2ab a

2

1 1

_

. So what about just plain old D? D takes

a function (f) and returns the function Df which takes a point (a, b) and returns

the linear map whose matrix is

_

2ab a

2

1 1

_

. Letting L(A, B) stand for all the linear

functions from A →B and Func(A, B) be the set of all functions from A →B, we

could write D : Func(R

n

, R

m

) →Func(R

n

, L(R

n

, R

m

)).

Please do not give up on the course after the last two paragraphs! Everything

is going to be okay. Hopefully you will be able to slowly digest these statements

throughout the course. Not understanding them now will not hold you back.

Question 4 Let f be a function which satisﬁes Df

¸

¸

(x,y)

=

_

_

3x

2

y

2

2x

3

y

2x 2y

ye

xy

xe

xy

_

_

.

Solution

78

27 Intuitively

Hint: D(f)

¸

¸

(1,2)

=

_

_

3(1)

2

(2)

2

2(1)

3

(2)

2(1) 2(2)

2e

(1)(2)

1e

(1)(2)

_

_

Hint: f

_

(1, 2) +

_

0.01

−0.02

__

≈ f(1, 2) +

_

_

12 4

2 4

2e

2

e

2

_

_

_

0.01

−0.02

_

Hint: f(1.01, 1.99) ≈ (2, 3, 1) +

_

_

12(0.01) + 4(−0.02)

2(0.01) + 4(−0.02)

2e

2

(0.01) + e

2

(−0.02)

_

_

Hint: f(1.01, 1.99) ≈ (2.04, 2.94, 1)

Hint: Format this as

_

_

2.04

2.94

1

_

_

Given that f(1, 2) = (2, 3, 1). Approximate f(1.01, 1.98).

79

28 Rigorously

The derivative approximates the changes in a function to ﬁrst order accuracy

We are now ready to deﬁne the derivative rigorously. Mimicking our development

of the single variable derivative, we deﬁne:

Deﬁnition 1 Let f : R

n

→ R

m

be a function, and let p ∈ R

n

. f is said to be

diﬀerentiable at p if there is a linear map M : R

n

→R

m

such that

f(p +

h) = f(p) + M(

h) + Error

p

(

h)

with

lim

h→0

¸

¸

¸Error

p

(

h)

¸

¸

¸

¸

¸

¸

h

¸

¸

¸

= 0

.

If f is diﬀerentiable at p, there is only one such linear map M, which we call

the (total) derivative of f at p.

Verbally, M is the linear function which makes the error between the function

value f(p +

h) and the aﬃne approximation f(a) + M(

h) go to zero ”faster than

h” does.

This deﬁnition is great, but it doesn’t tell us how to actually compute the

derivative of a diﬀerentiable function! Lets dig a little deeper:

Example 2 Let f : R

→R

be deﬁned by f

__

x

y

__

=

_

f

1

(x, y)

f

2

(x, y)

_

. Assuming f

is diﬀerentiable at the point (1, 2), lets try to compute the derivative there. Let M

be the derivative of f at (1, 2). Then

lim

h→0

¸

¸

¸

¸

f((1, 2) + h

_

1

0

_

) −f(

_

1

2

_

) −M(h

_

1

0

_

)

¸

¸

¸

¸

¸

¸

¸

¸

h

_

1

0

_¸

¸

¸

¸

= 0

lim

h→0

¸

¸

¸

¸

¸

¸

¸

¸

f(1 + h, 2) −f(1, 2) −hM(

_

1

0

_

)

h

¸

¸

¸

¸

¸

¸

¸

¸

= 0

lim

h→0

¸

¸

¸

¸

¸

¸

¸

¸

f(

_

1 + h

2

_

) −f(

_

1

2

_

)

h

−M(

_

1

0

_

)

¸

¸

¸

¸

¸

¸

¸

¸

= 0

80

28 Rigorously

so

M(

_

1

0

_

) = lim

h→0

f(1 + h, 2) −f(1, 2)

h

M(

_

1

0

_

) = lim

h→0

_

¸

_

f

1

(1 + h, 2) −f

1

(1, 2)

h

f

2

(1 + h, 2) −f

2

(1, 2)

h

_

¸

_

But each of the remaining quantities are derivatives of one variable functions!

In particular, we have that

M(

_

1

0

_

) =

_

¸

_

d

dx

(f

1

(x, 2))

¸

¸

x=1

d

dx

(f

2

(x, 2))

¸

¸

x=1

_

¸

_. We call these kinds of quantities partial deriva-

tives because they are part of the derivative. We will learn more about partial

derivatives in the next section.

Without copying the work in the example above (if you can) try to ﬁnd M(0, 1).

lim

h→0

¸

¸

¸

¸

f((1, 2) + h

_

0

1

_

) −f(1, 2) −M(h

_

0

1

_

)

¸

¸

¸

¸

¸

¸

¸

¸

h

_

0

1

_¸

¸

¸

¸

= 0

lim

h→0

¸

¸

¸

¸

¸

¸

¸

¸

f(1, 2 + h) −f(1, 2) −hM(

_

0

1

_

)

h

¸

¸

¸

¸

¸

¸

¸

¸

= 0

lim

h→0

¸

¸

¸

¸

f(1, 2 + h) −f(1, 2)

h

−M(

_

0

1

_

)

¸

¸

¸

¸

= 0

so

M(

_

0

1

_

) = lim

h→0

f(1, 2 + h) −f(1, 2)

h

M(

_

0

1

_

) = lim

h→0

_

¸

_

f

1

(1, 2 + h) −f

1

(1, 2)

h

f

2

(1, 2 + h) −f

2

(1, 2)

h

_

¸

_

M(

_

0

1

_

) =

_

¸

_

d

dy

(f

1

(1, y))

¸

¸

y=2

d

dy

(f

2

(1, y))

¸

¸

y=2

_

¸

_

This question and the previous example show that the matrix of the derivative

of f at (1, 2) is

_

¸

_

d

dx

(f

1

(x, 2))

¸

¸

x=1

d

dy

(f

1

(1, y))

¸

¸

y=2

d

dx

(f

2

(x, 2))

¸

¸

x=1

d

dy

(f

2

(1, y))

¸

¸

y=2

_

¸

_

Question 3 Use the results of this question and the previous example to ﬁnd the

matrix of the derivative of f

__

x

y

__

=

_

x

2

+ y

2

xy

_

at the point (1, 2).

81

28 Rigorously

Solution

Hint: In this case f1(x, y) = x

2

+ y

2

and f2(x, y) = xy

Hint: By the result of the last two exercises, the matrix of the derivative is

_

_

_

d

dx

(f1(x, 2))

¸

¸

x=1

d

dy

(f1(1, y))

¸

¸

y=2

d

dx

(f2(x, 2))

¸

¸

x=1

d

dx

(f2(1, y))

¸

¸

y=2

_

¸

_

Hint:

f1(x, 2) = x

2

+ 2

2

f1(1, y) = 1

2

+ y

2

f2(x, 2) = 2x

f2(1, y) = y

Hint:

d

dx

(f1(x, 2))

¸

¸

x=1

=

d

dx

_

x

2

+ 2

2

_ ¸

¸

x=1

= 2x

¸

¸

x=1

= 2

d

dx

(f2(x, 2))

¸

¸

x=1

=

d

dx

(2x)

¸

¸

x=1

= 2

¸

¸

x=1

= 2

d

dy

(f1(1, y))

¸

¸

y=2

=

d

dy

_

1

2

+ y

2

_ ¸

¸

y=2

= 2y

¸

¸

y=2

= 4

d

dy

(f2(1, y))

¸

¸

y=2

=

d

dy

(y)

¸

¸

y=2

= 1

¸

¸

x=1

= 1

Hint: Thus the matrix of the derivative is

_

2 4

2 1

_

82

29 Partial Derivatives

The entries in the Jacobian matrix are partial derivatives

1

There is a familiar looking formula for the derivative of a diﬀerentiable function:

Theorem 1 Let f : R

n

→R

m

be a diﬀerentiable function. Then

D(f)

¸

¸

p

(v) = lim

h→0

f(p + hv) −f(p)

h

Prove this theorem

By the deﬁnition of the derivative, we have that

lim

h→0

[f(p + hv) −f(p) −Df(p)(hv)[

[hv[

= 0

1

[v[

lim

h→0

¸

¸

¸

¸

f(p + hv) −f(p) −hDf(p)(v)

h

¸

¸

¸

¸

= 0 since

1

[v[

is a constant

lim

h→0

¸

¸

¸

¸

f(p + hv) −f(p)

h

−Df(p)(v)

¸

¸

¸

¸

= 0 since

1

[v[

is a constant

We conclude that

D(f)

¸

¸

p

(v) = lim

h→0

f(p + hv) −f(p)

h

Question 2 Let f : R

2

→R

2

be deﬁned by f(x, y) = (x

2

−y

2

, 2xy).

Solution

Hint:

Df

¸

¸

(3,4)

__

−1

2

__

= lim

h→0

f

_

(3, 4) + h

_

−1

2

__

−f(3, 4)

h

= lim

h→0

f(3 −h, 4 + 2h) −f(3, 4)

h

= lim

h→0

1

h

_

(3 −h)

2

−(4 + 2h)

2

2(3 −h)(4 + 2h)

_

−

_

−7

24

_

= lim

h→0

1

h

_

(3 −h)

2

−(4 + 2h)

2

+ 7

2(3 −h)(4 + 2h) −24

_

1

YouTube link: http://www.youtube.com/watch?v=HCAb4uUZzjU

83

29 Partial Derivatives

Hint:

= lim

h→0

_

_

_

−22h −3h

2

h

4h −4h

2

h

_

¸

_

= lim

h→0

_

−22 −3h

4 −4h

_

=

_

−22

4

_

Using the theorem above, compute Df

¸

¸

(3,4)

__

−1

2

__

Since the unit directions are especially important we deﬁne:

Deﬁnition 3 Let f : R

n

→ R be a (not necessarily diﬀerentiable) function. We

deﬁne its partial derivative with respect to x

i

by

∂f

∂x

i

(p) = f

xi

(p) := lim

h→0

f(p + he

i

) −f(p)

h

In other words,

∂f

∂x

i

(p) is the instantaneous rate of change in f by moving only

in the e

i

direction.

Example 4 There is really only a good visualization of the partial derivatives

of a map f : R

2

→ R, because this is really the only type of higher dimensional

function we can eﬀectively graph.

Computing partial derivatives is no harder than computing derivatives of single

variable functions. You take a partial derivative of a function with respect to x

i

just

by treating all other variables as constants, and taking the derivative with respect

to x

i

.

Question 5 Let f : R

2

→R be deﬁned by f(x, y) = xsin(y).

Solution

Hint: We are trying to compute

∂

∂x

(xsin(y))

¸

¸

(a,b)

Hint: We just diﬀerentiate as if y were a constant, so

∂

∂x

(xsin(y))

¸

¸

(a,b)

= sin(y)

¸

¸

(a,b)

Hint: fx(a, b) = sin(b)

fx(a, b) = sin(b)

Solution

Hint: We are trying to compute

∂

∂y

(xsin(y))

¸

¸

(a,b)

84

29 Partial Derivatives

Hint: We just diﬀerentiate as if x were a constant, so

∂

∂y

(xsin(y))

¸

¸

(a,b)

= xcos(y)

¸

¸

(a,b)

Hint: fx(a, b) = a cos(b)

fy(a, b) = a(cos(b))

We have already proven the following theorem in the special case n = m = 2

in the previous activity. Proving it in the general case requires no new ideas: only

better notational bookkeeping.

Theorem 6 Let f : R

n

→R

m

be a function with component functions f

i

: R

n

→

R, for i = 1, 2, 3, ..., m. In other words, f(p) =

_

¸

¸

¸

¸

¸

_

f

1

(p)

f

2

(p)

f

3

(p)

.

.

.

f

m

(p)

_

¸

¸

¸

¸

¸

_

. If f is diﬀerentiable at

p, then its Jacobian matrix at p is

_

¸

¸

¸

¸

¸

¸

¸

¸

_

∂f

1

∂x

1

(p)

∂f

1

∂x

2

(p)

∂f

1

∂x

n

(p)

∂f

2

∂x

1

(p)

∂f

2

∂x

2

(p)

∂f

2

∂x

n

(p)

.

.

.

.

.

.

.

.

.

.

.

.

∂f

m

∂x

1

(p)

∂f

m

∂x

2

(p)

∂f

m

∂x

n

(p)

_

¸

¸

¸

¸

¸

¸

¸

¸

_

More compactly, we might write

_

∂f

i

∂x

j

(p)

_

Try to prove this theorem. Using the more compact notation will be helpful.

Follow along the proof we developed together in the last section! By the deﬁnition

of the derivative, we have

lim

h→0

[f(p + h e

i

) −f(p) −M(h e

i

)[

[h e

i

[

= 0

lim

h→0

[f(p + h e

i

) −f(p) −hM( e

i

)[

[h[

= 0

lim

h→0

¸

¸

¸

¸

f(p + h e

i

) −f(p) −hM( e

i

)

h

¸

¸

¸

¸

= 0

lim

h→0

¸

¸

¸

¸

f(p + h e

i

) −f(p)

h

−M( e

i

)

¸

¸

¸

¸

= 0

So lim

h→0

f(p + h e

i

) −f(p)

h

= M(e

i

). But for this to be true, the j

th

row of each

side must be equal, so

85

29 Partial Derivatives

lim

h→0

f

j

(p + h e

i

) −f

j

(p)

h

= M

ji

But the quantity on the left hand side is

∂f

j

∂x

i

¸

¸

p

Question 7 Let f : R

3

→R

2

be deﬁned by f(x, y, z) = (x

2

+ y + z

3

, xy + yz

2

).

Solution

Hint: The Jacobian Matrix is

_

_

_

∂f1

∂x

∂f1

∂y

∂f1

∂z

∂f2

∂x

∂f2

∂y

∂f2

∂z

_

¸

_

Hint: As an example,

∂f2

∂z

=

∂

∂z

xy +yz

2

= 2yz. Remember that we just diﬀerentiate

with respect to z, treating x and y as constants.

Hint:

∂f1

∂x

= 2x

∂f1

∂y

= 1

∂f1

∂z

= 3z

2

∂f2

∂x

= y

∂f2

∂y

= x + z

2

∂f2

∂z

= 2yz

Hint: The Jacobian matrix is

_

2x 1 3z

2

y x + z

2

2yz

_

What is the Jacobian Matrix of f? This should be a matrix valued function of x, y, z.

The formula for the derivative

Df(p)(v) = lim

h→0

f(p + hv) −f(p)

h

looks a lot more familiar than our deﬁnition. You might be asking why we didn’t

take this formula as our deﬁnition of the derivative. After all, we usually take

something that looks like this as our deﬁnition in single variable calculus.

In the following two optional exercises you will ﬁnd out why.

Find a function f : R

2

→R such that at (0, 0), the limit M(v) = lim

h→0

f((0, 0) + hv) −f(0, 0)

h

exists for every vector v ∈ R

2

, but M is not a linear map.

86

29 Partial Derivatives

Hint: Try showing that the function

f(x, y) =

_

_

_

x

3

x

2

+ y

2

if (x, y) ,= (0, 0)

0 if (x, y) = (0, 0)

has the desired properties.

Let

f(x, y) =

_

_

_

x

3

x

2

+ y

2

if (x, y) ,= (0, 0)

0 if (x, y) = (0, 0)

Then for any vector v =

_

a

b

_

, we have

M (v) = lim

h→0

f((0, 0) + hv) −f(0, 0)

h

= lim

h→0

f(ha, hb)

h

= lim

h→0

h

3

a

3

h(h

2

a

2

+ h

2

b

2

)

= lim

h→0

a

3

a

2

+ b

2

=

a

3

a

2

+ b

2

So M

__

a

b

__

=

a

3

a

2

+ b

2

. This is certainly not a linear function from R

2

→R

So this formula cannot serve as a good deﬁnition of the derivative, because it

does not have to produce linear functions. What if we require that the function is

linear as well? Even then, it is no good:

Find a function f : R

2

→R such that at (0, 0), M(v) = lim

h→0

f((0, 0) + hv) −f(0, 0)

h

exists for every vector v ∈ R

2

and the function M deﬁned this way is linear, but

nevertheless, f is not diﬀerentiable at (0, 0).

Hint: Try the function

f(x, y) =

_

1 if y = x

2

and (x, y) ,= (0, 0)

0 else

Let

f(x, y) =

_

1 if y = x

2

and (x, y) ,= (0, 0)

0 else

Let v =

_

a

b

_

. Then

87

29 Partial Derivatives

M (v) = lim

h→0

f((0, 0) + hv) −f(0, 0)

h

= lim

h→0

f(ha, hb)

h

Now, the intersection of the line t → (ta, tb) with the parabola y = x

2

happens

when tb = t

2

a

2

, i.e. when t = ±

√

b

[a[

. So as long as we choose h smaller than that,

we know that (ha, hb) is not on the parabola y = x

2

. Hence f(ha, hb) = 0 for small

enough h.

Thus M(v) = 0 for all v ∈ R

2

.

This deﬁnitely is a linear function, but f is not diﬀerentiable at (0, 0) using our

deﬁnition, since

lim

h→0

¸

¸

¸f((0, 0) +

h) −f(0, 0) −M(

h)

¸

¸

¸

[

h[

= lim

h→0

¸

¸

¸f(

h)

¸

¸

¸

[

h[

does not exist, since taking

h on the parabola y = x

2

yields the limit lim

t→0

¸

¸

¸

¸

f(t, t

2

)

t

¸

¸

¸

¸

=

lim

t→0

1

[t[

, which diverges to ∞.

88

30 The gradient

The gradient is a vector version of the derivative.

In this section, we will focus on gaining a more ”geometric” understanding of deriva-

tives of functions f : R

n

→R.

If f is such a function, the derivative Df (p) : R

n

→R is a covector. So, by the

deﬁnition of the dot product, we can reinterpret that derivative as the dot product

with the ﬁxed vector Df (p)

.

Deﬁnition 1 The gradient of a diﬀerentiable function f : R

n

→ R is deﬁned

by ∇f(p) = Df (p)

**. Equivalently, ∇f is the (unique) vector which makes the
**

following equation true for all v ∈ R

n

:

∇f(p) v = Df(p)(v)

Question 2 Solution

Hint: ∇f(x, y, z) =

_

_

_

_

_

_

∂

∂x

sin(xyz

2

)

∂

∂y

sin(xyz

2

)

∂

∂z

sin(xyz

2

)

_

¸

¸

¸

¸

_

Hint: ∇f(x, y, z) =

_

_

yz

2

cos(xyz

2

)

xz

2

cos(xyz

2

)

2xyz cos(xyz

2

)

_

_

If f : R

3

→R is deﬁned by f(x, y, z) = sin(xyz

2

), what is ∇f(x, y, z)?

We can now use what we know about the geometry of the dot product to

understand some interesting things about the derivative.

In a sentence, how does the vector

v

[v[

relate to the vector v?

v

[v[

is the unit vector which points in the same direction as v

Theorem 3 Let f : R

n

→R, and p ∈ R

n

. Let η =

∇f(p)

[∇f(p)[

If [v[ = 1, then Df(p)(v) ≤ Df(p)(η)

More geometrically, this theorem says that ∇f(p) points in the direction of

“greatest increase” for the function f. More poetically, ∇f always points “straight

up the mountain”.

Prove this theorem.

Df(p)(v) = ∇f(p) v

≤ [v[[∇f(p)[ by Cauchy-Schwarz

= [∇f(p)[ since [v[ = 1

89

30 The gradient

On the other hand,

Df

¸

¸

p

(η) = ∇f(p) η

= ∇f(p)

∇f(p)

[∇f(p)[

=

[∇f(p)[

2

[∇f(p)[

= [∇f(p)[

The inequality follows.

One of the ways that we learned to visualize functions was via contour plots.

We will see that there is a very nice relationship between contours and gradient

vectors.

Let’s start with the two dimensional case. Let f : R

2

→ R, and consider the

contour ( = ¦(x, y) ∈ R

2

: f(x, y) = c¦ for some c ∈ R.

Question 4 Let v be a tangent vector to ( at a point p ∈ (

Solution

Hint: Since v is pointing in the direction of a level curve of f, f should not be changing

as you move in the direction v

Hint: So Df(p)(v) = 0.

Df(p)(v) = 0

Note: You should be able to answer this from an intuitive point of view, but we will

not develop the formal tool to prove this (the implicit function theorem

1

) in this course.

In general, if f : R

n

→ R, and the contour ( = ¦p ∈ R

n

: f(p) = c¦ for some

c ∈ R, then for every tangent vector v to (, we will have Dfp(v) = 0. Intuitively

this is true because moving a small amount in the direction of v will not change the

value of the function much, since you are staying as close as possible to the contour

where the function is constant. Accepting this, we have the following:

Theorem 5 If f : R

n

→ R, and the contour ( = ¦p ∈ R

n

: f(p) = c¦ for some

c ∈ R, then for every tangent vector v to (, we will have ∇f(p) v = 0. In other

words, ∇f(p) is perpendicular to the contour.

Question 6 Write an equation for the tangent plane to the surface x

2

+xy+4z

2

=

1 at the point (1, 0, 0).

Solution

Hint: Our general strategy will be to ﬁnd a vector which is perpendicular to the plane.

Writing down what that means in terms of dot products should yield the equation.

1

http://en.wikipedia.org/wiki/Implicit_function_theorem

90

30 The gradient

Hint: This surface is a level surface of f(x, y, z) = x

2

+xy+4z

2

, namely f(x, y, z) = 1.

Hint: ∇f =

_

_

_

_

_

_

∂f

∂x

∂f

∂y

∂f

∂z

_

¸

¸

¸

¸

_

Hint: ∇f =

_

_

2x + y

x

8z

_

_

Hint: ∇f(1, 0, 0) =

_

_

2

1

0

_

_

Hint: For a point (x, y, z) to be in the tangent plane, we would need that (x, y, z) −

(1, 0, 0) is perpendicular to ∇f(1, 0, 0).

Hint: So we need

_

_

x −1

y

z

_

_

_

_

2

1

0

_

_

= 0

Hint: This says that the equation of the plane is 2x −2 + y = 0

2x + y −2 = 0

91

31 One forms

One forms are covector ﬁelds.

In this section we just want to introduce you to some new notation and terminol-

ogy which will be helpful to keep in mind for the next course, which will cover

multivariable integration theory.

As we observed in the last section, the derivative of a function f : R

n

→ R

assigns a covector to each point in R

n

. In particular, Df

¸

¸

p

: R

n

→ R is the

covector whose matrix is the row

_

∂f

∂x

1

¸

¸

p

∂f

∂x

2

¸

¸

p

∂f

∂x

n

¸

¸

p

_

Deﬁnition 1 A covector ﬁeld, also known as a diﬀerential 1-form, is a function

which takes points in R

n

and returns a covector on R

n

. In other words, it is a

covector valued function. We can always write any covector ﬁeld ω as

ω(x) =

_

f

1

(x) f

2

(x) f

n

(x)

¸

for n functions f

i

: R

n

→R.

The derivative of a function f : R

n

→ R is the quintessential example of a

1-form on R

n

.

Question 2 Let f : R

3

→R be the function f(x, y, z) = y.

Solution

Hint: The Jacobian of f is

_

0 1 0

_

everywhere.

What is the matrix for Df at the point (a, b, c)?

Generalizing the result of the previous question, we see that if π

i

: R

n

→R is de-

ﬁned by π

i

(x

1

, x

2

, . . . , x

n

) = x

i

, then D(π

i

) will be the row

_

0 0 0 1 0 0

¸

,

where the 1 appears in the i

th

slot.

We introduce the notation dx

i

for the covector ﬁeld D(π

i

). So we can rewrite

any covector ﬁeld ω(x) =

_

f

1

(x) f

2

(x) f

n

(x)

¸

for n functions f

i

: R

n

→R

in the form ω(x) = f

1

(x)dx

1

+ f

2

(x)dx

2

+ + f

n

(x)dx

n

.

It turns out that 1-forms, not functions, are the appropriate objects to integrate

along curves in R

n

. The sequel to this course will focus on the integration of

diﬀerential forms: we will not touch on it in this course.

92

32 Numerical integration

Integrate a covector ﬁeld.

Exercise 1 Suppose we have a one-form expressed as a Python function, e.g.,

omega (which we will often write as ω) which takes a point (expressed as a list)

and returns a 1 n matrix. For example, perhaps we have that omega([7,2,5])

is [[5,3,2]].

Let’s only consider the case n = 2, and suppose that ω is the derivative of

some mystery function f : R

2

→ R. If we have access to omega and we know that

f(3, 2) = 5, can we approximate f(4, 3)? How might we go about this?

We can take a path from (3, 2) to (4, 3), and break it up into small pieces; on

each piece, we can use the derivative to approximate how a small change to the

input will aﬀect the output. And repeat.

Do this in Python.

Solution

Python

1 # suppose the derivative of f is omega, and f(3,2) = 5.

2 # so omega([3,2]) is (perhaps) [[-4,3]].

3 #

4 # integrate(omega) is an approximation to the value of f at (4,3).

5 #

6 def integrate(omega):

7 return # the value

8

9 def validator():

10 return abs(integrate( lambda p: [[2*p[0] - p[1], -p[0] + 1]] ) - 7.0) < 0.05

How did you move from (3, 2) to (4, 3)? Did it matter which path you walked

along?

93

33 Python

We approximate derivatives in Python.

There are two diﬀerent perspectives on the derivative available to us.

Suppose f : R

n

→R

m

is a diﬀerentiable function, and we have a point p ∈ R

n

.

The ﬁrst perspective on the derivative is the total derivative, which is lin-

ear map Df(p) which sends the vector v to Df(p)(v), recording how much an

inﬁnitesimal change in the v direction in R

n

will aﬀect the output of f in R

m

.

The second perspective on the derivative is the Jacobian matrix, which is the

matrix of partials given by

_

¸

¸

¸

¸

¸

¸

¸

¸

_

∂f

1

∂x

1

(p)

∂f

1

∂x

2

(p)

∂f

1

∂x

n

(p)

∂f

2

∂x

1

(p)

∂f

2

∂x

2

(p)

∂f

2

∂x

n

(p)

.

.

.

.

.

.

.

.

.

.

.

.

∂f

m

∂x

1

(p)

∂f

m

∂x

2

(p)

∂f

m

∂x

n

(p)

_

¸

¸

¸

¸

¸

¸

¸

¸

_

.

Observation 1 The Jacobian matrix is the matrix representing the linear map

Df(p).

This observation can be “seen” with some Python code.

94

34 Derivative

Code the total derivative.

Exercise 1 Let epsilon be a small, but positive number. Suppose f : R → R

has been coded as a Python function f which takes a real number and returns a

real number. Seeing as

f

(x) = lim

h→0

f(x + h) −f(x)

h

,

can you ﬁnd a Python function which approximates f

(x)?

Given a Python function f which takes a real number and returns a real number,

we can approximate f

**(x) by using epsilon. Write a Python function derivative
**

which takes a function f and returns an approximation to its derivative.

Solution

Hint: To approximate this, use (f(x+epsilon) - f(x))/epsilon.

Python

1 epsilon = 0.0001

2 def derivative(f):

3 def df(x): return (f(blah blah) - f(blah blah)) / blah blah

4 return df

5

6

7 def validator():

8 df = derivative(lambda x: 1+x**2+x**3)

9 if abs(df(2) - 16) > 0.01:

10 return False

11 df = derivative(lambda x: (1+x)**4)

12 if abs(df(-2.642) - -17.708405152) > 0.01:

13 return False

14 return True

Great work! Now let’s do this in a multivariable setting.

A function f : R

n

→ R

m

should be stored as a Python function which takes a

list with n entries and returns a list with m entries.

Solution Implement f(x, y, z) = (xy, x + z) as a Python function.

Hint: You can get away with

def f(v):

return [v[0]*v[1],v[0] + v[2]]

Python

1 def f(v):

2 x = v[0]

3 y = v[1]

4 z = v[2]

5 return # such and such

95

34 Derivative

6

7 def validator():

8 if f([3,2,7])[0] != 6:

9 return False

10 if f([3,2,7])[1] != 10:

11 return False

12 return True

Now we provide you with a function add vector which takes two vectors v and

w and returns v + w, and a function scale vector which takes a scalar c and a

vector v and returns the vector cv. Finally, vector length(v) computes the length

of the vector v.

Given all of this preamble, write a function D which takes a Python function

f : R

n

→ R

m

and returns the function Df : R

n

→ Lin(R

n

, R

m

) which takes a

point p ∈ R

n

and returns (an approximation to) the linear map Df(p) : R

n

→R

m

.

Solution

Hint: def D(ff):

def Df(p):

f = ff # band-aid over a Python interpreter bug

def L(v):

return scale_vector( 1/(epsilon),

add_vector( f(add_vector(p, scale_vector(epsilon, v))),

scale_vector(-1, f(p)) ) )

return L

return Df

Python

1 epsilon = 0.0001

2 n = 3

3 m = 2

4 def add_vector(v,w):

5 return [sum(v) for v in zip(v,w)]

6 def scale_vector(c,v):

7 return [c*x for x in v]

8 def vector_length(v):

9 return sum([x**2 for x in v])**0.5

10 def D(f):

11 def Df(p):

12 f = f # band-aid over a Python interpreter bug

13 def L(v):

14 # Try "f(p + blah blah) - f(p)" and so on...

15 return L(v) where L = Df(p)

16 return L

17 return Df

18

19

20 def validator():

21 # f(x,y,z) = (3*x^2 + 2*x*y*z, x*y^3*z^2)

22 Df = D(lambda v: [3*v[0]*v[0] + 2*v[0]*v[1]*v[2], v[0]*(v[1]**3)*(v[2]**2)])

23 Dfp = Df([2,3,4])

24 Dfpv = Dfp([3,2,1])

96

34 Derivative

25 if abs(Dfpv[0] - 152) > 1.0:

26 return False

27 if abs(Dfpv[1] - 3456) > 10.0:

28 return False

29 return True

Note that Df(p) is a linear map, so we can represent that linear map as a matrix.

We do so in the next activity.

97

35 Jacobian matrix

Code the Jacobian matrix.

In the previous activity, we wrote some code to compute D. Armed with this, we

can take a function f : R

n

→ R

m

and a point p ∈ R

n

and compute Df(p), the

linear map which describes, inﬁnitesimally, how wiggling the input to f will aﬀect

its output.

Assuming f is diﬀerentiable, we have that Df(p) is a linear map, so we can

write down a matrix for it. Let’s do so now.

Exercise 1 To get started, we begin by computing partial derivatives in Python.

To make things easy, let’s diﬀerentiate functions like

def fi(v):

return v[0] * (v[1]**2)

In other words, our functions will send an n-tuple to a single real number. In this

case where f

i

(x, y) = xy

2

, we should have that partial(fi,1)([2,3]) is close to

12, since

∂

∂y

_

xy

2

_

= 2xy,

and so at the point (2, 3), the derivative is 2 2 3 = 12.

Solution

Hint: def partial(fi,j):

def derivative(p):

p_shifted = p[:]

p_shifted[j] += epsilon

return (fi(p_shifted) - fi(p))/epsilon

return derivative

Python

1 epsilon = 0.0001

2 n = 2

3 #

4 # fi is a function from R^n to R

5 def partial(fi,j):

6 def derivative(p):

7 return # the partial derivative of fi in the j-th coordinate at p

8 return derivative

9 #

10 # this should be close to 12

11 print partial(lambda v: v[0] * v[1]**2, 1)([2,3])

12

13 def validator():

14 return abs(partial(lambda v: v[0]**2 * v[1]**3, 0)([7,2]) - 112) < 0.01

If we have a function f : R

n

→R

m

, we’ll encode it as a Python function which

takes a list with n entries, and returns a list with m-entries. Let’s write a Python

helper for pulling out just the ith component of the output.

98

35 Jacobian matrix

Solution

Python

1 # if f is a function from R^n to R^m,

2 # then component(f,i) is the function R^n to R,

3 # which just looks at the i-th entry of the output

4 #

5 def component(f,i):

6 return lambda p: # the i-th component of the output

7

8 def validator():

9 return component(lambda v: [v[0],v[1]],1)([1,17]) == 17

Now we put it all together. For a function f : R

n

→R

m

, the Jacobian matrix

is given by

_

¸

¸

¸

¸

¸

¸

¸

¸

_

∂f

1

∂x

1

(p)

∂f

1

∂x

2

(p)

∂f

1

∂x

n

(p)

∂f

2

∂x

1

(p)

∂f

2

∂x

2

(p)

∂f

2

∂x

n

(p)

.

.

.

.

.

.

.

.

.

.

.

.

∂f

m

∂x

1

(p)

∂f

m

∂x

2

(p)

∂f

m

∂x

n

(p)

_

¸

¸

¸

¸

¸

¸

¸

¸

_

Solution Implement the function jacobian which takes a function f : R

n

→R

m

and

a point p ∈ R

n

, and returns the Jacobian matrix of f at the point p.

Hint: You can write this matrix as

[[partial(component(f,i),j)(p) for j in range(n)] for i in range(m)]

Python

1 epsilon = 0.0001

2 n = 3 # the dimension of the domain

3 m = 2 # the dimension of the codomain

4 def component(f,i):

5 return lambda p: f(p)[i]

6 def partial(fi,j):

7 def derivative(p):

8 p_shifted = p[:]

9 p_shifted[j] += epsilon

10 return (fi(p_shifted) - fi(p))/epsilon

11 return derivative

12 #

13 # f is a function from R^n to R^m

14 # jacobian(f,p) is its Jacobian matrix at the point p

15 def jacobian(f,p):

16 return # the Jacobian matrix

17

18 def validator():

19 m = jacobian(lambda v: [v[0]**2, (v[1]**3)*(v[0]**2)], [3,7])

20 return abs(m[1][0] - 2058) < 0.1

99

36 Relationship

Relate the Jacobian matrix and the total derivative.

In the previous activities, we wrote some code to compute Df(p) and the Jacobian

matrix of f at the point p. In this activity, we observe that these are related.

Exercise 1 Try running the code below.

Solution

Python

1 epsilon = 0.0001

2 n = 3 # the dimension of the domain

3 m = 2 # the dimension of the codomain

4 def component(f,i):

5 return lambda p: f(p)[i]

6 def partial(fi,j):

7 def derivative(p):

8 p_shifted = p[:]

9 p_shifted[j] += epsilon

10 return (fi(p_shifted) - fi(p))/epsilon

11 return derivative

12 def jacobian(f,p):

13 return [[partial(component(f,i),j)(p) for j in range(n)] for i in range(m)]

14 def add_vector(v,w):

15 return [sum(v) for v in zip(v,w)]

16 def scale_vector(c,v):

17 return [c*x for x in v]

18 def vector_length(v):

19 return sum([x**2 for x in v])**0.5

20 def D(ff):

21 def Df(p):

22 f = ff

23 def L(v):

24 return scale_vector( 1/(epsilon),

25 add_vector( f(add_vector(p, scale_vector(epsilon, v))),

26 scale_vector(-1, f(p)) ) )

27 return L

28 return Df

29 #

30 def dot_product(v,w):

31 return sum([x[0] * x[1] for x in zip(v,w)])

32 def apply_matrix(m,v):

33 return [dot_product(row,v) for row in m]

34 #

35 f = lambda p: [p[0] + p[0]*p[1], p[1] * p[2]**2]

36 p = [1,2,0]

37 v = [2,2,1]

38 print apply_matrix(jacobian(f,p),v)

39 print D(f)(p)(v)

40

41 def validator():

100

36 Relationship

42 # I just want them to try running this

43 return True

In this case, we set f(x, y, z) = (x+xy, yz

2

), and we computed Df(p)(v) in two

diﬀerent ways. The two diﬀerent methods were close—but not exactly the same.

Why are they not exactly the same? The D function is computing Df(p) by

comparing f(p + v) to f(p).

In contrast, the jacobian function computes Df(p) by computing partial deriva-

tives, so we are not actually computing f(p+v) in that case, but rather, f(p+e

i

)

for various i’s.

That the way f changes when wiggling each component separately has anything

to do with what happens to f when we wiggle the inputs together boils down to

the assumption that f be diﬀerentiable. This relationship is true inﬁnitesimally,

but here we’re working with = 0.0001, so it is not surprising that it is not true

on the nose.

101

37 The Chain Rule

Diﬀerentiating a composition of functions is the same as composing the derivatives

of the functions.

The chain rule of single variable calculus tells you how the derivative of a compo-

sition of functions relates to the derivatives of each of the original functions. The

chain rule of multivariable calculus will work analogously.

Question 1 Let f : R

2

→R

3

and g : R

3

→R. The only things you know about f

are that f(2, 3) = (3, 0, 0) and Df(2, 3) has matrix

_

_

4 −1

2 3

0 1

_

_

. The only thing you

know about g is that g(3, 0, 0) = 4 and Dg(3, 0, 0) has matrix

_

4 5 6

¸

.

Solution

Hint: g(f(2 +a, 3 +b)) ≈ g

_

f(2, 3) + Df(2, 3)

__

a

b

___

using the linear approxima-

tion to f at (2, 3)

Hint:

g

_

f(2, 3) + Df(2, 3)

__

a

b

___

= g

_

_

(3, 0, 0) +

_

_

4 −1

2 3

0 1

_

_

_

a

b

_

_

_

= g

_

_

(3, 0, 0) +

_

_

4a −b

2a + 3b

b

_

_

_

_

≈ g(3, 0, 0) + Dg(3, 0, 0)

_

_

4a −b

2a + 3b

b

_

_

using the linear approximation to g at (3, 0, 0)

≈ 4 +

_

4 5 6

_

_

_

4a −b

2a + 3b

b

_

_

= 4 + (16a −4b) + (10a + 15b) + (6b)

= 4 + 26a + 17b

Assuming a, b ∈ R

2

are small, (g ◦ f)(2 + a, 3 + b) ≈ 4 + 26a + 17b

Solution So the matrix of D(g ◦ f)(2, 3) is

Hint: Dg

¸

¸

f(2,3)

◦ Df

¸

¸

(2,3)

has matrix

_

4 5 6

_

_

_

4 −1

2 3

0 1

_

_

=

_

4(4) + 5(2) + 6(0) 4(−1) + 5(3) + 6(1)

_

=

_

26 17

_

102

37 The Chain Rule

Notice that D(g ◦ f)(2, 3) is the same as Dg

¸

¸

f(2,3)

◦ Df

¸

¸

(2,3)

! You should really

check this to make sure. Look at the hint to see how to do the composition if you

need help.

The heuristic approximations in the last question lead us to expect the following

theorem:

Theorem 2 Let f : R

n

→R

m

and g : R

m

→R

k

be diﬀerentiable functions, and

p ∈ R

n

. Then

D(g ◦ f)(p) = Dg(f (p)) ◦ Df(p)

In other words, the derivative of a composition of functions is the composition

of the derivatives of the functions.

The trickiest part of the above theorem is remembering that you need to apply

Dg at the point f(p).

The proof of this theorem is a little bit beyond what we want to require you to

think about in this course, but the essential idea of the proof is just that

(g ◦ f)(p +

h) ≈ g(f(p) + Df(p)(

h)) ≈ g(f(p)) + Dg(f (p))(Df(p)(

h))

You should understand this essential idea, even if you do not understand the

full proof. We cover the full proof in an optional section after this one.

Question 3 Let f : R

2

→ R

2

be deﬁned by f(r, t) = (r cos(t), r sin(t)). Let

g : R

2

→R be deﬁned by g(x, y) = x

2

+ y

2

.

Don’t let the choice of variable names scare you.

Solution

Hint:

_

_

_

∂

∂r

r cos(t)

∂

∂t

r cos(t)

∂

∂r

r sin(t)

∂

∂t

r sin(t)

_

¸

_

Hint:

_

cos(t) −r sin(t)

sin(t) r cos(t)

_

What is the Jacobian of f at (r, t)?

Solution

Hint:

_

∂

∂x

(x

2

+ y

2

)

∂

∂y

(x

2

+ y

2

)

_

Hint:

_

2x 2y

_

What is the Jacobian of g at (x, y)?

103

37 The Chain Rule

Solution

Hint: Just plug (r cos(t), r sin(t)) into the formula for the Jacobian of g you obtained

above.

Hint: The answer is

_

2r cos(t) 2r sin(t)

_

What is the matrix of Dg(f (r, t))?

Solution

Hint:

_

2r cos(t) 2r sin(t)

_

_

cos(t) −r sin(t)

sin(t) r cos(t)

_

=

_

2r cos

2

(t) + 2r sin

2

(t) −2r

2

cos(t) sin(t) + 2r

2

sin(t) cos(t)

_

=

_

2r 0

_

What is the matrix of Dg(f (r, t)) ◦ Df(r, t)?

Solution

Hint:

(g ◦ f)(r, t) = g(r cos(t), r sin(t))

= (r cos(t))

2

+ (r sin(t))

2

= r

2

(cos

2

(t) + sin

2

(t))

= r

2

Compute the composite directly: (g ◦ f)(r, t) =r

2

Solution Compute D(g ◦ f)(r, t) directly from the formula for (g ◦ f).

This example has demonstrated the chain rule in action! Computing the deriva-

tive of the composite was the same as composing the derivatives.

The product rule from single variable calculus can be proved by invoking the

chain rule for multivariable functions. We supply a basic outline of the proof below:

can you complete this outline to give a full proof?

Let f, g : R → R be two diﬀerentiable functions. Let p : R → R be deﬁned by

p(t) = f(t)g(t). Let Both : R →R

2

be deﬁned by Both(t) = (f(t), g(t)). Finally let

Multiply : R

2

→ R be deﬁned by Multiply(x, y) = xy. Then we can diﬀerentiate

Multiply(Both(t)) = p(t) using the multivariable chain rule. This should result in

the product rule.

D(Both) has the matrix

_

f

(t)

g

(t)

_

at the point t ∈ R.

D(Multiply) has the matrix

_

y x

¸

at the point (x, y) ∈ R

2

So p

(t) = D(Multiply)

¸

¸

Both(t)

◦ D(Both)

¸

¸

t

by the multivariable chain rule.

104

37 The Chain Rule

So

p

(t) = D(Multiply)

¸

¸

Both(t)

◦ D(Both)

¸

¸

t

= D(Multiply)

¸

¸

(f(t),g(t))

◦ D(Both)

¸

¸

t

=

_

g(t) f(t)

¸

_

f

(t)

g

(t)

_

= g(t)f

(t) + f(t)g

(t)

This is the product rule from single variable diﬀerential calculus.

105

38 Proof of the chain rule

This section is optional

This section is optional

Before we beginning the proof of the chain rule, we need to introduce a new piece

of machinery:

Let L : R

n

→ R

m

be a linear map. Let S

n−1

= ¦v ∈ R

n

: [v[ = 1¦ be the

unit sphere in R

n

. Then there is a function F : S

n−1

→R

m

deﬁned on this sphere

which takes a vector and returns the length of its image under L, F(v) = [L(v)[.

Deﬁnition 1 The maximum value of the function F is called the operator

norm of L, and is written [L[

op

.

The fact that the operator norm of a linear transformation exists is a kind of

deep (for this course) piece of analysis. It follows from the fact that the sphere is

compact

1

, and continuous functions on compact spaces must achieve a maximum

value. For example, every continuous function on the closed interval [0, 1] ⊂ R

has a maximum, although not every continuous function on the open interval (0, 1)

does. The essential diﬀerence between these two intervals is that the ﬁrst one is

compact, while the second one is not.

The essential property of the operator norm is that [L(v)[ ≤ [L[

op

[v[ for each vector

v ∈ R

n

. This is true just because we have L(

v

[v[

) ≤ [L[

op

because

v

[v[

is a unit

vector, and by the deﬁnition of the operator norm. This fact will be essential to us

as we prove the chain rule.

Let f : R

n

→ R

m

and g : R

m

→ R

k

be diﬀerentiable functions. We want to

show that D(g ◦ f)(p) = Dg(f (p)) ◦ Df(p).

All we know at the beginning of the day is that

f(p +

h) = f(p) + Df(p)(

h) + fError(

h)

with

lim

h→0

¸

¸

¸fError(

h)

¸

¸

¸

[

h[

= 0

and

g(f (p) +u) = g(f(p)) + Dg(f (p))(u) + gError(u)

with

lim

u→0

[gError(u)[

[u[

= 0

1

http://en.wikipedia.org/wiki/Compact_space

106

38 Proof of the chain rule

We will simply compose these two formulas for f and g, and try to get some

control on the complicated error term which results.

g

_

f(p +

h)

_

= g

_

f(p) + Df(p)(

h) + fError

p

(

h)

_

= g(f(p)) + Dg(f (p))

_

Df(p)(

h) + fError

p

(

h)

_

+ gError

_

Df(p)(

h) + fError

p

(

h)

_

= (g ◦ f)(p) + Dg(f (p)) ◦ Df(p)(

h)

+ Dg(f (p))

_

fError

p

(

h)

_

+ gError

_

Df(p)(

h) + fError

p

(

h)

_

This looks pretty horrible, but at least we can see the error term in red that we

have to get some control over. In particular we will have proven the chain rule if

we can show that

lim

h→0

¸

¸

¸Dg(f (p))

_

fError

p

(

h)

_

+ gError

_

Df(p)(

h) + fError

p

(

h)

_¸

¸

¸

[

h[

= 0

Since

0 ≤

¸

¸

¸Dg(f (p))

_

fError

p

(

h)

_

+ gError

_

Df(p)(

h) + fError

p

(

h)

_¸

¸

¸ ≤

¸

¸

¸Dg(f (p))

_

fError

p

(

h)

_¸

¸

¸+

¸

¸

¸gError

_

Df(p)(

h) + fError

p

(

h)

_¸

¸

¸

by the triangle inequality, we will have the result if we can prove separately that

lim

h→0

¸

¸

¸Dg(f (p))

_

fError

p

(

h)

_¸

¸

¸

¸

¸

¸

h

¸

¸

¸

= 0

and

lim

h→0

¸

¸

¸gError

_

Df(p)(

h) + fError

p

(

h)

_¸

¸

¸

h

= 0

Lets prove the ﬁrst limit. This is where operator norms enter the picture.

¸

¸

¸Dg(f (p))

_

fError

p

(

h)

_¸

¸

¸

¸

¸

¸

h

¸

¸

¸

≤

[Dg[

op

[fError(

h)[

[

h[

Since

[fError[

¸

¸

¸

h

¸

¸

¸

→ 0 as

h → 0, then we see that

¸

¸

¸Dg(f (p))

_

fError

p

(

h)

_¸

¸

¸

¸

¸

¸

h

¸

¸

¸

must

as well, since it is bounded above by a constant multiple of something which goes

107

38 Proof of the chain rule

to 0 (and bounded below by 0).

For the other part of the error,

¸

¸

¸gError

_

Df(p)(

h) + fError(

h)

_¸

¸

¸

h

=

¸

¸

¸gError

_

Df(p)(

h) + fError(

h)

_¸

¸

¸

¸

¸

¸Df(p)(

h) + fError

p

(

h)

¸

¸

¸

¸

¸

¸Df(p)(

h) + fError(

h)

¸

¸

¸)

[

h[

The ﬁrst factor in this expression goes to 0 as

h →0 because g is diﬀerentiable.

So all we need to do is make sure that the second factor is bounded.

¸

¸

¸Df(p)(

h) + fError

p

(

h)

¸

¸

¸

[

h[

≤

¸

¸

¸Df(p)(

h)

¸

¸

¸

[

h[

+

¸

¸

¸fError(

h)

¸

¸

¸

[

h[

by the triangle inequality

≤

[Df(p)[

op

¸

¸

¸

h

¸

¸

¸

[

h[

+

¸

¸

¸fError(

h)

¸

¸

¸

[

h[

= [Df(p)[

op

+

¸

¸

¸fError(

h)

¸

¸

¸

[

h[

Now the second term in this expression goes to 0 as

h → 0 since f is diﬀer-

entiable. So the whole expression is bounded by, say, [Df(p)[

op

+

1

2

if

h is small

enough.

Now we are done! We have successfully shown that the nasty error term Error

satisﬁes

lim

h→0

¸

¸

¸Error(

h)

¸

¸

¸

[

h[

= 0

Thus g ◦ f is diﬀerentiable at p, and its derivative is given by Dg(f (p)) ◦ Df(p).

QED

108

39 End of Week Practice

Practice doing computations.

This section just contains practice problems on the material we have learned this

week. These problems do not have detailed hints: clicking on the hint will

immediately reveal the answer. Use them to test your knowledge.

If you are confused about how to do any of these problems, ask your peers in

one of the forums.

1

1

YouTube link: http://www.youtube.com/watch?v=4_Yuldynomc

109

40 Jacobian practice

Practice computing the Jacobian.

Question 1 Compute the Jacobian of f : R

2

→R

3

deﬁned by f(x, y) = (sin(xy), x

2

y

3

, x

3

).

Solution

Hint: J =

_

_

y cos(xy) xcos(xy)

2xy

3

3x

2

y

2

3x

2

0

_

_

Question 2 Compute the Jacobian of f : R

4

− ¦y = 0¦ → R

1

deﬁned by

f(x, y, z, t) = x

2

yz

3

t

4

+

x

y

.

Solution

Hint: J =

_

2xyz

3

t

4

+

1

y

x

2

z

3

t

4

−

x

y

2

3x

2

yz

2

t

4

4x

2

yz

3

t

3

_

Question 3 Compute the Jacobian of f : R

2

−¦(0, 0)¦ →R

2

deﬁned by f(x, y) =

(

x

x

2

+ y

2

,

y

x

2

+ y

2

).

Solution

Hint: J =

_

_

_

_

y

2

−x

2

(x

2

+ y

2

)

2

−2xy

(x

2

+ y

2

)

2

−2xy

(x

2

+ y

2

)

2

x

2

−y

2

(x

2

+ y

2

)

2

_

¸

¸

_

Question 4 Compute the Jacobian of f : R →R

4

deﬁned by f(t) = (cos(t), sin(t), t, t

2

).

Solution

Hint: J =

_

_

_

_

−sin(t)

cos(t)

1

2t

_

¸

¸

_

110

41 Gradient practice

Practice computing gradient vectors.

Question 1 Compute the ∇f where f : R

3

→R deﬁned by f(x, y, z) = x

2

+xyz.

Solution

Hint: ∇f =

_

_

2x + yz

xz

xy

_

_

Question 2 Compute the ∇f where f : R

4

→R deﬁned by f(x, y, z, t) = cos(xy) sin(zt).

Solution

Hint: ∇f =

_

_

_

_

−y sin(xy) sin(zt)

−xsin(xy) sin(zt)

t cos(xy) cos(zt)

z cos(xy) cos(zt)

_

¸

¸

_

Question 3 Compute the ∇f where f : R

2

−(0, 0) →R deﬁned by f(x, y) =

x

y

.

Solution

Hint: ∇f =

_

_

_

1

y

−x

y

2

_

¸

_

111

42 Linear approximation practice

Practice doing some linear approximations.

Question 1 Let f : R

2

→R

2

be given by f(x, y) = (x

2

−y

2

, 2xy).

Solution

Hint:

_

4.4

14.2

_

Use the linear approximation to f at (3, 2) to approximate f(3.1, 2.3). Give your answer

as a column vector.

Question 2 Let f : R

1

→R

3

be given by f(t) = (t, t

2

, t

3

).

Solution

Hint:

_

_

1.1

1.2

1.3

_

_

Use the linear approximation to f at 1 to approximate f(1.1). Give your answer as a

column vector.

Question 3 Let f : R

3

→R

2

be given by f(x, y, z) = (xy, yz).

Solution

Hint:

_

2.1

6.1

_

Use the linear approximation to f at (1, 2, 3) to approximate f(1.1, 1.9, 3.2). Give your

answer as a column vector.

112

43 Increasing and decreasing

Consider whether a function is increasing or decreasing.

Question 1 Let f : R

2

→R be given by f(x, y) = (x + 2y)

2

+ y

3

Solution

Hint: By computing the directional derivative in that direction, we see that f is

decreasing.

At the point (−1, 5), is f increasing or decreasing in the direction

_

1

−1

_

.

(a) Increasing

(b) Decreasing

Question 2 Let f : R

3

→R be given by f(x, y, z) =

x + y

z

Solution

Hint: By computing the directional derivative in that direction, we see that f is

decreasing. You could also see that f is decreasing by noting that the numerator is left

unchanged when x and y are increased equally in opposite directions, but the denominator

is increasing.

At the point (3, 2, 6), is f increasing or decreasing in the direction

_

_

1

−1

2

_

_

.

(a) Increasing

(b) Decreasing

Question 3 Let f : R

4

→R be given by f(x, y, z, t) =

xy

2

t

+ z

2

Solution

Hint: By computing the directional derivative in that direction, we see that f is

decreasing.

At the point (0, 1, 5, −1), is f increasing or decreasing in the direction

_

_

_

_

1

1

−2

3

_

¸

¸

_

.

(a) Increasing

(b) Decreasing

113

44 Stationary points

Find stationary points.

Question 1 Let f : R

2

→R

2

be deﬁned by f(x, y) = (x

2

−x−y

2

, 2xy−y). There

is one stationary point of f (one place where the derivative vanishes). What is this

stationary point? Give your answer as a column vector

Question 2 Let f : R

3

→ R be deﬁned by f(x, y, z) = x

2

+ xy + xz + z

2

. There

is one stationary point of f (one place where the derivative vanishes). What is this

stationary point? Give your answer as a column vector

Question 3 Let f : R

2

→R

3

be given by f(x, y) = (x

2

, y−x,

_

x

2

−4x + 4 + y

2

)

Question 4 f is diﬀerentiable everywhere except for one point. What is that

point? Give your answer as a column vector

114

45 Hypersurfaces

Find tangent planes and lines.

Question 1 Hint: 6(x −3) −4(y −2) = 0

Let f : R

2

→R be deﬁned by f(x, y) = x

2

−y

2

.

Question 2 The equation of the tangent line to the curve f(x, y) = 5 at the point

(3, 2) is 0 = 6(x −3) −4(y −2)

Question 3 Hint: 4(x −3) −2y −6(z −1) = 0

Let f : R

3

→R be deﬁned by f(x, y, z) = (x −z)

2

−(y + z)

2

.

Question 4 The equation of the tangent plane to the surface f(x, y, z) = 3 at the

point (3, 0, 1) is 0 = 4(x −3) −2y −6(z −1)

115

46 Use the chain rule

Compute using the chain rule.

Question 1 Let f : R

2

→R

2

be deﬁned by f(x, y) = (x

2

y + x,

x

y

). Let g : R

2

→

R

3

be deﬁned by g(x, y) = (x + y, x −y, xy).

Solution

Hint:

_

_

_

_

_

_

2ab + 1 +

1

b

a

2

−

a

b

2

2ab + 1 −

1

b

a

2

+

a

b

2

3a

2

+

2a

b

−a

2

b

2

_

¸

¸

¸

¸

_

Use the chain rule to ﬁnd the Jacobian of (g ◦ f) at the point (a, b).

Question 2 Let f : R

2

→ R be deﬁned by f(x, y) = x

3

+ y

3

. Let g : R → R

2

be

deﬁned by g(x, y) = (x, sin(x)).

Solution

Hint:

_

3a

2

3b

2

3a

2

cos(a

3

+ b

3

) 3b

2

cos(a

3

+ b

3

)

_

Use the chain rule to ﬁnd the Jacobian of (g ◦ f) at the point (a, b).

116

47 Abstraction

Not every vector space consists of lists of numbers.

So far, we’ve been thinking of all of our vector spaces as being R

n

. We now relax

this condition by providing a deﬁnition of a vector space in general.

117

48 Vector spaces

Vector spaces are sets with a notion of addition and scaling.

Until now, we have only dealt with the spaces R

n

. We now begin the journey of

understanding more general spaces.

The crucial structure we need to talk about linear maps on R

n

are addition and

scalar multiplication. Addition is a function which takes a pair of vectors v, w ∈ R

n

and returns a new vector v + w ∈ R

n

. Scalar multiplication is function which takes

a vector v ∈ R

n

and a scalar c ∈ R, and returns a new vector cv.

Deﬁnition 1 A vector space is a set V equipped with a notion of addition, which

is a function that takes a pair of vectors v, w ∈ V and returns a new vector v + w,

and a notion of scalar multiplication, which is a function that takes a scalar c ∈ R

and a vector v ∈ V and returns a new vector cv ∈ V .

These operations are subject to the following requirements:

Commutativity For each v, w ∈ V , v + w = w +v

Associativity For each v, w, u ∈ V , v + ( w +u) = (v + w) +u

Additive identity There is a vector called

0 ∈ V with v +

0 = v for each v ∈ V .

Additive inverse For each v ∈ V there is a vector w ∈ V with v + w =

0

Multiplicative identity For each v ∈ V , 1v = v (here 1 is really the real

number 1 ∈ R, and 1v is the scalar product of 1 with v)

Distributivity of scalar multiplication over vector addition For each v, w ∈

V and c ∈ R, c(v + w) = cv + c w

Distributivity of vector multiplication under scalar addition For each a, b ∈

R and v ∈ V , (a + b)v = av + bv

Let’s list oﬀ some nice examples of vector spaces.

Example 2 Our old friend R

n

is a vector space with the notions of vector addi-

tion and scalar multiplication we introduced in Week 1.

Example 3 Let Poly

2

be the set of all polynomials of degree at most 2 in one

variable. For example 1 + 2x ∈ Poly

2

and 3 + x

2

∈ Poly

2

. Then the usual way of

adding polynomials and multiplying them by constants turns Poly

2

into a vector

space. For example 2 (1 + 2x) + (3 + x

2

) = 5 + 4x + x

2

.

The next example will probably be the most important example for us. Thinking

of matrices as points in a vector space will be useful for us when we start thinking

about the second derivative.

Example 4 Let Mat

nm

be the collection of all n m matrixes. Then Mat

nm

is

a vector space with the usual notion of matrix addition and scalar multiplication.

The following is an important, but very small, ﬁrst step into the world of func-

tional analysis.

118

48 Vector spaces

Example 5 Let (([0, 1]) be the set of all continuous functions from [0, 1] to R.

Then (([0, 1]) is a vector space with addition and scalar multiplication deﬁned

pointwise (f + g is the function whose value at x is f(x) + g(x), and cf is the

function whose value at x is cf(x)).

Realizing that solution sets of certain diﬀerential equations form vector spaces

is important.

Let V be the set of all smooth functions f : R →R which satisfy the diﬀerential

equation

d

2

f

dx

2

+ 3

df

dx

+ 4f(x) = 0. Addition is the usual addition of functions,

meaning that f + g denotes the function which sends x to f(x) + g(x). Scalar

multiplication cf means the function that sends x to c f(x). With this, we can

show that V is a vector space.

What if we change the diﬀerential equation to

d

2

f

dx

2

+ 3

df

dx

+ 4f(x) = 0? Is this

still a vector space? Why or why not? We already know that function addition

and scalar multiplication of functions satisfy all of the axioms of a vector space:

what we do not know is whether function addition and scalar multiplication are

well deﬁned for solutions to this diﬀerential equation.

We need to check that if f and g are solutions, and c ∈ R, then f + g and cf

are as well.

d

2

dx

2

(f + g) + 3

d

dx

(f + g) + 4(f + g)(x) =

d

2

f

dx

2

‘ +

d

2

g

dx

2

+ 3

df

dx

+ 3

df

dx

+ 4f(x) + 4g(x)

=

d

2

f

dx

2

+ 3

df

dx

+ 4f(x) +

d

2

g

dx

2

+ 3

dg

dx

+ 4g(x)

= 0

So f + g is a solution to the DE if f and g are.

d

2

dx

2

(cf(x)) + 3

d

dx

(cf(x)) + 4cf(x) = c(

d

2

f

dx

2

+ 3

df

dx

+ 4f(x)) = 0

So cf is a solution to the DE if f is, and c ∈ R

So addition and scalar multiplication are well deﬁned on V , giving V the struc-

ture of a vector space.

We are dealing with a level of abstraction here that you may not have met

before. It is worthwhile taking some time to prove certain “obvious” (though they

are not so obvious) statements formally from the axioms:

Prove that in any vector space, there is a unique additive identity. In other

words, if V is a vector space, and there are two elements

0 ∈ V and

0

∈ V so that

for each v ∈ V , v +

0 = v and v +

0

= v, then

0 =

0

**. Every line of your proof
**

should be justiﬁed with a vector space axiom!

We will start with

0, and use our vector space axioms to construct a string of

equalities ending in

0

0 =

0 +

0

because

0

is an additive identity

=

0

+

**0 by the commutativity of vector addition
**

=

0

because

0 is an additive identity

119

48 Vector spaces

So

0 =

0

.

Prove that each element of a vector space has a unique (only one) additive

inverse. Let v ∈ V . Assume that both w

1

and w

2

are both additive inverses of v.

We will show that w

1

= w

2

w

1

= w

1

+

**0 by the deﬁnition of the additive identity
**

= w

1

+ (v + w

2

) because v and w

2

are additive inverses

= ( w +v) + w

2

by associativity of vector addition

=

0 + w

2

because v and w

1

are additive inverses

= w

2

by the deﬁnition of the additive identity

Let V be a vector space. Prove that 0v =

0 for every v ∈ V . (Note: 0 means

diﬀerent things on the diﬀerent sides of the equation! On the left hand side, it is

the scalar 0 ∈ R, whereas on the right hand side it is the zero vector

0 ∈ V )

0v = (0 + 0)v nothing funny here: 0 + 0 = 0

= 0v + 0v by the distributivity of vector multiplication under scalar addition

So 0v = 0v + 0v.

Now let w be the additive inverse of 0v, and add it to both sides of the equation:

0v + w = (0v + 0v) + w

0v + w = 0v + (0v + w) by the associativity of vector addition

0 = 0v +

0 by the deﬁnition of additive inverses

**0 = 0v by the deﬁnition of the additive identity
**

QED

Let V be a vector space. Prove that a

0 =

0 for every a ∈ R

a

0 = a(

0 +

**0) by deﬁnition of the additive identity
**

= a

0 + a

**0 by the distributivity of scalar multiplication over vector adddition
**

So

a

0 = a

0 + a

0

Let w be the additive inverse of a

**0. Adding w to both sides we have
**

a

0 + w =

_

a

0 + a

0

_

+ w

a

0 + w = a

0 +

_

0 + w

_

by associativity of vector addition

0 = a

0 +

0by deﬁnition of additive inverses

0 = a

**0 by deﬁnition of the additive identity
**

120

48 Vector spaces

QED

Let V be a vector space. Prove that (−1)v is the additive inverse of v for every

v ∈ R.

The proof of this uses the “rat poison principle”: if you want to show that

something is rat poison, try feeding it to a rat! In this case we want to see if (−1)v

is the additive inverse of v, so we should try adding it to v.

(−1)v +v = (−1)v + 1v Multiplicative identity property

= (−1 + 1)v distributivity of vector multiplication over scalar addition

= 0v =

**0 by one of the theorems above
**

So, indeed, (−1)v is an additive inverse of v. We already proved uniqueness

of additive inverses above, so we are done. We will often simply write −v for the

additive inverse of v in the future.

121

49 Linear maps, redux

Linear maps respect scalar multiplication and vector addition.

Deﬁnition 1 Let V and W be two vector spaces. A function L : V → W is a

linear map if

Respects vector addition For all v

1

, v

2

∈ V , L(v

1

+v

2

) = L(v

1

) + L(v

2

)

Respects scalar multiplication For all c ∈ R and v ∈ V , L(cv) = cL(v)

If the domain and codomain of a linear map are both V , then we may call it a

linear operator to emphasize this fact. For instance, you might hear someone say

“L : V →V is a linear operator.”

Let V be the space of all polynomials in one variable x, and W = R. For each

real number a ∈ R, deﬁne the function Eval

a

: V →W deﬁned by Eval

a

(p) = p(a).

Show that Eval

c

is a linear map.

Let p

1

, p

2

∈ V . Then Eval

a

(p

1

+ p

2

) = p

1

(a) + p

2

(a) = Eval

a

(p

1

) + Eval

a

(p

2

).

Also if c ∈ R, then Eval

a

(cp) = cp(a) = cEval

a

(p). So Eval

a

is a linear map.

To make sure linear maps work the way we expect them to in this new context,

and to ﬂex our brains a little bit, let’s prove some facts about linear functions:

Let L : V →W be a linear map. Show that L(

0) =

0. (Note:

0 means diﬀerent

things on either side of the equation. On the LHS it means the additive identity of

V , while on the RHS it means the additive identity of W).

L(

0) = L(0

0) we proved in the last section that 0v =

0 for any v

= 0L(

**0) because L respects scalar multiplication
**

=

**0 by the same reasoning quoted above
**

Another way to do this would be by starting with L(

0) = L(

0 +

0) and using

the fact that L respects vector addition. Try this proof out too!

Let V and W be vector spaces, and deﬁne a function Zero : V → W by

Zero(v) =

**0 for all v ∈ V . Show that Zero is a linear function.
**

Let v

1

, v

2

∈ V . Then

Zero(v

1

+v

2

) =

0

=

0 +

0

= Zero(v

1

) +Zero(v

2

)

So Zero respects vector addition

Let v ∈ V and c ∈ R.

Zero(cv) =

0

= c

0

= cZero(v)

So Zero respects scalar multiplication.

122

50 Python

Certain Python functions form a vector space?

Let T be the collection of all Python functions f with the properties that

• f accepts a single numeric parameter,

• f returns a single numeric parameter, and

• no matter what number x is, the function call f(x) successfully returns a

number.

We’ll say that two Python functions are “equal” if they produce the same outputs

for the same inputs.

Now the collection T (arguably) forms a vector space. I say “arguably” because

“numbers” in Python aren’t real numbers, but let’s just play along and pretend

that they are.

Question 1 What function plays the role of

0 in T?

Solution

Python

1 def zero(x):

2 # return ?

3

4 def validator():

5 return (zero(17) == 0)

Suppose we have two functions f and g. What is their sum?

Solution

Python

1 def vector_sum(f,g):

2 # return a new Python function which is the sum of f and g

3

4 def validator():

5 return (vector_sum(lambda x: x**2, lambda x: x**3)(3) == 36)

Now suppose we have a function f and a scalar c. What is c times f?

Solution

Python

1 def scalar_multiple(c,f):

2 # return a new Python function which is c*f

3

4 def validator():

5 return (scalar_multiple(17, lambda x: x**2)(2) == 68)

Now suppose we have a function f and a point a. The map evaluation map

sends f ∈ T to the value f(a).

Solution

123

50 Python

Python

1 def scalar_multiple(c,f):

2 return (lambda x: c * f(x))

3 def vector_sum(f,g):

4 return (lambda x: f(x) + g(x))

5 def evaluation_map(a,f):

6 # return the value of f at the point a

7 #

8 # Now note that evaluation_map(a,vector_sum(f,g)) = evaluation_map(a,f) + evaluation_map(a,g)

9 #

10 f = lambda x: x**2

11 g = lambda x: x**3

12 a = 3

13 print evaluation_map(a,vector_sum(f,g))

14 print evaluation_map(a,f) + evaluation_map(a,g)

15

16 def validator():

17 return (evaluation_map(17,(lambda x: x**2)) == 289)

This is an example of the fact that “evaluation at a” is a linear map from T to

the underlying number system.

Finally, some food for thought: in a little while, we’ll be thinking about “dimen-

sion.” Keep in mind the following question: what is the dimension of the vector

space T? You may want to think about this in the ideal world where “Python

functions” take honest real numbers as their input and outputs.

124

51 Bases

Basis vectors span the space without redundancy.

In our study of the vector spaces R

n

, we have relied quite heavily on the “standard

basis vectors” e

1

=

_

¸

¸

¸

¸

¸

_

1

0

0

.

.

.

0

_

¸

¸

¸

¸

¸

_

, e

2

=

_

¸

¸

¸

¸

¸

_

0

1

0

.

.

.

0

_

¸

¸

¸

¸

¸

_

, e

3

=

_

¸

¸

¸

¸

¸

¸

¸

_

0

0

1

0

.

.

.

0

_

¸

¸

¸

¸

¸

¸

¸

_

, . . . , e

n

=

_

¸

¸

¸

_

0

.

.

.

0

1

_

¸

¸

¸

_

. We’ll write c

\

for

the collection of vectors (e

1

, e

2

, . . . , e

n

).

A great feature of these vectors is that they span all of R

n

: every vector v ∈ R

n

can be written in the form

v = x

1

e

1

+ x

2

e

2

+ + x

n

e

n

.

What is even better is that this representation is unique: if I also have that

v = y

1

e

1

+ y

2

e

2

+ + y

n

e

n

,

then x

1

= y

1

, x

2

= y

2

, . . . , x

n

= y

n

.

Our goal in this section will be to ﬁnd similarly nice sets of vectors in an abstract

vector space.

Question 1 A linear combination of two vectors v and w is an expression of

the form

αv + β w

for some numbers α, β ∈ R.

Which of the following vectors is a linear combination of v =

_

_

3

2

1

_

_

and w =

_

_

1

5

1

_

_

?

Solution

(a)

_

_

1

−8

−1

_

_

(b)

_

_

−1

8

1

_

_

So not every vector in R

3

is a linear combination of v and w. Vectors which

are a linear combination of v and w are said to be in the “span” of v and w. Let’s

make this more general for more than just two vectors.

Deﬁnition 2 The span of an ordered list of vectors (v

1

, v

2

, . . . , v

n

) is the set of

all linear combinations of the v

i

.

Span(v

1

, v

2

, . . . , v

n

) = ¦a

1

v

1

+ a

2

v

2

+ + a

n

v

n

: a

i

∈ R¦

125

51 Bases

Question 3 Do

_

1

1

_

and

_

1

0

_

together span the vector space R

2

?

Solution

(a) Yes.

(b) No.

Indeed, every vector in R

2

can be written as a linear combination of

_

1

1

_

and

_

1

0

_

. Prove it!

Since we already know that

_

1

0

_

and

_

0

1

_

span R

2

, it is enough to show that

_

1

1

_

and

_

1

0

_

span these two vectors, i.e. we need only show that

_

0

1

_

is in the span of

these two vectors.

But

_

0

1

_

=

_

1

1

_

+−1

_

1

0

_

, so we are done.

To be a bit more explicit, we can write any vector

_

x

y

_

= x

_

1

0

_

+ y

_

0

1

_

= x

_

1

0

_

+ y

__

1

1

_

+−1

_

1

0

__

= (x −y)

_

1

0

_

+ y

_

1

1

_

So we have expressed any vector as a linear combination of

_

1

1

_

and

_

1

0

_

.

Question 4 Can every polynomial of degree at most 2 be written in the form

α(1) + β(x −1) + γ(x −1)

2

?

Solution

(a) Yes.

(b) No.

In other words: the polynomials 1, x −1, and (x −1)

2

span the vector space of

polynomials of degree at most 2. Prove it!

Let p(x) = a

0

+ a

1

x + a

2

x

2

.

Then

p(x) = a

0

+ a

1

[(x −1) + 1] + a

2

[(x −1) + 1]

2

= a

0

+ a

1

(x −1) + a

1

+ a

2

[(x −1)

2

+ 2(x −1) + 1]

= (a

0

+ a

1

+ a

2

)1 + (a

1

+ 2a

2

)(x −1) + a

2

(x −1)

2

so we have expressed every polynomial of degree at most 2 as a linear combina-

tion of 1, (x −1) and (x −1)

2

.

You could also solve this problem by appealing to Taylor’s theorem in one

variable calculus. Can you see how?

126

52 Dimension

Basis vectors span the space without redundancy.

Deﬁnition 1 A vector space is called ﬁnite dimensional if it has a ﬁnite list

of spanning vectors. A space which is not ﬁnite dimensional is called inﬁnite di-

mensional.

Question 2 The space P of all polynomials in one variable x is:

Solution

(a) Inﬁnite dimensional.

(b) Finite dimensional.

Can you prove it? Suppose that P were ﬁnite dimensional, and then deduce a

contradiction to show that it is impossible.

Let p

1

, p

2

, . . . , p

n

be a ﬁnite list of vectors. Since this list of polynomials is ﬁnite

they must be bounded in degree, i.e. the degree of p

i

must be less than some k

for each i. But a linear combination of polynomials of degree at most k is also of

degree at most k. So the polynomial x

k+1

,∈ Span(p

1

, p

2

, . . . , p

n

). Thus no ﬁnite

list of polynomials spans all of P. So P is inﬁnite dimensional.

Deﬁnition 3 Let V be a vector space. An ordered list of vectors (v

1

, v

2

, . . . , v

n

)

where all the v

i

∈ V is linearly independent if a

1

v

1

+ a

2

v

2

+ + a

n

v

n

=

b

1

v

1

+b

2

v

2

+ +b

n

v

n

implies that a

1

= b

1

, a

2

= b

2

, . . . , a

n

= b

n

. In other words,

every vector in the span of (v

1

, v

2

, . . . , v

n

) can be expressed as a linear combination

of the v

i

in only one way.

If the set of vectors is not linearly independent it is linearly dependent.

Show that the following alternative deﬁnition for linear independence is equiv-

alent to our deﬁnition:

Deﬁnition 4 Let V be a vector space. An ordered list of vectors (v

1

, v

2

, . . . , v

n

)

where all the v

i

∈ V is called linearly independent if a

1

v

1

+ a

2

v

2

+ + a

n

v

n

=

0

implies that a

i

= 0 for all i = 1, 2, 3, . . . , n.

Let us say that our original deﬁnition is of being linearly independent in the ﬁrst

sense, while this second deﬁnition is being linearly independent in the second sense.

If a list of vectors (v

1

, v

2

, . . . , v

n

) is linearly independent in the ﬁrst sense, then if

a

1

v

1

+a

2

v

2

+ +a

n

v

n

=

0 we have a

1

v

1

+a

2

v

2

+ +a

n

v

n

= 0v

1

+0v

2

+ +0v

n

,

so by the deﬁnition of linear independence in the ﬁrst sense, we have a

1

= a

2

=

= a

n

= 0.

On the other hand, if (v

1

, v

2

, . . . , v

n

) are linearly independent in the second

sense, then if a

1

v

1

+ a

2

v

2

+ + a

n

v

n

= b

1

v

1

+ b

2

v

2

+ + b

n

v

n

we have (a

1

−

b

1

)v

1

+ (a

2

−b

2

)v

2

+ + (a

n

−b

n

)v

n

=

0, so a

i

−b

i

= 0 for each i. Thus a

i

= b

i

for each i, proving that the list was linearly independent in the ﬁrst sense.

Often this deﬁnition is easier to check, although it does not capture the “mean-

ing” of linear independence as well as the ﬁrst deﬁnition.

127

52 Dimension

Prove that any ordered list of vectors containing the zero vector is linearly

dependent. We can see immediately from the second deﬁnition that since 1

0 =

0,

but 1 ,= 0, that the list cannot be linearly independent

Prove that an ordered list of length 2 (i.e. (v

1

, v

2

)) is linearly dependent if and

only if one vector is a scalar multiple of the other. For v

1

and v

2

to be linearly

dependent there must be two scalars a, b ∈ V with av

1

+ bv

2

= 0 with at least one

of a or b nonzero. Let us assume (without loss of generality) that a ,= 0. Then

av

1

= −bv

2

, so v

1

=

−b

a

v

2

. Thus one vector is a scalar multiple of the other.

Theorem 5 If ( v

1

, v

2

, v

3

, . . . , v

n

) is linearly dependent in V and v

1

,=

0, then one

of the vectors v

j

is in the span of v

1

, v

2

, . . . , v

j−1

Prove this theorem.

Since ( v

1

, v

2

, v

3

, . . . , v

n

) is linearly dependent, by deﬁnition there are scalars

a

i

∈ R with a

1

v

1

+ a

2

v

2

+ + a

n

v

n

= 0, and not all of the a

j

= 0. Let j

be the largest element of 2, 3, . . . , n so that a

j

is not equal to 0. Then we have

v

j

= −

a

1

a

j

v

1

−

a

2

a

j

v

2

−

a

3

a

j

v

3

− −

a

j−1

a

j−1

v

j−1

. So v

j

is in the span of v

1

, v

2

, . . . , v

j−1

.

If 2 vectors v

1

, v

2

span V , is it possible that the three vectors w

1

, w

2

, w

3

are

linearly independent?

Warning 6 This is harder to prove than you might think!

No!

Assume to the contrary that w

1

, w

2

, w

3

are linearly independent.

Since the list (v

1

, v

2

) spans V , the list (w

1

, v

1

, v

2

) is linearly dependant. Thus

by the previous theorem, either v

1

is in the span of w

1

, or v

2

is in the span of

(w

1

, v

1

). In either case we get that (w

1

, v) spans V , where v is either v

1

or v

2

.

Now apply the same trick: (w

2

, w

1

, v) must span V . So by the previous theorem,

either w

1

is in the span of w

2

or v is in the span of w

2

, w

1

. w

2

cannot be in the

span of w

1

because the w’s are linearly independent. So v is in the span of w

2

, w

1

.

So (w

2

, w

1

) spans V . But then w

3

is in the span of (w

2

, w

1

), contradicting the

fact that it is linearly independent from those two vectors. We have arrived at our

contradiction.

Therefore, w

1

, w

2

, w

3

cannot be linearly independent.

This problem generalizes:

Theorem 7 The length of a linearly independent list of spanning vectors is less

than the length of any spanning list of vectors.

Prove this theorem We will follow the same procedure that we did above. As-

sume (v

1

, v

2

, . . . , v

n

) is a list of vectors which spans V , and (w

1

, w

2

, . . . , w

m

) is a

linearly independent list of vectors. We must show that m < n.

(w

1

, v

1

, v

2

, . . . , v

n

) is linearly dependent since w

1

is in the span of the v

i

. By the

theorem above, we can remove on of the v

i

and still have a spanning list of length

n.

Repeating this, we can always add one w vector to the beginning of the list,

while deleting a v vector from the end of the list. This maintains a list of length n

which spans all of V . We know that it must be a v which gets deleted, because the

ws are all linearly independent. If m > n, then at the n

th

stage of this process we

128

52 Dimension

obtain that (w

1

, w

2

, . . . , w

n

) spans all of V , which contradicts the fact that w

n+1

is supposed to be linearly independent from the rest of the w.

Deﬁnition 8 An ordered list of vectors B = ( v

1

, v

2

, v

3

, . . . , v

n

) is called a basis

of the vector space V if B is both spans V and is linearly independent.

Let V be a ﬁnite dimensional vector space. Show that V has a basis. Let

(v

1

, v

2

, . . . , v

n

) be a spanning list of vectors (which exists and is ﬁnite since V is

ﬁnite dimensional). If this list is linearly dependent we can go through the following

process: For each i if v

i

∈ Span(v

1

, v

2

, . . . , v

i−1

), delete v

i

from the list. Note that

this also covers the 1st case: if v

1

= 0 delete it from the list.

At the end of this process, we have a list of vectors which span V , and also

no vector is the span of all the previos vectors. By the theorem above, the list is

linearly independent. So this new list is a basis for V .

Note: Let V be a ﬁnite dimensional vector space. Then every basis of V has

the same length. In other words, if v

1

, v

2

, . . . , v

n

is a basis and w

1

, w

2

, . . . , w

m

is a

basis, then n = m. This follows because we have already proven that n ≤ m and

m ≤ n

Deﬁnition 9 We say that a ﬁnite dimensional vector space has dimension n if

it has a basis of length n.

Let p

1

, p

2

, . . . , p

n

, p

n+1

be polynomials in the space P

n

of all polynomials of

degree at most n. Assume p

i

(3) = 0 for i = 1, 2, . . . , n. Is it possible that

p

1

, p

2

, . . . , p

n

, p

n+1

are all linearly independent? Why or why not? No. If p

1

, p

2

, . . . , p

n

, p

n+1

were all linearly independent then they would form a basis of P

n

, since P

n

has di-

mension n + 1. But every polynomial in the span of the p

i

must evaluate to 0 at

x = 3, while some polynomials in P

n

do not evaluate to 0 at x = 3, (for example,

the polynomial x).

129

53 Matrix of a linear map

Matrices record where basis vectors go.

In our ﬁrst brush with linear algebra, we only dealt with linear maps between the

spaces spaces R

n

for varying n. For those maps and those spaces, the convenient

standard basis allowed us to record linear maps using the ﬁnite data of a matrix.

In this section we will see that a similar story plays out for maps between ﬁnite

dimensional vector spaces: they too can be described by a matrix, but only after

making a choice of “basis” on the domain and codomain.

Question 1 Let V be a vector space with basis (v

1

, v

2

, v

3

) and W be a vector

space with basis ( w

1

, w

2

).

Suppose there is a linear map L : V →W for which

L(v

1

) = 3 w

1

+ 2 w

2

,

L(v

2

) = 3 w

1

−2 w

2

, and

L(v

3

) = w

1

+ w

2

.

In light of all this, compute L(2v

1

). But how will we write down our answer?

Where does L(2v

1

) live?

Solution

(a) In W.

(b) In V .

And since we know that L(2v

1

) ∈ W, we’ll write our answer as α w

1

+ β w

2

for

some numbers α and β.

So say L(2v

1

) = α w

1

+ β w

2

.

Solution In this case, α = 6.

Solution And β = 4.

Next compute L(2v

1

+ 3v

2

−4v

3

) = α w

1

+ β w

2

.

Solution

Hint: Use the fact that L(2v1 + 3v2 −4v3) = L(2v1) + L(3v2) −L(4v3).

Hint: Further use the fact that L(2v1)+L(3v2)−L(4v3) = 2 L(v1)+3 L(v2)−4 L(v3).

In this case, α = 11 but β = −6.

What we are seeing is an instance of the following observation.

Observation 2 If L : V → W is a linear map, and you know the value of L(v

i

)

for each vector in the basis (v

1

, v

2

, . . . , v

n

) of V , then you can compute L(v) for

any v ∈ V .

And by “compute,” I mean you can write down L(v) in terms of a basis of W.

130

53 Matrix of a linear map

Deﬁnition 3 Let L : V →W be a linear map between ﬁnite dimensional vector

spaces, let B

V

= (v

1

, v

2

, . . . , v

n

) be a basis for V , and let B

W

= ( w

1

, w

2

, . . . , w

m

)

be a basis for W. Then L( v

i

) = a

i,1

w

1

+ a

i,2

w

2

+ + a

i,m

w

m

.

Then the matrix with respect to the bases B

V

and B

W

is the matrix M whose

entry in the i

th

column and j

th

row is a

i,j

Question 4 Let B

1

=

_

_

1

1

_

E2

,

_

1

0

_

E2

_

and B

2

=

_

_

_

_

1

0

0

_

_

E3

,

_

_

0

0

1

_

_

E3

,

_

_

0

1

0

_

_

E3

_

_

be

bases for R

2

and R

3

, respectively.

Solution

Hint: The ﬁrst column of the matrix will be

_

1

1

_

E

2

but written with respect to the

basis B2.

Remember the order of vectors in the basis matters.

Hint:

L

_

_

1

1

_

E

2

_

=

_

_

2

1

0

_

_

E

3

= 2

_

_

1

0

0

_

_

E

3

+ 0

_

_

0

0

1

_

_

E

3

+ 1

_

_

0

1

0

_

_

E

3

Hint: So the ﬁrst column of the matrix is

_

_

2

0

1

_

_

B

2

.

Hint: Similarly, the second column is

_

_

1

0

1

_

_

B

2

.

Hint: So the matrix of this linear map is

_

_

2 1

0 0

1 1

_

_

What is the matrix for the linear map L

_

_

x

y

_

E

2

_

=

_

_

x + y

x

0

_

_

E

3

with respect to the

bases B1 and B2?

131

53 Matrix of a linear map

Question 5 Let P

2

be the space of polynomials of degree at most 2. Let B

0

=

(1, x, x

2

) and B

1

= (1, (x − 1), (x − 1)

2

). Consider the map L : P

2

→ R given by

L(p) = p(1).

Solution

Hint:

L(1) = 1

L(x) = 1

L(x

2

) = 1

2

= 1

Hint: The matrix of L with respect to B0 is

_

1 1 1

_

What is the matrix of this linear map with respect to the basis B0?

Solution

Hint:

L(1) = 1

L(x −1) = 1 −1 = 0

L((x −1)

2

) = (1 −1)

2

= 0

Hint: The matrix of L with respect to B1 is

_

1 0 0

_

What is the matrix of this linear map with respect to the basis B1?

Question 6 Let P

3

be the space of polynomials of degree at most 3. Let B =

(1, x, x

2

, x

3

). Consider the map L : P

3

→P

3

given by L(p(x)) =

d

dx

p(x). This map

is linear (why?).

Solution

Hint: L is linear because the derivative of a sum of two functions is the sum of the

derivative of the two functions, and since the derivative of a constant times a function is

the constant times the derivative of the function.

Hint:

L(1) = 0

L(x) = 1

L(x

2

) = 2x

L(x

3

) = 3x

2

Hint: The matrix of L is

_

_

_

_

0 1 0 0

0 0 2 0

0 0 0 3

0 0 0 0

_

¸

¸

_

What is the matrix for L with respect to the basis B?

132

54 Subspaces

A subspace is a subset of a vector space which is also a vector space.

Deﬁnition 1 A subset U of a vector space V is a subspace of V if U is a vector

space with respect to the scalar multiplication and the vector addition inherited

from V .

Question 2 Which of the following is a subspace of R

2

?

Solution

Hint: The vectors

_

1

2

_

and

_

0

1

_

are both on the line , but the sum

_

1

2

_

+

_

0

1

_

is not

on the line .

Hint: So is not a subspace.

Hint: The set P consists of a single vector.

Hint: But the vector in P is not the origin

_

0

0

_

.

Hint: So

_

1

2

_

∈ P but 10

_

1

2

_

,∈ P.

Hint: So P is not a subspace.

Hint: By process of elimination, the x-axis must be a subspace. Is it really?

Hint: Yes, if I multiply any vector

_

x

0

_

by a scalar, it is still on the x-axis.

Hint: And if I add together two vectors of the form

_

x

0

_

, the result is still on the

x-axis.

(a) The x-axis.

(b) The set P =

__

1

2

__

.

(c) The line =

__

x

y

_

∈ R

2

: y = x + 1

_

.

Let’s look at some more examples! Which of the following is a subspace of R

2

?

Solution

133

54 Subspaces

Hint: The set A is not a subspace because

_

1

0

_

∈ A, but −1

_

1

0

_

is not in A, so a

scalar multiple of something in A need not be in A.

Hint: The set C is not a subspace because even though it is closed under scalar

multiplication (check this!) it is not closed under vector addition, since

_

1

−2

_

and

_

1

2

_

are

both in C, but their sum

_

2

0

_

is not (draw a picture of this example!).

Hint: As the only choice left, B must be a subspace.

The reason is that it is just the span of the vector

_

2

1

_

, and as such, is closed under

scalar multiplication and vector addition.

(a) The set A = |

_

x

y

_

: x > 0 and y > 0¦

(b) The set B =

_

x

y

_

: x = 2y

(c) The set C = |

_

x

y

_

: [y[ < [x[¦

Solution

Hint: Question 3 What about the line y = 3? Does it form a subspace of R

2

?

Solution

(a) Yes.

(b) No.

That’s right; the tip of the vector

_

0

3

_

is on that line, but the scalar multiple of that

vector, like 2

_

0

3

_

=

_

0

6

_

, is not on the line.

So when do the points on a line in R

2

form a subspace?

(a) When the line passes through the point (0, 0).

(b) When the line is parallel to the x-axis.

This is an important observation.

Observation 4 Suppose U is a subspace of a vector space V . Then the “zero

vector” is in U.

134

55 Kernel

A kernel is everything sent to zero.

There are some special subspaces that we will want to pay attention to.

Theorem 1 If L : V → W is a linear transformation, then the kernel of L,

deﬁned by

ker(L) = ¦v ∈ V : L(v) =

0¦

is a subspace of V .

You may also hear this referred to as the null space of L.

Prove this theorem that ker L is a subspace.

We only need to show that ker(L) is closed under scalar multiplication and

vector addition.

For any v ∈ ker(L) and c ∈ R,

L(cv) = cL(v)

= c

0

=

0

so cv ∈ ker(L).

If v, w ∈ ker(L), then

L(v + w) = L(v) + L( w)

=

0 +

0

=

0

so v + w ∈ ker(L)

Thus ker(L) is a subspace!

Question 2 Let L : R

3

→R

2

be the linear map whose matrix is

_

2 3 1

1 0 −1

_

Solution

Hint: Just by evaluating all three, we see the only one which gets sent to

_

0

0

_

by L is

_

_

1

−1

1

_

_

Which of the following vectors is in the kernel of L?

(a)

_

_

1

−1

1

_

_

(b)

_

_

3

2

0

_

_

135

55 Kernel

(c)

_

_

0

0

2

_

_

Theorem 3 A linear map L : V →W is injective if and only if ker(L) = ¦

0¦.

Deﬁnition 4 The word “injective” is an adjective meaning the same thing as

“one to one.” In other words, a function f : A → B is injective if f(a

1

) = f(a

2

)

implies a

1

= a

2

.

Prove this theorem.

Let L be injective. Then L(v) =

0 implies L(v) = L(

0). Since L is injective,

this implies v =

**0. Thus the only element of the kernel is
**

0.

On the other hand, if ker(L) = ¦

0¦, then if L( v

1

) = L( v

2

), then L( v

1

− v

2

) =

0,

so v

1

− v

2

is in the null space, and hence must be equal to

0. But then we can

conclude that v

1

= v

2

Deﬁnition 5 The dimension of the kernel of L is the nullity of L.

Be careful to observe that ker L is a subspace, while dimker L is a number, so

the nullity of L is just a number.

136

56 Image

The “image” is every actual output.

Deﬁnition 1 If L : V →W is a linear transformation, then the image of L is

Imag(L) = ¦ w ∈ W : ∃v ∈ V, L(v) = w¦

.

Remember to read ∃ as “there exists.”

Warning 2 Some people may call this the “range.” Some other people use the

word “range” for what we’ve been calling the codomain. The result is that, in my

opinion, the word “range” is now overused, so we give up and never use the word.

Question 3 Suppose L : R

2

→R

3

, and suppose that

L

__

1

0

__

=

_

_

3

2

1

_

_

, and

L

__

0

1

__

=

_

_

1

1

1

_

_

.

What is a vector v ∈ R

2

so that L(v) =

_

_

2

1

0

_

_

?

Solution

Hint: Use the fact that

_

_

2

1

0

_

_

=

_

_

3

2

1

_

_

−

_

_

1

1

1

_

_

.

Hint: In other words,

_

_

2

1

0

_

_

= L(e1) −L(e2).

Hint: By linearity of L, we have L(e1) −L(e2) = L(e1 −e2).

Hint: And so a vector in the domain which is sent to

_

_

2

1

0

_

_

is the vector

_

1

−1

_

.

This is a special case of a general fact: if we have two vectors in the image, then

their sum is in the image, too.

Theorem 4 The image of a linear map is a subspace of the codomain.

137

56 Image

Prove this.

If w ∈ Imag(L), then there is a v ∈ V with L(v) = w. L(cv) = cL(v) = cw, so

cw ∈ Imag(L) for any c ∈ R. Thus Imag(L) is closed under scalar multiplication.

If w

1

, w

2

∈ Imag(L), then there are v

1

, v

2

∈ V with L(v

1

) = w

1

and L(v

2

) = w

2

.

L(v

1

+ v

2

) = L(v

1

) + L(v

2

) = w

1

+ w

2

, so w

1

+ w

2

∈ Imag(L). Thus Imag(L) is

closed under vector addition.

We ﬁnish with some terminology.

Deﬁnition 5 The dimension of the image of L is the rank of L.

Be careful to observe that the image of L is a subspace, while the dimension of

the image of L is a number, so the rank of L is just a number.

Question 6 Consider the linear map L : R

2

→R

3

given by the matrix

_

_

2 1

4 2

6 3

_

_

.

Solution

Hint: Not every vector in R

3

is the image of L.

Hint: Let’s think about which vectors are in the image of L.

Question 7 Is

_

_

2

4

6

_

_

in the image of L?

Solution

(a) Yes.

(b) No.

In fact,

_

_

2

4

6

_

_

= L

__

1

0

__

.

Is

_

_

1

1

1

_

_

in the image of L?

Solution

(a) Yes.

(b) No.

But how can we tell? The only things in the image of L are vectors of the form

_

_

x

2x

3x

_

_

for some x ∈ R. This is the span of

_

_

_

_

1

2

3

_

_

_

_

.

So what is the dimension of the vector space spanned by this single vector?

138

56 Image

Solution

(a) 0

(b) 1

(c) 2

And so the rank is one.

The rank of L is 1.

139

57 Rank nullity theorem

Rank plus nullity is the dimension of the domain.

Theorem 1 (Rank-Nullity) If L : V →W is a linear transformation, then the

sum of the dimension of the kernel of L and the dimension of the image of L is the

dimension of V .

The dimension of the kernel is sometimes called the “nullity” of L, and the

dimension of the image is sometimes called the “rank” of L.

Hence the name “rank-nullity” theorem.

Prove this theorem

Warning 2 This is hard!

Let v

1

, v

2

, v

3

, ..., v

m

be a basis of ker(L). We can extend this to a basis of V ,

v

1

, v

2

, ..., v

n

, u

1

, u

2

, ..., u

k

. We need to show that We will be done if we can show

that L(u

1

), L(u

2

), ..., L(u

k

) form a basis of Im(L).

Let w ∈ Im(L). Then w = L(v) for some v ∈ V . Since v

1

, v

2

, ..., v

n

, u

1

, u

2

, ..., u

k

is a basis of V , we can write

w = L(a

1

v

1

+ a

2

v

2

+ ... + a

n

v

n

+ b

1

u

1

+ ... + b

k

u

k

)

= a

1

L(v

1

) + ... + a

n

L(v

n

) + b

1

L(u

1

) + ... + b

k

L(u

k

)

= b

1

L(u

1

) + ... + b

k

L(u

k

)

So Im(L) is spanned by the L(u

i

). Now we need to see that the L(u

i

) are

linearly independent.

Assume b

1

L(u

1

) + b

2

L(u

2

) + ... + b

k

L(u

k

) = 0. Then L(b

1

u

1

+ ... + b

k

u

k

) = 0.

Then b

1

u

1

+ ... + b

k

u

k

would be in the null space of L. But the u

i

were chosen

speciﬁcally to be linearly independent of all of the vectors in the null space. So

b

1

= b

2

= ... = b

k

= 0. Thus the L(u

i

) are linearly independent and we are done.

140

58 Eigenvectors

Eigenvectors are mapped to multiples of themselves.

Deﬁnition 1 Let L : V → V be a linear map. A vector v ∈ V is called an

eigenvector of L if L(v) = λv for some λ ∈ R.

A constant λ ∈ R is called an eigenvalue of L if there is a nonzero eigenvector

v with L(v) = λv.

Geometrically, eigenvectors of L are those vectors whose direction is not changed

(or at worst, negated!) when they are transformed by L.

Let’s try some examples.

Question 2 Suppose L : R

2

→R

2

is the linear map represented by the matrix

_

3 2

4 1

_

.

Which of these vectors is an eigenvector of L?

Solution

Hint: Question 3 What is L

__

1

−1

__

?

Solution

Hint: We want to compute

_

3 2

4 1

_ _

1

−1

_

.

Hint: In this case,

_

3 2

4 1

_ _

1

−1

_

=

_

1

3

_

.

Is

_

1

3

_

a multiple of

_

1

−1

_

?

Solution

(a) No.

(b) Yes.

Consequently,

_

1

−1

_

is not an eigenvector. The eigenvector must be

_

1

1

_

.

(a)

_

1

1

_

(b)

_

1

−1

_

That’s right! Note that L

__

1

1

__

is

_

5

5

_

= 5

_

1

1

_

, and so

_

1

1

_

is an eigenvector.

141

58 Eigenvectors

Solution

Hint: Try computing L

__

1

−2

__

.

Hint: In this case, L

__

1

−2

__

=

_

−1

2

_

.

Hint: Question 4 Find λ ∈ R so that

_

1

−2

_

= λ

_

−1

2

_

.

Solution

Hint: The sign is opposite on both sides of the equation.

Hint: So try λ = −1.

λ = −1

And so −1 is an eigenvalue, with eigenvector

_

1

−2

_

.

Which of the following is another eigenvector?

(a)

_

1

−2

_

(b)

_

2

−1

_

Rock on! We check that

L

__

1

−2

__

=

_

−1

2

_

= −1

_

1

−2

_

and so

_

1

−2

_

is an eigenvector with eigenvalue −1.

142

59 Eigenvalues

Eigenvalues measure how much eigenvectors are scaled.

Deﬁnition 1 Let L : V → V be a linear operator (NB: linear maps with the

same domain and codomain are called linear operators). The set of all eigenvalues

of L is the spectrum of L.

Let’s try ﬁnding the spectrum.

Question 2 Let L : R

2

→ R

2

be the linear map whose matrix is

_

1 2

2 1

_

with

respect to the standard basis. L has two diﬀerent eigenvalues. What are they?

Give your answer in the form of a matrix

_

λ

1

λ

2

_

, where λ

1

≤ λ

2

.

Solution

Hint: For lambda to be an eigenvalue we need

_

1 2

2 1

_ _

x

y

_

= λ

_

x

y

_

Hint: This is the same as

_

x + 2y = λx

2x + y = λy

or

_

(1 −λ)x + 2y = 0

2x + (1 −λ)y = 0

Hint: These are two lines passing through the origin. To have more than just the

origin as a solution, we need that the slope of the two lines is the same. So

1 −λ

2

=

2

1 −λ

Hint:

1 −λ

2

=

2

1 −λ

(1 −λ)

2

= 4

1 −λ = ±2

lambda = −1 or 3

Hint: Let us now check that these really are eigenvalues:

If we let λ = −1, we have the equation 2x+2y = 0. Check that

_

1

−1

_

is an eigenvector

with eigenvalue −1

If we let λ = 3, we have the equation 2x − 2y = 0. Check that

_

1

1

_

is an eigenvector

with eigenvalue 3

143

59 Eigenvalues

Question 3 Let’s try another example. Suppose F : R

2

→ R

2

is the linear map

represented by the matrix

_

0 −1

1 0

_

.

Which of these numbers is an eigenvalue of F?

Solution

Hint: Let’s suppose that

_

x

y

_

is an eigenvector.

Hint: Then there is some λ ∈ R so that

_

0 −1

1 0

_ _

x

y

_

= λ

_

x

y

_

.

Hint: But

_

0 −1

1 0

_ _

x

y

_

=

_

−y

x

_

.

Hint: And so

_

−y

x

_

= λ

_

x

y

_

.

Hint: This means that −y = λx and x = λy.

Hint: Putting this together, −y = λ

2

y and x = −λ

2

x.

Hint: Since we are looking for a nonzero eigenvector (in order to have an eigenvalue),

we must have that either x ,= 0 or y ,= 0.

Hint: Consequently, λ

2

= −1.

Hint: But there is no real number λ ∈ R so that λ

2

= −1, since the square of any

real number is nonnegative.

Hint: Therefore, there is no real eigenvalue.

(a) There is no real eigenvalue.

(b) −1

(c)

√

2

(d) 1

Perhaps surprisingly, not every linear operator from R

n

to R

n

has any real

eigenvalues.

Geometrically, what is this linear map F doing?

Solution

(a) Rotation by 90

◦

counterclockwise.

(b) Rotation by 90

◦

clockwise.

(c) Rotation by 180

◦

.

This geometric fact also explains why there is no eigenvalue: what would be

the corresponding eigenvector whose direction is unchanged by applying F? Every

vector is moved by a rotation!

The additional fact that there are imaginary solutions to λ

2

= −1 is hinting

that i should have something to do with rotation, too.

144

60 Eigenspace

An eigenspace collects together all the eigenvectors for a given eigenvalue.

Theorem 1 Let λ be an eigenvalue of a linear operator L : V →V . Then the set

E

λ

(L) = ¦v ∈ V : L(v) = λv¦ of all (including zero) eigenvectors with eigenvalue λ

forms a subspace of V .

This subspace is the eigenspace associated to the eigenvalue λ.

Prove this theorem. We need to check that E

λ

(L) is closed under scalar multi-

plication and vector addition

If v ∈ E

λ

(L), and c ∈ R, then L(cv) = cL(v) = cλv = λ(cv), so cv is also an

eigenvector of L.

If v

1

, v

2

∈ E

λ

(L), then L(v

1

+ v

2

) = λv

1

+ λv

2

= λ(v

1

+ v

2

), so v

1

+ v

2

is also

an eigenvector of L.

The kernel of L is the eigenspace of the eigenvalue 0.

145

61 Eigenbasis

An eigenbasis is a basis of eigenvectors.

Observation 1 If (v

1

, v

2

, ..., v

n

) is a basis of eigenvectors of a linear operator L,

then the matrix of L with respect to that basis is diagonal, with the eigenvalues of

L appearing along the diagonal.

Theorem 2 Let L : V → W be a linear map. If v

1

, v

2

, ..., v

n

are nonzero eigen-

vectors of L with distinct eigenvalues λ

1

, λ

2

, ...λ

n

, then (v

1

, v

2

, ..., v

n

) are linearly

independent.

Prove this theorem.

Assume to the contrary that the list is linearly dependent. Let v

k

be the ﬁrst

vector in the list which is in the span of the preceding vectors, so that the vectors

(v

1

, v

2

, ..., v

k−1

) are linearly independent. Let a

1

v

1

+ a

2

v

2

+ ... + a

k−1

v

k−1

= v

k

.

Then applying L to both sides of this equation we have a

1

λ

1

v

1

+ a

2

λ

2

v

2

+ ... +

a

k−1

λ

k−1

v

k−1

= λ

k

v

k

. If we multiply the ﬁrst equation by λ

k

we also have a

1

λ

1

v

1

+

a

2

λ

1

v

2

+ ... + a

k−1

λ

1

v

k−1

= λ

1

v

k

. Subtracting these two equations we have

a

1

(λ

k

−λ

1

)v

1

+ a

2

(λ

k

−λ

2

)v

2

+ ... + a

3

(λ

k

−λ

k−1

)v

k

= 0.

Since the vectors (v

1

, v

2

, ..., v

k−1

) are linearly independent, we must have that

a

i

(λ

k

− λ

i

) = 0. But λ

k

,= λ

i

, so a

i

= 0 for each i. Looking back at where the a

i

came from, we see that this implies that v

k

= 0. This contradicts the assumption

that the v

j

were all nonzero.

So our assumption that the list was linearly dependent was absurd, hence the

list is linearly independent.

A corollary of this theorem is that if V is n dimensional and L : V → V has n

distinct eigenvalues, then the eigenvectors of L form a basis of V . The matrix of

the operator with respect to this basis is diagonal.

146

62 Python

We can ﬁnd eigenvectors in Python.

Let’s suppose I have an n n matrix M, expressed in Python as a list of lists. For

example, suppose

M =

_

_

6 4 3

4 5 2

3 2 7

_

_

= [[6,4,3],[4,5,2],[3,2,7]].

Further suppose that the matrix M = (m

ij

) is symmetric, meaning that m

ij

=

m

ji

. I’d like to compute an eigenvector of M quickly.

Question 1 Here’s a procedure that I’d like you to code in Python:

(a) Start with some vector v.

(b) Replace v with the result of applying the linear map L

M

to the vector v.

(c) Normalize v so that it has unit length.

(d) Repeat many times.

You can try print eigenvector([[6,4,3],[4,5,2],[3,2,7]]) to see what

happens in the case of the matrix above.

Solution

Python

1 def eigenvector(M):

2 # start with a random vector v

3 v = [1] * len(M[0])

4 # for many, many times

5 # replace v with Mv

6 # normalize v

7 # return v

8

9 def validator():

10 v = eigenvector([[6, 5, 5], [5, 2, 3],[5, 3, 8]])

11 if abs((v[1] / v[0]) - 0.6514182851) > 0.01:

12 return False

13 if abs((v[2] / v[0]) - 1.0603152077) > 0.01:

14 return False

15 return True

Can you use your program to ﬁnd, numerically, an eigenvector of the matrix

M?

147

63 Cayley-Hamilton theorem

Sometimes eigen-information reveals quite a bit about linear operators.

We will not be proving—or even stating!—the Cayley-Hamilton theorem

1

, but there

is one very special case which provides a nice activity. This activity will force us to

think about bases and about eigenvectors and eigenvalues.

Here’s the setup: suppose L : R

2

→R

2

is a linear map, and it has an eigenvector

u (with eigenvalue 2) and an eigenvector w (with eigenvalue 3).

Question 1 Now suppose v ∈ R

2

is some arbitrary vector. How does L(L(v))

compare to −6v + 5 L(v)?

Solution

(a) L(L(v)) = −6v + 5 L(v)

(b) L(L(v)) ,= −6v + 5 L(v)

(c) It cannot be determined from the information given.

Why is this the case?

Solution

Hint: The vectors u and w together form a basis for R

2

.

Can we write v as αu + β w?

(a) Yes.

(b) No.

What is L(v) in terms of α, β, u, and w?

Solution

(a) αL(u) + βL( w)

(b) αL( w) + βL(u)

But what is L(u)?

Solution

Hint: Question 2 Solution Remember that u is an eigenvector with eigenvalue

2.

Consequently L(u) = 2u.

(a) 2u

(b) 3u

(c) 2 w

(d) 3 w

And what is L( w)?

1

http://en.wikipedia.org/wiki/CayleyHamilton_theorem

148

63 Cayley-Hamilton theorem

Solution

Hint: Question 3 Solution Remember that w is an eigenvector with eigenvalue

3.

Consequently L( w) = 3 w.

(a) 2u

(b) 3u

(c) 2 w

(d) 3 w

Using these facts, what is L(v) in terms of α, β, u, and w?

Solution

(a) 2αu + 3β w

(b) 3αu + 2β w

(c) 2α w + 3β u

(d) 3α w + 2β u

Solution

Hint: Question 4 Solution Using linearity of L, what is L(L(v))?

(a) 2αL(u) + 3β L( w)

(b) 3αL(u) + 2β L( w)

(c) 2αL( w) + 3β L(u)

(d) 3αL( w) + 2β L(u)

Hint: Question 5 Solution But what is L(u)?

(a) L(u) = 2u

(b) L(u) = 3u

(c) L(u) = 2 w

(d) L(u) = 3 w

Hint: Question 6 Solution And what is L( w)?

(a) L( w) = 3 w

(b) L( w) = 2 w

(c) L( w) = 2u

(d) L( w) = 3u

149

63 Cayley-Hamilton theorem

Hint: Try substituting the facts that L(u) = 2u and L( w) = 3 w into 2αL(u) +

3β L( w).

What is L(L(v))?

(a) 4αu + 9β w

(b) 4α w + 9β u

(c) 9αu + 4β w

(d) 9α w + 4β u

What is −6v + 5 L(v) in terms of α, β, u, and w?

Solution

Hint: Earlier we wrote v = αu + β w.

Hint: Since L is a linear map, we have L(v) = αL(u) + βL( w).

(a) −6 (αu + β w) + 5αL(u) + 5βL( w)

(b) −6 (αu + β w) + 5βL(u) + 5αL( w)

(c) −6 (αu + β w) + 3αL(u) + 3βL( w)

(d) −6 (αu + β w) + 3βL(u) + 3αL( w)

Solution

Hint: Question 7 Solution But what is L(u)?

(a) L(u) = 2u

(b) L(u) = 3u

(c) L(u) = 2 w

(d) L(u) = 3 w

Hint: Question 8 Solution And what is L( w))?

(a) L( w) = 3 w

(b) L( w) = 2 w

(c) L( w) = 2u

(d) L( w) = 3u

Hint: Try substituting the facts that L(u) = 2u and L( w) = 3 w into −6 (αu + β w) +

5αL(u) + 5βL( w).

Hint: Then we get −6αu −6β w + (5 2)αu + (5 3)β w.

Hint: But −6 + 10 = 4 and −6 + 15 = 9.

150

63 Cayley-Hamilton theorem

Hint: Consequently, this simpliﬁes to 4αu + 9β w.

Now write −6v + 5 L(v) but without referring to L.

(a) 4αu + 9β w

(b) 4α w + 9β u

(c) 9αu + 4β w

(d) 9α w + 4β u

And so, after all this, we see that L(L(v)) = −6v + 5 L(v).

What happens if you try this in higher dimensions? Suppose you have a map

L : R

3

→ R

3

and it has three eigenvectors with three diﬀerent eigenvalues. Can

you rewrite L(L(L(v))) in terms of v and L(v) and L(L(v)) in that case?

151

64 Bilinear maps

Bilinear maps are linear in two vector variables separately.

Deﬁnition 1 Let V, W and U be vector spaces. A bilinear map B : V W →U

is a function of two vector variables which is linear in each variable separately. That

is

Additivity in the ﬁrst slot For all v

1

, v

2

∈ V and all w ∈ W, we have B(v

1

+

v

2

, w) = B(v

1

, w) + B(v

2

, w)

Additivity in the second slot For all v ∈ V and all w

1

, w

2

∈ W, we have

B(v, w

1

+ w

2

) = B(v, w

1

) + B(v, w

2

).

Scaling in each slot For all c ∈ R and all v inV and all w ∈ W, we have

B(cv, w) = B(v, c w) = cB(v, w).

A bilinear map from V V → R is called a bilinear form on V . We will

mostly be focusing on bilinear forms on R

n

, but we will sometimes need to work

with more general bilinear maps.

Example 2 The map B : R

n

R

n

→ R given by B(v, w) = v w is a bilinear

form, since we conﬁrmed that the dot product has these properties immediately

after deﬁning the dot product.

Question 3 R

n

R

m

can be identiﬁed with R

n+m

. Is a bilinear map R

n

R

m

→

R

k

linear when viewed as a map from R

n+m

→R

k

?

Solution

(a) No.

(b) Yes.

You are correct: a bilinear map R

n

R

m

→R

k

is not necessarily a linear map

when we identify R

n

R

m

with R

n+m

. Why? What is an example? For example,

the dot product dot : R

2

R

2

→ R deﬁned by B(

_

x

y

_

,

_

z

t

_

) = xz + yt is bilinear,

but it is certainly not a linear map from R

4

→R.

Question 4 Let B : R

2

R

3

→ R be a bilinear mapping, and you know the

following values of B:

• B

_

_

_

1

0

_

,

_

_

1

0

0

_

_

_

_

= 2

• B

_

_

_

1

0

_

,

_

_

0

1

0

_

_

_

_

= 1

• B

_

_

_

1

0

_

,

_

_

0

0

1

_

_

_

_

= −3

152

64 Bilinear maps

• B

_

_

_

0

1

_

,

_

_

1

0

0

_

_

_

_

= 2

• B

_

_

_

0

1

_

,

_

_

0

1

0

_

_

_

_

= 5

• B

_

_

_

0

1

_

,

_

_

0

0

1

_

_

_

_

= 4

What is B

_

_

_

3

2

_

,

_

_

4

2

1

_

_

_

_

?

Solution

Hint: We need to use the linearity in each slot to break this down into a computation

involving only the basis vectors.

Hint:

B

_

_

_

3

2

_

,

_

_

4

2

1

_

_

_

_

= B

_

_

_

3

0

_

+

_

0

2

_

,

_

_

4

2

1

_

_

_

_

= B

_

_

_

3

0

_

,

_

_

4

2

1

_

_

_

_

+ B

_

_

_

0

2

_

,

_

_

4

2

1

_

_

_

_

Hint:

= 3B

_

_

_

1

0

_

,

_

_

4

2

1

_

_

_

_

+ 2B

_

_

_

0

1

_

,

_

_

4

2

1

_

_

_

_

= 3B

_

_

_

1

0

_

,

_

_

4

2

1

_

_

_

_

+ 2B

_

_

_

0

1

_

,

_

_

4

2

1

_

_

_

_

= 3

_

_

4B

_

_

_

1

0

_

,

_

_

1

0

0

_

_

_

_

+ 2B

_

_

_

1

0

_

,

_

_

0

1

0

_

_

_

_

+ B

_

_

_

1

0

_

,

_

_

0

0

1

_

_

_

_

_

_

+ 2

_

_

4B

_

_

_

0

1

_

,

_

_

1

0

0

_

_

_

_

+ 2B

_

_

_

0

1

_

,

_

_

0

1

0

_

_

_

_

+ B

_

_

_

0

1

_

,

_

_

0

0

1

_

_

_

_

_

_

Hint:

= 3 (4(2) + 2(1) + 1(−3)) + 2 (4(2) + 2(5) + 1(4))

= 21 + 44

= 65

153

64 Bilinear maps

B

_

_

_

3

2

_

,

_

_

4

2

1

_

_

_

_

= 65

Question 5 Hint: If we set L(x) = B(x, 3), then L should be a linear map R →R.

Hint: But a linear map R → R is just multiplication, so B(x, 3) = αx for some

number α.

Hint: But a bilinear map is linear in both variables, so B(17, y) = βy for some number

β.

Hint: So one way to get a bilinear map would be to set B(x, y) = 10xy. You can

enter this as 10 * x * y.

Hint: Can you think of other examples?

Hint: Sure! Another way to get a bilinear map would be to set B(x, y) = 13xy. You

can enter this as 13 * x * y.

Hint: In general, if B : R R → R is bilinear, then it must be B(x, y) = λxy for

some λ ∈ R.

Write a nonzero bilinear map B : R R →R.

Solution B(x, y) =

154

65 Tensor products

Bilinear forms comprise a vector space of tensors.

The set of all bilinear maps from V W → R has the structure of a vector space:

we can add such maps, and multiply them by scalars.

Deﬁnition 1 We deﬁne V

∗

⊗W

∗

to be the vector space of all bilinear maps from

V W →R.

This is the tensor product of the dual spaces V

∗

and W

∗

.

Hopefully the reason for the duality involved in the deﬁnition above will become

clear shortly.

Given covectors S : V → R and T : W → R, their tensor product is the map

S ⊗T : V W →R given by the rule S ⊗T(v, w) = S(v)T( w)

Warning 2 This formula involves the product of S(v) and T( w) as real numbers.

Question 3 S ⊗T is a function from V W to R. Is it bilinear?

Solution

(a) Yes.

(b) No.

Let’s prove it!

First lets check additivity in the ﬁrst slot:

(S ⊗T)( v

1

+ v

2

, w) = S(v

1

+v

2

)T( w)

= (S(v

1

) + S(v

2

)) T( w)

= S(v

1

)T( w) + S(v

2

)T( w)

= (S ⊗T)(v

1

, w) + (S ⊗T)(v

2

, w)

Proving additivity in the second slot is similar.

Lets check scaling in the ﬁrst slot:

(S ⊗T)(c v

1

, w) = S(cv)T( w)

= cS(v)T( w)

= c(S ⊗T)(v, w)

Proving scaling in the other slot is similar.

So S ⊗T really is bilinear!

155

66 Some nice covectors

Bilinear forms can be written in terms of particularly nice covectors.

Let us recall some notation from the section on derivatives.

Let e

i

be the standard basis for R

n

. The covector e

i

: R

n

→R is

e

i

(v) = ¸e

i

, v¸.

Question 1 We can build more complicated examples, too. Suppose L : R

5

→R

is the covector given by

L = 4e

3

−2e

4

+e

5

.

Solution

Hint: Set v =

_

_

_

_

_

_

1

4

2

3

5

_

¸

¸

¸

¸

_

. We are considering L(v).

Hint: Then L(v) = (4e

3

−2e

4

+e

5

)(v).

Hint: So L(v) = 4e

3

(v) −2e

4

(v) +e

5

(v).

Hint: Replacing e

i

(v) by ¸v, ei) yields L(v) = 4 ¸v, e3) −2 ¸v, e4) +¸v, e5).

Hint: In this case, ¸v, e3) = 2.

Hint: And ¸v, e4) = 3.

Hint: And ¸v, e5) = 5.

Hint: We conclude L(v) = 4 2 −2 3 + 5 = 8 −6 + 5 = 7.

Then L

_

_

_

_

_

_

_

_

_

_

_

_

1

4

2

3

5

_

¸

¸

¸

¸

_

_

_

_

_

_

_

= 7.

This is a special case of something quite general.

Theorem 2 Any covector R

n

→R can be written as a linear combination of the

covectors e

i

.

Why is this? Think of a covector as

Solution

156

66 Some nice covectors

(a) a row vector

_

a1 a2 an

_

.

(b) a column vector

_

_

_

_

_

a1

a2

.

.

.

an

_

¸

¸

¸

_

.

Then we can write that row vector as

a

1

_

1 0 0

¸

+ a

2

_

0 1 0 0

¸

+ + a

n

_

0 0 1

¸

.

But those row vectors are just the duals to the standard basis, so we can write the

covector as

a

1

e

1

+ a

2

e

2

+ + a

n

e

n

.

How is this related to derivatives?

Deﬁne the coordinate functions to be π

i

: R

n

→R given by π

i

(x

1

, x

2

, x

3

, . . . , x

n

) =

x

i

.

What is the derivative of π

i

at any point p in R

n

?

Solution

Hint: This is a special case of a general theorem.

Theorem 3 The derivative of a linear map (at any point) is the same linear map.

Hint: In this case, πi is a linear map.

Hint: So Dπi(p) = πi.

Hint: But another way to write πi is e

i

.

(a) Dπi(p) = e

i

.

(b) Dπi(p) = ei.

(c) Dπi(p) =

0.

As a result of this, we will often write dx

i

as a more suggestive notation for the

covector with the rather more cumbersome name e

i

. Let’s do some calculations

with this new notation.

Solution

Hint: dx2 will select the second entry of any vector

Hint: dx2(

_

3, 6, 4

_

) = 6

dx2(

_

3, 6, 4

_

) = 6

We can also consider the tensor product of these covectors.

Solution

157

66 Some nice covectors

Hint: dx1 ⊗dx2

_

_

_

_

2

5

3

_

_

,

_

_

7

9

4

_

_

_

_

= dx1

_

_

_

_

2

5

3

_

_

_

_

dx2

_

_

_

_

7

9

4

_

_

_

_

Hint:

= 2(9)

= 18

dx1 ⊗dx2(

_

_

2

5

3

_

_

,

_

_

7

9

4

_

_

) =18

Prove that the set of bilinear forms ¦dx

i

⊗ dy

j

: 0 ≤ i ≤ n and 0 ≤ j ≤ m¦

forms a basis for the space (R

n

)

∗

⊗(R

m

)

∗

Warning 4 One of your greatest challenges here will be dealing with the all of

the indexes!

Let B : R

n

R

m

→ R be a bilinear map. Let x =

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

and y =

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

m

_

¸

¸

¸

_

. Then

we can write

B(x, y) = B

_

_

_

_

_

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

,

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

m

_

¸

¸

¸

_

_

_

_

_

_

=

j=n

j=1

x

j

B

_

_

_

_

_

e

j

,

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

m

_

¸

¸

¸

_

_

_

_

_

_

=

j=n

j=1

i=m

i=1

x

j

y

i

B (e

j

, e

i

)

=

j=n

j=1

i=m

i=1

B (e

j

, e

i

) dx

i

⊗dy

j

(x, y)

So B =

j=n

j=1

i=m

i=1

B (e

j

, e

i

) dx

i

⊗ dy

j

. This shows that the dx

i

⊗ dy

j

span all of

(R

n

)

∗

⊗(R

m

)

∗

.

To see that the dx

i

⊗ dy

j

are linearly independent, simply observe that if

j=n

j=1

i=m

i=1

a

i,J

dx

i

⊗dy

j

= 0, then in particular

j=n

j=1

i=m

i=1

a

i,J

dx

i

⊗dy

j

(e

i

, e

j

) = 0, which

implies that a

i,j

= 0 for all i, j.

158

66 Some nice covectors

Example 5 The dot product on R

2

is given by the expression dx

1

⊗dy

1

+dx

2

⊗dy

2

Example 6 Let R

4

have coordinates (t, x, y, z). The bilinear form η = −dt ⊗

dt + dx ⊗ dx + dy ⊗ dy + dz ⊗ dz on R

4

is incredibly important to physics. It is

called the Minkowski inner product

1

, and is one of the basic structures underlying

the local geometry of our universe.

1

http://en.wikipedia.org/wiki/Minkowski_space

159

67 A basis for forms

A basis for the space of bilinear forms consists of tensors of coordinate functions.

Prove that the set of bilinear forms ¦dx

i

⊗dy

j

: 1 ≤ i ≤ n and 1 ≤ j ≤ m¦ forms a

basis for the space (R

n

)

∗

⊗(R

m

)

∗

.

Warning 1 One of your greatest challenges here will be dealing with the all of

the indexes.

Let B : R

n

R

m

→ R be a bilinear map. Let x =

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

and y =

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

m

_

¸

¸

¸

_

. Then

we can write

B(x, y) = B

_

_

_

_

_

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

,

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

m

_

¸

¸

¸

_

_

_

_

_

_

=

j=n

j=1

x

j

B

_

_

_

_

_

e

j

,

_

¸

¸

¸

_

y

1

y

2

.

.

.

y

m

_

¸

¸

¸

_

_

_

_

_

_

=

j=n

j=1

i=m

i=1

x

j

y

i

B (e

j

, e

i

)

=

j=n

j=1

i=m

i=1

B (e

j

, e

i

) dx

i

⊗dy

j

(x, y)

So B =

j=n

j=1

i=m

i=1

B (e

j

, e

i

) dx

i

⊗ dy

j

. This shows that the dx

i

⊗ dy

j

span all of

(R

n

)

∗

⊗(R

m

)

∗

.

To see that the dx

i

⊗ dy

j

are linearly independent, simply observe that if

j=n

j=1

i=m

i=1

a

i,J

dx

i

⊗dy

j

= 0, then in particular

j=n

j=1

i=m

i=1

a

i,J

dx

i

⊗dy

j

(e

i

, e

j

) = 0, which

implies that a

i,j

= 0 for all i, j.

Example 2 The dot product on R

2

is given by the expression dx

1

⊗dy

1

+dx

2

⊗dy

2

Question 3 Can you write dx

1

⊗ dy

1

+ dx

2

⊗ dy

2

as α ⊗ β for some covectors

α, β : R

2

→R?

Solution

(a) No.

(b) Yes.

160

67 A basis for forms

Why not? Suppose this were possible. By the rank-nullity theorem, there must

be some nonzero vector v which is in the kernel of α. That is, there is some nonzero

vector v ∈ R

2

so that α(v) = 0.

But ¸v, v¸ , = 0, so (α ⊗β)(v, v) ,= 0.

On the other hand, (α ⊗β)(v, v) = α(v) β(v) = 0, which is a contradiction.

Deﬁnition 4 Bilinear forms which can be written as α ⊗β are pure tensors.

So what we have shown here is that not all bilinear forms are pure tensors.

Example 5 Let R

4

have coordinates (t, x, y, z). The bilinear form η = −dt ⊗

dt + dx ⊗dx + dy ⊗dy + dz ⊗dz on R

4

is the Minkowski inner product.

The Minkowski inner product

1

is one of the basic structures underlying the local

geometry of our universe.

1

http://en.wikipedia.org/wiki/Minkowski_space

161

68 Python

Build some bilinear maps in Python.

Question 1 Suppose v and w are both vectors in R

4

, represented in Python as

two lists of four real numbers called v and w. Build a Python function B which

represents some bilinear form B : R

4

R

4

→R.

Solution

Hint: For example, you could try returning 17 * v[0] * w[3].

Python

1 def B(v,w):

2 return # the real number B(v,w)

3

4 def validator():

5 if B([4,2,3,4],[6,5,4,3]) + B([6,2,3,4],[6,5,4,3]) != B([10,4,6,8],[6,5,4,3]):

6 return False

7

8 if B([1,2,3,4],[6,5,4,3]) + B([1,2,3,4],[6,3,4,3]) != B([1,2,3,4],[12,8,8,6]):

9 return False

10

11 if 2*B([1,2,3,4],[6,5,4,3]) != B([2,4,6,8],[6,5,4,3]):

12 return False

13

14 if 2*B([1,2,3,4],[6,5,4,3]) != B([1,2,3,4],[12,10,8,6]):

15 return False

16

17 return True

Now let’s write a Python function tensor which takes two covectors α and β,

and returns their tensor product α ⊗β.

Solution

Hint: The returned function should take two parameters (say v and w) and output

α(v) β( w).

Hint: Speciﬁcally, you could try return lambda v,w: alpha(v) * beta(w)

Python

1 def tensor(alpha,beta):

2 return # the bilinear form alpha tensor beta

3

4 def validator():

5 return tensor(lambda x: 4*x[0] + 5*x[1], lambda y: 2*y[0] - 3*y[1])([1,3],[4,5]) == -133

162

69 Linear maps and bilinear forms

Associated to a bilinear form is a linear map.

It turns out that we will be able to use the inner product on R

n

to rewrite any

bilinear form on R

n

in a special form.

Given a bilinear map B : V W → R, we obtain a new map B(, w) : V → R

for each vector w ∈ W. B(, w) is linear, since by deﬁnition of bilinearity it is

linear in the ﬁrst slot for a ﬁxed vector w in the second slot. Thus we have a map

Curry(B) : W →V

∗

deﬁned by Curry(B)( w) = B(, w).

If V and W are Euclidean spaces, then we have that every bilinear map R

n

R

m

→ R gives rise to a map R

m

→ (R

n

)

∗

. But every element of ω ∈ (R

n

)

∗

is just

a row vector, and so can be represented as the dot product against the vector ω

.

Thus we obtain a map L

B

: R

m

→R

n

deﬁned by L

B

( w) = B(, w)

. This is called

the linear map associated to the bilinear form. We also call the matrix of

L

B

the matrix of B.

Computing some examples will make these deﬁnitions more concrete in our

minds.

Question 1 Let B : R

2

R

3

→ R be a bilinear mapping, and suppose we have

the following values of B.

• B

_

_

_

1

0

_

,

_

_

1

0

0

_

_

_

_

= 2

• B

_

_

_

1

0

_

,

_

_

0

1

0

_

_

_

_

= 1

• B

_

_

_

1

0

_

,

_

_

0

0

1

_

_

_

_

= −3

• B

_

_

_

0

1

_

,

_

_

1

0

0

_

_

_

_

= 3

• B

_

_

_

0

1

_

,

_

_

0

1

0

_

_

_

_

= 5

• B

_

_

_

0

1

_

,

_

_

0

0

1

_

_

_

_

= 4

Solution

Hint: LB : R

3

→R

2

.

Hint: LB(e1) = B(, e1)

.

163

69 Linear maps and bilinear forms

Hint: To ﬁnd the matrix of B(, e1) : R

2

→ R, we need to see its eﬀect on basis

vectors.

B(e1, e1) = 2

B(e2, e1) = 3

so the matrix of B(, e1) is

_

2 3

_

Hint: Thus LB(e1) = B(, e1)

=

_

2

3

_

Hint: Similarly, LB(e2) =

_

1

5

_

and LB(e3) =

_

−3

4

_

Hint: Thus the matrix of LB is

_

2 1 −3

3 5 4

_

What is the matrix of LB?

Question 2 If B : R

3

R

3

→ R is a bilinear map, and the matrix of B is

_

_

2 3 1

−2 1 5

3 2 1

_

_

Solution

Hint: By deﬁnition, B

_

_

_

_

1

2

0

_

_

,

_

_

0

0

1

_

_

_

_

=

_

_

1

2

0

_

_

LB(

_

_

0

0

1

_

_

)

Hint: Thus B

_

_

_

_

1

2

0

_

_

,

_

_

0

0

1

_

_

_

_

=

_

_

1

2

0

_

_

_

_

1

5

1

_

_

= 11

Then B

_

_

_

_

1

2

0

_

_

,

_

_

0

0

1

_

_

_

_

= 11.

Show that the matrix of the bilinear form

a

i,j

dx

i

⊗dx

j

is the matrix (a

i,j

).

Let M be the matrix of L

B

.

Following the same line of reasoning as in a previous activity

1

, we know that

M

i,j

= e

i

L

B

(e

j

). But by deﬁnition, this is B(e

i

, e

j

), which plainly evaluates to

a

i,j

. The claim is proven.

To every linear map L : R

m

→R

n

we also obtain a bilinear map B

L

: R

n

R

m

→

R deﬁned by B

L

(v, w) = v

L(w).

1

http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week1/

inner-product/multiply-dot/

164

69 Linear maps and bilinear forms

To summarize, we have a really nice story about bilinear maps B : R

n

R

m

→R:

Every single one of them can be written as B(v, w) = v

**L(w) for some unique
**

linear map L : R

m

→ R

n

. Also every linear map R

m

→ R

n

gives rise to a bilinear

form by deﬁning B(v, w) = v

**L(w). On the level of matrices, we just have that
**

B(v, w) = v

**Mw where M is the matrix of the linear map L
**

B

. We will sometimes

say talk about “using a matrix as a bilinear form:” this is what we mean by that.

This will be very important to us when we start talking about the second derivative.

In this activity we have shown that for bilinear maps R

n

R

m

→ R, there is

a useful notion of a linear map R

m

→ R

n

associated to it. If the codomain of the

original bilinear map had been anything other than R we would not have such luck:

our work depended crucially on the ability to turn covectors into vectors using the

inner product on R

n

.

165

70 Python

Use Python to ﬁnd the linear maps associated to bilinear forms.

Question 1 Suppose alpha is a covector α : R

n

→ R. Write a Python function

for converting such a covector in (R

n

)

∗

into a vector in v ∈ R

n

. More speciﬁcally,

covector to vector should take as input a Python function alpha, and return

a list of n real numbers. This list of n real numbers, when regarded as a vector

v ∈ R

n

, should have the property that α( w) = ¸v, w¸.

Solution

Hint: You can determine what vi must be by consider α(ei).

Hint: Deﬁne e(i) by [0] * i + [1] + [0] * (n-i-1).

Hint: In other words, e = lambda i: [0] * i + [1] + [0] * (n-i-1).

Hint: Then α(ei) is alpha(e(i)).

Hint: So to form a vector, we need only use [alpha(e(i)) for i in range(0,n)].

Python

1 n = 4

2 def covector_to_vector(alpha):

3 return # a vector v so that alpha(w) = v dot w

4

5 def validator():

6 if covector_to_vector(lambda x: 3*x[0] + 2*x[1] - 17*x[2] + 30*x[3])[1] != 2:

7 return False

8 if covector_to_vector(lambda x: 3*x[0] + 2*x[1] - 17*x[2] + 30*x[3])[2] != -17:

9 return False

10 if covector_to_vector(lambda x: 3*x[0] + 2*x[1] - 17*x[2] + 30*x[3])[3] != 30:

11 return False

12 return True

Suppose B is a bilinear form B : R

n

R

m

→ R. Write a Python function for

taking such a bilinear form, and producing the corresponding linear map L

B

.

Recall that we encode a linear map R

m

→ R

n

in Python as regular Python

function which takes as input a list of m real numbers, and outputs a list of n real

numbers.

Solution

Hint: You may want to make use of covector to vector; copy the code from above

and paste it into the box below.

Hint: We deﬁned LB( w) = B(, w)

.

166

70 Python

Hint: In other words, LB sends a vector w to the vector corresponding to the covector

x → B(x, w).

Hint: So we should return a linear map sending w to the covector to vector applied

to x → B(x, w).

Hint: So we should return lambda w: covector to vector(lambda x: B(x,w)).

Python

1 n = 4

2 m = 3

3 def bilinear_to_linear(B):

4 return # the associated linear map L_B

5

6 def validator():

7 if bilinear_to_linear(lambda x,y: 7 * x[0] * y[1] + 3*x[1]*y[2])([3,5,7])[1] != 21:

8 return False

9 if bilinear_to_linear(lambda x,y: 7 * x[0] * y[1] + 3*x[1]*y[2])([3,5,7])[0] != 35:

10 return False

11 return True

These are again examples of “higher-order functions.” Keeping track of dual

spaces and thinking about operations which transform bilinear maps into linear

maps are two examples of where such “higher-order thinking” comes in handy.

167

71 Adjoints

Adjoints formalize the transpose.

Let L : R

m

→ R

n

be a linear map. Then there is an associated bilinear form

B

L

: R

n

R

m

→R given by B

L

(v, w) = ¸v, L( w)¸.

One thing we can do to a bilinear map is swap the two inputs, namely, we can

build B

∗

L

: R

m

R

n

→R by the rule B

∗

L

( w, v) = B

L

(v, w).

And with this “swapped” bilinear map, we can go back and recover an associated

linear map L

B

∗

L

: R

n

→R

m

.

Question 1 The domain of L

B

∗

L

is the same as

Solution

(a) the codomain of L.

(b) the domain of L.

Right! The minor surprise is that L went from R

m

to R

n

, but L

B

∗

L

went “the

other way” from R

n

to R

m

.

Deﬁnition 2 If L : R

m

→ R

n

is a linear map, the adjoint of L is the map

L

B

∗

L

: R

n

→R

m

. We usually write L

∗

for the adjoint of L.

Let L : R

m

→ R

n

be a linear map. Show that ¸v, L( w)¸ = ¸L

∗

(v), w¸ for every

v ∈ R

n

and w ∈ R

m

¸v, L( w)¸ = B

L

(v, w)

= B

∗

L

( w, v)

= ¸ w, L

∗

(v)¸

= ¸L

∗

(v), w¸

Question 3 Let’s work through an example. Let L : R

3

→R

2

be the linear map

represented by the matrix

_

3 2 1

−4 2 9

_

with respect to the standard basis.

Solution

Hint: L(e1) =

_

3

−4

_

Hint: ¸e2, L(e1)) = ¸e2,

_

3

−4

_

)

Hint: ¸e2,

_

3

−4

_

) = −4.

¸e2, L(e1)) = −4

168

71 Adjoints

Recall that ¸v, L( w)¸ = ¸L

∗

(v), w¸. Consequently, setting v = e

2

and w = e

1

,

we have ¸L

∗

(e

2

), e

1

¸ is also −4.

Let’s write (

ij

) for the entries of the matrix for L, and (

∗

ij

) for the entries of

the matrix for L

∗

. The fact that ¸e

2

, L(e

1

)¸ = −4 amounts to saying

2,1

= −4,

and then since ¸L

∗

(e

2

), e

1

¸ = −4, we have that

∗

1,2

= −4.

Solution

Hint: The matrix of the adjoint of a linear map is the transpose of the matrix of that

linear map.

Hint: The matrix of L

∗

is

_

_

3 −2

2 2

1 9

_

_

What is the matrix of L

∗

?

What do you notice about these entries?

Solution

(a) ij =

∗

ji

(b) ij =

∗

ij

Let’s summarize this fact as a theorem.

Theorem 4 Let L : R

n

→R

m

be a linear map. If L has matrix M with respect

to the standard basis, then L

∗

has matrix M

.

Recall that M

**is the transpose of M, meaning that M
**

ij

= M

ji

.

It is your turn to prove this theorem.

Let’s use the fundamental insight from this activity

1

.

Let the matrix of L

∗

be called M

∗

for now. To ﬁnd the entry in the i

th

row and

j

th

column of M

∗

, we just compute

M

i,j

= e

i

M

∗

(e

j

)

= e

i

L

∗

(e

j

)

= ¸e

i

, L

∗

(e

j

)¸

= ¸L(e

i

), e

j

¸

= e

j

L(e

i

)

= e

j

M(e

i

)

= M

j,i

So the entry in the i

th

row and j

th

column of M

∗

is the entry in the j

th

row

and i

th

column of M. Thus M

∗

= M

.

Here, ﬁnally, is a question for you to ponder: why are we bothering about adjoints

of linear maps if we have transposes of matrices?

1

http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week1/

inner-product/multiply-dot/

169

72 Spectrum of the adjoint

Taking adjoints doesn’t aﬀect the spectrum.

The set of eigenvalues of a linear operator—what we call the spectrum of the linear

operator—is of fundamental importance. Taking adjoints is one way to build a new

linear operator from an old linear operator. Fusing these two ideas together results

in a question: how does the spectrum of L relate to the spectrum of its adjoint, L

∗

?

Surprisingly, the spectrum of L is the same as the spectrum of L

∗

.

Let’s get started: show that if v is a nonzero eigenvector of L : R

n

→R

n

, with

eigenvalue λ then there is a nonzero eigenvector u of L

∗

with eigenvalue λ.

Warning 1 This is a very hard problem.

Hint: Consider the map S : R

n

→R

n

given by S(v) = L

∗

(v) −λv, or in other words

S = L

∗

−λI. Showing that this map has a nontrivial kernel is the same as showing that

L

∗

has λ as an eigenvector.

Hint: Notice that L −λI is adjoint to S = L

∗

−λI

Hint: For all w ∈ R

n

, we have ¸S( w), v) = ¸ w, L(v) −λv) = 0

Hint: Thus S( w) is in the subspace of vectors perpendicular to the eigenvector v,

which we denote v

⊥

.

Hint: Thus we have that Im(S) ⊂ v

⊥

. This implies that dim(Im(S)) ≤ n −1

Hint: By the rank nullity theorem, we have that dim(ker(S)) ≥ 1

Hint: So S has a nontrivial kernel, so L

∗

has a nonzero eigenvector u with eigenvalue

λ.

Consider the map S : R

n

→ R

n

given by S(v) = L

∗

(v) − λv. Showing that this

map has a nontrivial kernel is the same as showing that L

∗

has λ as an eigenvector.

Notice that L −λI is adjoint to S = L

∗

−λI

For all w ∈ R

n

, we have ¸S( w), v¸ = ¸ w, L(v) −λv¸ = 0

Thus S( w) is in the subspace of vectors perpendicular to the eigenvector v,

which we denote v

⊥

.

Thus we have that Im(S) ⊂ v

⊥

. This implies that dim(Im(S)) ≤ n −1

By the rank nullity theorem, we have that dim(ker(S)) ≥ 1

So S has a nontrivial kernel, so L

∗

has a nonzero eigenvector u with eigenvalue

λ.

170

73 Self-adjoint maps

Linear maps which equal their own adjoint are important

Deﬁnition 1 A linear operator L : R

n

→R

n

is self-adjoint if L = L

∗

.

These ideas also pop up when considering bilinear forms.

Deﬁnition 2 A bilinear form on R

n

is symmetric if for all v, w ∈ R

n

we have

B(v, w) = B( w, v).

Question 3 Consider the bilinear form B : R

n

R

n

→ R given by B(v, w) =

¸v, w¸. In other words, B is just the inner product. Is B symmetric?

Solution

(a) Yes.

(b) No.

That’s right!

Example 4 The dot product on R

n

is a symmetric bilinear form, since we have

already shown that v w = w v.

Which of the following bilinear forms on R

2

are symmetric?

Solution

Hint: B

__

x1, x2

_

,

_

y1, y2

__

= x1y2 + 3x2y1 is not symmetric since, for example,

B

__

1

0

_

,

_

0

1

__

= 1, but B

__

0

1

_

,

_

1

0

__

= 3.

Hint: B

__

x1, x2

_

,

_

y1, y2

__

= x1y1 +5x2y2 +x1y2 is not symmetric since, for example,

B

__

1, 0

_

,

_

0, 1

__

= 1 but B

__

0

1

_

,

_

1

0

__

= 0

Hint: B

__

x1, x2

_

,

_

y1, y2

__

= 2x1y1+4x2y2 is symmetric, since B

__

x1, x2

_

,

_

y1, y2

__

=

2x1y1 + 4x2y2 = B

__

y1, y2

_

,

_

x1, x2

__

(a) B

__

x1, x2

_

,

_

y1, y2

__

= x1y2 + 3x2y1

(b) B

__

x1, x2

_

,

_

y1, y2

__

= 2x1y1 + 4x2y2

(c) B

__

x1, x2

_

,

_

y1, y2

__

= x1y1 + 5x2y2 + x1y2

171

74 Symmetric matrices

The matrix of a self-adjoint linear map is symmetric.

The matrix of a self-adjoint operator equals its own transpose. In other words, it

is symmetric about the main diagonal.

Deﬁnition 1 A matrix which equal its transpose is a symmetric matrix.

Question 2 Let B be the symmetric bilinear form on R

2

deﬁned by B

__

x

1

x

2

_

,

_

y

1

y

2

__

=

2x

1

y

1

+4x

2

y

2

+x

1

y

2

+x

2

y

1

. What is the matrix of B? What do you notice about

this matrix?

Solution

Hint: Remember that the entry Mi,j = B(ei, ej)

Hint:

M1,1 = B(

_

1

0

_

,

_

1

0

_

) = 2

M1,2 = B(

_

1

0

_

,

_

0

1

_

) = 1

M2,1 = B(

_

0

1

_

,

_

1

0

_

) = 1

M2,2 = M(

_

0

1

_

,

_

0

1

_

) = 4

Hint: The matrix of B is

_

2 1

1 4

_

Notice that this matrix is a symmetric matrix!

Show that L is self-adjoint if and only if the bilinear form associated to it is

symmetric. If L is self-adjoint, then

B

L

(v, w) = v

L(w) = ¸v, L(w)¸

= ¸L(v), w¸

= w

L(v)

= B

L

(w, v)

So the bilinear form associated to L is symmetric.

172

74 Symmetric matrices

On the other hand, if B is a symmetric, then

¸L

B

(v), w¸ = ¸B(v, )

t

op, w¸

= B(v, )(w)

= B(v, w)

= B(w, v)

= B(w, )(v)

= ¸B(w, )

, v¸ = ¸L

B

(w), v¸

So the linear map associated with B is self-adjoint

173

75 Python

Build some self-adjoint operators in Python.

Question 1 We will represent a linear operator in Python as a function which

takes as input a list of n real numbers, and outputs a list of n real numbers.

Write down a linear operator L which is self-adjoint.

Solution

Hint: In this problem, n = 4.

Hint: So the input v will be a list of four numbers, namely v[0], v[1], v[2], and

v[3].

Hint: The output should also be a list of four numbers.

Hint: We must make sure that the resulting operator is self-adjoint, which we can

achieve if the corresponding matrix is symmetric.

Hint: Since we just need to write down one example, we could even get away with

return v, namely, the identity operator. But let’s try to be fancier!

Hint: Let’s make L into the linear operator represented by the matrix

_

_

_

_

2 3 0 0

3 4 0 0

0 0 1 0

0 0 0 1

_

¸

¸

_

.

Hint: We can achieve this with return [2*v[0] + 3*v[1], 3*v[0] + 4*v[1], v[2],

v[3]].

Python

1 n = 4

2 def L(v):

3 return # the vector L(v), but make sure that L is self-adjoint

4

5 def validator():

6 e = lambda i: [0] * i + [1] + [0] * (n-i-1)

7 for i in range(0,4):

8 for j in range(i,4):

9 if L(e(i))[j] != L(e(j))[i]:

10 return False

11 return True

Fantastic!

174

76 Deﬁniteness and the spectral theorem

“Deﬁniteness” describes what we can say about the sign of the output of a bilinear

form.

Deﬁnition 1 A bilinear form B : R

n

R

n

→R is

Positive deﬁnite if B(v, v) > 0 for all v ,=

0,

Positive semideﬁnite if B(v, v) ≥ 0 for all v,

Negative deﬁnite if B(v, v) < 0 for all v ,=

0,

Negative semideﬁnite if B(v, v) ≤ 0 for all v, and

Indeﬁnite if B there are v and w with B(v, v) > 0 and B(w, w) < 0

Let M be a diagonal matrix. In a sentence, can you relate the sign of the entries

M

i,i

to the deﬁniteness of the associated bilinear form?

Given a diagonal n n matrix M with M

i,i

= λ

i

, we see that B(x, x) =

_

x

1

x

2

.

.

. x

n

_

M

_

¸

¸

¸

_

x

1

x

2

.

.

.

x

n

_

¸

¸

¸

_

v = λ

1

x

2

1

+ λ

2

x

2

2

+ + λ

n

x

2

n

If the λ

i

are all positive, this expression is always positive whenever the x

i

are

not all 0. So the bilinear form is positive deﬁnite.

If the λ

i

are all nonnegative, this expression is always nonnegative whenever the

x

i

are not all 0. So the bilinear form is positive semideﬁnite.

If the λ

i

are all negative, this expression is always negative whenever the x

i

are

not all 0. So the bilinear form is negative deﬁnite.

If the λ

i

are all nonpositive, this expression is always nonpositive whenever the

x

i

are not all 0. So the bilinear form is negative semideﬁnite.

If the λ

i

> 0 and λ

j

< 0 for some i, j ≤ n, then B(e

i

, e

i

) = λ

i

> 0 and

B(e

j

, e

j

) = λ

j

< 0, so the bilinear form is indeﬁnite.

Our goal will now be to reduce the study of general symmetric bilinear forms

to those whose associated matrix is diagonal.

Let L : R

n

→ R

n

be a self adjoint linear operator. Prove that if v

1

and v

2

are

eigenvectors with distinct eigenvalues λ

1

and λ

2

, then v

1

⊥ v

2

.

¸L(v

1

), v

2

¸ = ¸v

1

, L(v

2

)¸

¸λ

1

v

1

, v

2

¸ = ¸v

1

, λ

2

v

2

¸

(λ

1

−λ

2

)¸v

1

, v

2

¸ = 0

¸v

1

, v

2

¸ = 0

Let L : R

n

→ R

n

be a self adjoint linear operator. Let v be an eigenvector of

L. Prove that L restricts to a self adjoint linear operator on the space of vectors

perpendicular to v, v

⊥

.

All we need to show is that w ⊥ v implies L(w) ⊥ v.

175

76 Deﬁniteness and the spectral theorem

¸L(w), v¸ = ¸w, L(v)¸

= ¸w, λv¸

= λ¸w, v¸

= 0

so we are done!

Theorem 2 If L : R

n

→ R

n

is a self adjoint linear operator, then L has an

eigenvector.

Proof First assume that L is not the identically 0 map. If it is, we are done

because 0 is an eigenvector in that case.

Since the unit sphere in R

n

is compact

1

, the function v → [L(v)[ achieves its

maximum M. So there is a unit vector u so that [L(u)[ = M, and [L(v)[ ≤ M for

all other unit vectors v. M > 0 because L ,= 0.

Now let w = L(u)/M. This is another unit vector.

Note that ¸w, L(u)¸ = M, so we also have ¸L(w), u¸ = M, since L is self adjoint.

¸L( w), u¸ ≤ [L( w)[[u[ with equality if and only if L( w) ∈ span(u) by Cauchy-

Schwarz.

But [L( w)[[u[ = [L( w)[ because u is a unit vector!

Since M is the maximum value of [L(u)[ over all unit vectors u, we must have

L( w) ∈ span(u)

We can conclude that L( w) = Mu.

Now either u + w ,= 0 or u − w ,= 0 . In either case,

L(u ± w) = Mu ±M w, so the nonzero v ± w is an eigenvector of L.

Credit for this beautiful line of reasoning goes to Marcos Cossarini

2

. Most proofs

of this theorem use either Lagrange Multipliers (which we will learn about soon),

or complex analysis. Here we use only linear algebra along with the one analytic

fact that a continuous function on a compact set achieves its maximum value.

We can combine the previous theorem and question to prove the “Spectral

Theorem for Real Self-adjoint Operators”:

Theorem 3 A self adjoint operator L : R

n

→ R

n

has an orthonormal basis of

eigenvectors.

Proof L has an eigenvector v

1

. L

¸

¸

v

⊥

1

: v

⊥

1

→ v

⊥

1

is another self adjoint linear

operator and so it has an eigenvector v

2

. Continue in this way until you have

constructed all n eigenvectors. Because of how they are constructed, we have that

each one is perpendicular to all of the eigenvectors which came before it.

This, in some sense, completely answers the question of how to characterize the

deﬁniteness of a symmetric bilinear form. Look at its associated linear operator,

which must be self adjoint. By the Spectral Theorem, it has a orthonormal basis

of eigenvectors. Then

1

http://en.wikipedia.org/wiki/Compact_space

2

http://mathoverflow.net/a/118759/1106

176

76 Deﬁniteness and the spectral theorem

• B positive deﬁnite ⇐⇒ L

B

has all positive eigenvalues

• B positive semideﬁnite ⇐⇒ L

B

has all nonnegative eigenvalues

• B negative deﬁnite ⇐⇒ L

B

has all negative eigenvalues

• B negative semideﬁnite ⇐⇒ L

B

has all nonpositive eigenvalues

• B indeﬁnite ⇐⇒ L

B

has both positive and negative eigenvalues

This will be crucially important when we get to the second derivative test:

it will turn out that the second derivative is a symmetric bilinear form, and the

deﬁniteness of this bilinear form is analogous to the concavity of a function of one

variable. Identifying local maxima and minima with the “second derivative test”

will require analysis of the eigenvalues of the associated linear map.

177

77 Second order partial derivatives

Second order partial derivatives are partial derivatives of partial derivatives

Deﬁnition 1 If f : R

n

→R is a function, then

∂f

∂x

i

: R

n

→R is another function,

so we can take its partial derivative with respect to x

j

. We deﬁne the second order

partial derivative

∂

2

f

∂x

j

∂x

i

=

∂

∂x

j

∂f

∂x

i

. In the special case that i = j we will write

∂

2

f

∂x

2

i

(even though this notation is horrible, it is standard, so we will follow it).

Question 2 Let f(x, y) = x

2

y

3

Solution

Hint:

∂

2

f

∂x

2

=

∂

∂x

_

∂

∂x

(x

2

y

3

)

_

=

∂

∂x

(2xy

3

)

= 2y

3

∂

2

f

∂x

2

= 2y

3

Solution

Hint:

∂

2

f

∂x∂y

=

∂

∂x

_

∂

∂y

(x

2

y

3

)

_

=

∂

∂x

(3x

2

y

2

)

= 6xy

2

∂

2

f

∂x∂y

= 6xy

2

Solution

Hint:

∂

2

f

∂y∂x

=

∂

∂y

_

∂

∂x

(x

2

y

3

)

_

=

∂

∂y

(2xy

3

)

= 6xy

2

∂

2

f

∂y∂x

= 6xy

2

Solution

178

77 Second order partial derivatives

Hint:

∂

2

f

∂y

2

=

∂

∂y

_

∂

∂y

(x

2

y

3

)

_

=

∂

∂y

(3x

2

y

2

)

= 6x

2

y

∂

2

f

∂y

2

= 6x

2

y

Solution Did you notice how

∂

2

f

∂x∂y

=

∂

2

f

∂y∂x

? Doesn’t that ﬁll you with a sense of

wonder and mystery?

(a) Yes!

(b) No :(

Question 3 Let f(x, y) = sin(xy

2

)

Solution

Hint:

∂

2

f

∂x

2

=

∂

∂x

_

∂

∂x

sin(xy

2

)

_

=

∂

∂x

(y

2

sin(xy

2

))

= −y

4

sin(xy

2

)

∂

2

f

∂x

2

= −y

4

sin(xy

2

)

Solution

Hint:

∂

2

f

∂x∂y

=

∂

∂x

_

∂

∂y

(sin(xy

2

))

_

=

∂

∂x

(2xy cos(xy

2

))

= 2y cos(xy

2

) −2xy(y

2

) sin(xy

2

)

= 2y cos(xy

2

) −2xy

3

sin(xy

2

)

∂

2

f

∂x∂y

= 2ycos(xy

2

) −2xy

3

sin(xy

2

)

Solution

Hint:

∂

2

f

∂y∂x

=

∂

∂y

_

∂

∂x

(sin(xy

2

))

_

=

∂

∂y

(y

2

cos(xy

2

))

= 2y cos(xy

2

) −y

2

(2xy) sin(xy

2

)

= 2y cos(xy

2

) −2xy

3

sin(xy

2

)

179

77 Second order partial derivatives

∂

2

f

∂y∂x

= 2ycos(xy

2

) −2xy

3

sin(xy

2

)

Solution

Hint:

∂

2

f

∂y

2

=

∂

∂y

_

∂

∂y

(sin(xy

2

))

_

=

∂

∂y

(2xy cos(xy

2

))

= 2xcos(xy

2

) −2xy(2xy) sin(xy

2

)

= 2xcos(xy

2

) −4x

2

y

2

sin(xy

2

)

∂

2

f

∂y

2

= 2xcos(xy

2

) −4x

2

y

2

sin(xy

2

)

This lends even more evidence to the startling claim that

∂

2

f

∂x∂y

=

∂

2

f

∂y∂x

Question 4 Let f : R

3

→R be the function f(x, y, z) = x

2

yz

3

Solution

Hint:

∂

2

f

∂x∂z

=

∂

∂x

∂

∂z

x

2

yz

3

=

∂

∂x

3x

2

yz

2

= 6xyz

2

∂

2

f

∂x∂z

=6xyz

2

Solution

Hint:

∂

2

f

∂z∂x

=

∂

∂z

∂

∂x

x

2

yz

3

=

∂

∂x

2xyz

3

= 6xyz

2

∂

2

f

∂z∂x

=6xyz

2

Okay, it really really looks like

∂

2

f

∂x

i

∂x

j

=

∂

2

f

∂x

j

∂x

i

.

180

78 Mixed partials commute

Order of partial diﬀerentiation doesn’t matter

In the last section on partial derivatives we made the interesting observation that

∂

2

f

∂x

i

∂x

j

=

∂

2

f

∂x

j

∂x

i

for all of the functions we considered. We will now prove this,

modulo some technical assumptions.

Theorem 1 Let f : R

n

→R be a diﬀerentiable function. Assume that the partial

derivatives f

xi

: R

n

→ R are all diﬀerentiable, and the second partial derivatives

f

xi,xj

are continuous. Then f

xi,xj

= f

xj,xi

.

First, let’s develop some intuition about why this result is true. This informal

discussion will also suggest how we should proceed with the formal proof.

Let’s restrict our attention, for the moment, to functions g : R

2

→ R. Observe

that g

x

(a, b) ≈

g(a + h, b) −g(a, b)

h

for small values of h. Analogously, g

y

(a, b) ≈

g(a, b + k) −g(a, b)

k

.

Now applying this idea twice, we have

f

xy

(a, b) ≈

1

h

(f

y

(a + h, b) −f

y

(a, b))

≈

1

h

_

f(a + h, b + k) −f(a + h, b)

k

−

f(a, b + k) −f(a, b)

k

_

=

f(a + h, b + k) −f(a + h, b) −f(a, b + k) + f(a, b)

hk

Going through the same process with f

yx

leads to exactly the same approxima-

tion!

So our strategy of proof will be to show that we can express both of these partial

derivatives as the two variable limit:

f

xy

(a, b) = lim

h,k→0

f(a + h, b + k) −f(a + h, b) −f(a, b + k) + f(a, b)

hk

= f

yx

(a, b)

Proof Let HODQ(h, k) = f(a + h, b + k) − f(a + h, b) − f(a, b + k) + f(a, b).

(Here HODQ stands for ”higher order diﬀerence quotient”).

Let Q(s) = f(s, b + k) −f(s, b).

Then HODQ(h, k) = Q(a + h) −Q(a).

By the mean value theorem for derivatives, there is an 0 <

1

< h such that

Q(a + h) −Q(a) = hQ

(a +

1

).

So HODQ(h, k) = h(f

x

(a +

1

, b + k) −f

x

(a, b)) .

By the mean value theorem again, we have

HODQ(h, k) = hkf

yx

(a +

1

, b +

2

) for some 0 <

2

< k.

Now apply exactly the same reasoning to conclude that HODQ(h, k) = hkf

yx

(a+

ξ

2

, b + ξ

1

) for some 0 < ξ

1

< k and 0 < ξ

2

< h.

Let R(s) = f(a + h, s) −f(a, s).

181

78 Mixed partials commute

Then HODQ(h, k) = R(b + k) −R(b).

By the mean value theorem for derivatives, there is an 0 < ξ

1

< k such that

R(b + k) −R(b) = kR

(b + ξ

1

).

So HODQ(h, k) = k(f

y

(a + h, b + ξ

1

) −f

y

(a, b)) .

By the mean value theorem again, we have

HODQ(h, k) = hkf

xy

(a + ξ

2

, b + ξ

1

) for some 0 < ξ

2

< h.

So we have

lim

h,k→0

f(a + h, b + k) −f(a + h, b) −f(a, b + k) + f(a, b)

hk

= lim

h,k→0

HODQ(h, k)

hk

= lim

h,k→0

f

yx

(a +

1

, b +

2

)

But since 0 <

1

< h and 0 <

2

< k , then as h, k → 0, a +

1

→ a and

b +

2

→b. By the continuity of f

yx

, we have that the limit equals f

yx

(a, b).

Apply the same reasoning to conclude that lim

h,k→0

f(a + h, b + k) −f(a + h, b) −f(a, b + k) + f(a, b)

hk

=

f

xy

(a, b)

lim

h,k→0

f(a + h, b + k) −f(a + h, b) −f(a, b + k) + f(a, b)

hk

= lim

h,k→0

HODQ(h, k)

hk

= lim

h,k→0

f

xy

(a + ξ

2

, b + ξ

1

)

But since 0 < ξ

1

< k and 0 < ξ

2

< h , then as h, k → 0, a + ξ

2

→ a and

b + ξ

1

→b. By the continuity of f

xy

, we have that the limit equals f

xy

(a, b).

So we can conclude that f

xy

(a, b) = f

yx

(a, b), because they are the common

value of the limit lim

h,k→0

f(a + h, b + k) −f(a + h, b) −f(a, b + k) + f(a, b)

hk

.

We close with a cautionary example. This result is not always true if the second

partial derivatives are not continuous. Remember that we deﬁne

g

x

(a, b) = lim

h→0

g(a + h, b) −g(a, b)

h

, and similarly g

y

(a, b) = lim

k→0

g(a, b + k) −g(a, b)

k

Question 2 Deﬁne f(x, y) =

_

_

_

xy

x

2

−y

2

x

2

+ y

2

if (x, y) ,= (0, 0)

0 if (x,y)=(0,0)

Solution

Hint: Question 3 Solution

Hint:

fy(x, 0) = lim

k→0

f(x, k) −f(x, 0)

k

= lim

k→0

xk

x

2

−k

2

x

2

+k

2

−0

k

= lim

k→0

x

x

2

−k

2

x

2

+ k

2

= x

182

78 Mixed partials commute

fy(x, 0) =x

Solution

Hint:

fx(0, y) = lim

h→0

f(h, y) −f(0, y)

h

= lim

h→0

hy

h

2

−y

2

h

2

+y

2

−0

h

= lim

h→0

y

h

2

−y

2

h

2

+ y

2

= −y

fx(0, y) =−y

Hint:

fxy(0, 0) = lim

h→0

fy(h, 0) −fy(0, 0)

h

= lim

h→0

h −0

h

= 1

Hint:

fyx(0, 0) = lim

k→0

fx(0, k) −fx(0, 0)

k

= lim

h→0

−k −0

k

= −1

fxy(0, 0) =1

Solution fyx(0, 0) =−1

183

79 Second derivative

The second derivative records how small changes aﬀect the derivative.

The derivative allowed us to ﬁnd the best linear approximation to a function at a

point. But how do these linear approximations change as we move from point to

nearby point? That is exactly what the second derivative is aiming for.

184

80 Intuitively

The second derivative is a bilinear form.

From our perspective, the second derivative of a function f : R

n

→ R at a point

will be a bilinear form on R

n

. Let us take some time to understand, intuitively,

why that should be the case.

Let f : R

2

→R be deﬁned by f(x, y) = x

2

y.

D(f)

¸

¸

(x,y)

is the linear map given by the matrix

_

2xy x

2

¸

. That is to say,

D(f)

¸

¸

(x,y)

(

_

∆x

∆y

_

) = 2xy∆x + x

2

∆y ≈ f(x + ∆x, y + ∆y) −f(x, y).

The second derivative should now tell you how much the derivative changes

from point to point. If we increment (x, y) by a little bit to (x + ∆x, y) then we

should expect the derivative to increase by about

_

∆x

∂

∂x

(2xy) ∆x

∂

∂x

(x

2

)

_

=

_

2y∆x 2x∆x

¸

. Similarly, when we increase y by ∆y, the derivative should change

by about

_

∆y

∂

∂y

(2xy) ∆y

∂

∂y

(x

2

)

_

=

_

2x∆y 0∆y

¸

.

By linearity, if we change from (x, y) to (x+∆x, y+∆y), we expect the derivative

to change by

_

2y∆x + 2x∆y 2x∆x + 0

¸

=

_

∆x ∆y

¸

_

2y 2x

2x 0

_

This gives a matrix which is the approximate change in the derivative. You can

then apply this to another vector if you so wish.

Summing it up, if you wanted to see approximately how much the derivative

changes from p = (x, y) to (x+∆x

2

, y+∆y

2

) = p+

h

2

(

h

2

=

_

∆x

2

∆y

2

_

) when both are

evaluated in the same direction

h

1

=

_

∆x

1

∆y

1

_

, you would perform the computation:

Df

p+

h2

(

h

1

) −Df

p

(

h

1

) ≈

_

∆x

2

∆y

2

¸

_

2x 2x

2x 0

_ _

∆x

1

∆y

1

_

This is exactly using the matrix

_

2x 2x

2x 0

_

as a bilinear form applied to the two

vectors

h

1

=

_

∆x

1

∆y

1

_

and

h

2

=

_

∆x

2

∆y

2

_

.

With all of this as motivation, we make the following wishy washy ”deﬁnition”

Deﬁnition 1 The second derivative of a function f : R

n

→R at a point p ∈ R

n

is a bilinear form D

2

f

¸

¸

p

: R

n

R

n

→ R enjoying the following approximation

property:

Df

¸

¸

p+

h1

(

h

2

) ≈ Df

¸

¸

p

(

h

2

) + D

2

f

¸

¸

p

(

h

1

,

h

2

)

We will make the sense in which this approximation holds precise in another

section, but for now this is good enough.

185

80 Intuitively

Question 2 If f : R

2

→R is a function, and Df

¸

¸

(1,2)

=

_

−1 1

¸

and Hf

¸

¸

(1,2)

=

_

0 4

4 1

_

. Df

¸

¸

(1.2,1.1)

(

_

−0.2

0.3

_

).

Solution

Hint: By the fundamental approximating property, we have Df

¸

¸

(1,2)+

0.2

0.1

(−0.2

0.3) ≈ Df

¸

¸

¸

(1,2)

_

−0.2

0.3

_

+ D

2

f

¸

¸

(1,2)

__

0.2

0.1

_

,

_

−0.2

0.3

__

Hint: Thus Df

¸

¸

(1,2)+

0.2

0.1

(−0.2

0.3) ≈

_

−1 1

_

_

−0.2

0.3

_

+

_

0.2

0.1

_ _

0 4

4 1

_ _

−0.2

0.3

_

Hint:

Df

¸

¸

(1.2,1.1)

(

_

−0.2

0.3

_

) ≈

_

−1 1

_

_

−0.2

0.3

_

+

_

0.2

0.1

_ _

0 4

4 1

_ _

−0.2

0.3

_

= −1(−0.2) + 1(0.3) + +(0.2)(0)(−0.2) + (0.2)(4)(0.3) + (0.1)(4)(−0.2) + (0.1)(1)(0.3)

= 0.2 + 0.3 + 0 + 0.12 −0.08 + 0.03

= 0.57

Df

¸

¸

(1.2,1.1)

(

_

−0.2

0.3

_

) ≈ 0.57

Note that the computation really splits into a ﬁrst order change (Df[

p

(

h)) and

a second order change (D

2

f(

h

1

,

h

2

)). In this case the ﬁrst order change was 0.5,

and the second order change was 0.07. This should be a better approximation to

the real value than if we had used the ﬁrst derivative alone.

Question 3 Let f : R

2

→ R be a function with Df[

p

=

_

3 4

¸

and Hf[

p

=

_

1 3

3 −2

_

, approximate Df

p+

0.01

0.02

.

Solution

Hint: By the fundamental approximation property, Df

p+

0.01

0.02

(

h) ≈ Dfp(v)+

_

0.01 0.02

_

Hf[pv.

So Df

p+

0.01

0.02

≈ Dfp +

_

0.01 0.02

_

Hf[p as linear maps from R

2

→R

Hint:

_

0.01 0.02

_

Hf[p =

_

0.01 0.02

_

_

1 3

3 −2

_

=

_

0.01(1) + 0.02(3) 0.01(3) + 0.02(−2)

_

=

_

0.07 −0.01

_

186

80 Intuitively

Hint: So Df

p+

0.01

0.02

≈

_

3 4

_

+

_

0.07 −0.01

_

=

_

3.07 3.99

_

Following the development at the beginning of this activity, we can anticipate

how to compute the second derivative as a bilinear form:

Warning 4 This is an intuitive development, not a rigorous proof

Let f : R

n

→R.

Since

Df

¸

¸

p

=

_

f

x1

(p) f

x2

(p) ... f

xn

(p)

¸

It is reasonable to think that

Df

¸

¸

p+

h1

≈ Df

¸

¸

p

+

_

Df

x1

(p)(

h

1

) Df

x2

(p)(

h

1

) ... Df

xn

(p)(

h

1

)

_

but

Df

xi

(

h

1

) =

_

f

x1x1

(p) f

x2x1

(p) ... f

xnx1

(p)

¸

h

1

We can rewrite this as

h

1

_

¸

¸

¸

_

f

x1x1

(p)

f

x2x1

(p)

.

.

.

f

xnx1

(p)

_

¸

¸

¸

_

, so we obtain the rather pleasing formula

Df

¸

¸

p+

h1

≈ Df

¸

¸

p

+

h

1

_

¸

¸

¸

_

f

x1x1

(p) f

x1x2

(p) ... f

x1xn

(p)

f

x2x1

(p) f

x2x2

(p) ... f

x2xn

(p)

.

.

.

f

xnx1

(p) f

xnx2

(p) ... f

xnxn

(p)

_

¸

¸

¸

_

So

Df

¸

¸

p+

h1

(

h

2

) ≈ Df

¸

¸

p

(

h

2

) +

h

1

_

¸

¸

¸

_

f

x1x1

(p) f

x1x2

(p) ... f

x1xn

(p)

f

x2x1

(p) f

x2x2

(p) ... f

x2xn

(p)

.

.

.

f

xnx1

(p) f

xnx2

(p) ... f

xnxn

(p)

_

¸

¸

¸

_

h

2

So it looks like we have:

Theorem 5 If f : R

n

→R, the matrix of the bilinear form D

2

f

¸

¸

p

: R

n

R

n

→R

is

_

¸

¸

¸

_

f

x1x1

(p) f

x1x2

(p) ... f

x1xn

(p)

f

x2x1

(p) f

x2x2

(p) ... f

x2xn

(p)

.

.

.

f

xnx1

(p) f

xnx2

(p) ... f

xnxn

(p)

_

¸

¸

¸

_

This matrix is also called the Hessian matrix of f.

We could also express this in the following convenient notation:

D

2

f

¸

¸

p

=

n

i,j=1

∂

2

f

∂x

i

∂x

j

dx

i

⊗dx

j

187

80 Intuitively

By the equality of mixed partial derivatives, this bilinear form is actually sym-

metric! So all of the theory we developed about self adjoint linear operators and

symmetric bilinear forms can (and will) be brought to bear on the study of the

second derivative.

Question 6 Let f : R

2

→R be deﬁned by f(x, y) =

x

y

.

Solution

Hint: Question 7 Solution

Hint:

fxx =

∂

∂x

∂

∂x

x

y

=

∂

∂x

1

y

= 0

fxx =0

Solution

Hint:

fxy =

∂

∂x

∂

∂y

x

y

=

∂

∂x

−x

y

2

=

−1

y

2

fxy =−1/y

2

Solution

Hint:

fyx = fxy by equality of mixed partials

=

−1

y

2

fyx =−1/y

2

Solution

Hint:

fyy =

∂

∂y

∂

∂y

x

y

=

∂

∂y

−x

y

2

=

2x

y

3

fyy =2x/y

3

188

80 Intuitively

Hint: ¹ =

_

_

_

0

−1

y

2

−1

y

2

2x

y

3

_

¸

_

What is the Hessian matrix of f at the point (x, y)?

Question 8 Let f : R

3

→R be deﬁned by f(x, y, z) = xy + yz.

Solution

Hint: The only second partials which are not zero are fxy = fyx and fyz = fzy

Hint: fxy = fyx = 1

and

fyz = fzy = 1

Hint: ¹ =

_

_

0 1 0

1 0 1

0 1 0

_

_

What is the Hessian matrix of f at the point (x, y, z)?

Notice that the second derivative of this function is the same at every point because

f was a quadratic function. Any other polynomial of degree 2 in n variables would

also have a constant second derivative. For example f(x, y, z, t) = 1 + 2x + 3z +

4z

2

+ zx + xt + t

2

+ yx would also have constant second derivative.

189

81 Rigorously

The second derivative allows approximations to the derivative.

Deﬁnition 1 Let f : R

n

→ R be a diﬀerentiable function, and p ∈ R

n

. We say

that f is twice diﬀerentiable at p if there is a bilinear form B : R

n

R

n

→R with

Df(p +

h

1

)(

h

2

) = Df(p)(

h

2

) + B(

h

1

,

h

2

) + Error(

h

1

,

h

2

)

With

lim

h1,

h2→0

¸

¸

¸Error(

h

1

,

h

2

)

¸

¸

¸

[

h

1

[[

h

2

[

= 0

In this case we call B the second derivative of f at p and write B = D

2

f(p).

Theorem 2 Let f : R

n

→ R be a function which is twice diﬀerentiable every-

where. Then the second derivative of f at p has the matrix

1(p) =

_

¸

¸

¸

_

f

x1x1

(p) f

x1x2

(p) ... f

x1xn

(p)

f

x2x1

(p) f

x2x2

(p) ... f

x2xn

(p)

.

.

.

f

xnx1

(p) f

xnx2

(p) ... f

xnxn

(p)

_

¸

¸

¸

_

Prove this theorem!

Hint: Apply the deﬁnition to B(hei, kej)

We want to show that D

2

f(p)(e

i

, e

j

) = f

xi,xj

(p).

By deﬁnition, we have that

lim

h,k→0

¸

¸

Df(p + he

i

)(k e

j

) −Df(p)(k e

j

) −D

2

f(he

i

, ke

j

)

¸

¸

[he

i

[[ke

j

[

= 0

So by the linearity of the derivative, and the bilinearity of the second derivative,

lim

h,k→0

¸

¸

kDf(p + he

i

)( e

j

) −kDf(p)( e

j

) −hkD

2

f(e

i

, e

j

)

¸

¸

[hk[

= 0.

So we have

lim

h,k→0

¸

¸

¸

¸

Df(p + he

i

)( e

j

) −Df(p)( e

j

)

h

−D

2

f(e

i

, e

j

)

¸

¸

¸

¸

= 0

Which implies

D

2

f(p)( e

1

, e

2

) = lim

h→0

Df(p + he

i

)( e

j

) −Df(p)( e

j

)

h

But Df(x)( e

j

) = f

xj

(x) for any x ∈ R

n

, so this is

D

2

f(p)( e

1

, e

2

) = lim

h→0

f

xj

(p + h e

i

) −f

xj

(p)

h

190

81 Rigorously

But by deﬁnition of the directional derivative, this implies that

D

2

f(p)( e

1

, e

2

) = f

xi,xj

(p)

191

82 Taylor series

The second derivative allows us to approximate functions better than just the ﬁrst

derivative

As it stands, the second derivative lets us get approximations of the ﬁrst derivative.

The ﬁrst derivative allows us to get approximations of the original function. In the

following extended question, we will see how we can use the second derivative to get

more information about the ﬁrst derivative, which then lets us get more information

about the original function. This will lead to approximations with second order

accuracy, rather than just ﬁrst order accuracy. This is the essence of the second

order Taylor’s theorem.

192

83 An example

A speciﬁc example sheds light on Taylor series.

Let’s work through an example.

Question 1 Let f : R

2

→ R be a function. All we know about f at the point

(1, 2) is the following:

• f(1, 2) = 6

• Df(1, 2) =

_

4 5

¸

• D

2

f(1, 2) =

_

1 −2

−2 3

_

Suppose that we want to approximate f(1.1, 1.9) as accurately as we can given

this information. We can simply use the linear approximation to f at (1, 2):

Solution

Hint:

f(1.1, 1.9 ≈ 6 +

_

4 5

_

_

0.1

−0.1

_

= 6 + 0.4 −0.5

= 5.9

Using the linear approximation to f at (1, 2), we ﬁnd f(1.1, 1.9) ≈ 5.9.

This approximation ignores the second order data provided by the second deriva-

tive: we have essentially assumed that the ﬁrst derivative is constant along the line

from (1, 2) to (1.1, 2.2). Since we know the second derivative at the point (1, 2)

we can estimate how the derivative is changing along this line assuming the second

derivative was constant, and get a better approximation.

For example, we could use the following three step process:

• Use the linear approximation to f at (1, 2) to approximate f(1.05, 1.95)

• Use the second derivative to approximate Df(1.05, 1.95)

• Use the linear approximation to f at (1.05, 1.95) to approximate f(1.1, 1.9)

Solution

Hint:

f(1.05, 1.95) ≈ 6 +

_

4 5

_

_

0.05

−0.05

_

= 6 + 0.2 −0.25

= 5.95

Let’s try that here: f(1.05, 1.95) ≈ 5.95

193

83 An example

Solution

Hint:

Df(1.05, 1.95) ≈ Df(1, 2) +

_

0.05 −0.05

_

_

1 −2

−2 3

_

=

_

4 5

_

+

_

0.05(1) +−0.05(−2) 0.05(−2) +−0.05(3)

_

=

_

4 5

_

+

_

0.15 −0.25

_

=

_

4.15 4.75

_

Using the second derivative, Df(1.05, 1.95) is approximately:

Solution

Hint:

f(1.1, 1.9) ≈ 5.95 +

_

4.15 4.75

_

_

0.05

−0.05

_

= 5.95 + 4.15(0.05) + 4.75(−0.05)

= 5.92

Using the linear approximation to f at (1.05, 1.95), f(1.1, 1.9) ≈ 5.92

So this method allowed us to get a slightly better approximation of f(1.1, 2.2)

using the fact that the Df

p

(

_

1

−1

_

) is increasing as p moves from (1, 2) in the

direction

_

1

−1

_

. We really got a slightly higher estimate from f(1.9, 2.1) using

this two step approximation compared to using the linear approximation because

D

2

f(1, 2)

__

1

−1

_

,

_

1

−1

__

= 8 is positive.

We do not have to limit ourselves to only using a two step approximation: we

could get better and better approximations of f(1.1, 1.9) by using more and more

partitions of the line segment from (1, 2) to (1.1, 1.9). For example, we could use

ten partitions:

• Use the linear approximation to f at (1, 2) to approximate f(1.01, 1.99)

• Use the second derivative to approximate Df(1.01, 1.99)

• Use the linear approximation to f at (1.01, 1.99) to approximate f(1.02, 1.98)

• Use the second derivative to approximate Df(1.02, 1.98)

• Use the linear approximation to f at (1.02, 1.98) to approximate f(1.03, 1.97)

•

.

.

.

• Use the linear approximation to f at (1.09, 1.91) to approximate f(1.1, 1.9)

This kind of process, where we are summing more and more of smaller and

smaller values to approximate something, furiously demands to be phrased as an

integral.

194

83 An example

Solution

Hint: Notice that

D

2

f

_

_

_

_

0.1

1

n

−0.1

1

n

_

_

,

_

_

0.1

1

n

−0.1

1

n

_

_

_

_

=

_

0.1

1

n

, −0.1

1

n

_

_

1 −2

−2 3

_

_

_

0.1

1

n

−0.1

1

n

_

_

= (0.1(0.1)(1) + 0.1(−0.1)(−2) + (0.1)(−0.1)(−2) + (−0.1)(−0.1)(3))

1

n

2

= 0.08

1

n

2

Hint: By partitioning [0, 1] into n little pieces of equal width, the contribution to the

sum

• over [0,

1

n

] is Df(1, 2)

_

0.1

1

n

−0.1

1

n

_

=

_

4 5

_

_

_

0.1

1

n

−0.1

1

n

_

_

= −0.1

1

n

• over [

1

n

,

2

n

] is

Df(1 + 0.1

1

n

, 2 + (−0.1)

1

n

)

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

≈ Df(1, 2)

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

+ D

2

f ()

_

_

0.1

1

n

(−0.1)

1

n

_

_

,

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

= −0.1

1

n

+ 0.08

1

n

2

• over [

2

n

,

3

n

] is

Df(1 + 2(0.1

1

n

), 2 + 2((−0.1)

1

n

))

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

≈ Df(1 + 0.1

1

n

, 2 + (−0.1)

1

n

)

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

+ D

2

f

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

,

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

≈ −0.1

1

n

+ 0.08

1

n

2

+ 0.08

1

n

2

= 1.4

1

n

+ 0.08

2

n

1

n

•

.

.

.

• over [

k + 1

n

,

k + 2

n

] is

Df(1 + (k + 1)(0.1

1

n

), 2 + (k + 1)((−0.1)

1

n

))

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

≈ Df(1 + (k)0.1

1

n

, 2 + (k)(−0.1)

1

n

)

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

+ D

2

f

_

_

_

_

0.1

1

n

(−0.1)

1

n

_

_

,

_

_

0.1

1

n

(−0.1)

1

n

_

_

_

_

≈ −0.1

1

n

+ (k −1)0.08

1

n

2

+ 0.08

1

n

2

= 1.4

1

n

+ 0.08

k

n

1

n

Hint: So f(1.1, 2.2) ≈ 6 +

n

k=0

−0.1

1

n

+ 0.08

k

n

1

n

195

83 An example

Hint: By deﬁnition of the integral we have lim

n→∞

n

k=0

−0.1

1

n

+ 0.08

k

n

1

n

=

_

1

0

(−0.1 +

0.08t)dt

In this case, we get that f(1.1, 1.9) ≈ 6 +

_

1

0

f(t)dt, where f(t) =−0.1 + 0.08t

Solution

Hint:

f(1.1, 1.9) ≈ 6 +

_

1

0

(−0.1 + 0.08t)dt

= 6 +

_

−0.1t +

1

2

(0.08)t

2

_

¸

¸

1

0

= 6 −0.1 + 0.04

= 5.94

Evaluating this integral we have f(1.1, 1.9) ≈ 5.94. This is the best approximation we can

really expect to get given only this information about f at (1, 2).

Notice that this approximation of f is really just f(1, 2) +Df(1, 2)

__

0.1

−0.1

__

+

1

2

D

2

f(1, 2)

__

0.1

−0.1

_

,

_

0.1

−0.1

__

.

The ﬁrst two terms are just the regular linear approximation to f at (1, 2), but

the next term arose from integrating the function D

2

f(

_

0.1

−0.1

_

,

_

0.1

−0.1

_

)t from t = 0

to t = 1. This is exactly

1

2

D

2

f(

_

0.1

−0.1

_

,

_

0.1

−0.1

_

).

Generalizing, we might expect in general that

Theorem 2 f(p +

h) ≈ f(p) + Df(p)(

h) +

1

2

D

2

f(p)

_

h,

h

_

This is the second order taylor approximation of f at p. Notice how similar it

looks to the second order taylor approximation of a single variable function! If we

had not taken the time to develop an understanding of D

2

f(p) as bilinear map, it

would be quite messy to even state this theorem, and it would only get worse for

the higher order Taylor’s theorem we will be learning about next week.

Hopefully this (admittedly long) discussion has helped you to understand where

this approximation comes from! We will give a rigorous statement and proof of the

theorem in the next section.

196

84 Rigorously

The second derivative enables quadratic approximation

You should know the statement of the following theorem for this course:

Theorem 1 (Second order Taylor’s theorem) If f : R

n

→ R is a twice

diﬀerentiable function, p ∈ R

n

then we have

f(p +

h) = f(p) + Df(p)

h +

1

2

D

2

f(b + ξ

˜

h)(

h,

h)

for some ξ ∈ [0, 1].

It follows (after a lot of work!) that

f(p +

h) = f(p) + Df(p)

h +

1

2

D

2

f(p)(

h,

h) + Error(

h)

with lim

h→

0

[Error(

h)[

[

h[

2

= 0

This approximation is also sometimes phrased as f(x) ≈ f(p)+Df(p)(x−p)+

D

2

f(p)(x −p, x −p)

In the future, we will prove the above theorem. To prepare yourself, you should

at a minimum make sure you have already worked through the other two optional

sections on the formal deﬁnition of a limit

1

and also the proof of the chain rule

2

where operator norms are introduced.

Proof

1

http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week2/limits/

formal-limit/

2

http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week2/chain-rule/

proof/

197

85 Taylor’s theorem examples

Let’s see how Taylor’s theorem gives us better approximations

Warning 1 Just due to the sheer number of calculations, these questions are

quite long.

Question 2 Consider f : R

2

→R deﬁned by f(x, y) = xcos(y) + xy.

Solution

Hint: Question 3 Solution

Hint:

∂

∂x

xcos(y) + xy = cos(y) + y

So fx(0, 0) = 1

fx(0, 0) =1

Solution

Hint:

∂

∂y

xcos(y) + xy = −xsin(y) + x

So fy(0, 0) = 0

fy(0, 0) =0

Hint:

f(x, y) ≈ f(0, 0) + Df(0, 0)(

_

x

y

_

)

= 0 cos(0) + 0(0) +

_

fx(0, 0) fy(0, 0)

_

_

x

y

_

=

_

1 0

_

_

x

y

_

= x

The linear approximation to f at (0, 0) is f(x, y) ≈ x

Solution

Hint: Question 4 Solution

Hint: fxx = 0

fxx(0, 0) =0

Solution

Hint: fxy = −sin(y) + 1

fxy(0, 0) =1

198

85 Taylor’s theorem examples

Solution

Hint: fyx = −sin(y) + 1

fyx(0, 0) =1

Solution

Hint: fyy = 0

fyy(0, 0) =0

Hint: So ¹(0, 0) =

_

0 1

1 0

_

Hint: By Taylors theorem, f(x, y) ≈ f(0, 0) + Df(0, 0)(

_

x

y

_

) +

1

2

D

2

f(

_

x

y

_

,

_

x

y

_

)

Hint: So

f(x, y) ≈ 0 + x +

1

2

_

x y

_

_

0 1

1 0

_ _

x

y

_

= x +

1

2

(2xy)

= x + xy

The second order approximation to f at (0, 0) is f(x, y) ≈ x + xy

It is kind of cool that we could also read this oﬀ from the following magic:

xcos(y) + xy = x(1 −

y

2

2!

+

y

4

4!

−...) + xy

= x + xy −

xy

2

2!

+

xy

4

4!

−...

So it looks like the second order approximation is x + xy

Solution Using the ﬁrst order approximation f(0.1, 0.2) ≈ 0.1

Solution Using the second order approximation f(0.1, 0.2) ≈ 0.12

A calculator tells me f(0.1, 0.2) ≈ 0.11800665778. So clearly, the second order

approximation is better. Notice that the second order approximation is slightly

high, and this is apparent from our magical calculation, since the next term should

be −

0.1(0.2)

2

2

= −0.002, which gets us even closer to the exact answer. We will

make the magic more precise when we deal with the full multivariable taylors the-

orem later.

Question 5 Consider f : R

3

→R deﬁned by f(x, y, z) = xe

z+y

+ z

2

199

85 Taylor’s theorem examples

Solution

Hint: f(x, y, z) ≈ f(0, 0, 1) + Df(0, 0, 1)(

_

_

x

y

z −1

_

_

) +

1

2

D

2

f

_

_

_

_

x

y

z −1

_

_

,

_

_

x

y

z −1

_

_

_

_

Hint: Question 6 Solution

Hint:

Df(0, 0, 1) =

_

∂f

∂x

∂f

∂y

∂f

∂z

_

¸

¸

(0,0,1)

=

_

e

z+y

xe

z+y

xe

z+y

+ 2z

_ ¸

¸

(0,0,1)

=

_

e 0 2

_

The matrix of Df(0, 0, 1) is

Solution

Hint:

¹(0, 0, 1) =

_

_

_

_

_

_

_

∂

2

f

∂x∂x

∂

2

f

∂x∂y

∂

2

f

∂x∂z

∂

2

f

∂y∂x

∂

2

f

∂y∂y

∂

2

f

∂y∂z

∂

2

f

∂z∂x

∂

2

f

∂z∂y

∂

2

f

∂z∂z

_

¸

¸

¸

¸

¸

_

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

(0,0,1)

=

_

_

0 e

z+y

e

z+y

e

z+y

xe

z+y

xe

z+y

e

z+y

xe

z+y

xe

z+y

+ 2

_

_

¸

¸

¸

¸

¸

¸

(0,0,1)

=

_

_

0 e e

e 0 0

e 0 2

_

_

The hessian matrix of f at (0, 0, 1) is

Hint:

f(x, y, z) ≈ 1 +

_

e 0 2

_

_

_

x

y

z −1

_

_

+

1

2

_

x y z −1

_

_

_

0 e e

e 0 0

e 0 2

_

_

_

_

x

y

z −1

_

_

= 1 + ex + 2(z −1) + exy + ex(z −1) + (z −1)

2

The second order taylor expansion of f about the point (0, 0, 1) is f(x, y, z) ≈1 + ex +

2(z −1) + exy + ex(z −1) + (z −1)

2

Question 7 Let f : R

4

→R be a function with

• f(0, 0, 0, 0) = 2

200

85 Taylor’s theorem examples

• Df(0, 0, 0, 0) =

_

1 −1 0 0

¸

• D

2

f(0, 0, 0, 0) =

_

¸

¸

_

0 0 0 0

0 2 0 3

0 0 0 0

0 3 0 0

_

¸

¸

_

Solution

Hint: f(x, y, z, t) ≈ f(0, 0, 0, 0) + Df(0, 0, 0, 0)

_

_

_

_

_

_

_

_

x

y

z

t

_

¸

¸

_

_

_

_

_

+

1

2

D

2

f

_

_

_

_

_

_

_

_

x

y

z

t

_

¸

¸

_

,

_

_

_

_

x

y

z

t

_

¸

¸

_

_

_

_

_

Hint:

f(x, y, z, t) ≈ 2 +

_

1 −1 0 0

_

_

_

_

_

x

y

z

t

_

¸

¸

_

+

1

2

_

x y z t

_

_

_

_

_

0 0 0 0

0 2 0 3

0 0 0 0

0 3 0 0

_

¸

¸

_

_

_

_

_

x

y

z

t

_

¸

¸

_

= 2 + x −y + y

2

+ 3yt

The second order approximation to f at (0, 0, 0, 0) is f(x, y, z, t) ≈ 2 + x −y + y

2

+ 3yt

201

86 Optimization

Optimization means ﬁnding a biggest (or smallest) value.

Suppose A is a subset of R

n

, meaning that each element of A is a vector in R

n

.

Maybe A contains all vectors in R

n

, maybe not. Further suppose f : A → R is a

function.

Question 1 Is there a vector v ∈ A so that f(v) is at least as large as any other

output of f?

Solution

(a) This is always the case.

(b) This is not necessarily the case.

It really does depend on A and on f.

For example, suppose f(v) = ¸v, v¸, meaning f sends v to the square of the

length of v. Further suppose that A = ¦v ∈ R

n

: [v[ ≤ 1¦.

Then for all v ∈ A, it is the case that f(v) ≤ 1. And yet, there is not a single

vector v ∈ A so that f(v) is at least as large as all outputs of f.

If you claim that you have found a vector v so that f(v) is as large as any output

of f, then you should consider the input

w =

1 +[v[

2

v,

and note that f( w) > f(v).

Question 2 Let’s consider an example. Let g : R

2

→R be the function given by

g

__

x

y

__

= 10 −(x + 1)

2

−(y −2)

2

.

Solution

Hint: No matter what x is, (x + 1)

2

≥ 0.

Hint: No matter what y is, (y −2)

2

≥ 0.

Hint: No matter what x and y are, (x + 1)

2

+ (y −2)

2

≥ 0.

Hint: No matter what x and y are, −(x + 1)

2

−(y −2)

2

≤ 0.

Hint: No matter what x and y are, 10 −(x + 1)

2

−(y −2)

2

≤ 10.

Hint: If (x, y) = (−1, 2), then 10 −(x + 1)

2

−(y −2)

2

= 10.

202

86 Optimization

Hint: Consequently, the largest possible output of g is 10.

The largest possible output of g is 10.

Solution This largest possible output occurs when x is −1.

Solution This largest possible output occurs when y is 2.

In this case, we were able to think through the situation by considering some

algebra—namely the fact that when we square a real number, the result is nonneg-

ative.

Here is the key idea that motivates everything we are about to do: using

the second derivative, we can approximate complicated functions by

“quadratic” functions, and quadratic functions we can analyze just as

we analyzed this example.

203

87 Deﬁnitions

“Local” means “after restricting to a small neighborhood.”

Deﬁnition 1 Let X ⊂ R

n

and f : X → R. To say that the maximum value

of f occurs at the point p ∈ X is to say that, for all q ∈ X, we have f(p) ≥ f(q).

Conversely, to say that the minimum value of f occurs at the point p ∈ X is

to say that, for all q ∈ X, we have f(p) ≤ f(q).

Sometimes people use the term “extremum value” to speak of both maximum

values and minimum values. Sometimes people say “maxima” instead of maximums

and “minima” instead of minimums.

A function need not achieve a maximum or a minimum value.

Our goal will be to use calculus to search for maximums and minimums, but that

raises a problem. The derivative at a point is only describing what is happening

around that point, so if we use calculus to search for extreme values, then we will

only see “local” extremes.

Deﬁnition 2 Let X ⊂ R

n

and f : X → R. To say that a local maximum of

f occurs at the point p ∈ X is to say that there is an > 0 so that for all q ∈ X

within of p, we have f(p) ≥ f(q).

Conversely, to say that a local minimum of f occurs at the point p ∈ X is to

say that there is an > 0 so that for all q ∈ X within of p, we have f(p) ≤ f(q).

Here’s an example of how this works out in practice. Let g : R

2

→ R be the

function given by

g(x, y) = x

2

+ y

2

+ y

3

Question 3 Does this function g achieve a minimum value?

Solution

(a) No.

(b) Yes.

That’s correct: there is no “global” minimum. No matter how negative you

want the output to g to be, you can achieve it by looking at g(0, y) where y is a

very negative number.

On the other hand, if we restrict our attention near the point (0, 0), then g is

nonnegative there.

Solution Whenever (x, y) is within of (0, 0), then g(x, y) ≥ g(0, 0) = 0.

As a result, g achieves a local minimum at (0, 0), in spite of the fact that there

is no global maximum.

204

88 Critical points and extrema

Extremes happen where the derivative vanishes

Deﬁnition 1 Let f : R

n

→ R be a function. A point p ∈ U is called a critical

point of f if f is not diﬀerentiable at p, or if Df(p) is the zero map.

Question 2 Consider f : R

2

→ R deﬁned by f(x, y) = e

x

2

+y

2

. The function f

has only one critical point

Solution

Hint: Question 3 Solution

Hint:

Df(x, y) =

_

∂f

∂x

∂f

∂y

_

=

_

2xe

x

2

+y

2

2ye

x

2

+y

2

_

What is Df(x, y)?

Hint: So we need

_

2xe

x

2

+y

2

2ye

x

2

+y

2

_

=

_

0 0

_

Hint: This only occurs when x = 0 and y = 0

Hint: Enter this as

_

0

0

_

What is this critical point? Give you answer as a vertical vector.

Question 4 Consider f : R

2

→R deﬁned by f(x, y) = x

3

+ y

3

−3xy.

Solution

Hint: Question 5 Solution

Hint:

Df(x, y) =

_

∂f

∂x

∂f

∂y

_

=

_

3x

2

−3y 3y

2

−3x

_

What is Df(x, y)?

Hint: So we need

_

3x

2

−3y 3y

2

−3x

_

=

_

0 0

_

205

88 Critical points and extrema

Hint:

_

3x

2

−3y = 0

3y

2

−3x = 0

_

y = x

2

x = y

2

_

y = y

4

x = y

2

_

y(y −1)(y

2

+ y + 1) = 0

x = y

2

Hint: The only two points that work are (0, 0) and (1, 1)

f has two critical points. One of them is (0, 0). What is the other?

You already had some practice with this concept in week 2

1

.

Deﬁnition 6 A function f : R

n

→R has a local maximum at the point p ∈ R

n

if there is an > 0 so that for all x ∈ R

n

with [x −p[ ≤ , f(x) ≤ f(p).

Warning 7 The fact that the inequalities in this deﬁnition are not strict means

that, for example, for the function f(x, y) = 1 every point is both a local maximum

and a local minimum.

Write a good deﬁnition for the local minimum of a function A function f : R

n

→

R has a local minimum at the point p ∈ R

n

if there is an > 0 so that for all

x ∈ R

n

with [x −p[ ≤ , f(x) ≥ f(p).

We call points which are either local maxima or local minima local extrema.

Theorem 8 If f : R

n

→ R is a diﬀerentiable function, and p a local extremum.

Then p is a critical point of f.

Prove this theorem Let p be a local maximum. We want to show that Df(p)(v) =

0 for all v ∈ R

n

. Recall that one formula for the derivative is

Df(p)(v) = lim

t→0

f(p + tv) −f(p)

t

Since f is diﬀerentiable, this limit must exist. As t →0

+

, we have

f(p + tv) −f(p)

t

≤

0, since the numerator is less than or equal to zero by deﬁnition of a local maximum,

and the denominator is greater than 0. So the limit must be less than or equal to 0

On the other hand, as t → 0

−

, the numerator is still less than 0, but the

denominator is now negative, so the limit must be greater than or equal to 0.

Therefore

Df(p)(v) = lim

t→0

f(p + tv) −f(p)

t

= 0

1

http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week2/practice/

stationary-points/

206

88 Critical points and extrema

Since we did this with an arbitrary vector v ∈ R

n

, we see that Df(p) is the zero

map.

We leave the nearly identical case of a local minima to you.

This theorem tells us that if we want to identify local extrema, a good place

to start is by looking for all the critical points. It is worthwhile to note that just

because a point is a critical point does not mean it is a local extrema:

Example 9 Let f : R

2

→ R be deﬁned by f(x, y) = x

2

− y

2

. Then (0, 0) is a

critical point of f (check this!), but (0, 0) is not a local extremum. In fact we can

see that along the line y = 0, (0, 0) is a local maximum, while along the line x = 0

it is a local minimum. The graph of f looks like a saddle.

Deﬁnition 10 A critical point which is not a local extremum is a saddle point.

Warning 11 A saddle point does not need to be a local minimum in some

directions and a local maximum in others. For example, according to our deﬁnition

0 is a saddle point of f(x, y) = x

3

In the next section we will learn how to determine when a critical point is a

local maximum, minimum, or saddle by using the second derivative.

207

89 Second derivative test

Deﬁniteness of the second derivative determines extreme behavior at critical points.

In this section, we apply the second derivative to extreme value problems.

Theorem 1 (Second derivative test) Let f : R

n

→ R be a (

2

function. Let

p ∈ R

n

be a critical point of f. Then

• If D

2

f(p) is positive deﬁnite, p is a local minimum

• If D

2

f(p) is negative deﬁnite, p is a local maximum

• If D

2

f(p) is indeﬁnite, then p is a saddle point

• If D

2

f(p) is only positive semideﬁnite or negative semideﬁnite, we get no

information

Proof By the second order taylor’s theorem, we have

f(p +

h) = f(p) + Df(p)(

h) + D

2

f(p + ξ

h)(

h,

h) for some ξ ∈ [0, 1]

Since p is a critical point,

f(p +

h) = f(p) + D

2

f(p + ξ

h)(

h,

h) for some ξ ∈ [0, 1]

If D

2

f(p) is positive deﬁnite, then because f is (

2

, D

2

f(p+ξ

h) is also positive

semideﬁnite for small enough

h (this just uses continuity of the second derivative).

Thus D

2

f(p + ξ

h)(

h,

**h) > 0 for all small enough values of
**

h, say [

h[ ≤ . But this

just says that f(p +

h) > f(p), so p is a local minimum.

If D

2

f(p) is negative deﬁnite, a completely analogous proof works.

If D

2

f(p) is indeﬁnite, then there are directions

h

1

where D

2

f(p)(

h

1

,

h

1

) > 0,

and

h

2

where D

2

f(p)(

h

2

,

h

2

) < 0. By continuity again, for [

h

i

[ < , we have

D

2

f(p + ξ

h

1

)(

h

1

,

h

1

) > 0 and D

2

f(p + ξ

h

2

)(

h

2

,

h

2

) < 0.

So we have f(p + ξ

1

h

1

) > f(p) and f(p + ξ

2

h

2

) < f(p). Thus p is neither a

local maximum nor a minimum, and so is a saddle point.

The method of proof show why the semideﬁnite cases might break down. With-

out strict positivity or negativity, continuity does not guarantee that D

2

f is still

positive deﬁnite or negative deﬁnite for nearby points: you might slip into an in-

deﬁnite case, for instance.

For a concrete counterexample in the semideﬁnite case, consider f(x, y) = x

2

+

y

3

. (0, 0) is a critical point, but and the Hessian

_

2 0

0 0

_

is positive semideﬁnite,

but (0, 0) is not a local minimum, since f(0, −) = −

3

is always less than f(0, 0).

The trouble here is really that the Hessian at nearby points (x, y) is

_

2 0

0 6y

_

, which

is indeﬁnite for y < 0.

We have already proven in the section on bilinear forms that the deﬁniteness

of a bilinear form is completely determined by the sign of the eigenvalues of the

associated linear operator.

208

89 Second derivative test

For the following exercises, use whatever means necessary to compute the eigen-

values of the Hessian, use that information to determine the deﬁniteness of the sec-

ond derivative, and use this to draw extremum information about f. I recommend

using a computer algebra system like Sage

1

to compute the eigenvalues, but you

can also use a free online app like this one

2

.

Question 2 Let f(x, y) = x

3

+ e

3y

−3xe

y

Solution

Hint:

Df(x, y) =

_

0 0

_

_

3x

2

−3e

y

3e

3y

−3xe

y

_

=

_

0 0

_

_

3x

2

−3e

y

= 0

3e

3y

−3xe

y

= 0

_

x

2

= e

y

e

3y

= xe

y

_

x

2

= e

y

x = e

2y

_

x

4

= x

x = e

2y

To satisfy the ﬁrst solution, either x = 0 or x = 1. If x = 0, then x = e

2y

has no

solutions, so we must have x = 1. Thus (1, 0) is the only critical point.

What is the critical point of f? Give your answer as a vertical vector.

Solution

Hint: ¹(x, y) =

_

6x −3e

y

−3e

y

9e

y

_

Hint: ¹(0, 0) =

_

6 −3

−3 6

_

What is the Hessian matrix of f at (1, 0)?

Solution

Hint: By using a computer algebra system, we see that the eigenvalues of the Hessian

are 3 and 9.

Hint: So D

2

f(1, 0) is positive deﬁnite.

Hint: Thus (1, 0) is a local minimum.

1

http://www.sagemath.org/

2

http://www.bluebit.gr/matrix-calculator/

209

89 Second derivative test

(a) (1, 0) is a local maximum

(b) (1, 0) is a local minimum

(c) (1, 0) is a saddle point

(d) The second derivative gives no information in this case

Observe that even though (1, 0) is the only local extremum, and this is a local

minimum, (1, 0) is not a global minimum because f(1, 0) = −1 but f(−10, 0) =

−1000 + 1 −3(−10) = −969. Contemplate this carefully.

Question 3 Let f : R

3

→R be deﬁned by f(x, y, z) = e

x+y+z

−x−y−z+4z

2

+xy.

f has a critical point at (0, 0, 0).

Solution

Hint: Question 4 Solution

Hint:

_

_

1 2 1

2 1 1

1 1 9

_

_

The Hessian matrix of f at (0, 0, 0) is

Hint: According to computer algebra software, the eigenvalues of this matrix are

approximately 9.31662, −1 and 2.68338, so the second derivative is indeﬁnite

Hint: Thus f has a saddle point at (0, 0, 0)

(a) (0, 0, 0) is a local maximum

(b) (0, 0, 0) is a local minimum

(c) (0, 0, 0) is a saddle point

(d) The second derivative gives no information in this case

Question 5 Let f(x, y) = cos(x + y

2

) −x

2

. (0, 0) is a critical point of f.

Solution

Hint: The hessian matrix of f at (0, 0) is

_

−1 0

0 −2

_

which plainly has eigenvalues

−1 and −2.

Hint: Thus D

2

f(0, 0) is negative deﬁnite

Hint: Thus f has a local maximum at (0, 0)

210

89 Second derivative test

Hint: You can actually see that this is a global maximum, since the largest cos could

ever be is 1, and the largest −x

2

could ever be is 0, so 1 is the largest value that f ever

attains, and this is attained at (0, 0). In fact this value is attained at inﬁnitely many

points on the line x = 0.

(a) (0, 0) is a local maximum

(b) (0, 0) is a local minimum

(c) (0, 0) is a saddle point

(d) The second derivative gives no information in this case

211

90 Lagrange multipliers

Lagrange multipliers enable constrained optimization

In the previous section we considered unconstrained optimization problems. Some-

times we want to ﬁnd extreme values of a function f : R

n

→ R subject to some

constraints.

Example 1 Say we want to maximize f(x, y) = x

2

y subject to the contraint that

x

2

+y

2

= 1. In words, we want to know among all points on the unit circle, which of

them has the greatest product of the square of the ﬁrst coordinate with the second

coordinate. One way we can do this is to reduce it to a single variable calculus

problem: we can parameterize the unit circle by γ(t) = (cos(t), sin(t)), and try to

maximize f(γ(t)) : [0, 2π] → R. In other words, we are maximizing cos

2

(t) sin(t)

on the interval [0, 2π].

Deﬁnition 2 Let f : R

n

→ R be a function, and g : R

n

→ R

m

be another

function. A point p is called a local maximum of f with constraint g(x) =

0

if there is an > 0 such that if [x −p[ < and g(x) = 0, then f(x) < f(p)

Give a deﬁnition of a constrained local minimum:

Let f : R

n

→ R be a function, and g : R

n

→ R

m

be another function. A point

p is called a local minimum of f with constraint g(x) =

0 if there is an > 0

such that if [x −p[ < and g(x) = 0, then f(x) > f(p)

The method we outlined above is a great way to ﬁnd constrained local extrema:

If you can parameterize g

−1

(¦0¦), by some function M : U ⊂ R

k

→ R

n

, then you

can just try to ﬁnd the unconstrained extrema of f ◦ M. The problem is that,

although ﬁnding a parameterization of the circle was easy, more general sets might

be harder to parameterize. Some of them might not be parameterizable by just one

open set: you might have to use several patches. We will not pursue this method

further. Instead we will develop an alternative method, the method of lagrange

multipliers.

Theorem 3 Let f : R

n

→ R and g : R

n

→ R

m

. Assume g has the m com-

ponent functions g

1

, g

2

, ..., g

n

: R

n

→ R. If p is a constrained extrema of f with

the constraint g(x) = 0, and Rank(Dg(p)) = m, then there exist λ

1

, λ

2

, ...λ

m

with Df(p) = λ

1

Dg

1

(p) + λ

2

Dg

2

(p) + ... + λ

m

Dg

m

(p). The scalars λ

i

are called

Lagrange Multipliers.

Proof The full proof of this theorem would require the implicit function theo-

rem

1

. We will make one intuitive assumption to get around this.

A vector v should be tangent to the set g

−1

(¦

**0¦) at the point p if and only if
**

Dg(p)(v) = 0. This is just because moving “inﬁnitesmally” in the direction of a

tangent vector to g

−1

(¦

**0¦) should not change the value of g to ﬁrst order.
**

Since p is a constrained maximum, moving in one of these tangent directions

should not eﬀect the value of f to ﬁrst order either.

We can summarize these intuitive statements as Null(Dg(p)) ⊂ Null(Df(p)).

This is the assumption whose formal proof would require the implicit function

theorem.

1

http://en.wikipedia.org/wiki/Implicit_function_theorem

212

90 Lagrange multipliers

Given this assumption, the result follows essentially formally from our work

with linear algebra:

Null(Dg(p)) ⊂ Null(Df(p))

Null(Df(p))

⊥

⊂ Null(Dg(p))

⊥

Image(Df(p)

) ⊂ Image(Dg(p)

)

This last line is exactly what we are trying to prove!

Now let’s get some practice actually using this theorem as a practical tool.

First some questions with only one constraint equation: g : R

n

→R

Question 4 Solution

Hint: At a point constrained maximum point (x, y) we would need Df(x, y) = λDg(x, y)

for some λ ∈ R.

Hint: Question 5 Solution What is Df(x, y)?

Solution What is Dg(x, y)?

Hint: So we must have

_

2xy x

2

_

= λ

_

2x 2y

_

Hint:

_

2xy = 2λx

x

2

= 2λy

_

y = λ x ,= 0 for if it were then y = 0 by the second equation

x

2

= 2λy

_

y = λ

x

2

= 2λ

2

But x

2

+ y

2

= 1, so we have 3λ

2

= 1, or λ =

±1

√

3

Hint: This results in only 4 possible extrema at (

_

±

2

3

, ±

1

√

3

)

Hint: The values of f at these points are ±

_

2

3 ∗ sqrt(3)

. So the maximum value of

f on the unit circle is

_

2

3 ∗ sqrt(3)

and the minimum value is −

_

2

3 ∗ sqrt(3)

.

213

90 Lagrange multipliers

The maximum value of f(x, y) = x

2

y subject to the constraint x

2

+y

2

= 1 is 2/3 ∗ sqrt(3)

Question 6 Let f : R

3

→R be deﬁned by f(x, y, z) = x

2

+y

2

+z

2

subject to the

constraint that g(x, y, z) = xyz −1 = 0.

Solution

Hint: At a point constrained maximum point (x, y) we would need Df(x, y) = λDg(x, y)

for some λ ∈ R.

Hint: Question 7 Solution What is Df(x, y)?

Solution What is Dg(x, y)?

Hint: So we must have

_

2x 2y 2z

_

= λ

_

λyz λxz λxy

_

Hint:

_

¸

_

¸

_

2x = λyz

2y = λxz

2z = λxy

Hint: Multiplying all of these equations together, we have 8xyz = λ

3

(xyz)

2

. Since

xyz = 1, we have λ

3

= 8, so λ = 2

Hint:

_

¸

_

¸

_

x = yz

y = xz

z = xy

_

¸

_

¸

_

x = xz

2

y = yx

2

z = zy

2

Hint: So x = ±1, y = ±1, z = ±1. So the only possible location of constrained ex-

trema are (1, 1, 1), (1, −1, −1), (−1, 1, −1), (−1, −1, 1). At each of these points f(x, y, z) =

1. These are all local minima. f has no local or global maxima.

The minimum value of f subject to this constraint is 1

Here is a question with two constraint equations: g : R

n

→R

2

Question 8 Hint: Here g(x, y, z) =

_

x

2

+ y

2

+ z

2

−1

x + y + z −1

_

Hint: We need

_

1 −1 0

_

= λ1

_

2x 2y 2z

_

+ λ2

_

1 1 1

_

214

90 Lagrange multipliers

Hint:

_

¸

_

¸

_

2λ1x + λ2 = 1

2λ1y + λ2 = −1

2λ1z + λ2 = 0

Adding these all together and using that (x + y + z) = 1, we have 2λ1 + 3λ2 = 0.

So the last equation becomes −3λ2z + λ2 = 0, or λ2(1 − 3z) = 0. λ2 ,= 0, for other-

wise λ1 = 0, which leads to a contradiction.

So We know that z =

1

3

Hint: There are only two points satisfying both constraints and with z =

1

3

Hint: Solving the system of 2 equations

_

_

_

x

2

+ y

2

+ (

1

3

)

2

= 1

x + y +

1

3

= 0

we obtain that the only two points which work are (

1 −

√

2

3

,

1 +

√

2

3

,

1

3

) and (

1 +

√

2

3

,

1 −

√

2

3

,

1

3

).

Hint: So the maximum value of x −y is

2

√

2

3

The maximum value of f(x, y, z) = x −y subject to the two constraints that x

2

+

y

2

+ z

2

= 1 and x + y + z = 1 is 2sqrt(2)/3

215

91 Hill climbing in Python

Hill climbing is a computational technique for ﬁnding a local maximum.

Let’s try hill climbing to—at least numerically—attempt to ﬁnd a local maximum.

The idea is the following: the gradient points up hill so if I want to ﬁnd a

local maximum, I should start somewhere and follow the gradient up. Hopefully

I’ll ﬁnd a point where the gradient vanishes (i.e., a critical point).

Question 1 Here’s the procedure that I’d like you to code in Python:

(a) Start with some point p.

(b) Replace p with p plus a small multiple of ∇f(p).

(c) If ∇f(p) is very small, stop!

(d) Otherwise, repeat.

Solution

Hint: You might want to use some code to add and scale vectors, like

def add_vector(v,w):

return [sum(v) for v in zip(v,w)]

def scale_vector(c,v):

return [c*x for x in v]

def vector_length(v):

return sum([x**2 for x in v])**0.5

Hint: You may also want some code to compute the gradient numerically.

epsilon = 0.01

def gradient(f, p):

n = len(p)

ei = lambda i: [0]*i + [epsilon] + [0]*(n-i-1)

return [ (f(add_vector(p, ei(i))) - f(p)) / epsilon for i in range(n) ]

Hint: To do the hill climbing, we can put together these pieces.

def climb_hill(f, starting_point):

p = starting_point

nabla = gradient(f,p)

while vector_length(nabla) > epsilon:

p = add_vector(p, scale_vector(epsilon,nabla))

nabla = gradient(f,p)

return p

Hint: Incidentally, be careful with your choice of in this problem; if it is too small,

the Python code might take too long to run!

216

91 Hill climbing in Python

Python

1 def climb_hill(f, starting_point):

2 p = starting_point

3 # while gradient of f is pretty big at p

4 # p = p + multiple of gradient f

5 # return p

6 #

7 # here’s an example to try

8 p = [3,6,2]

9 f = lambda x: 10 - (x[0] + x[1])**2 - (x[0] - 3)**2 - (x[2] - 4)**2

10 print(climb_hill(f, p))

11

12 def validator():

13 f = lambda x: 10 - (x[0] - 2)**4 - (x[1] - 3)**2 - (x[2] - 4)**2

14 p = climb_hill(f, [3,6,2])

15 return abs(p[0] - 2) < 0.5 and abs(p[1] - 3) < 0.5 and abs(p[2] - 4) < 0.5

So you can use your program to ﬁnd the maximum value of the function f :

R

3

→R given by

f(x, y, z) = 10 −(x + y)

2

−(x −3)

2

−(z −4)

2

.

Solution In this case, x is 3.

Solution And y is −3.

Solution And z is 4.

Fantastic!

217

92 Multilinear forms

Multilinear forms are separately linear in multiple vector variables.

Deﬁnition 1 Let X be a set. We introduce the notation X

k

to stand for the set

of all ordered k-tuples of elements of X. In other words, X

k

= X X X,

with k “factors” of X.

For example, if X = ¦cat, dog¦, then X

3

is a set with eight elements, consisting

of 3-tuples of either cat or dog. For example, (cat, cat, dog) ∈ X

3

.

Deﬁnition 2 A k-linear form on a vector space V is a function T : V

k

→ R

which is linear in each vector variable. In other words, given (k − 1) vectors

v

1

, v

2

, . . . , v

i−1

, v

i+1

, . . . , v

k

, the map T

i

: V →R deﬁned by T

i

(v) = T(v

1

, v

2

, . . . , v

i−1

, v, v

i+1

, . . . , v

k

)

is linear.

The k-linear forms on V form a vector space.

Question 3 Let T : R

2

R

2

R

2

→ R be a trilinear form on R

2

. Suppose we

know that

• T

__

1

0

_

,

_

1

0

_

,

_

1

0

__

= 1

• T

__

1

0

_

,

_

1

0

_

,

_

0

1

__

= 2

• T

__

1

0

_

,

_

0

1

_

,

_

1

0

__

= 3

• T

__

1

0

_

,

_

0

1

_

,

_

0

1

__

= 4

• T

__

0

1

_

,

_

1

0

_

,

_

1

0

__

= 5

• T

__

0

1

_

,

_

1

0

_

,

_

0

1

__

= 6

• T

__

0

1

_

,

_

0

1

_

,

_

1

0

__

= 7

• T

__

0

1

_

,

_

0

1

_

,

_

0

1

__

= 8

Solution

Hint:

T

__

1

1

_

,

_

1

2

_

,

_

1

0

__

= T

__

1

0

_

,

_

1

2

_

,

_

1

0

__

+ T

__

0

1

_

,

_

1

2

_

,

_

1

0

__

= T

__

1

0

_

,

_

1

0

_

,

_

1

0

__

+ 2T

__

1

0

_

,

_

0

1

_

,

_

1

0

__

+ T

__

0

1

_

,

_

1

0

_

,

_

1

0

__

+ 2T

__

0

1

_

,

_

0

1

_

,

_

1

0

__

= 1 + 2(3) + 5 + 2(7)

= 26

218

92 Multilinear forms

T

__

1

1

_

,

_

1

2

_

,

_

1

0

__

=26

From the last example—and by analogy with the bilinear case—it is clear that

if you know the value of a k−linear form on all k-tuples of basis vectors of V (there

are (dimV )

k

of such), then you can ﬁnd the value of T on any k-tuple of vectors.

Deﬁnition 4 Let T : V

k1

→ R and S : V

k2

→ R be multilinear forms. Then

we deﬁne their tensor product by T ⊗ S : V

k1+k2

→ R by multiplication: (T ⊗

S)(v

1

, v

2

, . . . , v

k1+k2

) = T(v

1

, v

2

, . . . , v

k1

)S(v

k1+1

, v

k1+2

, . . . , v

k1+k2

).

Theorem 5 The k-linear forms dx

i1

⊗ dx

i2

⊗ ⊗ dx

i

k

where 1 < i

j

< n form

a basis for the space of all multilinear k-forms on R

n

. In fact,

T =

T(e

i1

, e

i2

, . . . , e

i

k

)dx

i1

⊗dx

i2

⊗ ⊗dx

i

k

,

where the sum ranges of all n

k

k-tuples of basis vectors.

The proof is as straightforward as the corresponding proof for bilinear forms,

but the notation is something awful.

Question 6 Solution

Hint: dx1

__

1

2

__

= 1.

Hint: dx2

__

−2

4

__

= 4.

Hint: dx2

__

5

6

__

) = 6.

Hint: So putting this all together, we have

(dx1 ⊗dx2 ⊗dx2)

__

1

2

_

,

_

−2

4

_

,

_

5

6

__

= 1 4 6 = 24

(dx1 ⊗dx2 ⊗dx2)

__

1

2

_

,

_

−2

4

_

,

_

5

6

__

= 24.

Question 7 Let T = dx

1

⊗ dx

1

⊗ dx

1

+ 4dx

2

⊗ dx

2

⊗ dx

1

be a trilinear form on

R

2

. Let v =

_

x

y

_

Solution

Hint:

T(v, v, v) = dx1 ⊗dx1 ⊗dx1(

_

x

y

_

,

_

x

y

_

,

_

x

y

_

) + 4dx2 ⊗dx2 ⊗dx1(

_

x

y

_

,

_

x

y

_

,

_

x

y

_

)

= x x x + 4y y x

= x

3

+ 4y

2

x

219

92 Multilinear forms

As a function of x and y, T(v, v, v) = x

3

+ 4 ∗ y

2

∗ x

As this example shows, applying a trilinear form to the same vector three times

gives a polynomial.

Solution

Hint: The monomial x

3

has degree three.

Hint: The monomial 4x

2

x also has degree three.

Hint: So the total degree of each monomial is three.

The total degree of each monomial is 3.

What we are seeing is a special case of the following result.

Theorem 8 Applying a k-linear form to the same vector k-times gives a homo-

geneous polynomial of degree k.

220

93 Symmetry

Various sorts of symmetry are possible for multilinear forms.

Deﬁnition 1 A k-linear form F is a symmetric if

F(v

1

, v

2

, . . . , v

k

) = F(v

i1

, v

i2

, . . . , v

i

k

),

whenever (i

1

, i

2

, . . . , i

k

) is a rearrangement of (1, 2, . . . , k).

Question 2 Let B : R

2

R

2

→R be the bilinear form

B = dx

1

⊗dx

2

+ dx

2

⊗dx

1

.

Solution

Hint: dx1

__

1

2

__

= 1.

Hint: dx2

__

1

2

__

= 2.

Hint: dx1

__

3

4

__

= 3.

Hint: dx2

__

3

4

__

= 4.

Hint: (dx1 ⊗dx2)

__

1

2

_

,

_

3

4

__

= 1 4.

Hint: (dx2 ⊗dx1)

__

1

2

_

,

_

3

4

__

= 2 3.

Hint: B

__

1

2

_

,

_

3

4

__

= 1 4 + 2 3 = 4 + 6 = 10.

B

__

1

2

_

,

_

3

4

__

= 10.

Solution

Hint: B

__

1

2

_

,

_

3

4

__

= 3 2 + 4 1 = 6 + 4 = 10.

B

__

3

4

_

,

_

1

2

__

= 10.

Is the bilinear form B symmetric?

221

93 Symmetry

Solution

(a) Yes.

(b) No.

Now let’s consider trilinear forms.

Let T : R

2

R

2

R

2

→R be the trilinear form

T = dx

1

⊗dx

1

⊗dx

2

+ dx

1

⊗dx

2

⊗dx

1

Is the triilinear form T symmetric?

Solution

(a) Yes.

(b) No.

For example, compare

T

__

1 0

¸

,

_

0 1

¸

,

_

1 0

¸_

to the value of

T

__

0 0

¸

,

_

1 0

¸

,

_

1 0

¸_

.

Can you cook up some examples of symmetric trilinear forms? Sure! Here is an

example:

T = dx

1

⊗dx

1

⊗dx

2

+ dx

1

⊗dx

2

⊗dx

1

+ dx

2

⊗dx

1

⊗dx

1

.

222

94 Higher order derivatives

Higher derivatives of a function are multilinear maps.

The (k + 1)

st

order derivative of a function f : R

n

→ R at a point p is a (k + 1)-

linear form D

k+1

f(p), which allows us to approximate changes in the k

th

order

derivative. This approximation works as follows.

Deﬁnition 1 D

k

(p+ v

k+1

)( v

1

, v

2

, . . . , v

k

) = D

k

(p)( v

1

, v

2

, . . . , v

k

)+D

k+1

(p)( v

1

, v

2

, . . . , v

k

, v

k+1

)+

Error(p)( v

1

, v

2

, . . . , v

k

, v

k+1

) where lim

v1,v2,...,v

k+1

→

0

Error(p)( v

1

, v

2

, . . . , v

k

, v

k+1

)

[ v

1

[[ v

2

[ [ v

k+1

[

=

0

Theorem 2 D

k

f(p) =

∂

k

f

∂x

i1

∂x

i2

∂x

i

k

dx

i1

⊗ dx

i2

⊗ ⊗ dx

i

k

, where the

sum ranges over all k-tuples of basis covectors.

Question 3 f : R

2

→R is deﬁned by f(x, y) = x

2

y.

Solution

Hint: The only terms which are not zero are the terms involving 2 partial derivatives

with respect to x and 1 partial derivative with respect to y.

Hint: So D

3

f =

∂

3

f

∂x∂x∂y

dx ⊗dx ⊗dy +

∂

3

f

∂x∂y∂x

dx ⊗dy ⊗dx +

∂

3

f

∂y∂x∂y

dy ⊗dx ⊗dx

Hint: So D

3

f(0, 0, 0) = 2dx ⊗dx ⊗dy + 2dx ⊗dy ⊗dx + 2dy ⊗dx ⊗dx

Hint: So D

3

f(0, 0, 0)(

_

1

2

_

,

_

3

4

_

,

_

0

1

_

) = 2(2 3 1) + 2(1 4 0) + 2(2 3 0) = 12

D

3

f(0, 0)(

_

1

2

_

,

_

3

4

_

,

_

0

1

_

) =12

Question 4 Assume D

2

f(p) = 3dx

1

⊗ dx

2

+ 3dx

2

⊗ dx

1

. In other words, the

matrix of D

f

(p) is

_

0 3

3 0

_

. Assume D

3

f(p) = dx ⊗dx ⊗dx.

Solution

Hint: By the deﬁnition of higher order derivatives, we have

D

2

f(p +

_

_

0.3

0.2

0.3

_

_

)(v1, v2) ≈ D

2

f(p)( v1, v2) + D

3

f(

_

_

0.3

0.2

0.2

_

_

, v1, v2)

Hint: So

D

2

f(p +

_

_

0.3

0.2

0.3

_

_

)(v1, v2) ≈ 3dx1 ⊗dx2(v1, v2) + 3dx2 ⊗dx1(v1, v2) + dx ⊗dx ⊗dx(

_

_

0.3

0.2

0.3

_

_

, v1, v2)

= 3dx1 ⊗dx2(v1, v2) + 3dx2 ⊗dx1(v1, v2) + 0.3dx ⊗dx( v1, v2)

223

94 Higher order derivatives

Hint: The matrix of this bilinear form is

_

0.3 3

3 0

_

The matrix of the bilinear form D

2

f(p +

_

_

0.3

0.2

0.3

_

_

) is approximately

224

95 Symmetry

In many nice situations, higher-order derivatives are symmetric.

Recall that we once saw the following theorem.

Theorem 1 Let f : R

n

→R be a diﬀerentiable function. Assume that the partial

derivatives f

xi

: R

n

→ R are all diﬀerentiable, and the second partial derivatives

f

xi,xj

are continuous. Then f

xi,xj

= f

xj,xi

.

After interpreting the “second derivative” as a bilinear form, we were then

able to say something nicer (though the hypothesis is stronger, so this is a weaker

theorem).

Theorem 2 Let f : R

n

→R be a continuously twice diﬀerentiable function; then

the bilinear form representing the second derivative is symmetric.

And ﬁnally, we are in a position to formulate the higher-order version of this

theorem.

Theorem 3 Let f : R

n

→ R be a continuously k-times diﬀerentiable function;

then the k-linear form representing the k-th order derivative is a symmetric form.

225

96 Taylor’s theorem

Higher order derivatives give rise to higher order polynomial approximations.

Here is the statement of a statement of Taylor’s theorem for many variables.

Theorem 1 Let f : R

n

→R be a (k + 1)-times diﬀerentiable function. Then

f(p+

h) = f(p)+Df(p)(

h)+

1

2!

D

2

f(p)(

h,

h)+

1

3!

D

3

f(p)(

h,

h,

h)+ +

1

k!

D

k

f(p)(

h

k

)+

1

(k + 1)!

D

k+1

(p+ξ

h)(

h

k+1

)

for some ξ ∈ [0, 1], where we have abbreviated the ordered tuple of i

h

s as

h

i

Let’s apply this to a speciﬁc function.

Question 2 Let f : R

2

→R be deﬁned by f(x, y) = e

x+y

.

Solution

Hint: The second order taylor approximation is 1 + (x + y) +

(x + y)

2

2

Hint: Every partial derivative of this function is e

x+y

, so all of the third partial

derivatives are 1

Hint: So the third derivative is the sum of all of the following terms

• dx ⊗dx ⊗dx

• dx ⊗dx ⊗dy

• dx ⊗dy ⊗dx

• dx ⊗dy ⊗dy

• dy ⊗dx ⊗dx

• dy ⊗dx ⊗dy

• dy ⊗dy ⊗dx

• dy ⊗dy ⊗dy

Hint: Applying this tensor to (

_

x

y

_

,

_

x

y

_

,

_

x

y

_

) we get xxx+xxy +xyx+xyy +yxx+

yxy + yyx + yyy = (x + y)

3

Hint: So the third order taylor expansion is 1 + (x + y) +

(x + y)

2

2

+

(x + y)

3

6

The third order taylor series of f about the point (0, 0) is 1+(x+y)+(x+y)

2

/2+(x+y)

3

/6

226

97 Python

There are numerical examples of higher-order Taylor series.

In this exercise, given a function f, we compute a higher-order Taylor series for f

numerically.

Question 1 Suppose f is a Python function with two real inputs, perhaps

def f(x,y):

return 2.71828182845904**(x+y)

We have a couple of “numerical diﬀerentiation” functions

epsilon = 0.001

def partial_x(f):

return lambda x,y: (f(x+epsilon,y) - f(x,y))/epsilon

def partial_y(f):

return lambda x,y: (f(x,y+epsilon) - f(x,y))/epsilon

We can build a linear approximation function.

epsilon = 0.001

def linear_approximation(f):

return lambda x,y: f(0,0) + x*partial_x(f)(0,0) + y*partial_y(f)(0,0)

It is now your task to build a “quadratic approximation” function.

Solution

Hint: We need only write down the second order Taylor series.

def quadratic_approximation(f):

return lambda x, y: f(0,0) + x*partial_x(f)(0,0) + y*partial_y(f)(0,0) + x*x*partial_x(partial_x(f))(0,0)/2 + y*y*partial_y(partial_y(f))(0,0)/2 + x*y*partial_x(partial_y(f))(0,0)

Python

1 epsilon = 0.001

2 def partial_x(f):

3 return lambda x,y: (f(x+epsilon,y) - f(x,y))/epsilon

4 def partial_y(f):

5 return lambda x,y: (f(x,y+epsilon) - f(x,y))/epsilon

6 def quadratic_approximation(f):

7 return lambda x,y: # the second order Taylor series approximation

8 #

9 # here’s an example to try

10 f = lambda x,y: 2.71828182845904**(x+y)

11 print(quadratic_approximation(f)(0.1,0.2)) # should be about exp(0.3), which is about 1.35

12

13 def validator():

14 f = lambda x,y: x*y

15 if abs(quadratic_approximation(f)(2,3) - 6) > 0.1:

16 return False

17 f = lambda x,y: x*x

18 if abs(quadratic_approximation(f)(2,3) - 4) > 0.1:

227

97 Python

19 return False

20 f = lambda x,y: y*y

21 if abs(quadratic_approximation(f)(2,3) - 9) > 0.1:

22 return False

23 f = lambda x,y: x

24 if abs(quadratic_approximation(f)(2,3) - 2) > 0.1:

25 return False

26 f = lambda x,y: y

27 if abs(quadratic_approximation(f)(2,3) - 3) > 0.1:

28 return False

29 return True

If you like this, you could build a version of this that produces a third order

approximation.

228

98 Denouement

Farewell!

That’s it! You have reached the end of this course.

If you want a high level overview of essentially everything we did in this course,

we recommend that you read this set of lecture notes

1

.

It has been a joy working with all of you and talking to you on the forums. I

am grateful for the many people who submitted pull requests to ﬁx errors in these

notes. Keep an eye on this space: we will hopefully be improving this course by

adding videos, questions, a peer review system for free response answers, and some

interactive 3-D graphics! We also are planning a follow-up course on multivariable

integral calculus using diﬀerential forms

2

.

1

http://math.caltech.edu/

~

ma108a/notesderivatice.pdf

2

http://en.wikipedia.org/wiki/Differential_form

229

- MAT PROBLEMAS MATRIXuploaded byLuisFernando
- Chapter 7.pdfuploaded byoverkilled
- Chapter 7uploaded byoverkilled
- Matrix Multuploaded bysundeepadapa
- Course Outline - Python Economicsuploaded bytonytonycc
- tensorsuploaded bysohaib_321
- 01_Introduction to MATLABuploaded byRafiHunJian
- The Joy of Matrix Mathsuploaded byBryanLai
- Matlab Tutorial_short (1)uploaded byCam Miller
- W Sharpe Full Course and Examsuploaded bymzubair42
- 8-2uploaded bysarasmile2009
- Tutorial_1_Matlab_Sept07uploaded bydans_lw
- Notes on MATLABuploaded bySVR07
- matlabuploaded byArini Partiwi
- Matrix Multuploaded byshafiahmedbd
- Notes on Oleksyn's Programminguploaded byPete Jacopo Belbo Caya
- Matlab Tutorial_General Useuploaded byMasterAlbus3d
- WA51_13417_r1981-t44_Geogr-Polonicauploaded byJacopo Miglioranzi
- Wk13SV_Chapter6uploaded byMaisara Abdullah
- Lecture 2 of 11 (Chap 5, Matrices)uploaded byABC_Ais Batu Campur
- Topic 2 Matrices and System of Linear Equationsuploaded byNorlianah Mohd Shah
- matrices_0.pdfuploaded bychessgeneral
- Takin R to the Very Limitsuploaded byMichail
- Harvard Math-23a (Notes, Problems, Syllabus)uploaded bymaestroarvind
- PDE partuploaded byThanh Minh Nguyen
- Serial Matricesuploaded byrajat835
- matlab tut3uploaded byNeeraj Agrawal
- LADWuploaded byFrancisco Benítez
- PC-10uploaded bytt_aljobory3911
- m40_s12_lecture1uploaded bynarendra

- A Primer for Applying Propensity-Score Matching IDBuploaded bymarkstrayer
- Causal Inference in Statistics_Imbens_Rubinuploaded bymarkstrayer
- Designing Impact Evaluations for Agricultural Projects_Winters_Salazar_Maffioli_2010_IDB.pdfuploaded bymarkstrayer
- EnrichingSummerWorkAnEvaluationOfTheSummerCareerExplorationProgram.pdfuploaded bymarkstrayer
- Workforce Investment Act Non-Experimental Net Impact Eval_Heinrich 2008.pdfuploaded bymarkstrayer

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading