You are on page 1of 67

Chapter 6

Dierential Calculus
In this chapter, it is assumed that all linear spaces and at spaces under
consideration are nite-dimensional.
61 Dierentiation of Processes
Let c be a at space with translation space 1. A mapping p : I c from
some interval I SubR to c will be called a process. It is useful to think
of the value p(t) c as describing the state of some physical system at time
t. In special cases, the mapping p describes the motion of a particle and
p(t) is the place of the particle at time t. The concept of dierentiability for
real-valued functions (see Sect.08) extends without diculty to processes as
follows:
Denition 1: The process p : I c is said to be dierentiable at
t I if the limit

t
p := lim
s0
1
s
(p(t + s) p(t)) (61.1)
exists. Its value
t
p 1 is then called the derivative of p at t. We say
that p is dierentiable if it is dierentiable at all t I. In that case, the
mapping p : I 1 dened by (p)(t) :=
t
p for all t I is called the
derivative of p.
Given n N

, we say that p is n times dierentiable if


n
p : I 1
can be dened by the recursion

1
p := p,
k+1
p := (
k
p) for all k (n 1)
]
. (61.2)
We say that p is of class C
n
if it is n times dierentiable and
n
p is
continuous. We say that p is of class C

if it is of class C
n
for all n N

.
209
210 CHAPTER 6. DIFFERENTIAL CALCULUS
As for a real-valued function, it is easily seen that a process p is contin-
uous at t Domp if it is dierentiable at t. Hence p is continuous if it is
dierentiable, but it may also be continuous without being dierentiable.
In analogy to (08.34) and (08.35), we also use the notation
p
(k)
:=
k
p for all k (n 1)
]
(61.3)
when p is an n-times dierentiable process, and we use
p

:= p
(1)
= p, p

:= p
(2)
=
2
p, p

:= p
(3)
=
3
p, (61.4)
if meaningful.
We use the term process also for a mapping p : I T from some
interval I into a subset T of the at space c. In that case, we use poetic
license and ascribe to p any of the properties dened above if p[
E
has that
property. Also, we write p instead of (p[
E
) if p is dierentiable, etc. (If T
is included in some at T, then one can take the direction space of T rather
than all of 1 as the codomain of p. This ambiguity will usually not cause
any diculty.)
The following facts are immediate consequences of Def.1, and Prop.5 of
Sect.56 and Prop.6 of Sect.57.
Proposition 1: The process p : I c is dierentiable at t I if and
only if, for each in some basis of 1

, the function (p p(t)) : I R is


dierentiable at t.
The process p is dierentiable if and only if, for every at function a
Flf c, the function a p is dierentiable.
Proposition 2: Let c, c

be at spaces and : c c

a at mapping.
If p : I c is a process that is dierentiable at t I, then p : I c

is also dierentiable at t I and

t
( p) = ()(
t
p). (61.5)
If p is dierentiable then (61.5) holds for all t I and we get
( p) = ()p. (61.6)
Let p : I c and q : I c

be processes having the same domain I.


Then (p, q) : I c c

, dened by value-wise pair formation, (see (04.13))


is another process. It is easily seen that p and q are both dierentiable at
t I if and only if (p, q) is dierentiable at t. If this is the case we have

t
(p, q) = (
t
q,
t
q). (61.7)
61. DIFFERENTIATION OF PROCESSES 211
Both p and q are dierentiable if and only if (p, q) is, and, in that case,
(p, q) = (p, q). (61.8)
Let p and q be processes having the same domain I and the same
codomain c. Since the point-dierence (x, y) x y is a at mapping
from c c into 1 whose gradient is the vector-dierence (u, v) u v
from 1 1 into 1 we can apply Prop.2 to obtain
Proposition 3: If p : I c and q : I c are both dierentiable at
t I, so is the value-wise dierence p q : I 1, and

t
(p q) = (
t
p) (
t
q). (61.9)
If p and q are both dierentiable, then (61.9) holds for all t I and we
get
(p q) = p q. (61.10)
The following result generalizes the Dierence-Quotient Theorem stated
in Sect.08.
Dierence-Quotient Theorem: Let p : I c be a process and let
t
1
, t
2
I with t
1
< t
2
. If p[
[t
1
,t
2
]
is continuous and if p is dierentiable at
each t ]t
1
, t
2
[ then
p(t
2
) p(t
1
)
t
2
t
1
Clo Cxh
t
p [ t ]t
1
, t
2
[. (61.11)
Proof: Let a Flf c be given. Then (a p) [
[t
1
,t
2
]
is continuous and,
by Prop.1, a p is dierentiable at each t ]t
1
, t
2
[. By the elementary
Dierence-Quotient Theorem (see Sect.08) we have
(a p)(t
2
) (a p)(t
2
)
t
2
t
1

t
(a p) [ t ]t
1
, t
2
[.
Using (61.5) and (33.4), we obtain
a
_
p(t
2
) p(t
1
)
t
2
t
1
_
(a)
>
(o), (61.12)
where
o :=
t
p [ t ]t
1
, t
2
[.
Since (61.12) holds for all a Flf c we can conclude that b(
p(t
2
)p(t
1
)
t
2
t
1
) 0
holds for all those b Flf 1 that satisfy b
>
(o) P. Using the Half-Space
Intersection Theorem of Sect.54, we obtain the desired result (61.11).
Notes 61
(1) See Note (8) to Sect.08 concerning notations such as
t
p, p,
n
p, p

, p
(n)
, etc.
212 CHAPTER 6. DIFFERENTIAL CALCULUS
62 Small and Conned Mappings
Let 1 and 1

be linear spaces of strictly positive dimension. Consider a


mapping n from a neighborhood of zero in 1 to a neighborhood of zero in
1

. If n(0) = 0 and if n is continuous at 0, then we can say, intuitively, that


n(v) approaches 0 in 1

as v approaches 0 in 1. We wish to make precise


the idea that n is small near 0 1 in the sense that n(v) approaches 0 1

faster than v approaches 0 1.


Denition 1: We say that a mapping n from a neighborhood of 0 in 1
to a neighborhood of 0 in 1

is small near 0 if n(0) = 0 and, for all norms


and

on 1 and 1

, respectively, we have
lim
u0

(n(u))
(u)
= 0. (62.1)
The set of all such small mappings will be denoted by Small(1, 1

).
Proposition 1: Let n be a mapping from a neighborhood of 0 in 1 to a
neighborhood of 0 in 1

. Then the following conditions are equivalent:


(i) n Small(1, 1

).
(ii) n(0) = 0 and the limit-relation (62.1) holds for some norm on 1
and some norm

on 1

.
(iii) For every bounded subset o of 1 and every ^

Nhd
0
(1

) there is a
P

such that
n(sv) s^

for all s ], [ (62.2)


and all v o such that sv Domn.
Proof: (i) (ii): This implication is trivial.
(ii) (iii): Assume that (ii) is valid. Let ^

Nhd
0
(1

) and a bounded
subset o of 1 be given. By Cor.1 to the Cell-Inclusion Theorem of Sect.52,
we can choose b P

such that
(v) b for all v o. (62.3)
By Prop.3 of Sect.53 we can choose P

such that
bCe(

) ^

. (62.4)
Applying Prop.4 of Sect.57 to the assumption (ii) we obtain P

such
that, for all u Domn,

(n(u)) < (u) if 0 < (u) < b. (62.5)


62. SMALL AND CONFINED MAPPINGS 213
Now let v o be given. Then (sv) = [s[(v) [s[b < b for all s ], [
such that sv Domn. Therefore, by (62.5), we have

(n(sv)) < (sv) = [s[(v) [s[b


if sv ,= 0, and hence
n(sv) sbCe(

)
for all s ], [ such that sv Domn. The desired conclusion (62.2) now
follows from (62.4).
(iii) (i): Assume that (iii) is valid. Let a norm on 1, a norm

on 1

, and P

be given. We apply (iii) to the choices o := Ce(),


^

:= Ce(

) and determine P

such that (62.2) holds. If we put


s := 0 in (62.2) we obtain n(0) = 0. Now let u Domn be given such that
0 < (u) < . If we apply (62.2) with the choices s := (u) and v :=
1
s
u,
we see that n(u) (u)Ce(

), which yields

(n(u))
(u)
< .
The assertion follows by applying Prop.4 of Sect.57.
The condition (iii) of Prop.1 states that
lim
s0
1
s
n(sv) = 0 (62.6)
for all v 1 and, roughly, that the limit is approached uniformly as v varies
in an arbitrary bounded set.
We also wish to make precise the intuitive idea that a mapping h from a
neighborhood of 0 in 1 to a neighborhood of 0 in 1

is conned near zero in


the sense that h(v) approaches 0 1

not more slowly than v approaches


0 1.
Denition 2: A mapping h from a neighborhood of 0 in 1 to a neigh-
borhood of 0 in 1

is said to be conned near 0 if for every norm on 1


and every norm

on 1

there is ^ Nhd
0
(1) and P

such that

(h(u)) (u) for all u ^ Domh. (62.7)


The set of all such conned mappings will be denoted by Conf(1, 1

).
Proposition 2: Let h be a mapping from a neighborhood of 0 in 1 to a
neighborhood of 0 in 1

. Then the following are equivalent:


(i) h Conf(1, 1

).
214 CHAPTER 6. DIFFERENTIAL CALCULUS
(ii) There exists a norm on 1, a norm

on 1

, a neighborhood ^ of 0
in 1, and P

such that (62.7) holds.


(iii) For every bounded subset o of 1 there is P

and a bounded subset


o

of 1

such that
h(sv) so

for all s ], [ (62.8)


and all v o such that sv Domh.
Proof: (i) (ii): This is trivial.
(ii) (iii): Assume that (ii) holds. Let a bounded subset o of 1 be
given. By Cor.1 to the Cell-Inclusion Theorem of Sect.52, we can choose
b P

such that
(v) b for all v o.
By Prop.3 of Sect.53, we can determine P

such that bCe() ^.


Hence, by (62.7), we have for all u Domh

(h(u)) (u) if (u) < b. (62.9)


Now let v o be given. Then (sv) = [s[(v) [s[b < b for all s ], [
such that sv Domh. Therefore, by (62.9) we have

(h(sv) (sv) = [s[(v) < [s[b


and hence
h(sv) sbCe(

)
for all s ], [ such that sv Domh. If we put o

:= bCe(

), we obtain
the desired conclusion (62.8).
(iii) (i): Assume that (iii) is valid. Let a norm on 1 and a norm

on 1

be given. We apply (iii) to the choice o := Ce() and determine o

and
P

such that (62.8) holds. Since o

is bounded, we can apply the Cell-


Inclusion Theorem of Sect.52 and determine P

such that o

Ce(

).
We put ^ := Ce() Domh, which belongs to Nhd
0
(1). If we put s := 0
in (62.8) we obtain h(0) = 0, which shows that (62.7) holds for u := 0.
Now let u ^

be given, so that 0 < (u) < . If we apply (62.8) with the


choices s := (u) and v :=
1
s
u, we see that
h(u) (u)o

(u)Ce(

),
which yields the assertion (62.7).
The following results are immediate consequences of the denition and
of the properties on linear mappings discussed in Sect.52:
62. SMALL AND CONFINED MAPPINGS 215
(I) Value-wise sums and value-wise scalar multiples of mappings that are
small [conned] near zero are again small [conned] near zero.
(II) Every mapping that is small near zero is also conned near zero, i.e.
Small(1, 1

) Conf(1, 1

).
(III) If h Conf(1, 1

), then h(0) = 0 and h is continuous at 0.


(IV) Every linear mapping is conned near zero, i.e.
Lin(1, 1

) Conf(1, 1

).
(V) The only linear mapping that is small near zero is the zero-mapping,
i.e.
Lin(1, 1

) Small(1, 1

) = 0.
Proposition 3: Let 1, 1

, 1

be linear spaces and let h


Conf(1, 1

) and k Conf(1

, 1

) be such that Domk = Codh. Then


k h Conf(1, 1

). Moreover, if one of k or h is small near zero so is


k h.
Proof: Let norms ,

on 1, 1

, 1

, respectively, be given. Since h


and k are conned we can nd ,

and ^ Nhd
0
(1), ^

Nhd
0
(1

)
such that

((k h)(u)

(h(u))

(u) (62.10)
for all u ^ Domh such that h(u) ^

Domk, i.e. for all


u ^ h
<
(^

Domk). Since h is continuous at 0 1, we have


h
<
(^

Domk) Nhd
0
(1) and hence ^ h
<
(^

Domk) Nhd
0
(1).
Thus, (62.7) remains satised when we replace h, and ^ by kh,

, and
^ h
<
(^

Domk), respectively, which shows that k h Conf(1, 1

).
Assume now, that one of k and h, say h, is small. Let P

be given.
Then we can choose ^ Nhd
0
(1) such that

(h(u)) (u) holds for all


u ^Domh with :=

. Therefore, (62.10) gives

((kh)(u)) (u)
for all u ^ h
<
(^

Domk). Since P

was arbitrary this proves


that
lim
u0

((k h)(u))
(u)
= 0,
i.e. that k h is small near zero.
Now let c and c

be at spaces with translation spaces 1 and 1

, respec-
tively.
216 CHAPTER 6. DIFFERENTIAL CALCULUS
Denition 3: Let x c be given. We say that a mapping from a
neighborhood of x c to a neighborhood of 0 1

is small near x if the


mapping v (x + v) from (Dom) x to Cod is small near 0. The
set of all such small mappings will be denoted by Small
x
(c, 1

).
We say that a mapping from a neighborhood of x c to a neighborhood
of (x) c

is conned near x if the mapping v ((x+v) (x)) from


(Dom) x to (Cod) (x) is conned near zero.
The following characterization is immediate.
Proposition 4: The mapping is conned near x c if and only if
for every norm on 1 and every norm

on 1

there is ^ Nhd
x
(c) and
P

such that

((y) (x)) (y x) for all y ^. (62.11)


We now state a few facts that are direct consequences of the denitions,
the results (I)(V) stated above, and Prop.3:
(VI) Value-wise sums and dierences of mappings that are small [conned]
near x are again small [conned] near x. Here, sum can mean either
the sum of two vectors or sum of a point and a vector, while dier-
ence can mean either the dierence of two vectors or the dierence
of two points.
(VII) Every Small
x
(c, 1

) is conned near x.
(VIII) If a mapping is conned near x it is continuous at x.
(IX) A at mapping : c c

is conned near every x c.


(X) The only at mapping : c 1

that is small near some x c is


the constant 0
EV
.
(XI) If is conned near x c and if is a mapping with Dom = Cod
that is conned near (x) then is conned near x.
(XII) If Small
x
(c, 1

) and h Conf(1

, 1

) with Cod = Domh, then


h Small
x
(c, 1

).
(XIII) If is conned near x c and if is a mapping with Dom = Cod
that is small near (x) then Small
x
(c, 1

), where 1

is the
linear space for which Cod Nhd
0
(1

).
(XIV) An adjustment of a mapping that is small [conned] near x is again
small [conned] near x, provided only that the concept small [conned]
near x remains meaningful after the adjustment.
63. GRADIENTS, CHAIN RULE 217
Notes 62
(1) In the conventional treatments, the norms and

in Defs.1 and 2 are assumed


to be prescribed and xed. The notation n = o(), and the phrase n is small oh
of , are often used to express the assertion that n Small(V, V

). The notation
h = O(), and the phrase h is big oh of , are often used to express the assertion
that h Conf(V, V

). I am introducing the terms small and conned here


for the rst time because I believe that the conventional terminology is intolerably
awkward and involves a misuse of the = sign.
63 Gradients, Chain Rule
Let I be an open interval in R. One learns in elementary calculus that if a
function f : I R is dierentiable at a point t I, then the graph of f has
a tangent at (t, f(t)). This tangent is the graph of a at function a Flf(R).
Using poetic license, we refer to this function itself as the tangent to f at
t I. In this sense, the tangent a is given by a(r) := f(t) +(
t
f)(r t) for
all r R.
If we put := f a[
I
, then (r) = f(r) f(t) (
t
f)(r t) for all r I.
We have lim
s0
(t+s)
s
= 0, from which it follows that Small
t
(R, R).
One can use the existence of a tangent to dene dierentiability at t. Such
a denition generalizes directly to mappings involving at spaces.
Let c, c

be at spaces with translation spaces 1, 1

, respectively. We
consider a mapping : T T

from an open subset T of c into an open


subset T

of c

.
Proposition 1: Given x T, there can be at most one at mapping
: c c

such that the value-wise dierence [


D
: T 1

is small
near x.
Proof: If the at mappings
1
,
2
both have this property, then the
value-wise dierence (
2

1
)[
D
= (
1
[
D
) (
2
[
D
) is small near
218 CHAPTER 6. DIFFERENTIAL CALCULUS
x c. Since
2

1
is at, it follows from (X) and (XIV) of Sect.62 that

1
is the zero mapping and hence that
1
=
2
.
Denition 1: The mapping : T T

is said to be dierentiable
at x T if there is a at mapping : c c

such that
[
D
Small
x
(c, 1

). (63.1)
This (unique) at mapping is then called the tangent to at x. The
gradient of at x is dened to be the gradient of and is denoted by

x
:= . (63.2)
We say that is dierentiable if it is dierentiable at all x T. If this
is the case, the mapping
: T Lin(1, 1

) (63.3)
dened by
()(x) :=
x
for all x T (63.4)
is called the gradient of . We say that is of class C
1
if it is dierentiable
and if its gradient is continuous. We say that is twice dierentiable
if it is dierentiable and if its gradient is also dierentiable. The gradient
of is then called the second gradient of and is denoted by

(2)
:= () : T Lin(1, Lin(1, 1

))

= Lin
2
(1
2
, 1

). (63.5)
We say that is of class C
2
if it is twice dierentiable and if
(2)
is
continuous.
If the subsets T and T

are arbitrary, not necessarily open, and if x


Int T, we say that : T T

is dierentiable at x if [
E
Int D
is dierentiable
at x and we write
x
for
x
([
E
Int D
).
The dierentiability properties of a mapping remain unchanged if the
codomain of is changed to any open subset of c

that includes Rng . The


gradient of remains unaltered. If Rng is included in some at T

in c

,
one may change the codomain to a subset that is open in T

. in that case,
the gradient of at a point x T must be replaced by the adjustment

x
[
U

of
x
, where |

is the direction space of T

.
The dierentiability and the gradient of a mapping at a point depend
only on the values of the mapping near that point. To be more precise, let
1
and
2
be two mappings whose domains are neighborhoods of a given x c
and whose codomains are open subsets of c

. Assume that
1
and
2
agree
63. GRADIENTS, CHAIN RULE 219
on some neighborhood of x, i.e. that
1
[
E

N
=
2
[
E

N
for some ^ Nhd
x
(c).
Then
1
is dierentiable at x if and only if
2
is dierentiable at x. If this
is the case, we have
x

1
=
x

2
.
Every at mapping : c c

is dierentiable. The tangent of at


every point x c is itself. The gradient of as dened in this section
is the constant ()
ELin(V,V

)
whose value is the gradient of in the
sense of Sect.33. The gradient of a linear mapping is the constant whose
value is this linear mapping itself.
In the case when c := 1 := R and when T := I is an open interval,
dierentiability at t I of a process p : I T

in the sense of Def.1 above


reduces to dierentiability of p at t in the sense of Def.1 of Sect.1. The
gradient
t
p Lin(R, 1

) becomes associated with the derivative


t
p 1

by the natural isomorphism from 1

to Lin(R, 1

), so that
t
p = (
t
p) and

t
p = (
t
p)1 (see Sect.25). If I is an interval that is not open and if t is an
endpoint of it, then the derivative of p : I T

at t, if it exists, cannot be
associated with a gradient.
If is dierentiable at x then it is conned and hence continuous at x.
This follows from the fact that its tangent , being at, is conned near x
and that the dierence [
D
, being small near x, is conned near x. The
converse is not true. For example, it is easily seen that the absolute-value
function (t [t[) : R R is conned near 0 R but not dierentiable at 0.
The following criterion is immediate from the denition:
Characterization of Gradients: The mapping : T T

is dif-
ferentiable at x T if and only if there is an L Lin(1, 1

) such that
n : (T x) 1, dened by
n(v) := ((x +v) (x)) Lv for all v T x, (63.6)
is small near 0 in 1. If this is the case, then
x
= L.
Let T, T
1
, T
2
be open subsets of at spaces c, c
1
, c
2
with translation
spaces 1, 1
1
, 1
2
, respectively. The following result follows immediately from
the denitions if we use the term-wise evaluations (04.13) and (14.12).
Proposition 2: The mapping (
1
,
2
) : T T
1
T
2
is dierentiable
at x T if and only if both
1
and
2
are dierentiable at x. If this is the
case, then

x
(
1
,
2
) = (
x

1
,
x

2
) Lin(1, 1
1
) Lin(1, 1
2
)

= Lin(1, 1
1
1
2
)
(63.7)
General Chain Rule: Let T, T

, T

be open subsets of at spaces


c, c

, c

with translation spaces 1, 1

, 1

, respectively. If : T T

is
220 CHAPTER 6. DIFFERENTIAL CALCULUS
dierentiable at x T and if : T

is dierentiable at (x), then the


composite : T T

is dierentiable at x. The tangent to the compos-


ite at x is the composite of the tangent to at x and the tangent to
at (x) and we have

x
( ) = (
(x)
)(
x
). (63.8)
If and are both dierentiable, so is , and we have
( ) = ( )(), (63.9)
where the product on the right is understood as value-wise composition.
Proof: Let be the tangent to at x and the tangent to at (x).
Then
:= [
D
Small
x
(c, 1

),
:= [
D
Small
(x)
(c

, 1

).
We have
= ( +) ( +)
= ( +) + ( +)
= + () + ( +)
where domain restriction symbols have been omitted to avoid clutter. It
follows from (VI), (IX), (XII), and (XIII) of Sect.62 that () + (+
) Small
x
(c, 1

), which means that


Small
x
(c, 1

).
If follows that is dierentiable at x with tangent . The assertion
(63.8) follows from the Chain Rule for Flat Mappings of Sect.33.
Let : T c

and : T c

both be dierentiable at x T. Then


the value-wise dierence : T 1

is dierentiable at x and

x
( ) =
x

x
. (63.10)
This follows from Prop.2, the fact that the point-dierence (x

, y

) (x

)
is a at mapping from c

into 1

, and the General Chain Rule.


When the General Chain Rule is applied to the composite of a vector-
valued mapping with a linear mapping it yields
63. GRADIENTS, CHAIN RULE 221
Proposition 3: If h : T 1

is dierentiable at x T and if L
Lin(1

, ), where is some linear space, then Lh : T is dierentiable


at x T and

x
(Lh) = L(
x
h). (63.11)
Using the fact that vector-addition, transpositions of linear mappings,
and the trace operations are all linear operations, we obtain the following
special cases of Prop.3:
(I) Let h : T 1

and k : T 1

both be dierentiable at x T.
Then the value-wise sum h +k : T 1

is dierentiable at x and

x
(h +k) =
x
h +
x
k. (63.12)
(II) Let and Z be linear spaces and let F : T Lin(, Z) be dif-
ferentiable at x. If F

: T Lin(Z

) is dened by value-wise
transposition, then F

is dierentiable at x T and

x
(F

)v = ((
x
F)v)

for all v 1. (63.13)


In particular, if I is an open interval and if F : I Lin(, Z) is
dierentiable, so is F

: I Lin(Z

), and
(F

= (F

. (63.14)
(III) Let be a linear space and let F : T Lin() be dierentiable at x.
If trF : T R is the value-wise trace of F, then trF is dierentiable
at x and
(
x
(trF))v = tr((
x
F)v) for all v 1. (63.15)
In particular, if I is an open interval and if F : I Lin() is dier-
entiable, so is trF : I R, and
(trF)

= tr(F

). (63.16)
We note three special cases of the General Chain Rule: Let I be an
open interval and let T and T

be open subsets of c and c

, respectively. If
p : I T and : T T

are dierentiable, so is p : I T

, and
( p)

= (() p)p

. (63.17)
222 CHAPTER 6. DIFFERENTIAL CALCULUS
If : T T

and f : T

R are dierentiable, so is f : T R, and


(f ) = ()

((f) ) (63.18)
(see (21.3)). If f : T I and p : I T

are dierentiable, so is p f, and


(p f) = (p

f) f. (63.19)
Notes 63
(1) Other common terms for the concept of gradient of Def.1 are dierential,
Frechet dierential, derivative, and Frechet derivative. Some authors make
an articial distinction between gradient and dierential. We cannot use
derivative because, for processes, gradient and derivative are distinct though
related concepts.
(2) The conventional denitions of gradient depend, at rst view, on the prescription
of a norm. Many texts never even mention the fact that the gradient is a norm-
invariant concept. In some contexts, as when one deals with genuine Euclidean
spaces, this norm-invariance is perhaps not very important. However, when one
deals with mathematical models for space-time in the theory of relativity, the norm-
invariance is crucial because it shows that the concepts of dierential calculus have
a Lorentz-invariant meaning.
(3) I am introducing the notation
x
for the gradient of at x because the more
conventional notation (x) suggests, incorrectly, that (x) is necessarily the
value at x of a gradient-mapping . In fact, one cannot dene the gradient-
mapping without rst having a notation for the gradient at a point (see 63.4).
(4) Other notations for
x
in the literature are d(x), D(x), and

(x).
(5) I conjecture that the Chain of Chain Rule comes from an old terminology that
used chaining (in the sense of concatenation) for composition. The term
Composition Rule would need less explanation, but I retained Chain Rule
because it is more traditional and almost as good.
64 Constricted Mappings
In this section, T and T

denote arbitrary subsets of at spaces c and c

with translation spaces 1 and 1

, respectively.
Denition 1: We say that the mapping : T T

is constricted if
for every norm on 1 and every norm

on 1

there is P

such that

((y) (x)) (y x) (64.1)


64. CONSTRICTED MAPPINGS 223
holds for all x, y T. The inmum of the set of all P

for which (64.1)


holds for all x, y T is called the striction of relative to and

; it is
denoted by str(; ,

).
It is clear that
str(; ,

) = sup
_

((y) (x))
(y x)

x, y T, x ,= y
_
. (64.2)
Proposition 1: For every mapping : T T

, the following are


equivalent:
(i) is constricted.
(ii) There exist norms and

on 1 and 1

, respectively, and P

such
that (64.1) holds for all x, y T.
(iii) For every bounded subset c of 1 and every ^

Nhd
0
(1

) there is
P

such that
x y sc = (x) (y) s^

(64.3)
for all x, y T and s P

.
Proof: (i) (ii): This implication is trivial.
(ii) (iii): Assume that (ii) holds, and let a bounded subset c of 1
and ^

Nhd
0
(1

) be given. By Cor.1 of the Cell-Inclusion Theorem of


Sect.52, we can choose b P

such that
(u) b for all u c. (64.4)
By Prop.3 of Sect.53, we can choose P

such that Ce(

) ^

. Now
let x, y T and s P

be given and assume that x y sc. Then


1
s
(x y) c and hence, by (64.4), we have (
1
s
(x y)) b, which gives
(x y) sb. Using (64.1) we obtain

((y) (x)) sb. If we put


:=
b

, this means that


(y) (x) sCe(

) s^

,
i.e. that (64.3) holds.
(iii) (i): Assume that (iii) holds and let a norm on 1 and a norm

on 1

be given. We apply (iii) with the choices c := Bdy Ce() and


^

:= Ce(

). Let x, y T with x ,= y be given. If we put s := (x y)


224 CHAPTER 6. DIFFERENTIAL CALCULUS
then xy sBdy Ce() and hence, by (64.3), (x)(y) sCe(

), which
implies

((x) (y)) < s = (x y).


Thus, if we put := , then (64.1) holds for all x, y T.
If T and T

are open sets and if : T T

is constricted, it is conned
near every x T, as is evident from Prop.4 of Sect.62. The converse is not
true. For example, one can show that the function f : ]1, 1[R dened
by
f(t) :=
_
t sin(
1
t
) if t ]0, 1[
0 if t ]1, 0]
_
(64.5)
is not constricted, but is conned near every t ]1, 1[.
Every at mapping : c c

is constricted and
str(; ,

) = [[[[
,
,
where [[ [[
,
is the operator norm on Lin(1, 1

) corresponding to and

(see Sect.52).
Constrictedness and strictions remain unaected by a change of codo-
main. If the domain of a constricted mapping is restricted, then it remains
constricted and the striction of the restriction is less than or equal to the
striction of the original mapping. (Pardon the puns.)
Proposition 2: If : T T

is constricted then it is uniformly


continuous.
Proof: We use condition (iii) of Prop.1. Let ^

Nhd
0
(1

) be given.
We choose a bounded neighborhood c of 0 in 1 and determine P

according to (iii). Putting ^ :=


1

c Nhd
0
(1) and s :=
1

, we see that
(64.3) gives
x y ^ = (x) (y) ^

for all x, y T.
Pitfall: The converse of Prop.2 is not valid. A counterexample is the
square-root function

: P P, which is uniformly continuous but not
constricted. Another counterexample is the function dened by (64.5).
The following result is the most useful criterion for showing that a given
mapping is constricted.
Striction Estimate for Dierentiable Mappings: Assume that T
is an open convex subset of c, that : T T

is dierentiable and that the


gradient : T Lin(1, 1

) has a bounded range. Then is constricted


and
str(; ,

) sup[[
z
[[
,
[ z T (64.6)
64. CONSTRICTED MAPPINGS 225
for all norms ,

on 1, 1

, respectively.
Proof: Let x, y T. Since T is convex, we have tx + (1 t)y T for
all t [0, 1] and hence we can dene a process
p : [0, 1] T

by p(t) := (tx + (1 t)y).


By the Chain Rule, p is dierentiable at t when t ]0, 1[ and we have

t
p = (
z
)(x y) with z := tx + (1 t)y T.
Applying the Dierence-Quotient Theorem of Sect.61 to p, we obtain
(x) (y) = p(1) p(0) Clo Cxh(
z
)(x y) [ z T. (64.7)
If ,

are norms on 1, 1

then, by (52.7),

((
z
)(x y)) [[
z
[[
,
(x y) (64.8)
for all z T. To say that has a bounded range is equivalent, by Cor.1
to the Cell-Inclusion Theorem of Sect.52, to
:= sup[[
z
[[
,
[ z T < . (64.9)
It follows from (64.8) that

((
z
)(xy) (xy) for all z T, which
can be expressed in the form
(
z
)(x y) (x y)Ce(

) for all z T.
Since the set on the right is closed and convex we get
Clo Cxh(
z
)(y x) [ z T (x y)Ce(

)
and hence, by (64.7), (x) (y) (xy)Ce(

). This may be expressed


in the form

((x) (y)) (x y).


Since x, y T were arbitrary it follows that is constricted. The denition
(64.9) shows that (64.6) holds.
Remark: It is not hard to prove that the inequality in (64.6) is actually
an equality.
Proposition 3: If T is a non-empty open convex set and if : T T

is dierentiable with gradient zero, then is constant.


Proof: Choose norms ,

on 1, 1

, respectively. The assumption =


0 gives [[
z
[[
,
= 0 for all z T. Hence, by (64.6), we have str(, ,

) =
226 CHAPTER 6. DIFFERENTIAL CALCULUS
0. Using (64.2), we conclude that

((y)(x)) = 0 and hence (x) = (y)


for all x, y T.
Remark: In Prop.3, the condition that T be convex can be replaced by
the weaker one that T be connected. This means, intuitively, that every
two points in T can be connected by a continuous curve entirely within T.
Proposition 4: Assume that T and T

are open subsets and that


: T T

is of class C
1
. Let k be a compact subset of T. For every
norm on 1, every norm

on 1

, and every P

there exists P

such that k + Ce() T and such that, for every x k, the function
n
x
: Ce() 1

dened by
n
x
(v) := (x +v) (x) (
x
)v (64.10)
is constricted with
str(n
x
; ,

) . (64.11)
Proof: Let a norm on 1 be given. By Prop.6 of Sect.58, we can
obtain
1
P

such that k +
1
Ce() T. For each x k, we dene
m
x
:
1
Ce() 1

by
m
x
(v) := (x +v) (x) (
x
)v. (64.12)
Dierentiation gives

v
m
x
= (x +v) (x) (64.13)
for all x k and all v
1
Ce(). Since k +
1
Ce() is compact by Prop.6
of Sect.58 and since is continuous, it follows by the Uniform Continuity
Theorem of Sect.58 that [
k+
1
Ce()
is uniformly continuous. Now let a
norm

on 1

and P

be given. By Prop.4 of Sect.56, we can then


determine
2
P

such that
(y x) <
2
= [[(y) (x)[[
,
<
for all x, y k +
1
Ce(). In view of (64.13), it follows that
v
2
Ce() = [[
v
m
x
[[
,
<
for all x k and all v
1
Ce(). If we put := min
1
,
2
and if we dene
n
x
:= m
x
[
Ce()
for every x k, we see that [[
v
n
x
[[
,
[ v Ce()
is bounded by for all x k. By (64.12) and the Striction Estimate for
Dierentiable Mappings, the desired result follows.
64. CONSTRICTED MAPPINGS 227
Denition 2: Let : T T be a constricted mapping from a set T
into itself. Then
str() := infstr(; , ) [ a norm on 1 (64.14)
is called the absolute striction of . If str() < 1 we say that is a
contraction.
Contraction Fixed Point Theorem: Every contraction has at most
one xed point and, if its domain is closed and not empty, it has exactly one
xed point.
Proof: Let : T T be a contraction. We choose a norm on 1 such
that := str(; , ) < 1 and hence, by (64.1),
((x) (y)) (x y) for all x, y T. (64.15)
If x and y are xed points of , so that (x) = x, (y) = y, then
(64.15) gives (xy)(1 ) 0. Since 1 > 0 this is possible only when
(x y) = 0 and hence x = y. Therefore, can have at most one xed
point.
We now assume T ,= , choose s
0
T arbitrarily, and dene
s
n
:=
n
(s
0
) for all n N

(see Sect.03). It follows from (64.15) that


(s
m+1
s
m
) = ((s
m
) (s
m1
)) (s
m
s
m1
)
for all m N

. Using induction, one concludes that


(s
m+1
s
m
)
m
(s
1
s
0
) for all m N.
Now, if n N and r N, then
s
n+r
s
n
=

kr
[
(s
n+k+1
s
n+k
)
and hence
(s
n+r
s
n
)

kr
[

n+k
(s
1
s
0
)

n
1
(s
1
s
0
).
Since < 1, we have lim
n

n
= 0, and it follows that for every P

there is an m N such that (s


n+r
s
n
) < whenever n m+N, r N. By
228 CHAPTER 6. DIFFERENTIAL CALCULUS
the Basic Convergence Criterion of Sect.55 it follows that s := (s
n
[ n N)
converges. Since (s
n
) = s
n+1
for all n N, we have s = (s
n+1
[ n N),
which converges to the same limit as s. We put
x := lim s = lim( s).
Now assume that T is closed. By Prop.6 of Sect.55 it follows that x T.
Since is continuous, we can apply Prop.2 of Sect.56 to conclude that
lim( s) = (lim s), i.e. that (x) = x.
Notes 64
(1) The traditional terms for constricted and striction are Lipschitzian and
Lipschitz number (or constant), respectively. I am introducing the terms con-
stricted and striction here because they are much more descriptive.
(2) It turns out that the absolute striction of a lineon coincides with what is often
called its spectral radius.
(3) The Contraction Fixed Point Theorem is often called the Contraction Mapping
Theorem or the Banach Fixed Point Theorem.
65 Partial Gradients, Directional Derivatives
Let c
1
, c
2
be at spaces with translation spaces 1
1
, 1
2
. As we have seen in
Sect.33, the set-product c := c
1
c
2
is then a at space with translation
space 1 := 1
1
1
2
. We consider a mapping : T T

from an open
subset T of c into an open subset T

of a at space c

with translation
space 1

. Given any x
2
c
2
, we dene (, x
2
) : c
1
c according to (04.21),
put
T
(,x
2
)
:= (, x
2
)
<
(T) = z c
1
[ (z, x
2
) T, (65.1)
which is an open subset of c
1
because (, x
1
) is at and hence continuous,
and dene (, x
2
) : T
(,x
2
)
T

according to (04.22). If (, x
2
) is dif-
ferentiable at x
1
for all (x
1
, x
2
) T we dene the partial 1-gradient

(1)
: T Lin(1
1
, 1

) of by

(1)
(x
1
, x
2
) :=
x
1
(, x
2
) for all (x
1
, x
2
) T. (65.2)
In the special case when c
1
:= R, we have 1
1
= R, and the partial
1-derivative
,1
: T 1

, dened by

,1
(t, x
2
) := ((, x
2
))(t) for all (t, x
2
) T, (65.3)
65. PARTIAL GRADIENTS, DIRECTIONAL DERIVATIVES 229
is related to
(1)
by
(1)
=
,1
and
,1
= (
(1)
)1 (value-wise).
Similar denitions are employed to dene T
(x
1
,)
, the mapping
(x
1
, ) : T
(x
1
,)
T

, the partial 2-gradient


(2)
: T Lin(1
2
, 1

) and
the partial 2-derivative
,2
: T 1

.
The following result follows immediately from the denitions.
Proposition 1: If : T T

is dierentiable at x := (x
1
, x
2
) T,
then the gradients
x
1
(, x
2
) and
x
2
(x
1
, ) exist and
(
x
)v = (
x
1
(, x
2
))v
1
+ (
x
2
(x
1
, ))v
2
(65.4)
for all v := (v
1
, v
2
) 1.
If is dierentiable, then the partial gradients
(1)
and
(2)
exist
and we have
=
(1)

(2)
(65.5)
where the operation on the right is understood as value-wise application
of (14.13).
Pitfall: The converse of Prop.1 is not true: A mapping can have
partial gradients without being dierentiable. For example, the mapping
: R R R, dened by
(s, t) :=
_
st
s
2
+t
2
if (s, t) ,= (0, 0)
0 if (s, t) = (0, 0)
_
,
has partial derivatives at (0, 0) since both (, 0) and (0, ) are equal to the
constant 0. Since (t, t) =
1
2
for t R

but (0, 0) = 0, it is clear that is


not even continuous at (0, 0), let alone dierentiable.
The second assertion of Prop.1 shows that if is of class C
1
, then
(1)

and
(2)
exist and are continuous. The converse of this statement is true,
but the proof is highly nontrivial:
Proposition 2: Let c
1
, c
2
, c

be at spaces. Let T be an open subset of


c
1
c
2
and T

an open subset of c

. A mapping : T T

is of class C
1
if
(and only if ) the partial gradients
(1)
and
(2)
exist and are continuous.
Proof: Let (x
1
, x
2
) T. Since T (x
1
, x
2
) is a neighborhood of (0, 0)
in 1
1
1
2
, we may choose /
1
Nhd
0
(1
1
) and /
2
Nhd
0
(1
2
) such that
/:= /
1
/
2
(T (x
1
, x
2
)).
We dene m : / 1

by
m(v
1
, v
2
) := ((x
1
, x
2
) + (v
1
, v
2
)) (x
1
, x
2
)
(
(1)
(x
1
, x
2
)v
1
+
(2)
(x
1
, x
2
)v
2
).
230 CHAPTER 6. DIFFERENTIAL CALCULUS
It suces to show that m Small(1
1
1
2
, 1

), for if this is the case, then


the Characterization of Gradients of Sect.63 tells us that is dierentiable
at (x
1
, x
2
) and that its gradient at (x
1
, x
2
) is given by (65.4). In addition,
since (x
1
, x
2
) T is arbitrary, we can then conclude that (65.5) holds and,
since both
(1)
and
(2)
are continuous, that is continuous.
We note that m := h+k, when h : / 1

and k : / 1

are dened
by
h(v
1
, v
2
) := (x
1
, x
2
+v
2
) (x
1
, x
2
)
(2)
(x
1
, x
2
)v
2
,
k(v
1
, v
2
) := (x
1
+v
1
, x
2
+v
2
) (x
1
, x
2
+v
2
)
(1)
(x
1
, x
2
)v
1
.
_
(65.6)
The dierentiability of (x
1
, ) at x
2
insures that h belongs to
Small(1
1
1
2
, 1

) and it only remains to be shown that k belongs to


Small(1
1
1
2
, 1

).
We choose norms
1
,
2
, and

on 1
1
, 1
2
and 1

, respectively. Let
P

be given. Since
(1)
is continuous at (x
1
, x
2
), there are open convex
neighborhoods ^
1
and ^
2
of 0 in 1
1
and 1
2
, respectively, such that ^
1

/
1
, ^
2
/
2
, and
[[
(1)
((x
1
, x
2
) + (v
1
, v
2
))
(1)
(x
1
, x
2
) [[

1
,
<
for all (v
1
, v
2
) ^
1
^
2
. Noting (65.6), we see that this is equivalent to
[[
v
1
k(, v
2
) [[

1
,
< for all v
1
^
1
, v
2
^
2
.
Applying the Striction Estimate of Sect.64 to k(, v
2
)[
N
1
, we infer that
str(k(, v
2
)[
N
1
;
1
,

) . It is clear from (65.6) that k(0, v


2
) = 0 for
all v
2
^
2
. Hence we can conclude, in view of (64.1), that

(k(v
1
, v
2
))
1
(v
1
) max
1
(v
1
),
2
(v
2
) for all v
1
^
1
, v
2
^
2
.
Since P

was arbitrary, it follows that


lim
(v
1
,v
2
)(0,0)

(k(v
1
, v
2
))
max
1
(v
1
),
2
(v
2
)
= 0,
which, in view of (62.1) and (51.23), shows that k Small(1
1
1
2
, 1

) as
required.
Remark: In examining the proof above one observes that one needs
merely the existence of the partial gradients and the continuity of one of
them in order to conclude that the mapping is dierentiable.
We now generalize Prop.2 to the case when c := (c
i
[ i I) is the
set-product of a nite family (c
i
[ i I) of at spaces. This product c is a
65. PARTIAL GRADIENTS, DIRECTIONAL DERIVATIVES 231
at space whose translation space is the set-product 1 := (1
i
[ i I) of
the translation spaces 1
i
of the c
i
, i I. Given any x c and j I, we
dene the mapping (x.j) : c
j
c according to (04.24).
We consider a mapping : T T

from an open subset T of c into an


open subset T

of c

. Given any x T and j I, we put


T
(x.j)
:= (x.j)
<
(T) c
j
(65.7)
and dene (x.j) : T
(x.j)
T

according to (04.25). If (x.j) is dieren-


tiable at x
j
for all x T, we dene the partial j-gradient
(j)
: T
Lin(1
j
, 1

) of by
(j)
(x) :=
x
j
(x.j) for all x T.
In the special case when c
j
:= R for some j I, the partial j-
derivative
,j
: T 1

, dened by

,j
(x) := ((x.j))(x
j
) for all x T, (65.8)
is related to
(j)
by
(j)
=
,j
and
,j
= (
(j)
)1.
In the case when I := 1, 2, these notations and concepts reduce to the
ones explained in the beginning (see (04.21)).
The following result generalizes Prop.1.
Proposition 3: If : T T

is dierentiable at x = (x
i
[ i I) T,
then the gradients
x
j
(x.j) exist for all j I and
(
x
)v =

jI
(
x
j
(x.j))v
j
(65.9)
for all v = (v
i
[ i I) 1.
If is dierentiable, then the partial gradients
(j)
exist for all j I
and we have
=

jI
(
(j)
), (65.10)
where the operation

on the right is understood as value-wise application


of (14.18).
In the case when c := R
I
, we can put c
i
:= R for all i I and (65.10)
reduces to
= lnc
(
,i
| iI)
, (65.11)
where the value at x T of the right side is understood to be the linear-
combination mapping of the family (
,i
(x) [ i I) in 1

. If moreover, c

is also a space of the form c

:= R
K
with some nite index set K, then
232 CHAPTER 6. DIFFERENTIAL CALCULUS
lnc
(
,i
| iI)
can be identied with the function which assigns to each x T
the matrix
(
k,i
(x) [ (k, i) K I) R
KI

= Lin(R
I
, R
K
).
Hence can be identied with the matrix
= (
k,i
[ (k, i) K I). (65.12)
of the partial derivatives
k,i
: T R of the component functions. As we
have seen, the mere existence of these partial derivatives is not sucient
to insure the existence of . Only if is known a priori to exist is it
possible to use the identication (65.12).
Using the natural isomorphism between
(c
i
[ i I) and c
j
((c
i
[ i I j))
and Prop.2, one easily proves, by induction, the following generalization.
Partial Gradient Theorem: Let I be a nite index set and let c
i
, i
I and c

be at spaces. Let T be an open subset of c :=(c


i
[ i I) and
T

an open subset of c

. A mapping : T T

is of class C
1
if and only
if the partial gradients
(i)
exist and are continuous for all i I.
The following result is an immediate corollary:
Proposition 4: Let I and K be nite index sets and let T and T

be
open subsets of R
I
and R
K
, respectively. A mapping : T T

is of
class C
1
if and only if the partial derivatives
k,i
: T R exist and are
continuous for all k K and all i I.
Let c, c

be at spaces with translation spaces 1 and 1

, respectively.
Let T, T

be open subsets of c and c

, respectively, and consider a mapping


: T T

. Given any x T and v 1

, the range of the mapping


(s (x + sv)) : R c is a line through x whose direction space is Rv.
Let S
x,v
:= s R [ x + sv T be the pre-image of T under this
mapping and x + 1
S
x,v
v : S
x,v
T a corresponding adjustment of the
mapping. Since T is open, S
x,v
is an open neighborhood of zero in R and
(x + 1
S
x,v
v) : S
x,v
T

is a process. The derivative at 0 R of this


process, if it exists, is called the directional derivative of at x and is
denoted by
(dd
v
)(x) :=
0
( (x + 1
S
x,v
v)) = lim
s0
(x + sv) (x)
s
. (65.13)
65. PARTIAL GRADIENTS, DIRECTIONAL DERIVATIVES 233
If this directional derivative exists for all x T, it denes a function
dd
v
: T 1

which is called the directional v-derivative of . The following result is


immediate from the General Chain Rule:
Proposition 5: If : T T

is dierentiable at x T then the


directional v-derivative of at x exists for all v 1 and is given by
(dd
v
)(x) = (
x
)v. (65.14)
Pitfall: The converse of this Proposition is false. In fact, a mapping
can have directional derivatives in all directions at all points in its domain
without being dierentiable. For example, the mapping : R
2
R dened
by
(s, t) :=
_
s
2
t
s
4
+t
2
if (s, t) ,= (0, 0)
0 if (s, t) = (0, 0)
_
has the directional derivatives
(dd
(,)
)(0, 0) =
_

2

if ,= 0
0 if = 0
_
at (0, 0). Using Prop.4 one easily shows that [
R
2
\{(0,0)}
is of class C
1
and
hence, by Prop.5, has directional derivatives in all directions at all (s, t)
R
2
. Nevertheless, since (s, s
2
) =
1
2
for all s R

, is not even continuous


at (0, 0), let alone dierentiable.
Proposition 6: Let b be a set basis of 1. Then : T T

is of class
C
1
if and only if the directional b-derivatives dd
b
: T 1

exist and are


continuous for all b b.
Proof: We choose q T and dene : R
b
c by
() := q +

bb

b
b for all R
b
. Since b is a basis, is a at iso-
morphism. If we dene T :=
<
(T) R
b
and := [
D
D
: T T

, we
see that the directional b-derivatives of correspond to the partial deriva-
tives of . The assertion follows from the Partial Gradient Theorem, applied
to the case when I := b and c
i
:= R for all i I.
Combining Prop.5 and Prop.6, we obtain
Proposition 7: Assume that o Sub1 spans 1. The mapping
: T T

is of class C
1
if and only if the directional derivatives dd
v
:
T 1

exist and are continuous for all v o.


234 CHAPTER 6. DIFFERENTIAL CALCULUS
Notes 65
(1) The notation
(i)
for the partial i-gradient is introduced here for the rst time. I
am not aware of any other notation in the literature.
(2) The notation ,
i
for the partial i-derivative of is the only one that occurs fre-
quently in the literature and is not objectionable. Frequently seen notations such
as /x
i
or
x
i
for ,
i
are poison to me because they contain dangling dummies
(see Part D of the Introduction).
(3) Some people use the notation
v
for the directional derivative dd
v
. I am intro-
ducing dd
v
because (by Def.1 of Sect.63)
v
means something else, namely the
gradient at v.
66 The General Product Rule
The following result shows that bilinear mappings (see Sect.24) are of class
C
1
and hence continuous. (They are not uniformly continuous except when
zero.)
Proposition 1: Let 1
1
, 1
2
and be linear spaces. Every bi-
linear mapping B : 1
1
1
2
is of class C
1
and its gradient
B : 1
1
1
2
Lin(1
1
1
2
, ) is the linear mapping given by
B(v
1
, v
2
) = (B

v
2
) (Bv
1
) (66.1)
for all v
1
1
1
, v
2
1
2
.
Proof: Since B(v
1
, ) = Bv
1
: 1
2
(see (24.2)) is linear for each
v
1
1, the partial 2-gradient of B exists and is given by
(2)
B(v
1
, v
2
) =
Bv
1
for all (v
1
, v
2
) 1
1
1
2
. It is evident that
(2)
B : 1
1
1
2

Lin(1
2
, ) is linear and hence continuous. A similar argument shows that

(1)
B is given by
(1)
B(v
1
, v
2
) = B

v
2
for all (v
1
, v
2
) 1
1
1
2
and hence
is also linear and continuous. By the Partial Gradient Theorem of Sect.65, it
follows that B is of class C
1
. The formula (66.1) is a consequence of (65.5).
Let T be an open set in a at space c with translation space 1. If
h
1
: T 1
1
, h
2
: T 1
2
and B Lin
2
(1
1
1
2
, ) we write
B(h
1
, h
2
) := B (h
1
, h
2
) : T
so that
B(h
1
, h
2
)(x) = B(h
1
(x), h
2
(x)) for all x T.
66. THE GENERAL PRODUCT RULE 235
The following theorem, which follows directly from Prop.1 and the Gen-
eral Chain Rule of Sect.63, is called Product Rule because the term prod-
uct is often used for the bilinear forms to which the theorem is applied.
General Product Rule: Let 1
1
, 1
2
and be linear spaces and let
B Lin
2
(1
1
1
2
, ) be given. Let T be an open subset of a at space and
let mappings h
1
: T 1
1
and h
2
: T 1
2
be given. If h
1
and h
2
are both
dierentiable at x T so is B(h
1
, h
2
), and we have

x
B(h
1
, h
2
) = (Bh
1
(x))
x
h
2
+ (B

h
2
(x))
x
h
1
. (66.2)
If h
1
and h
2
are both dierentiable [of class C
1
], so is B(h
1
, h
2
), and we
have
B(h
1
, h
2
) = (Bh
1
)h
2
+ (B

h
2
)h
1
, (66.3)
where the products on the right are understood as value-wise compositions.
We now apply the General Product Rule to special bilinear mappings,
namely to the ordinary product in R and to the four examples given in
Sect.24. In the following list of results T denotes an open subset of a at
space having 1 as its translation space; ,

and

denote linear spaces.


(I) If f : T R and g : T R are dierentiable [of class C
1
], so is the
value-wise product fg : T R and we have
(fg) = fg + gf. (66.4)
(II) If f : T R and h : T are dierentiable [of class C
1
], so is the
value-wise scalar multiple fh : T , and we have
(fh) = h f + fh. (66.5)
(III) If h : T and : T

are dierentiable [of class C


1
], so is the
function h : T R dened by (h)(x) := (x)h(x) for all x T,
and we have
(h) = (h)

+ ()

h. (66.6)
(IV) If F : T Lin(,

) and h : T are dierentiable at x T, so


is Fh (dened by (Fh)(y) := F(y)h(y) for all y T) and we have

x
(Fh)v = ((
x
F)v)h(x) + (F(x)
x
h)v (66.7)
for all v 1. If F and h are dierentiable [of class C
1
], so is Fh and
(66.7) holds for all x T, v 1.
236 CHAPTER 6. DIFFERENTIAL CALCULUS
(V) If F : T Lin(,

) and G : T Lin(

) are dierentiable
at x T, so is GF, dened by value-wise composition, and we have

x
(GF)v = ((
x
G)v)F(x) +G(x)((
x
F)v) (66.8)
for all v 1. If F and G are dierentiable [of class C
1
], so is GF, and
(66.8) holds for all x T, v 1.
If is an inner-product space, then

can be identied with , (see


Sect.41) and (III) reduces to the following result.
(VI) If h, k : T are dierentiable [of class C
1
], so is their value-wise
inner product h k and
(h k) = (h)

k + (k)

h. (66.9)
In the case when T reduces to an interval in R, (66.4) becomes
the Product Rule (08.32)
2
of elementary calculus. The following for-
mulas apply to dierentiable processes f, h, , F, and G with values in
R, ,

, Lin(,

), and Lin(

), respectively:
(fh)

= f

h + fh

, (66.10)
(h)

h +h

, (66.11)
(Fh)

= F

h +Fh

, (66.12)
(GF)

= G

F +GF

. (66.13)
If h and k are dierentiable processes with values in an inner product space,
then
(h k)

= h

k +h k

. (66.14)
Proposition 2: Let be an inner-product space and let R : T Lin
be given. If R is dierentiable at x T and if Rng R Orth then
R(x)

((
x
R)v) Skew for all v 1. (66.15)
Conversely, if R is dierentiable, if T is convex, if (66.15) holds for all
x T, and if R(q) Orth for some q T then Rng R Orth.
Proof: Assume that R is dierentiable at x T. Using (66.8) with the
choice F := R and G := R

we nd that

x
(R

R)v = ((
x
R

)v)R(x) +R

(x)((
x
R)v)
66. THE GENERAL PRODUCT RULE 237
holds for all v 1. By (63.13) we obtain

x
(R

R)v = W

+W with W := R

(x)((
x
R)v) (66.16)
for all v 1. Now, if Rng R Orth1 then R

R = 1
W
is constant (see
Prop.2 of Sect.43). Hence the left side of (66.16) is zero and W must be
skew for all v 1, i.e. (66.15) holds.
Conversely, if (66.15) holds for all x T, then, by (66.16)
x
(R

R) = 0
for all x T. Since T is convex, we can apply Prop.3 of Sect.64 to conclude
that R

R must be constant. Hence, if R(q) Orth, i.e. if (R

R)(q) =
1
W
, then R

R is the constant 1
W
, i.e. Rng R Orth.
Remark: In the second part of Prop.2, the condition that T be con-
vex can be replaced by the one that T be connected as explained in the
Remark after Prop.3 of Sect.64.
Corollary: Let I be an open interval and let R : I Lin be a
dierentiable process. Then Rng R Orth if and only if R(t) Orth
for some t I and Rng (R

) Skew.
The following result shows that quadratic forms (see Sect.27) are of class
C
1
and hence continuous.
Proposition 3: Let 1 be a linear space. Every quadratic form Q : 1
R is of class C
1
and its gradient Q : 1 1

is the linear mapping


Q = 2 Q , (66.17)
where Q is identied with the symmetric bilinear form Q Sym
2
(1
2
, R)

=
Sym(1, 1

) associated with Q (see (27.14)).


Proof: By (27.14), we have
Q(u) = Q (u, u) = (Q (1
V
, 1
V
))(u)
for all u 1. Hence, if we apply the General Product Rule with the choices
B := Q and h
1
:= h
2
:= 1
V
, we obtain Q = Q +

Q . Since Q is
symmetric, this reduces to (66.17).
Let 1 be a linear space. The lineonic nth power pow
n
: Lin1 Lin1
on the algebra Lin1 of lineons (see Sect.18) is dened by
pow
n
(L) := L
n
for all n N. (66.18)
The following result is a generalization of the familiar dierentiation rule
(
n
)

= n
n1
.
238 CHAPTER 6. DIFFERENTIAL CALCULUS
Proposition 4: For every n N the lineonic nth power pow
n
on
Lin1 dened by (66.18) is of class C
1
and, for each L Lin1, its gradi-
ent
L
pow
n
Lin(Lin1) at L Lin1 is given by
(
L
pow
n
)M =

kn
]
L
k1
ML
nk
for all M Lin1. (66.19)
Proof: For n = 0 the assertion is trivial. For n = 1, we have pow
1
=
1
LinV
and hence
L
pow
1
= 1
LinV
for all L Lin1, which is consistent with
(66.19). Also, since pow
1
is constant, pow
1
is of class C
1
. Assume, then,
that the assertion is valid for a given n R. Since pow
n+1
= pow
1
pow
n
holds in terms of value-wise composition, we can apply the result (V) above
to the case when F := pow
n
, G := pow
1
in order to conclude that pow
n+1
is of class C
1
and that
(
L
pow
n+1
)M = ((
L
pow
1
)M)pow
n
(L) + pow
1
(L)((
L
pow
n
)M)
for all M Lin1. Using (66.18) and (66.19) we get
(
L
pow
n+1
)M = ML
n
+L(

kn
]
L
k1
ML
nk
),
which shows that (66.19) remains valid when n is replaced by n + 1. The
desired result follows by induction.
If M commutes with L, then (66.19) reduces to
(
L
pow
n
)M = (nL
n1
)M.
Using Prop.4 and the form (63.17) of the Chain Rule, we obtain
Proposition 5: Let I be an interval and let F : I Lin1 be a process. If
F is dierentiable [of class C
1
], so is its value-wise nth power F
n
: I Lin1
and we have
(F
n
)

kn
]
F
k1
F

F
nk
for all n N. (66.20)
67 Divergence, Laplacian
In this section, T denotes an open subset of a at space c with translation
space 1, and denotes a linear space.
67. DIVERGENCE, LAPLACIAN 239
If h : T 1 is dierentiable at x T, we can form the trace (see
Sect.26) of the gradient
x
h Lin1. The result
div
x
h := tr(
x
h) (67.1)
is called the divergence of h at x. If h is dierentiable we can dene the
divergence div h : T R of h by
(div h)(x) := div
x
h for all x T. (67.2)
Using the product rule (66.5) and (26.3) we obtain
Proposition 1: Let h : T 1 and f : T R be dierentiable. Then
the divergence of fh : T 1 is given by
div (fh) = (f)h + fdiv h. (67.3)
Consider now a mapping H : T Lin(1

, ) that is dierentiable
at x T. For every

we can form the value-wise composite


H : T Lin(1

, R) = 1

= 1 (see Sect.22). Since H is dieren-
tiable at x for every

(see Prop.3 of Sect.63) we may consider the


mapping
( tr(
x
(H))) :

R.
It is clear that this mapping is linear and hence an element of

= .
Denition 1: Let H : T Lin(1

, ) be dierentiable at x T. Then
the divergence of H at x is dened to be the (unique) element div
x
H of
which satises
(div
x
H) = tr(
x
(H)) for all

. (67.4)
If H is dierentiable, then its divergence div H : T is dened by
(div H)(x) := div
x
H for all x T. (67.5)
In the case when := R, we also have

= R and Lin(1

, ) =
1

= 1. Thus, using (67.4) with := 1 R, we see that the denition
just given is consistent with (67.1) and (67.2).
If we replace h and L in Prop.3 of Sect.63 by H and
(K K) Lin(Lin(1

, ), 1),
respectively, we obtain

x
(H) =
x
H for all

, (67.6)
240 CHAPTER 6. DIFFERENTIAL CALCULUS
where the right side must be interpreted as the composite of

x
H Lin
2
(1 1

, ) with Lin(, R). This composite, being an ele-


ment of Lin
2
(11

, R), must be reinterpreted as an element of Lin1 via the


identications Lin
2
(1 1

, R)

= Lin(1, Lin(1

, R)) = Lin(1, 1

)

= Lin1
to make (67.6) meaningful. Using (67.6), (67.4), and (67.1) we see that
div H satises
(div
x
H) = div
x
(H) = tr(
x
H) (67.7)
for all

.
The following results generalize Prop.1.
Proposition 2: Let H : T Lin(1

, ) and : T

be dierentiable. Then the divergence of the value-wise composite


H : T Lin(1

, R)

= 1 is given by
div (H) = div H+ tr(H

), (67.8)
where value-wise evaluation and composition are understood.
Proof: Let x T and v 1 be given. Using (66.8) with the choices
G := and F := H we obtain

x
(H)v = ((
x
)v)H(x) +(x)((
x
H)v). (67.9)
Since (
x
)v

, (21.3) gives
((
x
)v)H(x) = (H(x)

x
)v.
On the other hand, we have
(x)((
x
H)v) = ((x)(
x
H))v
if we interpret
x
H on the right as an element of Lin
2
(1 1

, ). Hence,
since v 1 was arbitrary, (67.9) gives

x
(H) = (x)
x
H+H(x)

x
.
Taking the trace, using (67.1), and using (67.7) with := (x), we get
div
x
(H) = (x)div
x
H+ tr(H(x)

x
).
Since x T was arbitrary, the desired result (67.8) follows.
Proposition 3: Let h : T 1 and k : T be dierentiable. The
divergence of k h : T Lin(1

, ), dened by taking the value-wise


tensor product (see Sect.25), is then given by
div (k h) = (k)h + (div h)k. (67.10)
67. DIVERGENCE, LAPLACIAN 241
Proof: By (67.7) we have
div
x
(k h) = div
x
((k h)) = div
x
((k)h)
for all x T and all

. Hence, by Prop.1,
div (k h) = ((k))h + (k)div h = ((k)h +k div h)
holds for all

, and (67.10) follows.


Proposition 4: Let H : T Lin(1

, ) and f : T R be dieren-
tiable. The divergence of fH : T Lin(1

, ) is then given by
div (fH) = fdiv H+H(f). (67.11)
Proof: Let

be given. If we apply Prop.2 to the case when


:= f we obtain
div ((fH)) = (fdiv H) + tr(H

(f)). (67.12)
Using (f) = f and using (25.9), (26.3), and (22.3), we obtain
tr(H

(f)) = tr(H

( f)) = f(H

) = (Hf). Hence (67.12)


and (67.7) yield div (fH) = (fdiv H + Hf). Since

was
arbitrary, (67.11) follows.
From now on we assume that c has the structure of a Euclidean space,
so that 1 becomes an inner-product space and we can use the identication
1

= 1 (see Sect.41). Thus, if k : T is twice dierentiable, we can
consider the divergence of k : T Lin(1, )

= Lin(1

, ).
Denition 2: Let k : T be twice dierentiable. Then the Lapla-
cian of k is dened to be
k := div (k). (67.13)
If k = 0, then k is called a harmonic function.
Remark: In the case when c is not a genuine Euclidean space, but one
whose translation space has index 1 (see Sect.47), the term DAlembertian
or Wave-Operator and the symbol are often used instead of Laplacian and
.
The following result is a direct consequence of (66.5) and Props.3 and 4.
Proposition 5: Let k : T and f : T R be twice dierentiable.
The Laplacian of fk : T is then given by
(fk) = (f)k + fk + 2(k)(f). (67.14)
242 CHAPTER 6. DIFFERENTIAL CALCULUS
The following result follows directly from the form (63.19) of the Chain
Rule and Prop.3.
Proposition 6: Let I be an open interval and let f : T I and
g : I both be twice dierentiable. The Laplacian of g f : T is
then given by
(g f) = (f)
2
(g

f) + (f)(g

f). (67.15)
We now discuss an important application of Prop.6. We consider the
case when f : T I is given by
f(x) := (x q)
2
for all x T, (67.16)
where q c is given. We wish to solve the following problem: How can T, I
and g : I be chosen such that g f is harmonic?
It follows directly from (67.16) and Prop.3 of Sect.66, applied to Q := sq,
that
x
f = 2(xq) for all x T and that hence (f)
2
= 4f. Also, (f)
is the constant with value 21
V
and hence, by (26.9),
f = div (f) = 2tr1
V
= 2 dim1 = 2 dimc.
Substitution of these results into (67.15) gives
(g f) = 4f(g

f) + 2(dimc)g

f. (67.17)
Hence, g f is harmonic if and only if
(2g

+ ng

) f = 0 with n := dimc, (67.18)


where is the identity mapping of R, suitably adjusted (see Sect.08). Now,
if we choose I := Rng f, then (67.18) is satised if and only if g satises the
ordinary dierential equation 2g

+ ng

= 0. It follows that g must be of


the form
g =
_
a
(
n
2
1)
+b if n ,= 2
alog +b if n = 2
_
, (67.19)
where a, b , provided that I is an interval.
If c is a genuine Euclidean space, we may take T := c q and I :=
Rng f = P

. We then obtain the harmonic function


h =
_
ar
(n2)
+b if n ,= 2
a(log r) +b if n = 2
_
, (67.20)
where a, b and where r : c q P

is dened by r(x) := [x q[.


68. LOCAL INVERSION, IMPLICIT MAPPINGS 243
If the Euclidean space c is not genuine and 1 is double-signed we have
two possibilities. For all n N

we may take T := x c [ (x
q)
2
> 0 and I := P

. If n is even and n ,= 2, we may instead take


T := x c [ (x q)
2
< 0 and I := P

.
Notes 67
(1) In some of the literature on Vector Analysis, the notation h instead of div h
is used for the divergence of a vector eld h, and the notation
2
f instead of f
for the Laplacian of a function f. These notations come from a formalistic and
erroneous understanding of the meaning of the symbol and should be avoided.
68 Local Inversion, Implicit Mappings
In this section, T and T

denote open subsets of at spaces c and c

with
translation spaces 1 and 1

, respectively.
To say that a mapping : T T

is dierentiable at x T means,
roughly, that may be approximated, near x, by a at mapping , the
tangent of at x. One might expect that if the tangent is invertible, then
itself is, in some sense, locally invertible near x. To decide whether
this expectation is justied, we must rst give a precise meaning to locally
invertible near x.
Denition: Let a mapping : T T

be given. We say that is a


local inverse of if = ([
N

N
)

for suitable open subsets ^ = Cod =


Rng and ^

= Dom of T and T

, respectively. We say that is a local


inverse of near x T if x Rng . We say that is locally invertible
near x T if it has some local inverse near x.
To say that is a local inverse of means that
[
N

N
= 1
N
and [
N

N
= 1
N
(68.1)
for suitable open subsets ^ = Cod and ^

= Dom of T and T

, respec-
tively.
If
1
and
2
are local inverses of and / := (Rng
1
) (Rng
2
), then

1
and
2
must agree on
>
(/) =
<
1
(/) =
<
2
(/), i.e.

1
[
M

>
(M)
=
2
[
M

>
(M)
if / := (Rng
1
) (Rng
2
). (68.2)
Pitfall: A mapping : T T

may be locally invertible near every


point in T without being invertible, even if it is surjective. In fact, if is a
local inverse of ,
<
(Dom) need not be included in Rng .
244 CHAPTER 6. DIFFERENTIAL CALCULUS
For example, the function f : ]1, 1[

]0, 1[ dened by f(s) := [s[ for


all s ] 1, 1[

is easily seen to be locally invertible, surjective, and even


continuous. The identity function 1
]0,1[
of ]0, 1[ is a local inverse of f, and
we have
f
<
(Dom1
]0,1[
) = f
<
( ]0, 1[ ) = ]1, 1[

, ]0, 1[= Rng 1


[0,1]
.
The function h : ]0, 1[ ] 1, 0[ dened by h(s) = s for all s ]0, 1[ is
another local inverse of f.
If dimc 2, one can even give counter-examples of mappings
: T T

of the type described above for which T is convex (see Sect.74).


The following two results are recorded for later use.
Proposition 1: Assume that : T T

is dierentiable at x T and
locally invertible near x. Then, if some local inverse near x is dierentiable
at (x), so are all others and all have the same gradient, namely (
x
)
1
.
Proof: We choose a local inverse
1
of near x such that
1
is dier-
entiable at (x). Applying the Chain Rule to (68.1) gives
(
(x)

1
)(
x
) = 1
V
and (
x
)(
(x)

1
) = 1
V
,
which shows that
x
is invertible and
(x)

1
= (
x
)
1
.
Let now a local inverse
2
of near x be given. Let / be dened as in
(68.2). Since /is open and since
1
, being dierentiable at x, is continuous
at x, it follows that
>
(/) =
<
1
(/) is a neighborhood of (x) in c

. By
(68.2),
2
agrees with
1
on the neighborhood
>
(/) of (x). Hence
2
must also be dierentiable at (x), and
(x)

2
=
(x)

1
= (
x
)
1
.
Proposition 2: Assume that : T T

is continuous and that is a


local inverse of near x.
(i) If /

is an open neighborhood of (x) and /

Dom, then

>
(/

) =
<
(/

) Rng is an open neighborhood of x; hence the


adjustment [

>
(M

)
M

is again a local inverse of near x.


(ii) Let ( be an open subset of c with x (. If is continuous at (x)
then there is an open neighborhood / of x with / Rng ( such
that
<
(/) =
>
(/) is open; hence the adjustment [
M

<
(M)
is again
a local inverse of near x.
Proof: Part (i) is an immediate consequence of Prop.3 of Sect.56. To
prove (ii), we observe rst that
<
(Rng () must be a neighborhood of
(x) =

(x). We can choose an open subset /

of
<
(Rng () with
68. LOCAL INVERSION, IMPLICIT MAPPINGS 245
(x) /

(see Sect.53). Application of (i) gives the desired result with


/:=
>
(/

).
Remark: It is in fact true that if : T T

is continuous, then every


local inverse of is also continuous, but the proof is highly non-trivial and
goes beyond the scope of this presentation. Thus, in Part (ii) of Prop.2, the
requirement that be continuous at (x) is, in fact, redundant.
Pitfall: The expectation mentioned in the beginning is not justied. A
continuous mapping : T T

can have an invertible tangent at x T


without being locally invertible near x. An example is the function f : R
R dened by
f(t) :=
_
2t
2
sin
_
1
t
_
+ t if t R

0 if t = 0
_
. (68.3)
It is dierentiable and hence continuous. Since f

(0) = 1, the tangent to f at


0 is invertible. However, f is not monotone, let alone locally invertible, near
0, because one can nd numbers s arbitrarly close to 0 such that f

(s) < 0.
Let I be an open interval and let f : I R be a function of class C
1
.
If the tangent to f at a given t I is invertible, i.e. if f

(t) ,= 0, one can


easily prove, not only that f is locally invertible near t, but also that it has
a local inverse near t that is of class C
1
. This result generalizes to mappings
: T T

, but the proof is far from easy.


Local Inversion Theorem: Let T and T

be open subsets of at spaces,


let : T T

be of class C
1
and let x T be such that the tangent to at
x is invertible. Then is locally invertible near x and every local inverse of
near x is dierentiable at (x).
Moreover, there exists a local inverse of near x that is of class C
1
and satises

y
= (
(y)
)
1
for all y Dom. (68.4)
Before proceeding with the proof, we state two important results that
are closely related to the Local Inversion Theorem.
Implicit Mapping Theorem: Let c, c

, c

be at spaces, let / be an
open subset of c c

and let : / c

be a mapping of class C
1
. Let
(x
o
, y
o
) /, z
o
:= (x
o
, y
o
), and assume that
y
o
(x
o
, ) is invertible.
Then there exist an open neighborhood T of x
o
and a mapping : T c

,
dierentiable at x
o
, such that (x
o
) = y
o
and Gr() /, and
(x, (x)) = z
o
for all x T. (68.5)
Moreover, T and can be chosen such that is of class C
1
and
(x) = (
(2)
(x, (x)))
1

(1)
(x, (x)) for all x T. (68.6)
246 CHAPTER 6. DIFFERENTIAL CALCULUS
Remark: The mapping is determined implicitly by an equation in
this sense: for any given x T, (x) is a solution of the equation
? y c

, (x, y) = z
o
. (68.7)
In fact, one can nd a neighborhood / of (x
o
, y
o
) in c c

such that (x)


is the only solution of
? y /
(x,)
, (x, y) = z
o
, (68.8)
where /
(x,)
:= y c

[ (x, y) /.
Dierentiation Theorem for Inversion Mappings: Let 1 and 1

be linear spaces of equal dimension. Then the set Lis(1, 1

) of all linear
isomorphorisms from 1 onto 1

is a (non-empty) open subset of Lin(1, 1

),
the inversion mapping inv : Lis(1, 1

) Lin(1

, 1) dened by inv(L) :=
L
1
is of class C
1
, and its gradient is given by
(
L
inv)M = L
1
ML
1
for all M Lin(1, 1

). (68.9)
The proof of these three theorems will be given in stages, which will be
designated as lemmas. After a basic preliminary lemma, we will prove a
weak version of the rst theorem and then derive from it weak versions of
the other two. The weak version of the last theorem will then be used, like
a bootstrap, to prove the nal version of the rst theorem; the nal versions
of the other two theorems will follow.
Lemma 1: Let 1 be a linear space, let be a norm on 1, and put
B := Ce(). Let f : B 1 be a mapping with f (0) = 0 such that f 1
BV
is constricted with
:= str(f 1
BV
; , ) < 1. (68.10)
Then the adjustment f [
(1)B
(see Sect.03) of f has an inverse and this
inverse is conned near 0.
Proof: We will show that for every w (1 )B, the equation
? z B, f (z) = w (68.11)
has a unique solution.
To prove uniqueness, suppose that z
1
, z
2
B are solutions of (68.11) for
a given w 1. We then have
(f 1
BV
)(z
1
) (f 1
BV
)(z
2
) = z
2
z
1
68. LOCAL INVERSION, IMPLICIT MAPPINGS 247
and hence, by (68.10), (z
2
z
1
) (z
2
z
1
), which amounts to
(1 )(z
2
z
1
) 0. This is compatible with < 1 only if (z
1
z
2
) = 0
and hence z
1
= z
2
.
To prove existence, we assume that w (1 )B is given and we put
:=
(w)
1
, so that [0, 1[. For every v B, we have (v) and
hence, by (68.10)
((f 1
BV
)(v)) (v) ,
which implies that
(w(f (v) v)) (w) + ((f 1
BV
)(v))
(1 ) + = .
Therefore, it makes sense to dene h
w
: B B by
h
w
(v) := w(f (v) v) for all v B. (68.12)
Since h
w
is the dierence of a constant with value w and a restriction of
f 1
BV
, it is constricted and has an absolute striction no greater than
. Therefore, it is a contraction, and since its domain B is closed, the
Contraction Fixed Point Theorem states that h
w
has a xed point z B.
It is evident from (68.12) that a xed point of h
w
is a solution of (68.11).
We proved that (68.11) has a unique solution z B and that this solution
satises
(z) :=
(w)
1
. (68.13)
If we dene g : (1 )B f
<
((1 )B) in such a way that g(w) is this
solution of (68.11), then g is the inverse we are seeking. It follows from
(68.13) that (g(w))
1
1
(w) for all w (1 )B. In view of part (ii)
of Prop.2 of Sect.62, this proves that g is conned.
Lemma 2: The assertion of the Local Inversion Theorem, except possi-
bly the statement starting with Moreover, is valid.
Proof: Let : c c

be the (invertible) tangent to at x. Since T


is open, we may choose a norming cell B such that x + B T. We dene
f : B 1 by
f := (

[
D
[
x+B
(x +1
V
)[
x+B
B
) x. (68.14)
Since is of class C
1
, so is f . Clearly, we have
f (0) = 0,
0
f = 1
V
. (68.15)
248 CHAPTER 6. DIFFERENTIAL CALCULUS
Put := no
B
, so that B = Ce(). Choose ]0, 1[ ( :=
1
2
would do). If we
apply Prop.4 of Sect.64 to the case when there is replaced by f and k by
0, we see that there is ]0, 1] such that
:= str(f [
B
1
BV
; , ) . (68.16)
Now, it is clear from (64.2) that a striction relative to and

remains
unchanged if and

are multiplied by the same strictly positive factor.


Hence, (68.16) remains valid if there is replaced by
1

. Since Ce(
1

) =
Ce() = B (see Prop.6 of Sect.51), it follows from (68.16) that Lemma
1 can be applied when , B, and f there are replaced by
1

, B and f [
B
,
respectively. We infer that if we put
/
o
:= (1 )B, ^
o
:= f
<
(/
o
),
then f [
M
o
N
o
has an inverse g : /
o
^
o
that is conned near zero. Note
that /
o
and ^
o
are both open neighborhoods of zero, the latter because it
is the pre-image of an open set under a continuous mapping. We evidently
have
g[
V
1
M
o
V
= (1
N
o
V
f [
N
o
) g.
In view of (68.15), 1
N
o
V
f [
N
o
is small near 0 1. Since g is conned near
0, it follows by Prop.3 of Sect.62 that g[
V
1
M
o
V
is small near zero. By the
Characterization of Gradients of Sect.63, we conclude that g is dierentiable
at 0 with
0
g = 1
V
.
We now dene
^ := x +^
o
, ^

:= (x) + ()
>
(/
o
).
These are open neighborhoods of x and (x), respectively. A simple calcu-
lation, based on (68.14) and g = (f [
M
o
N
o
)

, shows that
:= (x + 1
V
)[
N
N
o
g (

x)[
M
o
N

(68.17)
is the inverse of [
N

N
. Using the Chain Rule and the fact that g is dif-
ferentiable at 0, we conclude from (68.17) that is dierentiable at (x).
Lemma 3: The assertion of the Implicit Mapping Theorem, except pos-
sibly the statement starting with Moreover, is valid.
Proof: We dene : / c c

by
(x, y) := (x, (x, y)) for all (x, y) /. (68.18)
68. LOCAL INVERSION, IMPLICIT MAPPINGS 249
Of course, is of class C
1
. Using Prop.2 of Sect.63 and (65.4), we see that
(
(x
o
,y
o
)
)(u, v) = (u, (
x
o
(, y
o
))u + (
y
o
(x
o
, ))v)
for all (u, v) 1 1

. Since
y
o
(x
o
, ) is invertible, so is
(x
o
,y
o
)
. In-
deed,we have
(
(x
o
,y
o
)
)
1
(u, w) = (u, (
y
o
(x
o
, ))
1
(w(
x
o
(, y
o
))u))
for all (u, w) 11

. Lemma 2 applies and we conclude that has a local


inverse with (x
o
, y
o
) Rng that is dierentiable at (x
o
, y
o
) = (x
o
, z
o
).
In view of Prop.2, (i), we may assume that the domain of is of the form
TT

, where T and T

are open neighborhoods of x


o
and z
o
, respectively.
Let : T T

c and

: T T

be the value-wise terms of , so


that
(x, z) = ((x, z),

(x, z)) for all x T, z T

.
Then, by (68.18),
((x, y)) = ((x, z), ((x, z),

(x, z))) = (x, z)


and hence
(x, z) = x, (x,

(x, z)) = z for all x T, z T

. (68.19)
Since (x
o
, z
o
) = ((x
o
, z
o
),

(x
o
, z
o
)) = (x
o
, y
o
), we have

(x
o
, z
o
) = y
o
.
Therefore, if we dene : T c

by (x) :=

(x, z
o
), we see that (x
o
) =
y
o
and (68.5) are satised. The dierentiability of at x
o
follows from the
dierentiablity of at (x
o
, z
o
).
If we dene / := Rng , it is immediate that (x) is the only solution
of (68.8).
Lemma 4: Let 1 and 1

be linear spaces of equal dimension. Then


Lis(1, 1

) is an open subset of Lin(1, 1

) and the inversion mapping


inv : Lis(1, 1

) Lin(1

, 1) dened by inv(L) := L
1
is dierentiable.
Proof: We apply Lemma 3 with c replaced by Lin(1, 1

), c

by
Lin(1

, 1), and c

by Lin1. For of Lemma 3 we take the mapping


(L, M) ML from Lin(1, 1

) Lin(1

, 1) to Lin1. Being bilinear,


this mapping is of class C
1
(see Prop.1 of Sect.66). Its partial 2-gradient
at (L, M) does not depend on M. It is the right-multiplication Ri
L

Lin(Lin(1

, 1), Lin1) dened by Ri


L
(K) := KL for all K Lin(1, 1

).
It is clear that Ri
L
is invertible if and only if L is invertible. (We have
(Ri
L
)
1
= Ri
L
1 if this is the case.) Thus, given L
o
Lis(1, 1

), we can
250 CHAPTER 6. DIFFERENTIAL CALCULUS
apply Lemma 3 with L
o
, L
1
o
, and 1
V
playing the roles of x
o
, y
o
, and z
o
,
respectively. We conclude that there is an open neighborhood T of L
o
in
Lin(1, 1

) such that the equation


? M Lin(1

, 1), LM = 1
V
has a solution for every L T. Since this solution can only be L
1
= inv(L),
it follows that the role of the mapping of Lemma 3 is played by inv[
D
.
Therefore, we must have T Lis(1, 1

) and inv must be dierentiable at


L
o
. Since L
o
Lis(1, 1

) was arbitrary, it follows that Lis(1, 1

) is open
and that inv is dierentiable.
Completion of Proofs: We assume that hypotheses of the Local Inver-
sion Theorem are satised. The assumption that has an invertible tangent
at x T is equivalent to
x
Lis(1, 1

). Since is continuous and


since Lis(1, 1

) is an open subset of Lin(1, 1

) by Lemma 4, it follows that


( := ()
<
(Lis(1, 1

)) is an open subset of c. By Lemma 2, we can choose


a local inverse of near x which is dierentiable and hence continuous at
(x). By Prop.2, (ii), we can adjust this local inverse such that its range
is included in (. Let be the local inverse so adjusted. Then has an
invertible tangent at every z Rng . Let z Rng be given. Applying
Lemma 2 to z, we see that must have a local inverse near z that is dier-
entiable at (z). By Prop.1, it follows that must also be dierentiable at
(z). Since z Rng was arbitrary, it follows that is dierentiable. The
formula (68.4) follows from Prop.1. Since inv : Lis(1, 1

) Lin(1

, 1) is
dierentiable and hence continuous by Lemma 4, since is continuous by
assumption, and since is continuous because it is dierentiable, it follows
that the composite inv [
Lis(V,V

)
Rng
is continuous. By (68.4), this com-
posite is none other than , and hence the proof of the Local Inversion
Theorem is complete.
Assume, now, that the hypotheses of the Implicit Mapping Theorem
are satised. By the Local Inversion Theorem, we can then choose a local
inverse of , as dened by (68.18), that is of class C
1
. It then follows that
the function : T c

dened in the proof of Lemma 3 is also of class


C
1
. The formula (68.6) is obtained easily by dierentiating the constant
x (x, (x)), using Prop.1 of Sect.65 and the Chain Rule.
To prove the Dierentiation Theorem for Inversion Mappings, one applies
the Implicit Mapping Theorem in the same way as Lemma 3 was applied to
obtain Lemma 4.
Pitfall: Let I and I

be intervals and let f : I I

be of class C
1
and
surjective. If f has an invertible tangent at every t I, then f is not only
69. EXTREME VALUES, CONSTRAINTS 251
locally invertible near every t I but (globally) invertible, as is easily seen.
This conclusion does not generalize to the case when I and I

are replaced by
connected open subsets of at spaces of higher dimension. The curvilinear
coordinate systems discussed in Sect.74 give rise to counterexamples. One
can even give counterexamples in which I and I

are replaced by convex


open subsets of at spaces.
Notes 68
(1) The Local Inversion Theorem is often called the Inverse Function Theorem. Our
term is more descriptive.
(2) The Implicit Mapping Theorem is most often called the Implicit Function Theo-
rem.
(3) If the sets D and D

of the Local Inversion Theorem are both subsets of R


n
for
some n N, then
x
can be identied with an n-by-n matrix (see (65.12)). This
matrix is often called the Jacobian matrix and its determinant the Jacobian
of at x. Some textbook authors replace the condition that
x
be invertible by
the condition that the Jacobian be non-zero. I believe it is a red herring to drag
in determinants here. The Local Inversion Theorem has an extension to innite-
dimensional spaces, where it makes no sense to talk about determinants.
69 Extreme Values, Constraints
In this section c and T denote at spaces with translation spaces 1 and ,
respectively, and T denotes a subset of c.
The following denition is local variant of the denition of an ex-
tremum given in Sect.08.
Denition 1: We say that a function f : T R attains a local
maximum [local minimum] at x T if there is ^ Nhd
x
(T) (see
(56.1)) such that f[
N
attains a maximum [minimum] at x. We say that
f attains a local extremum at x if it attains a local maximum or local
minimum at x.
The following result is a direct generalization of the Extremum Theorem
of elementary calculus (see Sect.08).
Extremum Theorem: If f : T R attains a local extremum at
x Int T and if f is dierentiable at x, then
x
f = 0.
Proof: Let v 1 be given. It is clear that S
x,v
:= s R [ x + sv
Int T is an open subset of R and the function (s f(x + sv)) : S
x,v
R
attains a local extremum at 0 R and is dierentiable at 0 R. Hence, by
252 CHAPTER 6. DIFFERENTIAL CALCULUS
the Extremum Theorem of elementary calculus, and by (65.13) and (65.14),
we have
0 =
0
(s f(x + sv)) = (dd
v
f)(x) = (
x
f)v.
Since v 1 was arbitrary, the desired result
x
f = 0 follows.
From now on we assume that T is an open subset of c.
The next result deals with the case when a restriction of f : T R
to a suitable subset of T, but not necessarily f itself, has an extremum at
a given point x T. This subset of T is assumed by the set on which a
given mapping : T T has a constant value, i.e. the set
<
((x)).
The assertion that f[

<
({(x)})
has an extremum at x is often expressed by
saying that f attains an extremum at x subject to the constraint that be
constant.
Constrained-Extremum Theorem: Assume that f : T R and
: T T are both of class C
1
and that
x
Lin(1, ) is surjective for
a given x T. If f[

<
( (x)})
attains a local extremum at x, then

x
f Rng (
x
)

. (69.1)
Proof: We put | := Null
x
and choose a supplement Z of | in 1.
Every neighborhood of 0 in 1 includes a set of the form ^ +/, where ^
is an open neighborhood of 0 in | and / is an open neighborhood of 0 in
Z. Since T is open, we may select ^ and / such that x + ^ + / T.
We now dene : ^ / T by
(u, z) := (x +u +z) for all u ^, z /. (69.2)
It is clear that is of class C
1
and we have

0
(0, ) =
x
[
Z
. (69.3)
Since
x
is surjective, it follows from Prop.5 of Sect.13 that
x
[
Z
is
invertible. Hence we can apply the Implicit Mapping Theorem of Sect.68 to
the case when / there is replaced by ^ /, the points x
o
, y
o
by 0, and z
o
by (x). We obtain an open neighborhood ( of 0 in | with ( ^ and a
mapping h : ( Z, of class C
1
, such that h(0) = 0 and
(u, h(u)) = (x) for all u (. (69.4)
Since
(1)
(0, 0) =
0
(, 0) =
x
[
U
= 0 by the denition
| := Null
x
, the formula (68.6) yields in our case that
0
h = 0, i.e.
we have
h(0) = 0,
0
h = 0. (69.5)
69. EXTREME VALUES, CONSTRAINTS 253
It follows from (69.2) and (69.4) that x + u + h(u)
<
((x)) for all
u ( and hence that
(u f(x +u +h(u))) : ( R
has a local extremum at 0. By the Extremum Theorem and the Chain Rule,
it follows that
0 =
0
(u f(x +u +h(u))) =
x
f(1
UV
+
0
h[
V
),
and therefore, by (69.5), that
x
f[
U
= (
x
f)1
UV
= 0, which means that

x
f |

= (Null
x
)

. The desired result (69.1) is obtained by applying


(22.9).
In the case when T := R, the Constrained-Extremum Theorem reduces
to
Corollary 1: Assume that f, g : T R are both of class C
1
and that

x
g ,= 0 for a given x T. If f[
g
<
({g(x)})
attains a local extremum at x,
then there is R such that

x
f =
x
g. (69.6)
In the case when T := R
I
, we can use (23.19) and obtain
Corollary 2: Assume that f : T R and all terms g
i
: T R
in a nite family g := (g
i
[ i I) : T R
I
are of class C
1
and that
(
x
g
i
[ i I) is linearly independent for a given x T. If f[
g
<
({g(x)})
attains a local extremum at x, then there is R
I
such that

x
f =

iI

x
g
i
. (69.7)
Remark 1: If c is two-dimensional then Cor.1 can be given a geo-
metrical interpretation as follows: The sets L
g
:= g
<
(g(x)) and L
f
:=
f
<
(f(x)) are the level-lines through x of f and g, respectively (see
Figure).
254 CHAPTER 6. DIFFERENTIAL CALCULUS
If these lines cross at x, then the value f(y) of f at y L
g
strictly
increases or decreases as y moves along L
g
from one side of the line L
f
to
the other and hence f[
L
g
cannot have an extremum at x. Hence, if f[
L
g
has an extremum at x, the level-lines must be tangent. The assertion (69.6)
expresses this tangency. The condition
x
g ,= 0 insures that the level-line
L
g
does not degenerate to a point.
Notes 69
(1) The number of (69.6) or the terms
i
occuring in (69.7) are often called Lagrange
multipliers.
610 Integral Representations
Let I SubR be some genuine interval and let h : I 1 be a continuous
process with values in a given linear space 1. For every 1

the composite
h : I R is then continuous. Given a, b I one can therefore form the
integral
_
b
a
h in the sense of elementary integral calculus (see Sect.08). It
is clear that the mapping
(
_
b
a
h) : 1

R
is linear and hence an element of 1

= 1.
Denition 1: Let h : I 1 be a continuous process with values in the
linear space 1. Given any a, b I the integral of h from a to b is dened
to be the unique element
_
b
a
h of 1 which satises

_
b
a
h =
_
b
a
h for all 1

. (610.1)
Proposition 1: Let h : I 1 be a continuous process and let L be a
linear mapping from 1 to a given linear space . Then Lh : I is a
continuous process and
L
_
b
a
h =
_
b
a
Lh for all a, b I. (610.2)
Proof: Let

and a, b I be given. Using Def.1 twice, we obtain


(L
_
b
a
h) = (L)
_
b
a
h =
_
b
a
(L)h =
_
b
a
(Lh) =
_
b
a
Lh.
610. INTEGRAL REPRESENTATIONS 255
Since

was arbitrary, (610.2) follows.


Proposition 2: Let be a norm on the linear space 1 and let h : I 1
be a continuous process. Then h : I R is continuous and, for every
a, b I, we have
(
_
b
a
h)

_
b
a
h

. (610.3)
Proof: The continuity of h follows from the continuity of (Prop.7
of Sect.56) and the Composition Theorem for Continuity of Sect.56.
It follows from (52.14) that
[h[(t) = [h(t)[

()(h(t)) =

()( h)(t)
holds for all 1

and all t I, and hence that


[h[ h for all Ce(

).
Using this result and (610.1), (08.40), and (08.42), we obtain

_
b
a
h

_
b
a
h

_
a
b
[h[

_
b
a
h

for all Ce(

) and all a, b I. The desired result now follows from the


Norm-Duality Theorem, i.e. (52.15).
Most of the rules of elementary integral calculus extend directly to inte-
grals of processes with values in a linear space. For example, if h : I 1 is
continuous and if a, b, c I, then
_
b
a
h =
_
c
a
h +
_
b
c
h. (610.4)
The following result is another example.
Fundamental Theorem of Calculus: Let c be a at space with trans-
lation space 1. If h : I 1 is a continuous process and if x c and a I
are given, then the process p : I c dened by
p(t) := x +
_
t
a
h for all t I (610.5)
is of class C
1
and p

= h.
Proof: Let 1

be given. By (610.5) and (610.1) we have


(p(t) x) =
_
t
a
h =
_
t
a
h for all t I.
256 CHAPTER 6. DIFFERENTIAL CALCULUS
By Prop.1 of Sect.61 p is dierentiable and we have ((p x))

= p

.
Hence, the elementary Fundamental Theorem of Calculus (see Sect.08) gives
p

= h. Since 1

was arbitrary, we conclude that p

= h, which is
continuous by assumption.
In the remainder of this section, we assume that the following items are
given: (i) a at space c with translation space 1, (ii) An open subset T of
c, (iii) An open subset T of R c such that, for each x T,
T
(,x)
:= s R [ (s, x) T
is a non-empty open inteval, (iv) a linear space .
Consider now a mapping h : T such that h(, x) : T
(,x)
is
continuous for all x T. Given a, b R such that a, b T
(,x)
for all x T,
we can then dene k : T by
k(x) :=
_
b
a
h(, x) for all x T. (610.6)
We say that k is dened by an integral representation.
Proposition 3: If h : T is continuous, so is the mapping
k : T dened by the integral representation (610.6).
Proof: Let x T be given. Since T is open, we can choose a norm
on 1 such that x + Ce() T. We may assume, without loss of generality,
that a < b. Then [a, b] is a compact interval and hence, in view of Prop.4
of Sect.58, [a, b] (x + Ce()) is a compact subset of T. By the Uniform
Continuity Theorem of Sect. 58, it follows that the restriction of h to [a, b]
(x +Ce()) is uniformly continuous. Let be a norm on and let P

be given. By Prop.4 of Sect.56, we can determine ]0, 1 ] such that


(h(s, y) h(s, x)) <

b a
for all s [a, b] and all y x + Ce(). Hence, by (610.6) and Prop.2, we
have
(k(y) k(x)) =
__
b
a
(h(, y) h(, x))
_

_
b
a
(h(, y) h(, x))
(b a)

b a
=
whenever (y x) < . Since P

was arbitrary, the continuity of k


at x follows by Prop.1 of Sect.56. Since x T was arbitrary, the assertion
follows.
610. INTEGRAL REPRESENTATIONS 257
The following is a stronger version of Prop.3.
Proposition 4: Let T R R c be dened by
T := (a, b, x) [ x T, (a, x) T, (b, x) T. (610.7)
Then, if h : T is continuous, so is the mapping
_
(a, b, x)
_
b
a
h(, x)
_
: T .
Proof: Choose a norm on . Let (a, b, x) T be given. Since
h is continuous at (a, x) and at (b, x) and since T is open, we can nd
^ Nhd
x
(T) and P

such that
^ := ((a+], [ ) (b+], [ )) ^
is a subset of T and such that the restriction of h to ^ is bounded by
some P

. By Prop.2, it follows that


(
_
a
s
h(, y) +
_
t
b
h(, y)) ( [s a[ +[t b[ ) (610.8)
for all y ^, all s a + ], [, and all t b + ], ].
Now let P

be given. If we put := min,



4
, it follows from
(610.8) that

__
a
s
h(, y) +
_
t
b
h(, y)
_
<

2
(610.9)
for all y ^, all s a + ], ], and all t b + ], ]. On the other hand,
by Prop.3, we can determine / Nhd
x
(T) such that

__
b
a
h(, y)
_
b
a
h(, x)
_
<

2
(610.10)
for all y /. By (610.4) we have
_
t
s
h(, y)
_
b
a
h(, x) =
_
b
a
h(, y)
_
b
a
h(, x)
+
_
a
s
h(, y) +
_
t
b
h(, y).
Hence it follows from (610.9) and (610.10) that

__
t
s
h(, y)
_
b
a
h(, x)
_
<
258 CHAPTER 6. DIFFERENTIAL CALCULUS
for all y /^ Nhd
x
(T), all s a+ ], ], and all t b+ ], [. Since
P

was arbitrary, it follows from Prop.1 of Sect.56 that the mapping


under consideration is continuous at (a, b, x).
Dierentiation Theorem for Integral Representations: Assume
that h : T satises the following conditions:
(i) h(, x) is continuous for every x T.
(ii) h(s, ) is dierentiable at x for all (s, x) T and the partial 2-gradient

(2)
h : T Lin(1, ) is continuous.
Then the mapping k : T dened by the integral representation
(610.6) is of class C
1
and its gradient is given by the integral representation

x
k =
_
b
a

(2)
h(, x) for all x T. (610.11)
Roughly, this theorem states that if h satises the conditions (i) and (ii),
one can dierentiate (610.6) with respect to x by dierentiating under the
integral sign.
Proof: Let x T be given. As in the proof of Prop.3, we can choose
a norm on 1 such that x + Ce() T, and we may assume that a < b.
Then [a, b]
_
x + Ce()
_
is a compact subset of T. By the Uniform Conti-
nuity Theorem of Sect.58, the restriction of
(2)
h to [a, b]
_
x + Ce()
_
is
uniformly continuous. Hence, if a norm on and P

are given, we
can determine ]0, 1 ] such that
|
(2)
h(s, x +u)
(2)
h(s, x)|
,
<

b a
(610.12)
holds for all s [a, b] and u Ce().
We now dene n :
_
T (0, x)
_
by
n(s, u) := h(s, x +u) h(s, x)
_

(2)
h(s, x)
_
u. (610.13)
It is clear that
(2)
n exists and is given by

(2)
n(s, u) =
(2)
h(s, x +u)
(2)
h(s, x).
By (610.12), we hence have
|
(2)
n(s, u)|
,
<

b a
610. INTEGRAL REPRESENTATIONS 259
for all s [a, b] and u Ce(). By the Striction Estimate for Dierentiable
Mapping of Sect.64, it follows that n(s, )[
Ce()
is constricted for all s [a, b]
and that
str(n(s, )[
Ce()
; , )

b a
for all s [a, b].
Since n(s, 0) = 0 for all s [a, b], the denition (64.1) shows that
(n(s, u))

b a
(u) for all s [a, b]
and all u Ce(). Using Prop.2, we conclude that

__
b
a
n(, u)
_

_
b
a
(n(, u)) (u)
whenever u Ce(). Since P

was arbitrary, it follows that the map-


ping
_
u
_
b
a
n(, u)
_
is small near 0 1 (see Sect.62). Now, integrating
(610.13) with respect to s [a, b] and observing the representation (610.6)
of k, we obtain
_
b
a
n(, u) = k(x +u) k(u)
__
b
a

(2)
h(, x)
_
u
for all u Tx. Therefore, by the Characterization of Gradients of Sect.63,
k is dierentiable at x and its gradient is given by (610.11). The continuity
of k follows from Prop.3.
The following corollary deals with generalizations of integral representa-
tions of the type (610.6).
Corollary: Assume that h : T satises the conditions (i) and
(ii) of the Theorem. Assume, further, that f : T R and g : T R are
dierentiable and satisfy
f(x), g(x) T
(,x)
for all x T.
Then k : T , dened by
k(x) :=
_
g(x)
f(x)
h(, x) for all x T, (610.14)
is of class C
1
and its gradient is given by

x
k = h(g(x), x)
x
g h(f(x), x)
x
f (610.15)
+
_
g(x)
f(x)

(2)
h(, x) for all x T.
260 CHAPTER 6. DIFFERENTIAL CALCULUS
Proof: Consider the set T R R c dened by (610.7), so that
a, b T
(,x)
for all (a, b, x) T. Since T
(,x)
is assumed to be an interval
for each x T, we can dene m : T by
m(a, b, x) :=
_
b
a
h(, x). (610.16)
By the Theorem,
(3)
m : T Lin(1, ) exists and is given by

(3)
m(a, b, x) =
_
b
a

(2)
h(, x). (610.17)
It follows from Prop.4 that
(3)
m is continuous. By the Fundamental The-
orem of Calculus and (610.16), the partial derivatives m,
1
and m,
2
exist,
are continuous, and are given by
m,
1
(a, b, x) = h(a, x), m,
2
(a, b, x) = h(b, x). (610.18)
By the Partial Gradient Theorem of Sect.65, it follows that m is of class C
1
.
Since
(1)
m = m,
1
and
(2)
m = m,
2
, it follows from (65.9), (610.17),
and (610.18) that
(
(a,b,x)
m)(, , u) = h(b, x) h(a, x) (610.19)
+
__
b
a

(2)
h(, x)
_
u
for all (a, b, x) T and all (, , u) R R 1.
Now, since f and g are dierentiable, so is (f, g, 1
DE
) : T RRc
and we have

x
(f, g, 1
DE
) = (
x
f,
x
g, 1
V
) (610.20)
for all x T. (See Prop.2 of Sect.63). By (610.14) and (610.16) we have
k = m (f, g, 1
DE
)[
D
. Therefore, by the General Chain Rule of Sect.63, k
is of class C
1
and we have

x
k = (
(f(x),g(x),x)
m)
x
(f, g, 1
DE
)
for all x T. Using (610.19) and (610.20), we obtain (610.15).
611. CURL, SYMMETRY OF SECOND GRADIENTS 261
611 Curl, Symmetry of Second Gradients
In this section, we assume that at spaces c and c

, with translation spaces


1 and 1

, and an open subset T of c are given.


If H : T Lin(1, 1

) is dierentiable, we can apply the identication


(24.1) to the codomain of the gradient
H : T Lin
_
1, Lin(1, 1

)
_

= Lin
2
(1
2
, 1

),
and hence we can apply the switching dened by (24.4) to the values of H.
Denition 1: The curl of a dierentiable mapping H : T Lin(1, 1

)
is the mapping Curl H : T Lin
2
(1
2
, 1

) dened by
(Curl H)(x) :=
x
H(
x
H)

for all x T. (611.1)


The values of Curl H are skew, i.e. Rng Curl H Skew
2
(1
2
, 1

).
The following result deals with conditions that are sucient or necessary
for the curl to be zero.
Curl-Gradient Theorem: Assume that H : T Lin(1, 1

) is of class
C
1
. If H = for some : T c

then Curl H = 0. Conversely, if T is


convex and Curl H = 0 then H = for some : T c

.
The proof will be based on the following
Lemma: Assume that T is convex. Let q T and q

be given and
let : T c

be dened by
(q +v) := q

+
__
1
0
H(q + sv)ds
_
v for all v T q. (611.2)
Then is of class C
1
and we have
(q +v) = H(q +v)
__
1
0
s(Curl H)(q + sv)ds
_
v (611.3)
for all v T q.
Proof: We dene T R 1 by
T := (s, u) [ u T q, q + su T.
Since T is open and convex, it follows that, for each u T q, T
(,u)
:=
s R [ q + su T is an open interval that includes [0, 1]. Since H is of
class C
1
, so is
((s, u) H(q + su)) : T Lin(1, 1

);
262 CHAPTER 6. DIFFERENTIAL CALCULUS
hence we can apply the Dierentiation Theorem for Integral Representations
of Sect.610 to conclude that u
_
1
0
H(q +su)ds is of class C
1
and that its
gradient is given by

v
(u
_
1
0
H(q + su)ds) =
_
1
0
sH(q + sv)ds
for all v T q. By the Product Rule (66.7), it follows that the mapping
dened by (611.2) is of class C
1
and that its gradient is given by
(
q+v
)u =
__
1
0
sH(q + sv)uds
_
v +
__
1
0
H(q + sv)ds
_
u
for all v Tq and all u 1. Applying the denitions (24.2), (24.4), and
(611.1), and using the linearity of the switching and Prop.1 of Sect.610, we
obtain
(
q+v
)u =
__
1
0
s(Curl H)(q + sv)ds
_
(u, v) (611.4)
+
__
1
0
(s(H(q + sv))v +H(q + sv))ds
_
u
for all v Tq and all u 1. By the Product Rule (66.10) and the Chain
Rule we have, for all s [0, 1],

s
(t tH(q + tv)) = H(q + sv) + s(H(q + sv))v.
Hence, by the Fundamental Theorem of Calculus, the second integral on the
right side of (611.4) reduces to H(q +v). Therefore, since Curl H has skew
values and since u 1 was arbitrary, (611.4) reduces to (611.3).
Proof of the Theorem: Assume, rst, that T is convex and that
Curl H = 0. Choose q T, q

and dene : T c

by (611.2). Then
H = by (611.3).
Conversely, assume that H = for some : T c

. Let q T
be given. Since T is open, we can choose a convex open neighborhood ^
of q with ^ T. Let v ^ q by given. Since ^ is convex, we have
q + sv ^ for all s [0, 1], and hence we can apply the Fundamental
Theorem of Calculus to the derivative of s (q + sv), with the result
(q +v) = (q) +
__
1
0
H(q + sv)ds
_
v.
611. CURL, SYMMETRY OF SECOND GRADIENTS 263
Since v ^ q was arbitrary, we see that the hypotheses of the Lemma are
satised for ^ instead of T, q

:= (q), [
N
instead of , and H[
N
= [
N
instead of H. Hence, by (611.3), we have
__
1
0
s(Curl H)(q + sv)ds
_
v = 0 (611.5)
for all v ^ q. Now let u 1 be given. Since ^ q is a neighborhood
of 0 1, there is a P

such that tu ^ q for all t ]0, ]. Hence,


substituting tu for v in (611.5) and dividing by t gives
__
1
0
s(Curl H)(q + stu)ds
_
u = 0 for all t ]0, ].
since Curl H is continuous, we can apply Prop.3 of Sect.610 and, in the limit
t 0, obtain ((Curl H)(q))u = 0. Since u 1 and q T were arbitrary,
we conclude that Curl H = 0.
Remark 1: In the second part of the Theorem, the condition that T be
convex can be replaced by the weaker one that T be simply connected.
This means, intuitively, that every closed curve in T can be continuously
shrunk entirely within T to a point.
Since Curl =
(2)
(
(2)
)

by (611.1), we can restate the rst


part of the Curl-Gradient Theorem as follows:
Theorem on Symmetry of Second Gradients: If : T c

is of class C
2
, then its second gradient
(2)
: T Lin
2
(1
2
, 1

) has
symmetric values, i.e. Rng
(2)
Sym
2
(1
2
, 1

).
Remark 2: The assertion that
x
() is symmetric for a given x T
remains valid if one merely assumes that is dierentiable and that
is dierentiable at x. A direct proof of this fact, based on the results of
Sect.64, is straightforward although somewhat tedious.
We assume now that c := c
1
c
2
is the set-product of at spaces c
1
, c
2
with translation spaces 1
1
, 1
2
, respectively. Assume that : T c

is
twice dierentiable. Using Prop.1 of Sect.65 repeatedly, it is easily seen that

(2)
(x) =
(
(1)

(1)
)(x)(ev
1
, ev
1
) + (
(1)

(2)
)(x)(ev
1
, ev
2
) + (611.6)
(
(2)

(1)
)(x)(ev
2
, ev
1
) + (
(2)

(2)
)(x)(ev
2
, ev
2
)
for all x T, where ev
1
and ev
2
are the evaluation mappings from 1 :=
1
1
1
2
to 1
1
and 1
2
, respectively (see Sect.04). Evaluation of (611.6) at
264 CHAPTER 6. DIFFERENTIAL CALCULUS
(u, v) = ((u
1
, u
2
), (v
1
, v
2
)) 1
2
gives

(2)
(x)(u, v) =
(
(1)

(1)
)(x)(u
1
, v
1
) + (
(1)

(2)
)(x)(u
1
, v
2
) + (611.7)
(
(2)

(1)
)(x)(u
2
, v
1
) + (
(2)

(2)
)(x)(u
2
, v
2
).
The following is a corollary to the preceding theorem.
Theorem on the Interchange of Partial Gradients: Assume that
T is an open subset of a product space c := c
1
c
2
and that : T c

is of class C
2
. Then

(1)

(2)
= (
(2)

(1)
)

, (611.8)
where the operation is to be understood as value-wise switching.
Proof: Let x T and w 1
1
1
2
be given. If we apply (611.7) to the
case when u := (w
1
, 0), v := (0, w
2
) and then with u and v interchanged
we obtain

(2)
(x)(u, v) = (
(1)

(2)
)(x)(w
1
, w
2
),

(2)
(x)(v, u) = (
(2)

(1)
)(x)(w
2
, w
1
).
Since w 1
1
1
2
was arbitrary, the symmetry of
(2)
(x) gives (611.8).
Corollary: Let I be a nite index set and let T be an open subset
of R
I
. If : T c

is of class C
2
, then the second partial derivatives
,
i
,
k
: T 1

satisfy
,
i
,
k
= ,
k
,
i
for all i, k I. (611.9)
Remark 3: Assume T is an open subset of a Euclidean space c with
translation space 1. A mapping h : T 1

= 1

is then called a vector-eld


(see Sect.71). If h is dierentiable, then the range of Curl h is included in
Skew
2
(1
2
, R)

= Skew1. If 1 is 3-dimensional, there is a natural doubleton
of orthogonal isomorphisms from Skew1 to 1, as will be explained in Vol.II.
If one of these two isomorphisms, say V Orth(Skew1, 1), is singled out,
we may consider the vector-curl curl h := V(Curl h)[
SkewV
of the vector
eld h (Note the lower-case c). This vector-curl rather than the curl of
Def.1 is used in much of the literature.
Notes 611
(1) In some of the literature on Vector Analysis, the notation h instead of curl h
is used for the vector-curl of h, which is explained in the Remark above. This
notation should be avoided for the reason mentioned in Note (1) to Sect.67.
612. LINEONIC EXPONENTIALS 265
612 Lineonic Exponentials
We note the following generalization of a familiar result of elementary cal-
culus.
Proposition 1: Let c be a at space with translation space 1 and let
I be an interval. Let g be a sequence of continuous processes in Map(I, 1)
that converges locally uniformly to h Map(I, 1). Let a I and q c be
given and dene the sequence z in Map(I, c) by
z
n
(t) := q +
_
t
a
g
n
for all n N

and t I. (612.1)
Then h is continuous and z converges locally uniformly to the process p :
I c dened by
p(t) := q +
_
t
a
h for all t I. (612.2)
Proof: The continuity of h follows from the Theorem on Continuity of
Uniform Limits of Sect.56. Hence (612.2) is meaningful. Let be a norm
on 1. By Prop.2 of Sect.610 and (612.1) and (612.2) we have
(z
n
(t) p(t)) =
__
t
a
(g
n
h)
_

_
t
a
(g
n
h)

(612.3)
for all n N

and all t I. Now let s I be given. We can choose


a compact interval [b, c] I such that a [b, c] and such that [b, c] is a
neightborhood of s relative to I (see Sect.56). By Prop.7 of Sect.58, g[
[b,c]
converges uniformly to h[
[b,c]
. Now let P

be given. By Prop.7 of Sect.55


we can determine m N

such that
(g
n
h)[
[b,c]
<

c b
for all n m +N.
Hence, by (612.3), we have
(z
n
(t) p(t)) < [t a[

c b
< for all n m +N
and all t [b, c]. Since P

was arbitrary, we can use Prop.7 of Sect.55


again to conclude that z[
[b,c]
converges uniformly to p[
[b,c]
. Since s I was
arbitrary and since [b, c] Nhd
s
(I), the conclusion follows.
From now on we assume that a linear space 1 is given and we consider
the algebra Lin1 of lineons on 1 (see Sect.18).
266 CHAPTER 6. DIFFERENTIAL CALCULUS
Proposition 2: The sequence S in Map(Lin1, Lin1) dened by
S
n
(L) :=

kn
[
1
k!
L
k
(612.4)
for all L Lin1 and all n N

converges locally uniformly.


Its limit is called the (lineonic) exponential for 1 and is denoted by
exp
V
: Lin1 Lin1. The exponential exp
V
is continuous.
Proof: We choose a norm on 1. Using Prop.6 of Sect.52 and induction,
we see that |L
k
|

|L|

k
for all k N and all L Lin1 (see (52.9)).
Hence, given P

, we have
|
1
k!
L
k
|



k
k!
for all k N and all L Ce(||

). Since the sum-sequence of (

k
k!
[ k N)
converges (to e

), we can use Prop.9 of Sect.55 to conclude that the restric-


tion of the sum-sequence S of (
1
k!
L
k
[ k N) to Ce(| |

) converges
uniformly. Now, given L Lin1, Ce(| |

) is a neighborhood of L if
> |L|

. Therefore S converges locally uniformly. The continuity of its


limit exp
V
follows from the Continuity Theorem for Uniform Limits.
For later use we need the following:
Proposition 3: Let L Lin1 be given. Then the constant zero is the
only dierentiable process D : R Lin1 that satises
D

= LD and D(0) = 0. (612.5)


Proof: By the Fundamental Theorem of Calculus, (612.5) is equivalent
to
D(t) =
_
t
0
(LD) for all t R. (612.6)
Choose a norm on 1. Using Prop.2 of Sect.610 and Prop.6 of Sect.52, we
conclude from (612.6) that
|D(t)|


_
t
0
|LD(s)|

ds |L|

_
t
0
|D(s)|

ds,
i.e., with the abbreviations
(t) := |D(t)|

for all t P, := |L|

, (612.7)
that
0 (t)
_
t
0
for all t P. (612.8)
612. LINEONIC EXPONENTIALS 267
We dene : P R by
(t) := e
t
_
t
0
for all t P. (612.9)
Using elementary calculus, we infer from (612.9) and (612.8) that

(t) = e
t
_
t
0
+ e
t
(t) 0
for all t P and hence that is antitone. On the other hand, it is evident
from (612.9) that (0) = 0 and 0. This can happen only if = 0,
which, in turn, can happen only if (t) = 0 for all t P and hence, by
(612.7), if D(t) = 0 for all t P. To show that D(t) = 0 for all t P, one
need only replace D by D () and L by L in (612.5) and then use the
result just proved.
Proposition 4: Let L Lin1 be given. Then the only dierentiable
process E : R Lin1 that satises
E

= LE and E(0) = 1
V
(612.10)
is the one given by
E := exp
V
(L). (612.11)
Proof: We consider the sequence G in Map(R, Lin1) dened by
G
n
(t) :=

kn
[
t
k
k!
L
k+1
(612.12)
for all n N

and all t R. Since, by (612.4), G


n
(t) = LS
n
(tL) for
all n N

and all t R, it follows from Prop.2 that G converges locally


uniformly to Lexp(L) : R Lin1. On the other hand, it follows from
(612.12) that
1
V
+
_
t
0
G
n
= 1
V
+

kn
[
t
k+1
(k + 1)!
L
k+1
= S
n+1
(tL)
for all n N

and all t R. Applying Prop.1 to the case when c and 1 are


replaced by Lin1, I by R, g by G, a by 0, and q by 1
V
, we conclude that
exp
V
(tL) = 1
V
+
_
t
0
(Lexp
V
(L)) for all t R, (612.13)
268 CHAPTER 6. DIFFERENTIAL CALCULUS
which, by the Fundamental Theorem of Calculus, is equivalent to the asser-
tion that (612.10) holds when E is dened by (612.11).
The uniqueness of E is an immediate consequence of Prop.3.
Proposition 5: Let L Lin1 and a continuous process F : R Lin1
be given. Then the only dierentiable process D : R Lin1 that satises
D

= LD+F and D(0) = 0 (612.14)


is the one given by
D(t) =
_
t
0
E(t s)F(s)ds for all t R, (612.15)
where E : R Lin1 is given by (612.11).
Proof: Let D be dened by (612.15). Using the Corollary to the Dif-
ferentiation Theorem for Integral Representations of Sect.610, we see that
D is of class C
1
and that D

is given by
D

(t) = E(0)F(t) +
_
t
0
E

(t s)F(s)ds for all t R.


Using (612.10) and Prop.1 of Sect.610, and then (612.15), we nd that
(612.14) is satised.
The uniqueness of D is again an immediate consequence of Prop.3.
Dierentiation Theorem for Lineonic Exponentials: Let 1 be a
linear space. The exponential exp
V
: Lin1 Lin1 is of class C
1
and its
gradient is given by
(
L
exp
V
)M =
_
1
0
exp
V
(sL)Mexp
V
((1 s)L)ds (612.16)
for all L, M Lin1.
Proof: Let L, M Lin1 be given. Also, let r R

be given and put


K := L + rM, E
1
:= exp
V
(L), E
2
:= exp
V
(K). (612.17)
By Prop.4 we have E
1

= LE
1
, E
2

= KE
2
, and E
1
(0) = E
2
(0) = 1
V
.
Taking the dierence and observing (612.17)
1
we see that D := E
2
E
1
satises
D

= KE
2
LE
1
= LD+ rME
2
, D(0) = 0.
Hence, if we apply Prop.5 with the choice F := rME
2
we obtain
(E
2
E
1
)(t) = r
_
t
0
E
1
(t s)ME
2
(s)ds for all t R.
612. LINEONIC EXPONENTIALS 269
For t := 1 we obtain, in view of (612.17),
1
r
(exp
V
(L + rM) exp
V
(L)) =
_
1
0
(exp
V
((1 s)L))Mexp
V
(s(L + rM))ds.
Since exp
V
is continuous by Prop.2, and since bilinear mappings are contin-
uous (see Sect.66), we see that the integrand on the right depends contin-
uously on (s, r) R
2
. Hence, by Prop.3 of Sect.610 and by (65.13), in the
limit r 0 we nd
(dd
M
exp
V
)(L) =
_
1
0
(exp
V
((1 s)L))Mexp
V
(sL)ds. (612.18)
Again, we see that the integrand on the right depends continuously on
(s, L) RLin1. Hence, using Prop.3 of Sect.610 again, we conclude that
the mapping dd
M
exp
V
: Lin1 Lin1 is continuous. Since M Lin1
was arbitrary, it follows from Prop.7 of Sect.65 that exp
V
is of class C
1
. The
formula (612.16) is the result of combining (65.14) with (612.18).
Proposition 6: Assume that L, M Lin1 commute, i.e. that LM =
ML. Then:
(i) M and exp
V
(L) commute.
(ii) We have
exp
V
(L +M) = (exp
V
(L))(exp
V
(M)). (612.19)
(iii) We have
(
L
exp
V
)M = (exp
V
(L))M. (612.20)
Proof: Put E := exp
v
(L) and D := EM ME. By Prop.4, we nd
D

= E

M ME

= LEM MLE. Hence, since LM = ML, we obtain


D

= LD. Since D(0) = 1


V
M M1
V
= 0, it follows from Prop.3 that
D = 0 and hence D(1) = 0, which proves (i).
Now put F := exp
v
(M). By the Product Rule (66.13) and by Prop.4,
we nd
(EF)

= E

F +EF

= LEF +EMF.
Since EM = ME by part (i), we obtain (EF)

= (L + M)EF. Since
(EF)(0) = E(0)F(0) = 1
V
, by Prop.4, EF = exp
v
((L +M)). Evaluation
at 1 gives (612.19), which proves (ii).
Part (iii) is an immediate consequence of (612.16) and of (i) and (ii).
270 CHAPTER 6. DIFFERENTIAL CALCULUS
Pitfalls: Of course, when 1 = R, then Lin1 = LinR

= R and the
lineonic exponential reduces to the ordinary exponential exp. Prop.6 shows
how the rule exp(s+t) = exp(s) exp(t) for all s, t R and the rule exp

= exp
generalize to the case when 1 , = R. The assumption, in Prop.6, that L and
M commute cannot be omitted. The formulas (612.19) and (612.20) need
not be valid when LM ,= ML.
If the codomain of the ordinary exponential is restricted to P

it becomes
invertible and its inverse log : P

R is of class C
1
. If dim1 > 1, then exp
V
does not have dierentiable local inverses near certain values of L Lin1
because for these values of L, the gradient
L
exp
V
fails to be injective (see
Problem 10). In fact, one can prove that exp
V
is not locally invertible near
the values of L in question. Therefore, there is no general lineonic analogue
of the logarithm. See, however, Sect.85.
613 Problems for Chapter 6
(1) Let I be a genuine interval, let c be a at space with translation space
1, and let p : I c be a dierentiable process. Dene h : I
2
1 by
h(s, t) :=
_
p(s)p(t)
st
if s ,= t
p

(s) if s = t
_
. (P6.1)
(a) Prove: If p

is continuous at t I, then h is continuous at


(t, t) I
2
.
(b) Find a counterexample which shows that h need not be continu-
ous if p is merely dierentiable and not of class C
1
.
(2) Let f : T R be a function whose domain T is an open convex subset
of a at space c with translation space 1.
(a) Show: If f is dierentiable and f is constant, then f = a[
D
for
some at function a Flf(c) (see Sect.36).
(b) Show: If f is twice dierentiable and
(2)
f is constant, then f
has the form
f = a[
D
+Q (1
DE
q
DE
), (P6.2)
where a Flf(c), q c, and Q Qu(1) (see Sect. 27).
613. PROBLEMS FOR CHAPTER 6 271
(3) Let T be an open subset of a at space c and let be a genuine inner-
product space. Let k : T

be a mapping that is dierentiable


at a given x T.
(a) Show that the value-wise magnitude [k[ : T P

, dened by
[k[(y) := [k(y)[ for all y T, is dierentiable at x and that

x
[k[ =
1
|k(x)|
(
x
k)

k(x). (P6.3)
(b) Show that k/[k[ : T

is dierentiable at x and that

x
_
k
|k|
_
=
1
|k(x)|
3
_
[k(x)[
2

x
k k(x) (
x
k)

k(x)
_
. (P6.4)
(4) Let c be a genuine inner-product space with translation space 1, let
q c be given, and dene r : c q 1

by
r(x) := x q for all x c q (P6.5)
and put r := [r[ (see Part (a) of Problem 3).
(a) Show that r/r : c q 1

is of class C
1
and that

_
r
r
_
=
1
r
3
(r
2
1
V
r r). (P6.6)
(Hint: Use Part (b) of Problem 3.)
(b) Let the function h : P

R be twice dierentiable. Show that


h r : c q R is twice dierentiable and that

(2)
(h r) =
1
r
_
1
r
2
(r(h

r) (h

r))r r + (h

r)1
V
_
.
(P6.7)
(c) Evaluate the Laplacian (h r) and reconcile your result with
(67.17).
(5) A Euclidean space c of dimension n with n 2, a point q c, and a
linear space are assumed given.
(a) Let I be an open interval, let T be an open subset of c, let
f : T I be given by (67.16), let a be a at function on c, let
g : I be twice dierentiable, and put h := a[
D
(g f) : T
. Show that
h = 2a[
D
((2g

+ (n + 2)g

) f) 4a(q)(g

f). (P6.8)
272 CHAPTER 6. DIFFERENTIAL CALCULUS
(b) Assuming that the Euclidean space c is genuine, show that the
function h : c q given by
h(x) := ((e (x q))[x q[
n
)a +b (P6.9)
is harmonic for all e 1 := c c and all a, b . (Hint: Use
Part (a) with a determined by a = e, a(q) = 0.)
(6) Let c be a at space, let q c, let Q be a non-degenerate quadratic
form (see Sect.27) on 1 := c c, and dene f : c R by
f(y) := Q(y q) for all y c. (P6.10)
Let a be a non-constant at function on c (see Sect.36).
(a) Prove: If the restriction of f to the hyperplane T := a
<
(0)
attains an extremum at x T, then x must be given by
x = q
1
Q (a), where :=
a(q)
1
Q (a,a)
. (P6.11)
(Hint: Use the Constrained-Extremum Theorem.)
(b) Under what condition does f[
F
actually attain a maximum or a
minimum at the point x given by (P6.11)?
(7) Let c be a 2-dimensional Euclidean space with translation-space 1
and let J Orth1 Skew1 be given (see Problem 2 of Chapt.4). Let
T be an open subset of c and let h : T 1 be a vector-eld of class
C
1
.
(a) Prove that
Curl h = (div(Jh))J. (P6.12)
(b) Assuming that T is convex and that div h = 0, prove that h =
Jf for some function f : T R of class C
2
.
Note: If h is interpreted as the velocity eld of a volume-preserving ow, then
div h = 0 is valid and a function f as described in Part (b) is called a stream-
function of the ow.
613. PROBLEMS FOR CHAPTER 6 273
(8) Let a linear space 1, a lineon L Lin1, and u 1 be given. Prove:
The only dierentiable process h : R 1 that satises
h

= Lh and h(0) = u (P6.13)


is the one given by
h := (exp
V
(L))u. (P6.14)
(Hint: To prove existence, use Prop. 4 of Sect. 612. To prove unique-
ness, use Prop.3 of Sect.612 with the choice D := h (value-wise),
where 1

is arbitrary.)
(9) Let a linear space 1 and a lineon J on 1 that satises J
2
= 1
V
be
given. (There are such J if dim1 is even; see Sect.89.)
(a) Show that there are functions c : R R and d : R R, of class
C
1
, such that
exp
V
(J) = c1
V
+ dJ. (P6.15)
(Hint: Apply the result of Problem 8 to the case when 1 is re-
placed by c := Lsp(1
V
, J) Lin1 and when L is replaced by
Le
J
Linc, dened by Le
J
U = JU for all U c.)
(b) Show that the functions c and d of Part (a) satisfy
d

= c, c

= d, c(0) = 1, d(0) = 0, (P6.16)


and
c(t + s) = c(t)c(s) d(t)d(s)
d(t + s) = c(t)d(s) + d(t)c(s)
_
for all s, t R. (P6.17)
(c) Show that c = cos and d = sin.
Remark: One could, in fact, use Part (a) to dene sin and cos.
(10) Let a linear space 1 and J Lin1 satisfying J
2
= 1
V
be given and
put
/ := L Lin1 [ LJ = JL. (P6.18)
274 CHAPTER 6. DIFFERENTIAL CALCULUS
(a) Show that / = Null (
J
exp
V
) and conclude that
J
exp
V
fails
to be invertible when dim1 > 0. (Hint: Use the Dierentiation
Theorem for Lineonic Exponentials, Part (a) of Problem 9, and
Part (d) of Problem 12 of Chap. 1).
(b) Prove that 1
V
Bdy Rng exp
V
and hence that exp
V
fails to be
locally invertible near J.
(11) Let T be an open subset of a at space c with translation space 1 and
let 1

be a linear space.
(a) Let H : T Lin(1, 1

) be of class C
2
. Let x T and v 1 be
given. Note that

x
Curl H Lin(1, Lin(1, Lin(1, 1

)))

= Lin
2
(1
2
, Lin(1, 1

))
and hence
(
x
Curl H)

Lin
2
(1
2
, Lin(1, 1

))

= Lin(1, Lin(1, Lin(1, 1

))).
Therefore we have
G := (
x
Curl H)

v Lin(1, Lin(1, 1

))

= Lin
2
(1
2
, 1

).
Show that
GG

= (
x
Curl H)v. (P6.19)
(Hint: Use the Symmetry Theorem for Second Gradients.)
(b) Let : T 1

be of class C
2
and put
W := Curl : T Lin(1, 1

)

= Lin
2
(1
2
, R).
Prove: If h : T 1 is of class C
1
, then
Curl (Wh) = (W)h +Wh + (h)

W, (P6.20)
where value-wise evaluation and composition are understood.
(12) Let c and c

be at spaces with dimc = dimc

2, let : c c

be
a mapping of class C
1
and put
c := x c [
x
is not invertible.
613. PROBLEMS FOR CHAPTER 6 275
(a) Show that
>
(c) Rng Bdy (Rng ).
(b) Prove: If the pre-image under of every bounded subset of c

is
bounded and if Acc
>
(c) = (see Def.1 of Sect.57), then is
surjective. (Hint: Use Problem 13 of Chap.5.)
(13) By a complex polynomial p we mean an element of C
(N)
, i.e. a
sequence in C indexed on N and with nite support (see (07.10)). If
p ,= 0, we dene the degree of p by
deg p := max Supt p = maxk N [ p
k
,= 0.
By the derivative p

of p we mean the complex polynomial p

C
(N)
dened by
(p

)
k
:= (k + 1)p
k+1
for all k N. (P6.21)
The polynomial function p : C C of p is dened by
p(z) :=

kN
p
k
z
k
for all z C. (P6.22)
Let p (C
(N)
)

be given.
(a) Show that p
<
(0) is nite and p
<
(0) deg p.
(b) Regarding C as a two-dimensional linear space over R, show that
p is of class C
2
and that
(
z
p)w = p

(z)w for all z, w C. (P6.23)


(c) Prove that p is surjective if deg p > 0. (Hint: Use Part (b) of
Problem 12.)
(d) Show: If deg p > 0, then the equation ? z C, p(z) = 0 has at
least one and no more than deg p solutions.
Note: The assertion of Part (d) is usually called the Fundamental Theorem of
Algebra. It is really a theorem of analysis, not algebra, and it is not all that
fundamental.

You might also like