You are on page 1of 74

Calculus and Differential Geometry:

An Introduction to Curvature

Donna Dietz
Howard Iseri
Department of Mathematics and Computer Information Science,
Mansfield University, Mansfield, PA 16933
E-mail address:

Chapter 1. Angles and Curvature 1

1. Rotation 1
2. Angles 3
3. Rotation 4
4. Definition of Curvature 6
5. Impulse Curvature 8

Chapter 2. Solid Angles and Gauss Curvature 11

1. Total curvature for cone points 11
2. Total curvature for smooth surfaces 13
3. Gauss curvature and impulse curvature 14
4. Gauss-Bonnet Theorem (Exact exerpt from Creative Visualization
handout. 15
5. Defining Gauss curvature 16
6. Intrinsic aspects of the Gauss curvature 19

Chapter 3. Intrinsic Curvature 21

1. Parallel vectors 21

Chapter 4. Functions 25
1. Introduction 25
2. Piecewise-Linear Approximations for Functions of One Variable 25
3. Uniform Continuity 27
4. Differentiation in One Variable 29
5. Derivatives and PL Approximations 33
6. Parametrizations of Curves 35
7. Functions of Two Variables 37
8. Differentiability for Functions of Two Variables 37

Chapter 5. The Riemannian Curvature Tensor in Two Dimensions 47

1. Parametrizations 48

Chapter 6. Riemannian Curvature Tensor 53

1. The Riemannian Metric for a Plane 53
2. The Riemannian Metric for Curved Surfaces 56
3. Curvature 60
4. The Inverse of the Metric 62

Chapter 7. Riemannian Curvature Tensor 63

1. Intrinsic Interpretations 63

Chapter 8. Curvature of 3-Dimensional Spaces 69

1. What we know 69
2. What is the geometry like around a vertex of a cubed 3-manifold? 69
3. A positive curvature example 69

Angles and Curvature

0.1. Overview. As you walk around a closed path (along a simple closed
curve on the floor), the direction you are facing will make a net rotation of 2π
radians or 360◦.

1. Rotation
Imagine a circle drawn on the floor (the radius might be ten feet). You are
to walk around the circle once in a counter-clockwise direction. If you are initially
facing north, you will soon be facing north-west and then west. We can naturally
say that the direction in which you are facing has changed by 90◦ or π2 radians.
After that, you will face south, then east, and finally north again. The direction
in which you are facing has experienced a rotation of 360◦. We will want to think
of this rotation as describing how the direction you are facing has changed as
opposed to your change in location as you make an orbit around the circle.
For a curve in the plane, we can talk about the rotation of a tangent vector in
the same way that we have talked about the rotation of our body as we walk along
a curve drawn on the floor. Intuitively at least, we would like to identify these two
concepts. That is, what we discover about one should apply equally to the other.
Throughout this book, we will use the convention that counter-clockwise rota-
tions are positive. For example, if you were to turn 45◦ to the left and then 90◦ to
the right, the net rotation would be −45◦ .

Figure 1. Walk along this path marked on the floor. (Exercise 1)


1.1. Exercises.
1. Suppose you are walking around the curve shown in Figure 1 in a counter-
clockwise direction. Assume that the curve is smooth (the direction varies smoothly)
and that the direction you are facing is the same as that of a tangent vector. How
does the direction you face change as you move from the starting point A to the
point B? From B to C? From A to C? What is your total (net) rotation for the
entire circuit?

Figure 2. Walk along this path marked on the floor. (Exercise 2)

2. What would your total rotation be as you walked in the direction indicated
around the path shown in Figure 2?

Figure 3. Walk along this path marked on the floor. (Exercise 3)

3. What would your total rotation be as you walked in the direction indicated
around the path shown in Figure 3?
4. Make a conjecture about the net rotation of a tangent vector moving around
a simple closed curve in the plane in a counter-clockwise direction.
5. Make a conjecture about the net rotation of a normal vector moving around
a simple closed curve in the plane in a counter-clockwise direction. Does it make
a difference whether the normal vector is pointing outward or inwards? Are there
other directions that a normal vector can point?
1.2. Overview. Angles are abrupt changes in direction. Total curvature is
the net change in direction over some section of a curve or polygonal path.

2. Angles
One of the most important theorems in Euclidean geometry states that the
sum of the angles of a triangle is 180◦. Virtually all of the theorems that involve
angle measure or parallelism can be proved with this fact. Among these would be
that the angle sum of a quadrilateral is 360◦ , the angle sum of a pentagon is 540◦,
the angle sum of a hexagon is 720◦ , and in general,
Theorem 1. The angle sum of a (convex) n-gon is (n − 2) · 180◦





Figure 4. The turning angles for a quadrilateral.

This is all very nice, but the sequence of theorems just mentioned can be
restated more simply and intuitively in terms of the turning angle or angle
defect. The reason for using the term turning angle should become clear, and
angle defect refers to the idea that the turning angle measures how far the angle is
from being a straight angle. In Figure 4, a quadrilateral is shown with the turning
angles marked. You should imagine yourself walking around the quadrilateral in
a counter-clockwise direction. The turning angles then measure the amount you
must turn to your left as you start the next edge. In this case, the sum of the
turning angles is 360◦ . If you imagine yourself walking around any closed path,
taking left turns, and coming back to your original position, you must have rotated
a full 360◦ . This should agree completely with your answers to the exercises in the
previous section. It seems reasonable, therefore, that the sum of the turning angles
is 360◦ for any polygon. This is in fact true, and Theorem 1 can be restated as
Theorem 2. The turning angle sum of a (convex) n-gon is 360◦.
It is not necessarily true that Theorem 2 is a better theorem than Theorem 1,
but it is certainly simpler and more intuitive. The angle sum theorem is probably
more convenient for analyzing geometric figures, but we are wanting to understand
curvature, and the turning angle sum theorem sets us off in the right direction.
2.1. Exercises.
6. Theorem 1 states that the angle sum of an n-gon is (n − 2)180◦ or n − 2
times the angle sum of a triangle. Draw a figure illustrating that a convex pentagon
has the angle sum of three triangles. Do the same for a hexagon.

7. Suppose the quadrilateral of Figure 4 is drawn on the floor with up in the

picture corresponding to north, and you are to walk around it in the counter-
clockwise direction. Draw a picture of the face of a compass, and for one of the
sides, draw the position of the needle corresponding to the direction you are facing
as you walk along it. On the same picture, draw the needle positions corresponding
to the other three sides. At each vertex, you would need to pivot as you finish
walking along one side of the quadrilateral and start on the next. In your picure,
for each vertex, indicate which directions you sweep through as you turn to the left.

3. Rotation
Our goal is to formulate definitions in differential geometry. Before we do that
for curves in the plane, let us summarize what we have so far.
Given an object moving in a counter-clockwise direction around a simple closed
curve, a vector tangent to the curve and associated with the object must make a
“full” rotation of 2π radians or 360◦ . In other words, if we were to think of this
tangent vector (of if you wish, a copy of it) as having its tail fixed at the origin,
then as the object moves around the curve, the tangent vector will sweep through
all possible directions. This rotation of the tangent vector will be predominantly
in the counter-clockwise direction, but it may, for example, sweep clockwise for a
bit, come back counter-clockwise an equal amount, and then continue on. These
clockwise rotations are always countered by an extra counter-clockwise rotation,
and the total net result is always 360◦ of counter-clockwise rotation.
If the curve is smooth (whatever that means), we can easily describe a tangent
vector in terms of a derivative. There are some difficulties at non-smooth parts of
a curve. At the corners of a quadrilateral, for example, a derivative will not specify
a unique tangent direction. In this case at least, we will be able to find a tangent
direction entering the vertex and one leaving. We can and will account for the
directions swept through as we pivot from one direction to the other, and we will
avoid curves that are “less smooth” than this.
In order to motivate the definitions describing rotations in terms of derivatives,
we will consider the following. Looking at the unit tangent vector as we move
around a vertex of a polygonal path, we see that the direction of the tangent vector
stays the same, pivots through some angle θ at the vertex, and then again remains
the same until another vertex is encountered. An example of this is illustrated in
Figure 5, and in this picture the angle θ will be positive. Later, we will be interested
in understanding curvature in higher dimensions, and it will be more convenient
to speak in terms of a unit normal vector rather than a unit tangent. For a curve
in the plane (we will assume that polygonal paths are curves) a unit normal to a
curve will experience the same changes in direction that a unit tangent will. The
unit normal to the same curve shown in Figure 5 will also sweep through the same
angle θ, as shown in Figure 6. As described earlier, the rotation is a measure of
how the direction of the unit tangent or unit normal vectors changes. If we take
the unit normal at each point of the curve, and put its tail at the origin, the head
of the vector will stay on the unit circle and serve as a “direction-o-meter,” as
shown in Figure 7. As we move along the curve, the will stay fixed until we reach
the vertex, and then it will swing over to the left as we pass through the vertex.
Formally, it is common to associate each point on the curve with a point on the
unit circle determined by the unit normal in this way. It is called the Gauss map,

Figure 5. Following the unit tangent vectors around a vertex with

turning angle θ.

Figure 6. Following the unit normal vectors around a vertex with

turning angle θ.

and this will be something that we will be able to differentiate in a meaningful way.
In order to perform this differentiation, we need to consider a situation where the

Figure 7. The unit normal vectors moved to the origin.

direction of the normal vector changes over some interval, and not all at once. If
we take the polygonal curve we have been using and “smooth” it out, the change
in direction is spread out over the curve, as shown in Figure 8. In particular, note
that the total change in direction is the same, the positive angle θ, it is only that
the direction-o-meter swings to the left more gradually.
The total rotation, which we will call the total curvature, is a quantitity that
applies to both polygonal curves and smooth ones. With the smooth curves, how-
ever, we can also talk about the rate of rotation (it does makes some sense to say
that the rate of rotation at the vertex of the polygonal curve is infinite). There are
two quantities that are natural candidates to which we will compare the rotation,

Figure 8. Following the unit normal vectors around a vertex with

turning angle θ.

time and distance. It makes sense to say that if a given amount of rotation takes
place over a very short distance, then the curve must be very sharply curved, and
conversely, the curve is not so sharply curved, if the the rotation takes place over a
longer distance. Therefore, if we simply divide,
total rotation
(1) average rate of rotation =
we get a reasonable measure of how much a curve curves. We will call this average
rate of rotation, average curvature. The next logical step is to take a limit as the
distance approaches zero, and this suggests a definition for curvature. If ρ is a
quantity measuring rotation, and s is an arclength parameter, then the curvature
κ should be defined

(2) κ=
In the next section, we will express this definition more formally as a formula.
In particular, we need to find a function corresponding to ρ. This formula will
correspond exactly to the one given in calculus classes. Before we do that, however,
we can check a simple case.
The circumference of a circle of radius r is 2πr. The total rotation is 2π radians.
Radians are more natural in this context than are degrees, but degrees would work
OK. If we assume that the curvature of a circle is constant, then the curvature
should be same as the average curvature. The curvature must be, therefore,
2π 1
(3) κ= = ,
2πr r
and this agrees with the calculus definition of curvature. The point of this book is
to show that the definitions for the curvature of surfaces and of three-dimensional
spaces can be motivated in an analogous way.

4. Definition of Curvature
We are in search of a function that measures the angle of rotation for the unit
normal vector, or equivalently, the unit tangent vector. In terms of the Gauss
map, the head of the unit normal vector always lies on the unit circle. Therefore,
the derivative of the unit normal vector must always be tangent to the unit circle.
This is a manifestation of the fact that the derivative of a vector function that has
constant magnitude is always perpendicular to the original vector function. Two
notions point the way. First, over small distances, the arc of a circle near a point

on the circle and the tangent line through that point are very similar. Second, the
length of an arc of the unit circle is equal to the corresponding angle measured in
radians. Therefore, a derivative of the unit normal vector measures change along
a tangent to the unit circle (as in the Gauss map), this change is essentially the
same as the change along the unit circle, which is equal to a change in the direction
of the normal vector measured in radians. In other words, the conclusions of the
last section suggest that the curvature can be defined as the derivative of the unit
normal vector with respect to arclength. It can also be defined as the derivative of
the unit tangent vector with respect to arclength. That is,

dn dT
κ(s) = = .
ds ds
4.1. Exercises.
8. Show that the derivative of the unit normal vector is perpendicular to unit
normal vector. Use the fact that the unit normal vector has constant magnitude,
i.e., knk = 1, and that the D “dot Eproduct rule” looks like the product rule from
d dy

calculus, i.e., ds h x, y i = x, ds + ds , y .

9. The derivitive of the unit normal vector expresses a rate of change along
a tangent to the Gauss map circle. I claimed that this rate of change could be
interpreted as a rate of change of direction in terms of an angle measured in radians.
Show that this interpretation is a reasonable one by showing that for the quantities

shown in Figure 9 dτ dτ
dθ (0) = dρ (0) = dθ (0) = 1. Hint: Don’t work too hard. Just
use trigonometry to express each as a function of one of the others.


Figure 9. Comparing the quantities measured along the tangent

line, along the unit circle, and the angle at the center of the circle.

10. Suppose an object is moving along a curve, and at a point P on the curve,
the derivative of the unit normal vector with respect to time is dndt = [ 3, 2 ] (this
is a velocity for the head of the vector n in terms of the Gauss map in feet per
second perhaps). We could say that the unit normal vector is rotating at a certain
rate measured in radians per second. What is this rate? Suppose that the velocity
of the object as it passes through P is v = [ 6, 4 ]. What is its speed (a.k.a. dsdt )?
What is the curvature of the curve at the point P ? Hint: You can use the chain
rule, if you want.
11. The vector function x(t) = t, t2 is a parametrization of a parabola. Find
the curvature of the curve at the points corresponding to t = 0 and t = 2.

5. Impulse Curvature
We can define curvature for smooth curves, but this definition will not work for
curves with sharp corners in them. The notion of total curvature applies to both
cases, however. We can develop a notion of curvature that works for corners that we
will call impulse curvature. Let us look closely first at a total curvature function, and
how total curvature and (instantaneous) curvature are related through derivatives
and integrals. Consider the smooth curve of Figure 8. As we move along the curve
from right to left, the unit normal vector makes an angle with the “positive x-axis”
of π6 radians (in this particular example). In Figure 10, the initial point of the graph
corresponds to this value of ρ. The curve in Figure 8 is straight initially, so the
direction of the unit normal is constant, and this manifests itself in the horizontal
section of the graph in Figure 10. As we move into the curved section of the curve,
the unit normal vector begins to rotate counter-clockwise, so the angle ρ increases,
and this is reflected in the graph. The angle reaches a maximum as we move into
the final straight portion of the curve, and ρ is again constant, now at a value of
about 5π 6 radians. We can think of ρ(s) as being a total curvature function, and
difference between the starting and finishing values, θ = 5π π 4π
6 − 6 = 6 represents
the total curvature of this section of the curve.

Figure 10. Graph of the direction of the unit normal vector for
the smooth curve of Figure 8 with respect to arclength.

If we graph the total curvature function ρ for the polygonal curve of Figure 8,
the initial and final values for ρ are the same, but the increase in the value of ρ
occurs at a single point on the curve. The graph for ρ is a step function in this case,
but the total curvature is still the difference between the initial and final values of
ρ. This total curvature function ρ keeps track of the direction of the unit normal
vector, and total curvature is the net change in this function. As concluded before,
curvature is the derivative of this function, dρ
ds . Therefore, we can see curvature in
the graphs of Figures 10 and 11 as slopes. The slopes are zero on either end of
the graph of Figure 10, so the curvature of the corresponding parts of the smooth
curve of Figure 8 must also be zero. This agrees with the fact that the ends of
this curve are straight. The slopes are positive in the middle of the graph, and this
corresponds with the fact that the middle section of the smooth curve has positive
curvature (positive, since the unit normal vector is rotating counter-clockwise). On
the other hand, the polygonal curve of Figure 8 is straight everywhere except at the
vertex. Therefore, the slopes in the graph of Figure 11 are zero everywhere except

Figure 11. Graph of the direction of the unit normal vector for
the polygonal curve of Figure 8 with respect to arclength.

at the point in the middle. Here the slope and the curvature at the vertex are both
infinite. The curvature graphs are shown in Figures 12 and 13.

ds =κ

Figure 12. Graph of the derivative of ρ with respect to s for the
smooth curve of Figure 8.

ds =κ

Figure 13. Graph of the derivative of ρ with respect to s for the
polygonal curve of Figure 8.

Of particular interest is the fact that it is possible to assign a finite value to the
infinite curvature at the vertex of the polygonal curve. More specifically, we will
think of the curvature at this point as being infinite, but if we were to integrate
across this infinite function value, we would obtain a definite finite value, namely
the total curvature. The total curvature at the vertex of the polygonal curve of
Figure 8 is 4π 4π
6 radians, so we will say that the curvature at this point is ∞ ∗ 6 . We
will call this impulse curvature, and the notation will simply remind us that when
we integrate across such a value, the result will be 4π 6 . For example, for any interval
[a, b] containing the undefined point for the function in Figure 13, we would have
Z b
dρ 4π
(5) ds = .
a ds 6

5.1. Exercises.
12. Consider a square with sides of length 1. Choose the midpoint of one of
the sides as a starting point and consider on object moving around the square in
a counter-clockwise direction. Let κ(s) be the curvature function with respect to

arclength from this starting point. Compute each of the following. κ(0). κ 12 .
R1 R4
0 κ(s) ds. 0 κ(s) ds.

13. Consider the graph y = | sin(x)|. What is the curvature at the point (0, 0)?

Solid Angles and Gauss Curvature

1. Total curvature for cone points

The goal here is to generalize our notions of curvature to surfaces. This can
be done in a number of ways, but our intention will be to eventually end up with
an intuitive understanding of the Gauss curvature. In the previous chapter, the
curvature of a curve was obtained by extending the notion of the turning angle
for the vertex of a polygonal curve. This suggests, perhaps, that we first consider
the vertex of a polyhedral surface. If we imagine the vertex of a pyramid, the
3-dimensional region interior to that vertex can be compared to the 2-dimensional
region inside an angle. This solid angle parallels the notion of a (plane) angle in a
number of ways.
It was the turning angle, however, that became the total curvature. One way
of measuring this was to consider the length of the arc on the unit circle that all
of the possible unit normal vectors swept out at the vertex in terms of the Gauss
map. These normals were perpendicular to the tangent lines through the vertex
that were outside of the angle. For the vertex of the pyramid, we could consider
all tangent planes through the vertex outside of the pyramid and the unit vectors
normal to these planes. Under the Gauss map (now to the unit sphere instead of
the unit circle), the heads of these normals would sweep out a region on the unit
sphere. The area of this region would be a natural candidate for the total curvature
(and also the impulse curvature) at this vertex. This works amazingly well, but it
is a bit simpler to look at a cone.
We can make a cone out of a piece of paper by removing a wedge and taping the
edges together. Let us suppose that we remove a wedge with angle θ as in Figure
1. A circle of radius R is shown in Figure 1, and θ radians have been removed from
the circle as well (an arc of length Rθ). After joining the edges, we get something
like the cone in Figure 2. The circle, as a circle on the cone, still has radius R, and
the radius is measured to the vertex of the cone. As a curve in space, it also has a
smaller radius r. The angle between the radius of length r and the surface of the
cone is marked φ, and the angle between the central axis and the normals to the
surface of the cone is also φ. The angle between the central axis and the surface of
the cone is marked ψ.
We are interested in computing the area of the region on the unit sphere cor-
responding to the normal vectors at the vertex. The normal vectors to the surface
of the cone determine a circle on the unit sphere under the Gauss map as shown in
Figure 2. This circle separates the sphere into two pieces, and we are interested in
the area of the upper one.
To find φ, consider the circle of radius R (shown in Figure 1). After removing
the wedge and joining the edges, this circle becomes a circle on the cone. It has



Figure 1. Remove a θ-wedge to construct a cone.



Figure 2. A cone with total curvature θ.

radius R along the surface, and it has radius r in R3 . We can, therefore, compute its
circumference two ways. We have C = 2πr, as a circle in space, and C = (2π − θ)R,
as a circle on the cone having been formed by removing a θ radian wedge. The
right triangle shown in Figure 2 with hypotenuse R and base r has an angle φ, so

r 2π − θ
(6) cos φ = = .
R 2π

Equation (6) determines φ in terms of the angle θ. We can determine the area swept
out by the normals at the vertex under the Gauss map in terms of φ easily using
φ̄, θ̄, and ρ̄ as spherical coordinates for the sphere. Here, 0 ≤ φ̄ ≤ φ, 0 ≤ θ̄ ≤ 2π,

and ρ = 1. The desired area is then

Z 2π Z φ Z 2π Z φ
ρ̄ sin φ̄ dφ̄dθ̄ = sin φ̄ dφ̄dθ̄
0 0 0 0
Z 2π
= 1 − cos φ dθ̄
= 2π(1 − cos φ).
The total curvature at the vertex is therefore 2π(1 − cos φ). Quite remarkable is
the fact that this total curvature is precisely θ, the measure of the wedge removed
to form the cone. This can be seen by solving equation (6) for θ.
Definition 1. The impulse curvature, or total curvature, at the vertex of a
cone is the area swept out by the unit normal vectors at the vertex under the Gauss
Theorem 3. The total curvature at the vertex of a cone is equal to the angle
of the wedge removed to construct it.
1.1. Exercises.
14. Show that the results are the same if we used a pyramid instead of a
cone. Note that there are only four different unit normals obtained from the lateral
surfaces of the pyramid. The rest of the boundary of the region on the unit sphere
come from those tangent planes that contain an edge leading into the vertex. The
normals would be those unit vectors perpendicular to the edge between the normals
for the faces. The normal vectors at the vertex are normal to planes through the
vertex that lie outside of the pyramid.
15. What is the total curvature of any region of the cone not containing the
vertex? (Note: a curve on the surface of the unit sphere has no area.)

2. Total curvature for smooth surfaces

If v is the vertex of a cone, then all of the area on the unit sphere under the
Gauss map comes from unit normal vectors at the vertex. If we were to smooth the
vertex, as in Figure 3, then these unit normals will be spread out over the smooth
surface, and there will only be one unit normal at each point of the surface, but
the area under the Gauss map would be the same, since we have precisely the same
collection of unit normals. Therefore, smoothing the vertex should not change the
total curvature, and the geometry of the surface near the circle shown is exactly the
same as the geometry on the cone. With this notion of total curvature for surfaces
described intuitively, we can define an instantaneous curvature for smooth surfaces,
generally known as the Gauss curvature, as we did with smooth curves.
2.1. Exercises.
16. What is total curvature of a sphere? A cube? A tetrahedron?
17. It is possible to find a triangle on the unit sphere that has one vertex at the
north pole, two vertices on the equator, and three right angles. What is the area
of this triangle? What is the total curvature of the region inside of this triangle?
Find a triangle with two right angles and one angle measuring π4 radians. What
is the total curvature of the region inside of it? Do you see a relationship between
the total curvature within and the angle sums of these two triangles?

Figure 3. We have the same total curvature as the cone in Figure

2, θ, if we smooth the vertex of the cone.

3. Gauss curvature and impulse curvature

The total curvature of a curve was defined as the length of an arc of the unit
circle under the Gauss map. This was an extension of the idea of a turning angle
to curves. The measure of the turning angle, as interpreted through the Gauss
map, can be applied to the vertex of a cone by considering the area a region of the
unit sphere under the (spherical) Gauss map. This idea extends to smooth surfaces
in the same way as the turning angle extends to smooth curves. We obtained an
instantaneous curvature for curves by taking a limit comparing the length along
the unit circle with the corresponding length along the curve. We can do the same
thing here by comparing the area on the unit sphere with the corresponding area
on the surface. This is the notion of curvature of surfaces used by Gauss, and it is
called the Gauss curvature.
Definition 2. At a point p on a surface S, the Gauss curvature at p is the
(8) K = lim ,
∆A→0 ∆A

where ∆A is the area of some region on the surface containing p and ∆Θ is the
total curvature of that region.
If we think of the measure of an angle in terms the possible directions of the
unit normal vector at a vertex (the turning angle), and then extend this into the
curvature of the curve, then this is the most natural notion of curvature for surfaces,
since it is a direct translation of the relevant notions in terms of curves to surfaces.
For computational purposes, this is not the most convenient formula, but this
is probably one of the more intuitive ways to think about what Gauss curvature is.

3.1. Exercises.

18. Find the total curvature of a sphere with radius r. What is the Gauss

4. Gauss-Bonnet Theorem (Exact exerpt from Creative Visualization

I do not address the Gauss-Bonnet theorem in any of the labs, but after the
students have completed the last lab, I would look at the cone point version of the
Gauss-Bonnet theorem. From here, the definition for Gauss curvature on a smooth
surface should make sense intuitively.

r θ

Figure 4. The angle defect corresponds to total curvature.

The basic idea can be seen using circles and spheres. Consider a circle of radius
r centered at the cone point of a cone with angle defect θ, as in Figure 4. In the
plane, this circle will have curvature κ = 1r . Since the local geometry on the cone
is Euclidean away from the cone point, the geodesic curvature for this circle as a
curve on the cone must be the same. That is, κg = 1r . What is different about this
circle and a circle in the plane with the same radius, is that the circle on the cone
has a smaller circumference. In fact, the difference must be θr.
We can now compute the total geodesic curvature.
1 1
(9) κg ds = ds = (2πr − θr) = 2π − θ.
C r C r
Since curvature measures the rate of rotation of the tangent vector, it should make
sense to students that the total rotation for a simple closed curve in the plane must
always be 2π. Since any small deformation of the circle essentially takes place in
the plane, it should also make sense that the total rotation for a simple closed curve
around the cone point will always be 2π minus the angle defect. In any case, the
formulation of the Gauss-Bonnet theorem should seem natural.
Comparing Equation (9) to the Gauss-Bonnet theorem,
(10) κg ds = 2π − K dA,
it’s obvious that the angle defect corresponds with the total curvature K dA. In
fact, I think it makes perfect sense to motivate the definition of the Gauss curvature
K in terms of this formula. I might start out by doing the following.
Consider a sphere tangent to a cone, as shown in Figure 5. The geodesic
curvature for the circle of tangency will be the same on both surfaces. Therefore,
the total curvature for the regions contained by the circle on both surfaces should
be the same. We can then require that the Gauss curvature be an infinitesimal

φ R

Figure 5. The circle of tangency will have the same geodesic cur-
vature on both surfaces.

version of the total curvature and that it be constant on the sphere. That is,
(11) θ= K dA = K dA = KR2 θ,
(12) K= .
I think the actual computation is a bit tricky, but there may be a simpler way. In
any case, the area integral is
Z Z 2π Z φ
(13) dA = R2 sin p dpdt = R2 (1 − cos φ)2π,
D 0 0
where the parameters p and t are the phi and theta from spherical coordinates. To
express this expression in terms of θ, note that the circumference of the circle C is
2πr−θr on the cone. If the radius of this circle in space is ρ, then this circumference
is also 2πρ. Since R sin φ = ρ, we have that
(14) 2πr − θr = 2πR sin φ,
(15) θ = 2π(1 − sin φ).
Now, tan φ = R, so
cos φ
(16) θ = 2π(1 − sin φ) = 2π(1 − cos φ).
sin φ
Equations (13) and (16) establish equation (11).

5. Defining Gauss curvature

The Gauss curvature at a point on a surface is generally defined to be the
product of the two principle curvatures. Very roughly, this can be described as
follows. At a point on a surface in space, we can choose one of two possible unit
vectors normal to the surface (one normal is as good as the other). For every plane
that contains the point and the normal vector, the intersection of the plane and
the surface is a curve that has a curvature within that plane. If the curve bends
towards the normal vector, we will associate a positive sign with this curvature,
and if the curve bends away from the normal vector, we will associate a negative
sign. In other words, if the normal vector chosen points upwards and the curve
is concave up, then the curvature will be positive. These signed curvatures are

called normal curvatures. The maximum normal curvature (most positive) and
the minimum normal curvature (most negative) are the principle curvature. The
choice of normal vector and the which curvatures are positive is quite arbitrary,
but of significance is that the Gauss curvature of a bowl-shaped surface will always
be positive, and the Gauss curvature of a saddle-shaped surface will always be
negative, regardless of how the choices were made.
That this is as simple a definition for the curvature of a surface as could be
expected is one thing, and that it works incredibly well is made very clear in any
book on the subject. What is not so clear is why anyone would consider the
definition in the first place and what it really represents. What we will do here is
to show that this definition is a rigorous implementation of the definition we have
already described and how the previous definition leads to this one.
A lot of insight into what Gauss curvature is can be obtained by examining
the connection between the intuitive definition given earlier and the one involving
the principle curvatures. We will start with the intuitive definition of the Gauss
curvature at a point. This was expressed in Equation (8). The biggest problem
with this formula is that it does not say how ∆A goes to zero. Different values for
K can be obtained, if there are no restrictions. We will want to choose the most
boring limit possible. Sufficient for our purposes, we can take a small sphere in
space centered at the point P . Each point on the surface contained in the sphere
(this region has area ∆A) has a normal vector, and thus an image under the Gauss
map. These Gauss map images will determine a region on the unit sphere having a
well-defined area (if the surface is sufficiently smooth, which we will always assume
is the case), and this area is ∆Θ. We can then take the limit as the radius of
the sphere about P goes to zero. If the surface is sufficiently smooth, this limit
should exist, and we will assume that all surfaces under consideration are sufficiently
smooth, unless otherwise noted.
As it stands, this definition is non-trivial to apply directly, so we will formulate
an alternative in terms of derivatives. For one of the small regions on the surface
about P contained in the small sphere, the region should be roughly disk shaped,
and we can imagine it as consisting of a bouquet of radial arcs. The normal vector
at P will determine one point on the unit sphere under the Gauss map. The normal
vectors from the points on the radial arcs will determine arcs on the unit sphere
also under the Gauss map. Of relevance is the fact that the length of the arc on
the unit sphere under the Gauss map divided by the length of the radial arc on the
surface will limit on the curvature of the radial arc at P . Also of relevance is the
observation that the area ∆Θ is determined by the extent of these arcs. It would
seem reasonable to assume, therefore, that the limit of Equation (8) will depend
only on the curvatures of arcs through P . The one important assumption that we
will make in the formulation of the alternative definition of the Gauss curvature is
that it depends only on information provided by first and second derivatives.
Suppose we have a point P on a surface in space, and we will define the Gauss
curvature of the surface at P . The curvature is independent of the surface’s position
and orientation in space, so we will assume that the point P is at the origin and the
surface is tangent to the xy-plane. In a region about P , we will assume that the
surface can be described as the graph of a function f (x, y), and since the curvature
depends only on first and second derivatives, we will only consider surfaces that
ensure that f has continuous first and second derivatives (i.e., f is C 2 ). Since we

will only use information from the first and second derivatives at P , we can also
assume that f is quadratic, f (0, 0) = 0, fx (0, 0) = 0, and fy (0, 0) = 0. Therefore,
f must take the form
(17) f (x, y) = ax2 + bxy + cy 2 .
It will be convenient to use vector notation and terminology, so we will work with
the parametrization
(18) x(x, y) = x, y, ax2 + bxy + cy 2 .

The first (partial) derivatives, dx dx

dx = [ 1, 0, 2ax + by ] and dy = [ 0, 1, bx + 2cy ], are
vectors tangent to the surface, and at each point, these two vectors span a plane
tangent to the surface at that point. All vectors tangent to the surface at this point
will lie in this plane. That dx dx
dx (0, 0) = [ 1, 0, 0 ] and dy (0, 0) = [ 0, 1, 0 ] reiterate the
fact that the surface is tangent to the xy-plane.
The unit normal vector at each point of the surface must, essentially by def-
inition, be perpendicular to the tangent plane. It must be perpendicular to both
tangent vectors, and so can be obtained from the cross product.
[ −2ax − by, −bx − 2cy, 1 ]
(19) n= p
b x + 4bcxy + 4c2 y 2 + 4a2 x2 + 4abxy + b2 y 2 + 1
2 2

We are interested in how much the unit normal vector varies over a small piece of
the surface about the origin, and then how it shrinks to zero. The unit vector n
ranges over a region on the unit sphere, and what we want is essentially a derivative
of n over two dimensions. The appropriate object is a linear function associated
with a tangent plane. In particular the plane determined by the partial derivatives
of n. These partial derivatives are a bit messy, but we only need to know them at
(0, 0). The partial with respect to x is
dn b2 x2 + 4bcxy + 4c2 y 2 + 4a2 x2 + 4abxy + b2 y 2 + 1 [ −2a, −b, 0 ]
(20) = p 2
dx b2 x2 + 4bcxy + 4c2 y 2 + 4a2 x2 + 4abxy + b2 y 2 + 1
[ −2ax − by, −bx − 2cy, 1 ] ( 12 )(2b2 x + 4bcy + 8a2 x + 4aby)
− p 3 ,
b2 x2 + 4bcxy + 4c2 y 2 + 4a2 x2 + 4abxy + b2 y 2 + 1
which at (0, 0) is
(21) (0, 0) = [ −2a, −b, 0 ] .
(22) (0, 0) = [ −b, −2c, 0 ] .
The partial derivatives dn dn
dx and dy describe the linear approximation to how the unit
vector n varies near the origin. For a short distance  in the x-direction, therefore,
the unit normal vector moves approximately a distance [ −2a, −b, 0 ], and for
the same distance in the y-direction, it moves approximately [ −b, −2c, 0 ]. This
completely determines the linear approximation, so an -square on the xy-plane
corresponds to a “parallelogram” on the unit sphere under the Gauss map spanned

by these vectors. The area of this parallelogram is given by the cross product

i j k
−2a −b
−2a −b 0 = −b −2c
−b −2c 0
This determinant describes how areas under the Gauss map compare to areas in
the domain near (0, 0), and so this should define the Gauss curvature.
Note that the matrix
−2a −b
−b −2c
completely describes the linear approximation to the normal vector at (0, 0). As a
point passes through the origin, this matrix describes how the corresponding normal
vector is changing at the origin. For example, if we move in the direction of [ 1, 0 ]
from the origin, then the direction of the unit normal to the surface is changing at
the following (vector) rate.
−2a −b 1 −2a
(25) =
−b −2c 0 −b
This is almost, but not quite, a curvature. Specifically, if we considered the curve
on the surface above the x-axis, we would have a parabola, z = −2ax, and this
curve corresponds to the direction determined by the vector [ 1, 0 ]. The curvature
for this curve is 2a at the origin, but this comes from the rotation of the normal to
the curve in the xz-plane. The normal to the surface may rotate in the y-direction
as well, as indicated by the component −b.
5.1. Exercises.
19. Determine the magnitude of the tangent vector dx dx (x, 0), and then differ-
entiate with respect to x to verify the claim made above.
20. Verify the claims above that the curvature at the origin of the curve above
or below the x-axis has curvature 2a at the origin by computing dT ds where T =
dx (x,0)
k dx
dx (x,0)k

6. Intrinsic aspects of the Gauss curvature

In the discussion leading to the definition of the Gauss curvature, we stumbled
across a surprising relationship. If we remove a θ-wedge to form a cone, then the
total curvature of the vertex of that cone is also θ. Imagine that you are a 2-
dimensional person living on the surface of the cone, who is completely unaware of
a third dimension. Without a concept of a third dimension, the sharpness of the
vertex would be completely outside of your experiences, just as concepts requiring a
fourth dimension lie outside of our 3-dimensional minds. You would perhaps notice
that circles around the vertex have smaller circumferences than circles that did not
contain the vertex. From this you might be able to see that the vertex of the cone
only had radian measure 2π − θ around it, while there are 2π radians around every
other point. We will say that the sharpness of the cone is extrinsic (seen from the
outside), and the fact that a θ-wedge is missing is intrinsic (seen from the inside).
This illustrates an interesting difference between curves and surfaces. The total
curvature of a curve is purely extrinsic, since a 1-dimensional person living in a curve

would be totally unaware of it. The total curvature of a surface is both extrinsic
and intrinsic. It is equally measurable from outside the surface and from within it.
This idea, which originates with Descartes and Gauss, is expoited by Riemann and
others, in particular Einstein, to show that while curvature is basically an extrinsic
concept, it is possible to talk about the curvature of our space without there being
more dimensions.
We can illustrate some aspects of this by looking at the geometry of geodesics
on a cone.

Intrinsic Curvature

1. Parallel vectors
Understanding what it means for two vectors in the plane to be parallel is
hardly an issue. It is even difficult to explain the concept, since the concept of
parallel vectors seems so obvious. Imagine taking a vector in the plane based at the
origin. If you were to move it to some other point without altering its direction,
then few would argue with the claim that the result is a vector parallel to the
original. At issue, however, is what it means for the direction to remain the same
and how you would know. For vectors tangent to a sphere, on the other hand,
it is impossible in most cases to move the vector in a way that keeps the vector
tangent to the sphere and not change its direction. Here the concept of direction is
taken from the direction of a vector in Euclidean 3-space, which most of us would
think is intuitively clear. If we were to restrict our attention to the surface of the
sphere, and make no reference to an ambient space the issue is much less clear. If
we were two-dimensional creatures living on the sphere with no awareness of an
ambient space, we probably would have some notion of moving an object without
rotating it. This must also not be consistent with the notion of direction in 3-space
mentioned above. One possible basis for such a notion is the concept of parallel
Consider the three vectors shown in Figure 1. The angle between each vector
and the straight line is the same angle θ. This is consistent with our intuitive
notion that all three vectors are parallel. We can phrase this as a trivial axiom: If
we move a vector along a straight line and keep the angle between the vector and
line constant, then the resulting vector is parallel to the original.

Figure 1. Moving a vector without changing its direction.


If you were a 2-dimensional creature on the sphere, then a great circle would
be the object for you corresponding to a straight line. This would be a curve that
turns neither to the right nor left. In other words, it does not change direction
(as far as you are concerned). If you were to move a vector along a great circle
at a constant angle, then you must conclude that the vector did not rotate, and
the resultant vector is therefore parallel to the original. If this sphere sat in a
3-dimensional Euclidean space (and there is no real reason to assume that it did),
then a 3-dimensional Euclidean creature would see this differently. One of the
fundamental notions of the study of manifolds is that the 3-dimensional Euclidean
view is not necessarily the correct one. It is simply one of many.
One very important aspect of this notion of parallel vectors on the sphere is
that it is dependent on path. We can see this in the following example. Figure
2 shows parts of four great circles. One is the equator, two meet the equator at
right angles, and a fourth intersects one of the vertical great circles at a right angle.
This forms a quadrilateral with three right angles. We know that the angle sum
of this quadrilateral must be greater than 2π, so the fourth angle must be larger
than a right angle. In fact, if this sphere has radius 1, then the difference between
this angle and a right angle must be equal to the area of the quadrilateral. The
quadrilateral is shown in a flattened version to give a different view in Figure 3.
We will perform a parallel transport from vertex A to vertex C two ways. First
from A to B to C, and then from A to D to C. Suppose the vector under question
is tangent to side AD at A and points towards D. It is perpendicular to side AB,
so as we parallel transport it to B, it maintain this right angle. This results in a
vector tangent to BC at B. Parallel transport along BC entails maintaining a zero
angle, and so the resultant vector at vertex C will still be tangent to BC. On the
other hand, if we parallel transport to vertex D first, we get a vector tangent to AD
at vertex D. This is perpendicular to side DC, so this right angle is maintained as
we slide it upwards to vertex C. The result is a vector at C that is perpendicular
to DC. We have, therefore, two vectors at C that have equal claim to being
parallel to the original vector at A. With a Euclidean bias, we might conclude that
this contradiction proves that parallel transport is a flawed concept. From a more
enlightened manifold view, however, we would just say that parallel transport is
independent of path in Euclidean spaces.

Figure 2. A quadrilateral on the sphere.




Figure 3. The quadrilateral flattened.

Even from a fundamentalist Euclidean point of view, the notion of parallel

transport has some value. A few simple calculations show that the angle marked θ
in Figure 3 is equal to the difference between the angle sum of the quadrilateral and
2π. This is equal to the total curvature contained within the quadrilateral. This
provides a way of computing total curvature, and is basically equivalent to using
the angle sum, the turning angles, or the total rotation of a tangent vector.
We are headed towards a way of computing (actually defining) curvature us-
ing derivatives both extrinsically and intrisically. We will be imposing coordinate
systems on surfaces, which very roughly, means imposing a grid system (ala graph
paper) on the surface. In other words, we will be breaking the surface into tiny
quadrilaterals, and the two parallel transported vectors just mentioned will have
natural intepretations corresponding to second and third derivatives.


1. Introduction
The point of this book is to gain some understanding of the geometry of space,
in particular, a space that we could live in. With that in mind, we will want
to assume that the functions we use to describe these spaces are basically well-
behaved. This chapter will explain what we mean by that. In order to gain an
understanding of the curvature of three-dimensional space, we will first explore the
curvature of lower-dimensional spaces. This approach should seem reasonable in
that the lower-dimensional spaces are simpler, but that they also require concepts
that generalize to the three-dimensional case. A less obvious class of relevant spaces
will also provide us with significant insight into the geometry of all the spaces just
mentioned. These are spaces with isolated singularities. These singularities will
include sharp corners in graphs and cone points on surfaces. The functions we
consider, therefore, will have these kinds of characteristics.
One basic principle that we will try to exploit is that linear functions are
easier to understand than functions in general, and that straight lines are easier to
understand than general curves. Furthermore, finite and discrete objects are easier
to understand than are infinite and continuous ones. The general aim of this book
is to use what we know about the easier to understand objects to gain some insight
in the harder to understand ones. This chapter takes this approach to the study of
The graph of a function of two variables is shown in Figure 1. This graph was
produced by a software package called Maple, and we can simplistically describe the
process that Maple used as follows. A 25 × 25 grid is imposed on the [ −0.1, 0.1 ] ×
[ −0.1, 0.1 ] portion of the domain, and function values are computed at each of
the lattice points. This provides Maple with the coordinates for 676 points on the
surface (A 25 × 25 array of squares makes a 26 × 26 array of lattice points). For
each line segment in the grid, Maple draws a line segment between the appropriate
points on the surface, essentially projecting the grid lines onto the surface.
Looking at the surface in Figure 1, it appears that the Maple graph is a good
representation of the surface. The fact that we are looking at a set of straight line
segments is not overly obtrusive, and it is easy to accept that this is a representation
of a nicely curving surface. It is not inconceivable, therefore, that the line segments
themselves contain useful information about the underlying surface.

2. Piecewise-Linear Approximations for Functions of One Variable

We are not interested in studying functions in general, but how we can use
functions to describe and understand geometry. We will take a very contrained
view of piecewise-linear approximations, therefore. While we will use functions

Figure 1. The graph of a function of two variables z = f (x, y).

that are defined on the entire real line, at any one time, we will generally only be
interested in that function over some closed interval. If we were going to graph a
function, for example, we would typically only graph part of it. So we will define a
piecewise-linear approximation for real-valued functions over closed intervals.

Definition 3. Let f : [ a, b ] → R. For some positive integer n, we divide

[ a, b ] into n equal subintervals, [ x1 , x2 ] , [ x2 , x3 ] , . . . , [ xn , xn+1 ]. This collection
of subintervals will be called a partition, the interval [ xi , xi+1 ] will be called the i-
th subinterval, and the common length of the subintervals ∆x = xi+1 − xi = b−a n
will be called the mesh. For each subinterval, the line segment from (xi , f (xi ))
to (xi+1 , f (xi+1 )) will be called the i-th segment. The piecewise-linear (PL)
approximation of f is the function f : [ a, b ] → R whose graph coincides with
the collection of all n segments.
Example 1. Consider the function f (x) = x2 . The PL approximation of f
with n = 4 has lattice { −2, −1, 0, 1, 2 } and mesh ∆x = 1. The four segments and
the graph of f are shown in Figure 2. We generally will not make specific use of a
formula for f , but it is possible to come up with one. In this case, for example, we

 −3x − 2 for x ∈ [ −2, −1 ] ,

−x for x ∈ [ −1, 0 ] ,
(26) f (x) =

 x for x ∈ [ 0, 1 ] ,

3x − 2 for x ∈ [ 1, 2 ] .

Figure 2. The PL approximation of f (x) = x2 with a mesh of 1.

The PL approximation of f with n = 10 gives a more accurate looking graph,

as shown in Figure 3.

Figure 3. The PL approximation of f (x) = x2 with a mesh of 0.4.

As can be seen in Example 1, increasing n (or equivalently decreasing ∆x)

makes f more closely resemble f . We will want to make the assumption that
the difference between the two functions can be made arbitrarily small. This is not
necessarily the case, as can be seen in the next example, but we will be able to make
assuptions of this type by restricting our attention to sufficiently nice functions as
will be explored throughout this chapter.
Example 2. As can be seen in Figure 4, the PL approximation
some difficulty in following the graph of f (x) = x sin x1 near x = 0, even with
n = 100. There are an infinite number of oscillations in the graph in any interval
about x = 0, so there is no way that a finite number of segments can portray this
to any great degree of satisfaction. As mentioned, we will seek to avoid functions
with characteristics such as these.

3. Uniform Continuity
As we go through this chapter, we will be laying out the conditions we expect
our functions to satisfy. Our underlying goal is to build an understanding of smooth
geometry, so at the very least, we might expect our functions to be continuous.
That we will be making use of PL approximations also speaks to the need for the


Figure 4. The graph of x sin x .

assumption of continuity. We will make important use of non-continuous functions,

but the discontinuities will be isolated and simple in nature.
We will be making occasional reference to the definition of continuity, so let us
state it here.
Definition 4. For a function f : [ a, b ] → Rn and an x ∈ [ a, b ], f is con-
tinuous at x if for every  > 0, there is a δ > 0 such that whenever |∆x| < δ (and
x + ∆x ∈ [ a, b ]),
(27) | f (x + ∆x) − f (x) | < .
For n > 1, we will take | f (x + dx) − f (x) | to mean the magnitude of this difference
as vectors. If this definition is satisfied at each x ∈ [ a, b ], we will say that f is
continuous on [ a, b ].
Note that if f is continuous on an interval, this definition allows a different δ
for each x. This will be more than a little inconvenient for us, so we would like a
slightly stronger notion of continuity. This will be the following.
Definition 5. For a function f : [ a, b ] → Rn , f is said to be uniformly
continuous on [ a, b ] if for every  > 0, there is a δ > 0 such that whenever
|∆x| < δ (and x + ∆x ∈ [ a, b ]),
(28) | f (x + ∆x) − f (x) | < .
The main point of this definition is that the same δ works for any x. From our
experience in calculus, we are familiar with a wide range of continuous functions
and a few non-continuous ones. We may not, however, be as comfortable with
determining which functions are uniformly continuous. It turns out that this will

not be a concern, since we will almost exclusively be interested in functions over

some closed interval. This is a result of the following theorem about which more
details can be found in a book on topology or real analysis.
Theorem 4. Let f : A ⊂ Rm → Rn be a continuous function. If A is closed
and bounded, then f is uniformly continuous.
Uniform continuity is actually less than we desire. The function of Example 2
is continuous everywhere, and therefore by Theorem 4, is uniformly continuous over
any closed interval about x = 0. So while no graph can capture the oscillations
present near x = 0, since the magnitudes of the oscillations become very small,
the graph can stay close to the sampling points. Requiring uniform continuity,
therefore, will not exclude all of the functions we would want to exclude. It should
be emphasized that we want uniform continuity for use in proofs, not necessarily
to exclude bad functions.

4. Differentiation in One Variable

All of our work with differentiation we extend the basic notion of the derivative
studied in calculus. We will begin with the definition.
Definition 6. For the function f : [ a, b ] → R, and for any x ∈ ( a, b ), let
f 0 (x) be defined by
f (x + ∆x) − f (x)
(29) f 0 (x) = lim ,
∆x→0 ∆x
if the limit exists. For the endpoints a and b, the derivative is defined by
f (a + ∆x) − f (a)
(30) f 0 (a) = lim ,
∆x→0 + ∆x
f (b + ∆x) − f (b)
(31) and f 0 (b) = lim ,
∆x→0− ∆x
where the first involves a limit from the right and the second a limit from the
left. These will sometimes be specifically referred to as right- and left-sided
derivatives. At each x for which the derivative exists (including a and b), we will
say that f is differentiable at x. If f is differentiable at every point of [ a, b ], we
will say that f is differentiable on [ a, b ].
At a particular value of x, the number f 0 (x) is typically associated with the
slope of a tangent line. Let us look a bit at what that means. If the limit in (29)
exists, then for any  > 0, there is a δ > 0 such that as long as | ∆x | < δ,

(32) f (x) − f (x + ∆x) − f (x) < .
In other words, for any epsilon, there is an interval (−δ, δ) such that
(33) | f 0 (x)dx + f (x) − f (x + ∆x) | < ∆x.
for any ∆x in this interval. This tells us that the linear function t(∆x) = f 0 (x)∆x+
f (x) is a reasonable approximation to the function F (∆x) = f (x + ∆x) for a fixed
value of x. Since  is arbitrary, no other linear function will fit as well. If f
is differentiable at x, therefore, then there is a unique tangent line that fits the
curve better than any other line. Increasing or decreasing the slope, as in Figure
5 results in a line that does not fit as well, so intuitively, we see a certain amount

of symmetry. If the limit in the derivative definition does not exist, then a line

Figure 5. Lines with slopes different from the derivative do not

fit as well.

cannot be singled out as fitting better than the rest. In Figure 6, we see a point of
non-differentiability where a single line cannot fit the curve as we are accustomed
to seeing in a tangent line on both sides of the point, and the most “symmetric” line
does not fit the curve very well at all. We do see lines that look tangent on one side
of the non-differentiable point or the other, so the function may be differentiable
from the right or left at this point.

Figure 6. At a non-differentiable point, there is no single best

linear approximation, and no single line fits very well.

The tangent line, if it exists, is closely associated with a linear function. Since
it is the slope of this function that is most important to us, we will often talk about
this linear function in terms of a coordinate system whose origin is at the point
(x, f (x)).
Definition 7. If f is differentiable at the point x, then the differential of f
at x is the linear function
(34) df (dx) = f 0 (x)dx.

Note that the variable names in this coordinate system are dx and dy and that the
expression f 0 (x) is a constant.
Note that for the differential, the origin for the dxdy-coordinate need not be
thought of as being at the point (x, f (x)) as shown in Figure 7. Compare this
with the notion of putting the base of a tangent vector at a relevant point on the
graph. In fact, the differential (as well as the tangent line) can be identified with
the collection of all possible tangent vectors. As a result, where a vector has both
a direction and a magnitude, the differential has only direction. We will use the
differential, therefore, as a way of generalizing slope to higher dimensions.
dy df


Figure 7. We can think of the origin of the dxdy-coordinate sys-

tem as being based at the relevant point on the curve, but we don’t
have to.

Example 3. Consider the function f (x) = x2 + 1. Its derivative is f 0 (x) = 2x,

and f 0 (0) = 0. The slope of the tangent line at the point (0, 1) is therefore 0, and
the equation of the tangent line is t(x) = 0x + 1 (Note that the differential at this
point is df = 0dx). For  = .1 in (33), the graph of f must lie between the lines
y = ±.1x + 1 over some interval about x = 0. This is shown in Figure 8. No matter
how small we make , there will be some inteval about x = 0 in which the parabola
lies between the lines y = x + 1. This is a geometric description of our concept of
a tangent line.

Figure 8. For the function f (x) = x2 +1, the line y = 1 is tangent

to the curve at x = 0. Here the graph of f lies between the lines
y = ±.1x + 1 over some interval close to x = 0.

Example 4. Differentiable functions allow graph behavior that lie beyond what
we would like to consider. For example, note that the function g(x) = −x2 + 1 has
the same tangent line at x = 0 as the function f just mentioned. It follows that any
function that lies between f and g must also have the same tangent line. Consider
the following function h defined below and graphed in Figure 9 with f , g, and t.
x2 sin x12 , x 6= 0,
(35) h(x) =
0, x = 0.
Clearly h lies between any pair of lines y = x + 1 over some interval, since both

Figure 9. The graph of h lies between the graphs of f and g, so

it has the same tangent at x = 0.

f and g do. It appears in the graph depicted in Figure 9, however, that the slopes
of tangents to h near x = 0 can have high-magnitude slopes. This is confirmed by
a computation of the derivative. The derivative of h away from zero can be found
using the basic techniques of calculus, so the derivative of h must be
0 2x sin x12 − x2 cos x12 , x 6= 0,
(36) h (x) =
0, x = 0,

and it is clear that h0 (x) takes large values arbitraily close to x = 0. The slopes of
the tangent lines to h vary so wildly that h0 is not even continuous at x = 0. The
point of this book is to study the relationship between curvature and the geometry
of curves and surfaces and to understand what it might mean for the universe in
which we live to have curvature. As a result, we are most interested in objects that
curve very gently. As this last example illustrates, differentiability alone does not
guarantee the gentle curving we desire. The oscillations that are seen in the graph
of Figure 9 are not really the problem. The problem is that the oscillations become
more wild as we approach x = 0, and this is sufficient to make the derivative of f 0
not continuous. This particular example can be eliminated, of course, if we only
considered functions with continuous derivatives.
As we have just seen in Example 4, a function can be differentiable with a
non-continuous derivative. We do not want to consider functions that are this wild,
so we will require that our functions have continuous derivatives unless specifically
noted otherwise. Such functions are called continuously differentiable or C 1 .

Definition 8. For a function f : [ a, b ] → R, if f is differentiable on [ a, b ]

and f 0 is continuous on [ a, b ], then f is said to be continuously differentiable on
[ a, b ]. We will also use the notation C 1 for continuously differentiable functions.
If the second derivative is also continuous (more specifically, if f 0 is continuously
differentiable), then f is C 2 . Similarly, we may speak of functions that are C n for
any positive integer n, or even C ∞ (f and all of its derivatives are continuously
differentiable). The notation C 0 is sometimes used to describe continuous functions.
It is dangerous to place too much weight on what a differentiable or a contin-
uously differentiable function might look like, but in general, we can think of a C 1
function as looking smoother than a function that was differentiable, but not C 1 .
A C 2 function would look smoother still, but the differences become much more
subtle as we consider higher levels of continuous differentiability.
For a function f to be differentiable at x, we consider the slopes of secant lines.
We can imagine ourselves at the point (x, f (x)) seeing an object approaching us
along the graph. The expression
f (x + dx) − f (x)
describes the observed direction we look in to see the object. For f to be differen-
tiable, we would expect this direction to have a limit, and this limit would agree
with the direction for an object approaching from the opposite direction. The limit
would correspond to the directions determined by the tangent to the curve. If the
object were a car with its headlights on, the direction the headlights pointed in
would correspond to the tangent line at the point of the graph occupied by the car.
These directions correspond to the values f 0 (x + dx). For the function h described
above, the direction of the headlights would swing wildly back and forth between
directions perpendicular to the tangent. If f is continuously differentiable, then
these values must limit on f 0 (x). In other words, the direction the headlights point
must approach the direction of the tangent line, and they would always be pointing
in your general direction.

5. Derivatives and PL Approximations

Given a PL approximation of a function f , the segments are each a portion
of a secant line. At each individual lattice point, the slope of the secants through
(xi , f (xi )) and (xi + ∆x, f (xi + ∆x)) limits on the derivative as ∆x → 0. We would
expect, therefore, that for very small values of the mesh, the slopes of the segments
will very closely approximate the derivatives at the lattice points. We can establish
this easily with reference to the Mean Value Theorem, which we state here.
Mean Value Theorem. If f is continuous on [ a, b ] and differentiable on
( a, b ), then there is a point c ∈ ( a, b ) such that
f (b) − f (a)
(38) f 0 (c) = .
Let f be a continuously differentiable function on [ a, b ], and let f be the
PL-approximation of f with mesh ∆x and lattice points { x1 , x2 , . . . , xn+1 }. On
any particular segment, the Mean Value Theorem states that there must be a
ci ∈ ( xi , xi+1 ) such that
f (xi+1 ) − f (xi )
(39) f 0 (ci ) = .

Since the function f 0 is continuous, it is uniformly continuous, so given any  > 0,

there is a δ > 0 such that if ∆x < δ, then | f 0 (ci ) − f 0 (xi ) | <  for all i. It follows
that, for all i,

0 f (xi+1 ) − f (xi )
f (xi ) − ∆x < .

We can conclude, therefore, that the slopes of the segments of f are good approxi-
mations of the derivatives of f at the lattice points, and that the error can be made
arbitrarily small by reducing the mesh.
Definition 9. We will define the PL differential of f at xi (and, if it is
convenient to have done so, at any point in [ xi , xi+1 )) to be
(41) Df (xi ) = f (xi+1 ) − f (xi ).

Df (xi )


Figure 10. The vector Df .

Note that
Df (xi )
(42) lim = f 0 (xi ),
∆x→0 ∆x
where ∆x = b−a n for positive integers n, and n → ∞. If the function f were
constant, its graph would be a horizontal straight line, and the PL differential
would be 0. As shown in Figure 10, the PL differential measures the increase in f
(or the decrease) as we move from one lattice point to the next.
We can talk about the function f 00 in a similar way. If f is C 2 , then for each
i, there is a c0i ∈ ( xi , xi+1 ) such that
f 0 (xi+1 ) − f 0 (xi )
(43) f 00 (c0i ) = .
Since f 00 is continuous, there is a δ 0 > 0 smaller than the δ mentioned above such
that if ∆x < δ 0 ,
(44) | f 00 (c0i ) − f 00 (xi ) | < .

(45) f (xi ) − f (xi+1 ) − f (xi ) < ,


(46) f (xi+1 ) − f (xi+2 ) − f (xi+1 ) < ,

we can conclude that

f (xi+1 )−f (xi )
− f (xi+2 )−f (xi+1 )
00 ∆x ∆x
(47) f (xi ) −


Df (xi+1 ) − Df (xi )
= f 00 (xi ) − < 3.

With this in mind, we will make the following definition.
Definition 10. The second PL differential of f is defined as
(48) D2 f (xi ) = Df (xi+1 ) − Df (xi ).
Note that
D2 f (xi )
(49) lim = f 00 (xi )
∆x→0 ∆x2
It should be noted that D2 f (xi ) is probably a better approximation of f 00 (xi+1 )
than it is of f 00 (xi ), but this formulation will be more convenient for us. In Figure

Df (xi )
Df (xi+1 )

−Df (xi )

Figure 11. The distance D2 f (xi ) is equal to the sum of the dis-
tances Df (xi+1 ) and −Df (xi ).

13, if the graph of f were a straight line, then we would expect Df to be constant,
and Df (xi ) and Df (xi+1 ) would be the same. In this case, the graph of f would
continue to the point marked A. Instead, the graph of f proceeds to the point
marked B. The difference between A and B is the quantity D2 f (xi ). Therefore,
D2 f and f 00 measure how much the graph is not a straight line, and so they are
measures of curvature in some way. They measure the deviation from straightness
in the vertical direction, however. These values will change if the graph is rotated,
for example, so they are not convenient quantities to use to describe a curve’s shape.
They are easy to compute, and they contain the information necessary to describe
a curve’s curvature, and they will be of use to us.

6. Parametrizations of Curves

(xi+1 , f (xi+1 ))
Df (xi )

(xi , f (xi ))

Figure 12. The vector Df .


Df (xi ) Df (xi+1 )

Df (xi ) D2 f (x1 )

Figure 13. The vector D2 f .


7. Functions of Two Variables

Earlier in the chapter, we discussed briefly the graph of a function of two
variables (see Figure 1). The computer representation of the graph consists of
a collection of line segments. These segments will play a role in our study of
functions of two variables as the segments of a PL approximation did with functions
of one variable. The segments form a grid on the surface breaking the surface
into quadrilaterals that we will call grid parallelograms. In general, these grid
parallograms are not true parallograms, which, among other things, always lie in a
plane. In fact, while it would seem natural to use the segments directly to define a
PL approximation to a surface, the four vertices of a grid parallelogram generally
will not lie in a plane. Since any set of three points is always coplanar, we can, in
some sense, fold each grid parallelogram along a diagonal to fit the segments. We
can, therefore, approximate the graph of a function of two variables with a collection
of flat triangular disks. From this we can naturally find a piecewise linear function.
Definition 11. For the function f : [ a, b ] × [ c, d ] → R, we can define the
piecewise-linear (PL) approximation as follows. Given positive integers m and
n, we can break [ a, b ] into m equal subintervals with lattice points { x1 , x2 , . . . , xm+1 },
and we can break [ c, d ] into n equal subintervals with lattice points { y1 , y2 , . . . , yn+1 }.
From these, we can divide the rectangle [ a, b ] × [ c, d ] into mn equal rectangles
[ xi , xi+1 ] × [ yj , yj+1 ] with width ∆x and height ∆y. We will say that the mesh is
∆x × ∆y. The set of rectangles is called the partition, and the points (xi , yj )
are the lattice points. The (i, j)-th rectangle has vertices (xi , yj ), (xi+1 , yj ),
(xi+1 , yj+1 ), and (xi , yj+1 ). There is a flat (planar) triangular disk with vertices
(xi , yj , f (xi , yj )), (xi+1 , yj , f (xi+1 , yj )), and (xi , yj+1 ), f (xi , yj+1 ), and there is an-
other flat triangular disk with vertices (xi+1 , yj , f (xi+1 , yj )), (xi+1 , yj+1 , f (xi+1 , yj+1 )),
and (xi , yj+1 ), f (xi , yj+1 ). Together these form the (i, j)-th grid parallelogram.
The PL approximation of f is the function f : [ a, b ] × [ c, d ] → R whose graph
consists of all of the grid parallelograms.
In practice, we will use only the grid segments from the PL approximation,
and the pair of triangular disks that make up each grid parallelogram along with
the diagonal between them will be of only secondary importance. Using the no-
tation | (x1 , y1 ) − (x2 , y2 ) | for the distance between the two points, we can define
continuity for a function of two variables as follows.
Definition 12. For f : [ a, b ] × [ c, d ] → R, f is continuous at (x, y) if for
every  > 0, there is a δ > 0 such that whenever | (x, y) − (x + ∆x, y + ∆y) | < δ,
we have
(50) | f (x + ∆x, y + ∆y) − f (x, y) | < .
If the existence of δ is independent of the point (x, y), then f is said to be uniformly
continuous. By Theorem 4 we see that f is uniformly continuous if it is continuous
on [ a, b ] × [ c, d ].

8. Differentiability for Functions of Two Variables

Our notion of differentiability for functions of more than one variable will be
based on the concept of a partial derivative. Given a function f in several variables,
f : R3 → R for example, we can take the derivative of f with respect to one of the
variables by holding the others constant.

Consider the function f (x, y) = 3xy + x4 . Taking y to be constant, we can

differentiate with respect to x to obtain the expression 3y + 2x. We will use the
(51) = fx = fx (x, y) = 3y + 4x3
for the partial derivative with respect to x. It is common to use curly ∂’s in the
notation for partial derivatives, but since we will be using partial derivatives almost
exclusively, there is no significant advantage to making a distinction between partial
derivatives and regular ones. Of course the partial respect to y would be written as
(52) = fy = fy (x, y) = 3x.
A small portion of the graph of this function is shown in Figure 14. As we have

Figure 14. Graph of f (x, y) = 3xy + x4 .

discussed, this depiction of the graph imposes gridlines on the surface. What we
see are a collection of straight line segments each an edge shared by two grid par-
allelograms. Half of the gridlines correspond to fixed values of y and the other half
to fixed values of x. For example, if we were to fix y to a value of zero, this would
single out those points lying on a curve corresponding to the function values f (x, 0).
The points (x, 0, f (x, 0)) all lie in the xz-plane, and if y = 0 is one of the lattice
coordinates, the corresponding segments would form a PL approximation of f (x, 0).
The partial derivatives fx (x, 0) can be interpreted as slopes in the xz-plane, and
these would be approximated by the slopes of the segments. Fixing y = 1 singles
out the gridline on the closest face of the cube in Figure 14. The values of fx (x, 1)
can be interpreted as slopes in this plane.
For a function f of one variable, being differentiable implies the continuity of
f . This does not apply to the partial derivatives of a function of more than one

variable. A standard counterexample is as follows.

2 2, (x, y) 6= (0, 0),
(53) f (x, y) = x +y
0, (x, y) = (0, 0).
The partial derivatives of this function are
( 3 2
y −x y
(x2 +y 2 )2 , (x, y) 6= (0, 0),
(54) fx (x, y) =
0, (x, y) = (0, 0),
x3 −xy 2
(55) fy (x, y) = (x2 +y 2 )2 , (x, y) 6= (0, 0),
0, (x, y) = (0, 0).
The partial derivatives exist at all points, and in particular at (0, 0). The function
f is not continuous at (0, 0), however, since f (t, t) = 12 for all t 6= 0, there are
function values equal to 12 arbitrarily close to (0, 0). A portion of the surface is
shown in Figure 15. The graph is not accurate around the discontinuity, but some
sense of the surface can be obtained from the picture. In fact, discontinuities can
often be seen in a graph such as this with badly distorted grid parallelograms
near the discontinuity. Note that the x- and y-axes lie on the surface and that
the horizontal lines with points (t, t, 12 ) and with points (t, −t, − 12 ) also lie on the
surface everywhere except for when t = 0. In particular, for any δ > 0, there is a
point (x, y) within δ of (0, 0) such that f (x, y) takes any particular value between
− 21 and 12 . If we look at a few of the gridlines, we see that these are nicely smooth

Figure 15. Graph of f (x, y) = x2 +y 2 .

individually. For example in Figure 16, graphs in the xz-plane corresponding to

fixed values of y = 1, .5, .1, .02 are shown. Each is the graph of a differentiable
function. In fact, they are continuously differentiable as functions of one variable.
For y = 0, f (x, 0) = 0, so this gridline is also nicely smooth. It is the transition

to the gridline at y = 0 that is not continuous. Considering the gridlines where

x is held constant shows a similar situation. What we see, therefore, is that the
partial derivatives only address the differentiability of the individual gridlines, so
the continuity of the function of two variables is not necessarily guaranteed.

Figure 16. Gridlines with y = 1, .5, .1, .02.

Our interests lie in the geometry of nice surfaces, and we would like to avoid
discontinuities such as the one exhibited by this last example. It turns out that
if the partial derivatives are continuous (as functions of two variables), then the
original function is also continuous. With this in mind, we make the following
Definition 13. If the partial derivatives of a function f are continuous, we will
call f continuously differentiable or C 1 . If all of the second partial derivatives
are also continuous, f is C 2 . As before, we can also speak of C n functions in
general and C ∞ functions.
We can roughly say that if n > m, then C n functions are more smooth than
C functions. For the most part, we will assume that whenever we speak of the
derivative of a function, that derivative will be continuous. We will purposely
encounter instances of non-differentiability and non-continuity, and these will be
very important and very specific. Otherwise, if we speak of a third partial derivative
of a function f , for example, we will implicitly assume that f is at least C 3 .
For the non-continuous function f in the example above, it is difficult to imagine
a plane tangent to the surface at (0, 0) (except, perhaps, for a vertical one). This
function was not C 1 , however, and one important consequence of a function of two
variables being C 1 is that it is always possible to determine a tangent plane in a

reasonable way. Let us examine this in some detail, since this will be central to
much of what we study.
Consider a function f : R2 → R. If the first partial derivatives exist, then at
a point (a, b), fx (a, b) and fy (a, b) can be interpreted as being the slopes of lines
tangent to the gridlines at (a, b, f (a, b)). It should seem reasonable that if a plane
were to be tangent to the surface at this point, then the two tangent lines would
lie on this tangent plane. In the example above, this would mean that the xy-plane
would be the only possible candidate for a tangent plane at the origin, and we would
not want to consider this to be the case. No reasonable tangent plane exists in this
situation. Again, however, the function in the example was not C 1 . In any case,
we can assume that the plane determined by the two partial derivatives should be
the only possible candidate for a tangent plane. We wish to show that if f is C 1 ,
then we can reasonably call this plane a tangent plane.
Let us now suppose that f is indeed C 1 (as always, we assume that f is dif-
ferentiable in some region around the point under consideration). For convenience,
suppose that fx (a, b) = m and fy (a, b) = n. We can reasonably call these the x- and
y-slopes at (a, b). The plane determined by these two slopes is the graph of a linear
function t(x, y) = mx + ny + c where c is the constant that makes t(a, b) = f (a, b).
Consider a nearby point (a + dx, b + dy), and we will attempt to estimate the dif-
ference between t(a + dx, b + dy) and f (a + dx, b + dy). Our strategy will be to
consider f (a + dx, b) first and then f (a + dx, b + dy) (we could just as easily start
with f (a, b+dy)). We can use the partial derivative fx (a, b) to estimate f (a+dx, b).
The fact that fy is continuous allows us to estimate fy (a + dx, b), and this in turn
can be used to estimate f (a + dx, b + dy).
Let  > 0. There is a δ1 > 0 such that if |dx| < δ1 , then
| f (a, b) + fx (a, b)dx − f (a + dx, b) | = | t(a, b) + mdx − f (a + dx, b) | < dx.

We can make a similar approximation using fy (a+dx, b), but this would necessarily
depend on dx complicating matters significantly. We can however consider f (a +
dx, b + dy) − f (a + dx, b). Since f is differentiable with respect to y, the Mean Value
Theorem tells us that there is a ν between b and b + dy (dy could be negative) such

(57) f (a + dx, b + dy) − f (a + dx, b) = fy (a + dx, ν)dy.

The continuity of fy guarantees the existence of a δ2 > 0 such that if (a + dx, η) is

within δ2 of (a, b), then

(58) | fy (a + dx, η) − fy (a, b) | = | fy (a + dx, η) − n | < .

There is a δ > 0, therefore, such that whenever (a + dx, b + dy) is within δ of (a, b),
|dx| < δ1 and (a + dx, η) will be within δ2 of (a, b). For a point (a + dx, b + dy)

satisfying these conditions, we have

| f (a + dx, b + dy) − t(a + dx, b + dy) |
= | f (a + dx, b + dy) − f (a + dx, b) + f (a + dx, b) − t(a + dx, b + dy) |
= | f (a + dx, b + dy) − f (a + dx, b) + f (a + dx, b) − f (a, b) − mdx − ndy |
≤ | f (a + dx, b + dy) − f (a + dx, b) − ndy | + | f (a + dx, b) − f (a, b) − mdx |
≤ |dy| + |dx| = (|dy| + |dx|) < 2 |dx|2 + |dy|2 .

This can be interpreted as showing that the directional derivative in the direction
of the vector [ dx, dy ] exists. In other words, if we were to consider the curve on the
surface containing the points (a + tdx, b + tdy, f (a + tdx, b + tdy)), then the tangent
line to this curve would lie on the plane determined by the two partial derivatives.
A third way of saying this is that if f is continuously differentiable, then the set of
tangent lines to the surface at (a, b) will form a plane, and this plane is the same
as the one determined by fx (a, b) and fy (a, b).
We now can state that for a function of two variables defined in some region
about (a, b) and continuously differentiable in that region, it is perfectly reasonable
to speak of a tangent plane to the graph of the function. Continuous differentiability
is not a necessary condition in this regard (see a book in real analysis), but since we
will never consider a non-continuous derivative outside of several important special
cases, it is reasonable to proceed in this way.
Continuously differentiable functions have one other important property that
we will exploit heavily. This is in regards to the second partial derivatives. The
partial derivative of fx with respect to y will be denoted by fxy . It is important
to note that in fxy we differentiated with respect to x first and then y. This same
function is denoted
d2 f
(64) fxy = .
With the dx notation, we indicate differentiation by placing symbols in front of
the function name, so the latest derivative should be furthest left. Even though
it is fundamentally important that C 2 functions are such that fxy = fyx , as we
will now investigate, the order of the differentiation is critical to understanding the
relationship between geometry and differentiation.
Suppose we have a C 2 function f in two variables. At a point (a, b), fx (a, b)
is the slope of the tangent line to the gridline with y = b, and fy (a, b) is the slope
of the tangent to the gridline with x = a. The second partial derivative fxy (a, b)
describes the rate of rotation of the tangent in the x-direction as we move it in the
y-direction. Similarly, fyx (a, b) describes the rate of rotation of the tangent in the
y-direction as we move it in the x-direction. Expressed this way, there appears to
be no reason to expect that fxy (a, b) = fyx (a, b). Let us try to understand why this
might be the case.

A standard example of a function with unequal cross-partials is as follows.

xy(x2 −y 2 )
(65) f (x, y) = x2 +y 2 , (x, y) 6= (0, 0),
(0, 0), (x, y) = (0, 0).
The graph of this function looks unremarkable (see Figure 17), but anomolies of
the second derivative will not be obvious in graphs such as this. The first partial
derivatives for this function are

Figure 17. For this function fxy (0, 0) 6= fyx (0, 0).
x4 y−y 5 +4x2 y 3
(x2 +y 2 )2 , (x, y) 6= (0, 0),
(66) fx (x, y) =
(0, 0), (x, y) = (0, 0).
( 5 4 3 2
x −xy −4x y
(x2 +y 2 )2 , (x, y) 6= (0, 0),
(67) fy (x, y) =
(0, 0), (x, y) = (0, 0).
Away from (0, 0), the functions fxy and fyx are equal.
x6 + 9x4 y 2 − 9x2 y 4 − y 6
(68) fxy (x, y) = fyx (x, y) = , for(x, y) 6= (0, 0).
(x2 + y 2 )3
This should not be surprising, since all of the second partial derivatives are contin-
uous away from (0, 0). The common graph for the cross-partials is shown in Figure
18, and it is apparent that neither function can be continuous at (0, 0). Both func-
tions have values at (0, 0), however. Since fx (0, y) = −y, the partial derivative of
this function at (0, 0) must be fxy (0, 0) = −1. Similarly, since fy (x, 0) = x, we have
that fyx (0, 0) = 1.
The nature of this example indicates that for nice functions, the cross-partials
should be expected to be equal. Let us look at why we might believe this to be

Figure 18. This is the graph of fxy or fyx . They agree everywhere
except at (0, 0).

the case. The grid lines on the graph break the surface into grid parallelograms
corresponding to a PL approximation of f . In general, these are not actual par-
allelograms, since the four corners are assumed to lie on a curved surface, and
so pairs of opposite sides cannot be expected to be parallel or the same length.
For a grid with mesh ∆x × ∆y, we can consider the grid parallelogram at a point
(a, b). In particular, the four corners are (a, b, f (a, b)), (a + ∆x, b, f (a + ∆x, b)),
(a + ∆x, b + ∆y, f (a + ∆x, b + ∆y)), and (a, b + ∆y, f (a, b + ∆y)). The grid par-
allelogram is depicted in Figure 19 along with the true parallelogram determined
by the grid segments adjacent to (a, b, f (a, b)). Let Dx f(a, b) be the vector from
(a, b, f (a, b)) to (a + ∆x, b, f (a + ∆x, b)), as shown in Figure 19. The coordinates
of Dx f (a, b) are
(69) Dx f = [ ∆x, 0, f (a + ∆x, b) − f (a, b) ] .
Compare this vector to the slope of this side of the grid parallelogram
f (a + ∆x, b) − f (a, b)
(70) m= .
We are looking at the slope of a secant line whose limit is fx (a, b), and the vector
Dx f(a, b) gives us information that is equivalent to this slope. The change in the
values of f , the third coordinate in Dx f , will be called the PL partial differential
of f with respect to x, and will be denoted Dx f. The vector Dy f(a, b) similarly
contains information about a secant line whose slope approximates fy (a, b), and we
define Dy f in the obvious way.
Figure 20 shows Df (a, b) and Df (a + ∆x, b). These reflect a change in fy
as it moves in the x-direction. In other words, the difference between these two

f (a + ∆x, b + ∆y)

f (a, b + ∆y)

Dy f (a, b)
Dx f (a, b)
f (a, b) f (a + ∆x, b)

Figure 19. A grid parallelogram determined by four points on

the surface and the actual parallelogram determined by Dx f and
Dy f.

vectors is an approximation of fyx (a, b), and as long as the limits are sufficiently
Dy f (a + ∆x, b) − Dy f(a, b)
(71) lim lim = [ 0, 0, fyx(a, b) ] .
∆y→0 ∆x→0 ∆x∆y
If there were no difference between Dy f(a, b) and Dy f(a + ∆x, b), then Dy f(a +
∆x, b) would occupy the opposite side of the true parallelogram shown in Figure
20. The difference between the two vectors must be a vector from the corners of the
true parallelogram and the grid parallelogram corresponding to (a + ∆x, b + ∆y),
as shown in Figure 21.

Dy f (a + ∆x, b)

Dy f (a, b) Dy f (a, b)

Figure 20. The vectors Dy f(a, b) and Dy f (a + ∆x, b).

The vector Dxy f (a, b) must have precisely the same geometric interpretation,
so as long as the limits are well-behaved, it should be that fxy (a, b) = fyx (a, b).

Dyx f (a, b)

Figure 21. The vector corresponding to fyx (a, b).


The Riemannian Curvature Tensor in Two

For a surface parametrized by x(u1 , u2 ) = x1 (u1 , u2 ), x2 (u1 , u2 ), x3 (u1 , u2 ) ,
the Riemannian curvature tensor is defined to be
dΓlik dΓlij
(72) l
Rijk = − + Γpik Γlpj − Γpij Γlpk ,
duj duk
where i, j, k, l, p ∈ { 1, 2 }, and the Einstien summation convention is used (i.e.,
since p occurs as both an upper and lower index, it is summed over). Using the
dx d2 x k
notation xi = du i and xij = duj dui , we define the Christoffel symbols Γij along

with the coefficients of the Second Fundamental Form by

(73) xij = Lij n + Γkij xk ,
again using the Einstein summation convention. Note that the Γkij describe the
tangential change in the tangent vectors xi in terms of the xi , and they can be
obtained by making measurements along the surface (i.e., they are intrinsic). The
Lij are extrinsic, and the principal curvatures, κ1 and κ2 , can be obtained from
them, and so the Gauss curvature, K = κ1 κ2 , also depends on the Lij . The Gauss
curvature can also be obtained from Rijk , in particular
R121 gl2
(74) K= ,
where gij = h xi , xj i and g = | gij |. There are several other choices for the
indices that will result in ±K. All quantities used here are intrinsic, so a proof of
the relationships stated here also proves Gauss’ Theorema Egregium, that Gauss
curvature is intrinsic.
The Rijk contain more curvature information than K, and the Riemannian
curvature tensor generalizes to higher dimensions, where the Gauss curvature does
not. Motivation for the Riemannian curvature tensor comes from the following
observations. What I describe is partly nonsense, but it lays out the basic idea.
Suppose a simple closed
R curve C on the surface bounds a region S. The total
curvature of S is θ = S K dA. If we were to parallel transport a vector around
C, then the resultant vector would differ from the original by an angle θ. If we
had points A and B on the curve, then there are two ways that we could parallel
transport a vector along C from A to B. The angle between the two resultant
vectors would also be θ.
The Riemannian curvature tensor captures this idea using differentials and
derivatives. For example, if we were to follow the tangent vector x1 as it moved
a small distance du1 in the u1 -direction, and then a small distance du2 in the u2 -
direction, we would get some vector x1 (12). We could go the other way, that is,

we could move first in the u2 -direction, and then the u1 -direction. We’ll call this
x1 (21). These two vectors will be the same, but if we were to keep track of the
rotation of the vectors relative to the surface, we would get a discrepancy of θ,
where θ is the total curvature inside the little parallelogram with sides du1 and
du2 . The average curvature would be
(75) K≈ .
kdu1 × du2 k
Note also that
kx1 (12) − x1 (21)k
(76) θ≈ .
kx1 k
Very roughly then, the Riemannian curvature tensor describes the following. Ig-
noring the normal component of the change in xi , move it in the uj -direction, and
then in the uk -direction to obtain xi (jk). Obtain xi (kj) similarly and subtract.
Expressed in terms of the tangent vectors x1 and x2 , we have
1 2
(77) xi (jk) − xi (kj) = Rijk x1 + Rijk x2 .

1. Parametrizations
Let x(u , u ) = x (u , u2 ), x2 (u1 , u2 ), x3 (u1 , u2 ) be a vector function x :
1 2 1 1

R2 → R3 . Let x be a piecewise linear approximation of x with mesh ∆u1 × ∆u2 .

At a lattice point (u1 , u2 ) = (a, b), define the partial PL-differential with respect
to ui to be
(78) x1 = D1 x(a, b) = x(a + ∆u1 , b) − x(a, b),
(79) and x2 = D2 x(a, b) = x(a, b + ∆u2 ) − x(a, b),
and the second partial PL-differentials with respect to ui and uj to be
(80) x11 = D11 x(a, b) = D1 x(a + ∆u1 , b) − D1 x(a, b),
(81) x12 = D12 x(a, b) = D1 x(a, b + ∆u2 ) − D1 x(a, b),
(82) x21 = D21 x(a, b) = D2 x(a + ∆u1 , b) − D2 x(a, b),
(83) and x22 = D22 x(a, b) = D2 x(a, b + ∆u2 ) − D2 x(a, b).
Also at each lattice point, we can define the unit PL-normal vector at (a, b) to be
x 1 × x2
(84) n(a, b) = .
kx1 × x2 k
Note that n is normal to the plane determined by x1 and x2 .
One important goal is to understand the curvature of space, so it is important
to understand curvature intrinsically. It is possible to decompose the xij into
tangential and normal components.
(85) xij = Lij n + Γkij xk .
The Lij are the coefficients of the PL-second form (??). The Γkij are the PL-
Christoffel symbols, and they describe the tangential (or geodesic) change in the
tangent vectors xi . From an intrinsic point of view, we can define quantities that
correspond roughly to the PL-Christoffel symbols. We will do this by working from
the fact that any two adjoining grid parallelograms can be laid flat (i.e., can be
embedded in a plane).

Let us first consider the intrinsic change corresponding to x11 . At a lattice

point (a, b), we are looking at
(86) x11 = x1 (a + ∆u1 , b) − x1 (a, b),
so we are interested in the grid parallelograms corresponding to (a, b) and (a +
∆u1 , b). Both grid parallelograms can be embedded in the vector space spanned by
x1 (a, b) and x2 (a, b), and in particular, x1 (a + ∆u1 , b) lies in this plane. With the
following subtraction taking place in this vector space, we can therefore define the
intrinsic PL-differential, δ1 x1 , and the intrinsic PL-Christoffel symbols, γ11 , by
(87) x1 (a + ∆u1 , b) − x1 (a, b) = δ1 x1 (a, b) = γ11
1 2
x1 (a, b) + γ11 x2 (a, b).
In general, we define δi xj and γij the same way.

x1 (21)
x1 (2)
(a + ∆u , b + ∆u ) 2 x1 (12)


x1 x1 (1)
(a, b)

Figure 1. Pushing x1 around the grid parallelogram.

If we consider the grid parallelograms at (a, b), (a + ∆u1 , b), (a, b + ∆u2 ), and
(a+∆u1 , b+∆u2 ), it will not be possible to lay these flat if there is non-zero impulse
curvature at (a + ∆u1 , b + ∆u2 ). If we cut along the vector x1 (a + ∆u1 , b + ∆u2 ),
however, we can pull all four grid parallelograms into the plane spanned by the
vectors x1 (a, b) and x2 (a, b), as shown in Figure 1. The angle θ between the two
copies of x1 (a+∆u1 , b+∆u2 ) is precisely the impulse curvature at (a+∆u1 , b+∆u2 ).
The Riemannian curvature tensor exploits this observation, and we will start to
build up to Rijk here.
The intrinsic PL-differential δ1 x1 (a, b) measures the difference between the vec-
tors marked x1 and x1 (1) in Figure 1. Similarly, δ2 x1 (a + ∆u1 , b) is the difference
between the vectors marked x1 (1) and x1 (12). Both of these differences are com-
puted in the plane spanned by x1 (a, b) and x2 (a, b). Also in this plane, the difference
between x1 and x1 (21) is measured by δ2 x1 (a, b) and δ1 x1 (a, b + ∆u2 ). Finding the
angle between the vectors x1 (12) and x1 (21) is a bit awkward, but we can get a
good approximation for small θ’s with
kx1 (12) − x1 (21)k
(88) θ≈ ,
kx1 (12)k
since the vectors x1 (21) and x1 (12) are the same length, and so the difference
between the vectors is essentially an arc of a circle with radius kx1 (12)k (or equiv-
alently, kx1 (21)k). With this in mind, we define the PL-Riemannian curvature

tensor by
(89) xi (jk) − xi (kj) = R1ijk x1 + R2ijk x2 .
In words, we take the vector xi and move it first in the xj -direction, and then the
xk -direction. From this we subtract the result of moving this vector first in the
xk -direction, and then the xj -direction. This description makes it apparent that
we have the following relationships.
(90) Rlijk = −Rlikj ,
(91) and Rlijj = 0.
There are eight pairs of numbers R1ijk and R2ijk , and each of the four pairs for which
j 6= k, along with x1 and x2 , give us approximations of the impulse curvature at
(a + ∆u1 , b + ∆u2 ).
We can express the quantities in (89) in terms of differentials at (a, b). Specif-
ically, we need the following
k k
(92) δ1 γij (a, b) = γij (a + ∆u1 , b) − γij
(a, b),
k k
(93) and δ2 γij (a, b) = γij (a, b + ∆u2 ) − γij
(a, b).
For example, we have that
(94) γij (a + ∆u1 , b) = γij
k k
(a, b) + δ1 γij (a, b).
We can find x1 (12) as follows, using numbers in parentheses to designate the lattice
point. Since
1 2
(95) x1 (1) = x1 + γ11 x1 + γ11 x2 ,
1 2
(96) x2 (1) = x2 + γ21 x1 + γ21 x2 ,
i i i
(97) and γ12 (1) = γ12 + δ1 γ12 ,
we have that
1 2
(98) x1 (12) = x1 (1) + γ12 (1)x1 (1) + γ12 (1)x2 (1)
1 2
(99) = x1 + γ11 x1 + γ11 x2
1 1 1 2
(100) + (γ12 + δ1 γ12 )(x1 + γ11 x1 + γ11 x2 )
2 1 1 2
(101) + (γ12 + δ1 γ12 )(x2 + γ21 x1 + γ21 x2 ).
1 2
(102) x1 (21) = x1 (2) + γ11 (2)x1 (2) + γ11 (2)x2 (2)
1 2
(103) = x1 + γ12 x1 + γ12 x2
1 1 1 2
(104) + (γ11 + δ2 γ11 )(x1 + γ12 x1 + γ12 x2 )
2 1 1 2
(105) + (γ11 + δ2 γ11 )(x2 + γ22 x1 + γ22 x2 ).
It follows that
1 1 1 1 1 2 1 2 1
(106) x1 (12) − x1 (21) = (γ12 γ11 + δ1 γ12 + δ1 γ12 γ11 + γ12 γ21 + δ1 γ12 γ21 )x1
1 2 1 2 2 2 2 2 2
(107) + (γ12 γ11 + δ1 γ12 γ11 + γ12 γ21 + δ1 γ12 + δ1 γ12 γ21 )x2 .
1 1 1 1 1 2 1 2 1
(108) − (γ11 γ12 + δ2 γ11 + δ2 γ11 γ12 + γ11 γ22 + δ2 γ11 γ22 ))x1
1 2 1 2 2 2 2 2 2
(109) − (γ11 γ12 + δ2 γ11 γ12 + γ11 γ22 + δ2 γ11 + δ2 γ11 γ22 )x2

The x1 component can be written in Einstein notation as

1 1 p 1 p 1 p 1 p 1
(110) δ1 γ12 − δ2 γ11 + γ12 γp1 − γ11 γp2 + δ1 γ12 γp1 − δ2 γ11 γp2 ,
and the x2 -component can be written as
2 2 p 2 p 2 p 2 p 2
(111) δ1 γ12 − δ2 γ11 + γ12 γp1 − γ11 γp2 + δ1 γ12 γp1 − δ2 γ11 γp2 .
In general, the xi -component would be
i i p i p i p i p i
(112) δ1 γ12 − δ2 γ11 + γ12 γp1 − γ11 γp2 + δ1 γ12 γp1 − δ2 γ11 γp2 .
All cases are covered by
p l p l p l p l
(113) Rlijk = δj γik
l l
− δk γij + γik γpj − γij γpk + δj γik γpj − δk γij γpk .
Compare this to the definition of the (non-PL) Riemannian curvature tensor.

Riemannian Curvature Tensor

1. The Riemannian Metric for a Plane

One thing that should always be kept in mind is that the derivative can always
be interpreted as being a linear approximation. In other words, it is often helpful
to understand a situation concerning a differentiable object by studying the corre-
sponding linear algebra situation. We will be exploring something that differential
geometers call a metric or a Riemannian metric. This is not to be confused with
the metric a topologist would impose on a metric space, although a metric-space
metric can always be constructed from a Riemannian metric.
Differential geometers study objects that don’t have natural (or at least not
convenient) coordinate systems. We can impose coordinate systems on these spaces
by identifying a piece of the space with a piece of a Euclidean space. Right now, we
will associate a piece of a surface with a piece of the Eucidean plane, R2 . We can
talk about points of the surface using a coordinate system for R2 . The geometry,
however, for the two spaces will be different, and we will impose a funny geometry
on R2 and pretend that R2 actually is the surface. We will start to describe how
this is done in the case when the surface is another plane, and at the same time,
develop some of the notation and terminology that is used in differential geometry.
We will think of the derivatives at a point of a surface as being generalizations
of the notion of a linear transformation, so let’s look at a linear transformation. Due
to the repetitive nature of linear algebra and differential geometry, it is convenient
to talk about things such as the u1 u2 -plane rather than the uv-plane. For the most
part, superscripts are used in the same way that subscripts are. This interferes with
the use of exponents, but the handyness of the notation more than makes up for
this. Suppose we have a linear transformation A from the u1 u2 -plane to the x1 x2 -
plane. Linear transformations can be defined in terms of matrix multiplication, so
we have

 
a11 a12  1 
(114) A(u1 , u2 ) =   u2 .
a21 a22

Note that aij is the entry in the i-th row and j-th column. Part of the reason for
using superscripts in this way will be apparent in a minute. More will made clear

later. Expanding the multiplication, we see that

 1 1 
 1 a1 u + a12 u2
= A(u1 , u2 ) =  
a21 u1 + a22 u2
(115) P2 1 j

j=1 aj u
= .
P2 2 j
j=1 aj u

The use of the summation notation indicates some of the repetitiveness. More
can be seen in the fact that both entries look the same. Another common way of
expressing this relationship is
x1 = a1j uj
(116) 2
x2 = a2j uj

Both equations look the same, except for the superscripts, which are both 1 in the
first equation, and both 2 in the second. Note also that the summation index j in
both summations appear as both a subscript and a superscript. Albert Einstein is
often credited with noticing that there is a nice scheme for deciding which indices
should be subscripts and which should be superscripts, and if this is followed,
summations will always range over one subscript and one superscript. The terms
covariant and contravariant tensors are used in this scheme, and we will discuss
this later. The important thing here is that we will always sum over an index that
appears both as a superscript and a subscript. This makes the summation sign
redundant (the context will make it clear what the range of the index is supposed
to be), and so using the Einstein summation convention, we can write the equations
in (116) as one equation and without the summation sign,
(117) xi = aij uj .
This may look odd at first, but it turns out to be a powerful use of notation that
takes care of itself.
Let’s look
 at an example. Consider the linear transformation defined by the
1 3
matrix A = . The vectors (the points) [ 1, 0 ] and [ 0, 1 ] in the u1 u2 -plane
2 2
map to the following vectors in the x1 x2 -plane. We will use A for the name of the
function, as well.
1 3 1 1
A(1, 0) = =
2 2 0 2
1 3 0 3
A(0, 1) = =
2 2 1 2
Remember that we are building up intuition and notation for more complicated
situations. What we want to do here is to talk about points on the x1 x2 -plane
using coordinates from the u1 u2 -plane. The game is that we will use the label
[ 1, 0 ] to talk about the vector [ 1, 2 ] in the x1 x2 -plane. We can compute the

magnitude of the vector [ 1, 2 ] using the dot product. That is,

(119) 1 2 = 1 + 4 = 5,

so the magnitude of the vector must be 5. Under the new rules√for the game and
the new geometry for the u1 u2 -plane, the magnitude of [ 1, 0 ] is 5. Similarly, the
magnitude √ of the vector
√ [ 0, 1 ] is the same as the magnitude of the vector [ 3, 2 ],
which is 9 + 4 = 13. Under this new geometry, as √ we move from the point (0, 0)
to the point (1, 0), we have traveled a distance √ of 5 units, and as we move from
(0, 0) to (0, 1), we have moved a distance of 13. Our notion of distance has changed
dramatically, and the way we measure angles can be different, as well. In the old
geometry of the u1 u2 -plane, the two vectors [ 1, 0 ] and [ 0, 1 ] are perpendicular,
but in the new geometry, the angle is the same as the angle between [ 1, 2 ] and
[ 3, 2 ] in the x1 x2 -plane. We can compute this angle using the dot product, as we
did before. That is,
(120) 1 2 = 3 + 4 = 8,
and so the angle between them θ must satisfy
√ √
(121) 8 = 5 13 cos θ.
In other words,
(122) θ = cos−1 √ √ .
5 13
In the new geometry for the u1 u2 -plane, therefore, this must be the angle between
[ 1, 0 ] and [ 0, 1 ]. All that we have done here depends on taking dot products in
the x1 x2 -plane. The rules of the game, therefore, can be condensed into a funny
dot product on the u1 u2 -plane. These generalizations of the dot product are called
inner products, and inner products can always be expressed in terms of a matrix
product as follows. Given an inner  product,
 is a matrix [ gij ] such that the
inner product of vectors a1 , a2 and b1 , b2 is

 1 2  1 2  1  g11 g12 b1
a ,a , b ,b = a a2
g21 g22 b2
= a1 b1 g11 + a2 b1 g21 + a1 b2 g12 + a2 b2 g22
= ai bj gij (in Einstein notation)
Assuming the existence of such a matrix, it is easy to find the matrix [ gij ] for this
5 8
(124) [ gij ] =
8 13
This inner product, and equivalently the matrix gij , determines the new geometry
completely. Letting e1 = [ 1, 0 ] and e2 = [ 0, 1 ], the gij are determined by what
the inner products between these vectors should be. That is,
(125) gij = h ei , ej i

We will define magnitudes and angles in the new geometry using the inner product
in place of the dot product.
(126) kxk = h x, x i
(127) h x, y i = kxk kyk cos θ

1.1. Exercises.

21. Let A be the linear transformation

 from the u1 u2 -plane to the x1 x2 -plane
2 −3
determined by the matrix . Find the inner product matrix [ gij ] for the
1 4
new geometry. Under the new geometry, determine the distance traveled along the
path from (0, 0) to (1, 0). Along the path from (0, 0) to (1, 1) to (0, 3). Find the
angle between the vectors [ 2, 3 ] and [ −3, 1 ].
  h i
22. Write the product of the square matrices aij and bjk in Einstein no-
tation (use cik for the product).

2. The Riemannian Metric for Curved Surfaces

For a function of one variable, the first derivative can be interpreted as describ-
ing an approximating tangent line, and the second derivative an approximating
parabola. We can make similar comparisons with surfaces.
Suppose we have a surface parametrized by the following vector function.
(128) x(u1 , u2 ) = x1 (u1 , u2 ), x2 (u1 , u2 ), x3 (u1 , u2 )

Suppose also that we are interested in investigating the curvature of the surface
at any particular point on the surface. We will interpret the derivatives of x as
describing linear approximations to the surface. In other words, the first partial
derivatives define a tangent plane, and they also determine a linear transformation
from the u1 u2 -plane to that tangent plane. Let us take a look at these first partials
and discuss their meaning.
dx dx dx2 dx3
(129) = , ,
du1 du1 du1 du1
dx dx dx2 dx3
(130) = , ,
du2 du2 du2 du2

These two vectors are tangent to the surface at their respective points. They
can be used directly to describe a plane tangent to the surface, and the chain
rule provides justification for doing  1 this.  Suppose we have a line in the u1 u2 -
2 1
plane parametrized by γ(t) = a t, a t (in other words, u = a t and v =
 t). The corresponding curve on the surface can be parametrized by x(γ(t)) =
x1 (a1 t, a2 t), x2 (a1 t, a2 t), x3 (a1 t, a2 t) . We can differentiate x with respect to the

new parameter t, and the chain rule illustrates the linear character of the derivative.
 1 1 
dx dx du dx du2 dx2 du1 dx2 du2 dx3 du1 dx3 du2
(131) = + 2 , + 2 , + 1
dt du1 dt du dt du1 dt du dt du1 dt du dt
 1 1 2 2 3 3

dx 1 dx 2 dx 1 dx 2 dx 1 dx 2
(132) = a + 2a , 1a + 2a , 1a + 2a
du1 du du du du du
 1 2 3
  1 2 3

dx dx dx dx dx dx
(133) = a1 1
, 1 , 1 + a2 , ,
du du du du2 du2 du2
dx dx
(134) = a1 1 + a2 2
du du
This could be interpreted as follows. If a point is moving along a line according
 1the 2parametrization
 γ, it would pass through any particular point with velocity
dx 2 dx
a , a . The image of this point on the surface will have velocity a1 du 1 + a du2 .

The relationship between velocity vectors in the u1 u2 -plane at a particular point

and velocity vectors at the corresponding image point on the surface are related
by a linear transformation. This function is known as the differential and can be
defined by
dx dx
(135) dx = 1 du1 + 2 du2 ,
du du
where a velocity vector du1 , du2 at a point in the u1 u2 -plane corresponds to a
velocity vector dx at the corresponding image point on the surface. This function
is also described by a matrix called the Jacobian,
 1 1

dx dx
1 du2
 du 
 
 dx2 dx2  .
(136) J(x) =  du 
 1 du2 
 
dx3 dx3
du1 du2
The differential then becomes
1 2 du1
(137) dx(du , du ) = J(x)
 1 1

dx dx
1 du2
 du  
 
 dx2 dx2  du1
(138) =  du 
 1 du2  du2
 
dx3 dx3
du1 du2

Once we have this linear function, it is easy to compare areas in the u1 u2 -plane with
areas on the tangent plane. The unit square determined by the vectors (1, 0) and
dx dx
(0, 1) gets mapped to a parallelogram determined by the vectors du 1 and du2 . The

area of this parallelogram, if you remember from calculus, is equal to the magnitude
of the cross product of these two vectors. We’ll come back to this later.
We have already talked a bit about a point in the u1 u2 -plane and the cor-
responding point on the surface. The parametrization ties points together, and
we can identify the pairs of points and speak of them almost as if they were one.
Expressed another way, we are again playing the game of using labels from the
u1 u2 -plane to describe points on the surface. This is an important concept when

dealing with manifolds, since there may be no way, or at least no convenient way,
of describing individual points on a manifold. What we will do is to talk about
points in the u1 u2 -plane and endow them with properties from the manifold, or in
this case, the surface. For example, let us say that we move along the segment from
(0, 0) to (1, 0) in the u1 u2 -plane. We have traveled a distance of 1. The image of
this segment on the surface might be a curve with length 5. If we change the way
that we measure distances in the u1 u2 -plane, as we did in the case when the surface
was simply another plane, then in some new geometry,the length of the segment
just mentioned would be 5, and then doing geometry in the u1 u2 -plane becomes
more like doing geometry on the surface. The goal here is to come up with a funny
way of measuring things like distances and angles so that doing geometry with these
new measurement schemes is equivalent to doing geometry on the surface. This is
a rough description of what differential geometry is about.
The differential tells us how velocity vectors in the u1 u2 -plane correspond to
velocity vectors on the surface via a linear transformation that changes from point
to point. These contain the necessary information to find a relationship between
distances and angles. Note that both of these quantities for vectors are obtainable
from the dot product, so if we know how the dot products compare, then we should
be able to get what we need for distances and angles. Suppose the Jacobian at
(0, 0) is
 
c11 c12
(139) 
J(x)(0, 0) = c21 c22 
c31 c32
Consider two velocity vectors in the u1 u2 -plane, a1 , a2 and b1 , b2 . Their
images are
   1 1 
c11 c12  1  c1 a + c12 a2
c21 a
(140) c22  2 = c21 a1 + c22 a2  ,
c31 c32 c31 a1 + c32 a2

and similarly
   1 1 
c11 c12  1  c1 b + c12 b2
c21 b
(141) c22  2 = c21 b1 + c22 b2  .
c31 c23
c31 b1 + c32 b2

The dot product of these two vectors is

 1 1 
 1 1  c1 b + c12 b2
(142) c1 a + c12 a2 c21 a1 + c22 a2 c31 a1 + c32 a2 c21 b1 + c22 b2 
c31 b1 + c32 b2
= (c11 a1 + c12 a2 )(c11 b1 + c12 b2 ) + (c21 a1 + c22 a2 )(c21 b1 + c22 b2 )
+ (c31 a1 + c32 a2 )(c31 b1 + c32 b2 )

Of particular interest is the way the distributive property translates to a property

called bilinearity for the dot product. Note how the ai factor out, and then the bi .

The gij represent the quantities in parentheses.

= a1 [c11 (c11 b1 + c12 b2 ) + c21 (c21 b1 + c22 b2 ) + c31 (c31 b1 + c32 b2 )]

+ a2 [c12 (c11 b1 + c12 b2 ) + c22 (c21 b1 + c22 b2 ) + c32 (c31 b1 + c32 b2 )]
(143) = a1 b1 (c11 c11 + c21 c21 + c31 c31 ) + a1 b2 (c11 c12 + c21 c22 + c31 c32 )
+ a2 b1 (c12 c11 + c22 c21 + c32 c31 ) + a2 b2 (c12 c12 + c22 c22 + c32 c32 )
= a1 b1 g11 + a1 b2 g12 + a2 b1 g21 + a2 b2 g22 .

We could have established this with less clutter by assuming

 the dot product

follows a distributive law, which it does. The vectors a1 , a2 and b1 , b2 map
dx 2 dx 1 dx 2 dx
to the vectors on the surface a1 du 1 + a du2 and b du1 + b du2 . Therefore,

1 dx 2 dx 1 dx 2 dx
(144) a +a · b +b
du1 du2 du1 du2
dx dx dx dx dx dx dx dx
= a1 b1 1 · 1 + a1 b2 1 · 2 + a2 b1 2 · 1 + a2 b2 2 · 2
du du du du du du du du
= a1 b1 g11 + a1 b2 g12 + a2 b1 g21 + a2 b2 g22 .

Note that the gij represent the same quantities in both derivations. The dot product
is a special case of a vector space product called an inner product, which share the
basic property of bilinearity. Bilinearity is described as

(145) h ax + by, z i = a h x, z i + b h y, z i


(146) h x, by + cz i = b h x, y i + c h x, z i .
This new inner product on the vectors a1 , a2 and b1 , b2 is defined by

 1 2  1 2  1 2
 g11 g12 b1
(147) a ,a , b ,b = a a
g21 g22 b2

The matrix [ gij ], or the bilinear function (i.e., the inner product) defined by it, is
called the first fundamental form. The entries of the matrix, the gij , generally vary
from point to point, and we usually want to consider surfaces and parametrizations
where these vary smoothly.
If we were to consider a vector at a certain point, say [ 1, 0 ], we can compute
its inner product with itself using the first fundamental form.
  g11 g12 1
(148) h [ 1, 0 ] , [ 1, 0 ] i = 1 0 = [ g11 ]
g21 g22 0

The quantity g11 came from the inner product of the vector on the surface corre-
sponding to [ 1, 0 ], so this should not be surprising. Perhaps more importantly,
note that once we have the matrix [ gij ], we can work exclusively with vectors in
the u1 u2 -plane, and while this vector is a unit vector under
p the dot product, with

respect to this new inner product, it has magnitude h [ 1, 0 ] , [ 1, 0 ] i = g11 .
The matrix [ gij ] is also called the the Riemannian metric or simply the metric.

3. Curvature
The curvature of the surface at a point depends on how the unit normal vector
is rotating as it moves past the point. This depends, of course, on how the normal
is moving past the point, but this relationship is based on the derivative, and so,
it is linear. We need, therefore, to find the derivative of the unit normal in two
directions, and this is most convenient in the directions corresponding to u1 and
u2 . In other words, the curvature of the surface is completely described by du 1 and
du2 . In practice, these can be complicated derivatives to compute. For that reason,
and also to understand them better, we will look at their relationship with other
The first derivatives of x determine n, so the second derivatives of x should
also determine the derivatives of n. These relationships are linear, so all should
be expressible in terms of matrix multiplications. Note that the product rule and
chain rule generalize inner products and cross products in a natural way, and we
will use these as we need them.
The unit normal vector can be written in terms of the first derivatives, du 1 and
du 2 as follows, since the cross product is perpendicular to the two factors.
dx dx
1 × du2
(149) n = du
dx1 × dx
du du2
Differentiating this expression directly is not immediately illuminating, so we will
approach this from another direction. Each of the second partial derivatives dudj du i
measures the change in the first derivative du i at each point of the surface. Some of

this change occurs in the form of a change in magnitude, and some of this change is
a result of the vector rotating. Furthermore, some change occurs along the tangent
plane, and the rest in the direction of the normal vector. In any case, the two
tangent vectors and the unit normal at each point form a basis for R3 , so each
second derivative can be expressed as a linear combination of these three vectors.
d2 x dx dx
j i
= Lij n + Γ1ij 1 + Γ2ij 2
(150) du du du du
k dx
= Lij n + Γij k (in Einstein notation)
The four numbers Lij measure how quickly the first derivatives turn away from
the surface. These together, therefore, can conceivably contain all of the surface’s
curvature information. We are assuming that this curvature information comes from
the linear approximation of the rotation of the unit normal vector at each point.
We should look for the relationship between the Lij and the derivatives of the unit
normal vector. Now, the unit normal has constant magnitude, so its derivatives
are perpendicular to n. In other words, the derivatives of n must be parallel to
the tangent plane. They can be expressed, therefore, as a linear combination of the
first derivatives.
dn dx dx
= −L1i 1 − L2i 2
(151) du du du
= −Lji j
The four numbers −Lji are different from the Lij , and in general, the position of the
indices should be understood to indicate distinct variable names. The two L’s are

closely related, and differences in index placement will usually imply a particular
relationship. Furthermore, the negative signs on the −Lji are customary and are
used to simplify the relationship betweenh the
i L’s. We have actually seen the Li
before. The determinant of the matrix Lji is the Gauss curvature.
To establish

dx a formula tying the L’s together, we will differentiate the inner
product du i,n , which is zero, since the two vectors are perpendicular.

d du i,n dx dn d2 x
0= = , + , n
duk dui duk duk dui
dx j dx
= ,L + Lik
(152) dui k duj
dx dx
= −Ljk , + Lik
dui duj
= −Ljk gij + Lik
We have, therefore, that
(153) Lik = Ljk gij (in Einstein notation).
This is a typical arrangement. We may speak of lowering an index, which means
multiplying by the matrix associated with the metric. Note that the ordering of the
indices is not critical, since the matrices we will be dealing with are for the most
part symmetric. The matrix [ gij ] has a matrix inverse, which we will denote by
(154) [ gij ] = g jk .
Again, the g with two superscripts is distinct from the g with two subscripts. By
definition, one is the metric, the other is the metric’s inverse. In particular, let
1 if i = j,
(155) δji =
0 if i 6= j.
Essentially, δj is the identity matrix. We can express the fact that [ gij ] and
g are inverse matrices in Einstein notation by
(156) gij g jk = δik .
Equation (153) can be reversed using Einstein notation as
(157) Lik = Ljk gij
(158) Lik g il = Ljk gij g il
(159) Lik g il = Ljk δjl = Llk .
The matrix Lij defines a linear transformation relating the rate at which the unit
normal vector rotates with a corresponding velocity vector on the surface at a
particular point on the surface. This linear transformation is called the Weingarten
map. At each point of the surface the Weingarten map is a linear approximation
to the Gauss map. It is, therefore, a derivative of the Gauss map, at least in some
sense. The ratio of areas under the Gauss map with areas on the surface is the
Gauss curvature, so the Weingarten map is intimately related to the curvature of
the surface. Since the relationship is linear, the particular region we choose to use
to compute the two areas is not important. The easiest correspond to the square

determined by the unit vectors [ 1, 0 ] and [ 0, 1 ] in the u1 u2 -plane. The image of

this square under dx is the parallelogram determined by the two tangent vectors
dx dx
du1 and du2 . The area of this parallelogram can be found using the cross product
dx dx
(160) Area on surface = × 2.
du1 du
dn 1 dx 2 dx
The corresponding vectors under the Weingarten map are du 1 = −L1 du1 − L1 du2
dn 1 dx 2 dx
and du2 = −L2 du1 − L2 du2 . The area of the paralellogram determined by these
two vectors can also be found using the cross product. Using the fact that the cross
product is bilinear and anti-symmetric, we see that
dn dn
× 2
du1 du
dx dx dx dx
= (−L11 1 − L21 2 ) × (−L12 1 − L22 2 )
du du du du
dx dx dx dx dx dx dx dx
= L11 L12 1 × 1 + L11 L22 1 × 2 + L21 L12 2 × 1 + L21 L22 2 × 2
du du du du du du du du
dx dx dx dx
= 0 + L11 L22 1 × 2 − L21 L12 1 × 2 + 0
du du du du
dx dx
= (L11 L22 − L21 L12 ) 1 × 2
du du
The ratio of the areas under the Gauss map  and areas on the surface, therefore, is
given by the determinant of the matrix Lij .

4. The Inverse of the Metric

It is more difficult to compute distances along a surface, since the metric
changes from point to point. At a particular point, however, the vector [ 1, 0 ]

say, has an interpretation as a velocity vector with magnitude,
 or speed, g11 . A
small movement in this direction from this point, say ∆u , 0 , corresponds to a

distance g11 ∆u1 . This would be a good approximation for a distance along the
surface in this direction for small values of ∆u1 . From this point, it should be
conceivable that we could compute distances using integration, but that is not the
concern here. We will consider differentiation first.
Remember that we obtained the metric from p the first partial derivatives of
the parametrization, and for unit vectors u, h u, u i is the magnitude of the
corresponding tangent vector on the surface. It is, in some sense, a directional

Riemannian Curvature Tensor

1. Intrinsic Interpretations
We have discussed the second derivatives of the vector function x in terms of
the following.

d2 x dx
(162) = Lij n + Γkij k
duj dui du

These are called Gauss’ formulas. The Lij are called the coefficients of the second
fundamental form, and the Γkij are called the Christoffel symbols (of the second
kind). We have talked about the Lij a bit, and right now, we will focus on the Γkij .
We saw earlier that if we followed a tangent vector around a closed path in a
plane, then the net rotation of the tangent vector would be 2π (radians). Any devia-
tion from 2π gives us direct information about total curvature contained within the
closed curve. Let us try to follow a tangent vector around the path corresponding
to the unit square in the u1 u2 -plane ([1, 0] × [1, 0]) using information only available
at the point (via derivatives). That is, we will use the Γkij and perhaps derivatives
of these at the point. Imposing the normal looking graph paper of the u1 u2 -plane
on the surface, we will be running around one of the “squares,” which we’ll call
s-squares, to have a name.
Running from (0, 0) to (1, 0), the tangent vector du 1 gives us a velocity on the

surface. We will assume that this vector is

dx tangent to the first side of the s-square.
The vector dudx
1 will move a distance (approximately) to the next vertex. It
will turn towards du2 according to Γ11 . It changes magnitude according to Γ111 . In

particular, traversing the first side affects the tangent vector in the following way

dx dx dx
(163) 1
→ (1 + Γ111 ) 1 + Γ211 2 .
du du du

Now we want to push this vector along the path corresponding to the segement
from (1, 0) to (1, 1). We’re pushing in the direction of u2 , so we are interested in
Γ112 and Γ212 , but these have changed slightly, since we’re starting at a different
dΓ1 dΓ212
point. This change can be approximated using du12 1 and du1 . We have moved a

distance corresponding to a change in u1 of one unit, so these are the differences


we need. We have the effect of traversing the next side as

1 1
 1 dΓ112 dx
1 + Γ11 + 1 + Γ11 Γ12 + 1
du du1
 dΓ 2
+ Γ211 + 1 + Γ111 Γ212 + 12 1
du du2
(164)  1 1

dΓ 1 dΓ12 dx
= 1 + Γ111 + Γ112 + 12 1
+ Γ 1
11 12
+ Γ 11 1
du du du1
dΓ2 dΓ2 dx
+ Γ211 + Γ212 + 12 1
+ Γ111 Γ212 + Γ111 12 1
du du du2

Pushing du1 the other way around the square, from (0, 0) to (0, 1) to (1, 1), yields
1 1
 1 dΓ111 dx
1 + Γ12 + 1 + Γ12 Γ11 +
du2 du1
2 1
 2 dΓ211 dx
+ Γ12 + 1 + Γ12 Γ11 +
du2 du2
1 1 dΓ111 1 1
1 dΓ11 dx
= 1 + Γ12 + Γ11 + + Γ12 Γ11 + Γ12 2
du2 du du1
2 2 dΓ211 1 2 1 dΓ11
+ Γ12 + Γ11 + + Γ12 Γ11 + Γ12 2
du2 du du2

This is all wrong. Start over again.

We want to push the vector x1 around the square two ways: in the u1 direction
and then the u2 direction, and also in the u2 direction and then the u1 direction.
Let’s call these x1 (12) and x1 (21). We’ll also say x1 (1) is the intermediate vector
after pushing x1 in only the u1 direction.
Based on the information available at the original point, we can make the
following “best guess.” We start with

(166) x1 (1) = x1 + Γ111 x1 + Γ211 x2

(167) x2 (1) = x2 + Γ121 x1 + Γ221 x2

As we push x1 (1) to x1 (12), we can use the best information we have about the
current states of the various quantities. We have estimates of x1 (1) and x2 (1), and
we can also estimate the new Γk11 with

d 1
(168) Γ112 (1) = Γ112 + Γ
du1 12
(169) Γ212 (1) = Γ212 + 1 Γ212 .

Therefore, we can make the estimate

(170) x1 (12) = x1 (1) + Γ112 (1)x1 (1) + Γ212 (1)x2 (1),


and substitution yields

x1 (12)
= x1 + Γ111 x1 + Γ211 x2
+ Γ112 + 1 Γ112 x1 + Γ111 x1 + Γ211 x2
(171) + Γ212 + 1 Γ212 x2 + Γ121 x1 + Γ221 x2
dΓ1 dΓ112 1 dΓ212 1
= 1 + Γ111 + Γ112 + 12 + Γ 1
Γ 1
12 11 + Γ + Γ 2
Γ 1
+ Γ x1
du1 du1 11 12 21
du1 21
dΓ1 2 dΓ212 dΓ212 2
+ Γ211 + Γ112 Γ211 + 12 Γ + Γ 2
+ + Γ 2
Γ 2
+ Γ x2
du1 11 12
du1 12 21
du1 21

For x1 (21), we can make similar approximations.

(172) x1 (21) = x1 (2) + Γ111 (2)x1 (2) + Γ211 (2)x2 (2).

The intemediate values are

(173) x1 (2) = x1 + Γ112 x1 + Γ212 x2

(174) x2 (2) = x2 + Γ122 x1 + Γ222 x2
d 1
(175) Γ111 (2) = Γ111 + Γ
du2 11
(176) Γ211 (2) = Γ211 + 2 Γ211 .


x1 (21)
= x1 + Γ112 x1 + Γ212 x2
+ Γ111 + 2 Γ111 x1 + Γ112 x1 + Γ212 x2
(177) + Γ211 + 2 Γ211 x2 + Γ122 x1 + Γ222 x2
1 1 dΓ111 1 1 dΓ111 1 2 1 dΓ211 1
= 1 + Γ12 + Γ11 + + Γ11 Γ12 + Γ + Γ11 Γ22 + Γ x1
du2 du2 12 du2 22
dΓ1 2 dΓ211 dΓ211 2
+ Γ212 + Γ111 Γ212 + 11 Γ 12 + Γ 2
11 + + Γ 2
11 Γ 2
22 + Γ x2 .
du2 du2 du2 22

Some measure of the curvature is given by how different x1 (12) is from x1 (21). If
we subtract, we see that
x1 (12) − x1 (21)
dΓ12 dΓ111 1 1 1 1 dΓ112 1 dΓ111 1
= − + Γ 12 Γ 11 − Γ 11 Γ 12 + Γ 11 − Γ
du1 du2 du1 du2 12

dΓ2 1 dΓ211 1
+ Γ212 Γ121 − Γ211 Γ122 + 12 Γ 21 − Γ x1
du1 du2 22

dΓ1 2 dΓ111 2 dΓ212 dΓ211
+ Γ112 Γ211 − Γ111 Γ212 + 12 Γ 11 − Γ 12 + −
(178) du1 du2 du1 du2
2 2

dΓ dΓ
+ Γ212 Γ221 − Γ211 Γ222 + 12 Γ2 − 11 Γ 2 x2
du1 21 du2 22
dΓ12 dΓ111 p 1 p 1 dΓp12 1 dΓp11 1
= − + Γ12 Γp1 − Γ11 Γp2 + Γ − Γ x1
du1 du2 du1 p1 du2 p2
p 2 p 2 dΓp12 2 dΓp11 2 dΓ212 dΓ211
+ Γ12 Γp1 − Γ11 Γp2 + Γ − Γ + − x2 .
du1 p1 du2 p2 du1 du2
If we were pushing x1 around an -square, then all terms will have a factor
dΓ1 1 2
of , except for the terms like du12 1 Γ11 , which would have a factor of  . It is
conceivable that in a comparison with the actual function, the  -terms would
become irrelevant. This brings agreement with the Riemannian curvature tensor.
That is, x1 (12) − x1 (21) = R112 xk . In Millman and Parker, it is shown that
R121 gl2
(179) K= .
l l
It seems that R112 or R212 could also be used with an appropriate gij . In our case,
this appears to be
R112 gl2
(180) K= .
This appears to be
h x1 (12) − x1 (21), x2 i
(181) .
To see that the difference seen in the vector x1 as it is pushed around the square
two different ways is relevant, we get an initial confirmation from the following.
In the Smarandache Manifolds book, it is shown that the relative angle between
two geodesics increases or decreases depending on the total curvature between the
geodesics. If the relative angle decreases by θ radians, then there must be θ total
curvature between. The total curvature, therefore, must be ± the angle between
x1 (12) and x1 (21). The formula given by Millman and Parker computes this angle.
We can verify this as follows.
(182) h x1 (12), x2 i = kx1 (12)k kx2 k ≈ kx1 k kx2 k cos θ12
(183) h x1 (21), x2 i = kx1 (21)k kx2 k ≈ kx1 k kx2 k cos θ21 ,
and we want θ12 − θ21 . We can use the trig identity
α+β α−β
(184) cos α − cos β = −2 sin sin
2 2

We also have the fact that

(185) g = kx1 × x2 k .
R121 gl2 kx1 k kx2 k (−2) sin θ12 +θ
sin θ12 −θ21
= 2 2 2
g kx1 k kx2 k sin θ
(186) θ12 − θ21

kx1 k kx2 k sin θ

Curvature of 3-Dimensional Spaces

1. What we know
There has been some work done with polyhedral metrics by Gromov, and later
by Aitchison, and Rubinstein. The latter two worked with cubings of 3-manifolds,
which consist of flat 3-cubes, and the curvature is concentrated in the 1-skeleton.
The dihedral angle around each edge must be at least 2π, so the geometry is Euclid-
ean or hyperbolic around the edges. At each vertex, it is required that lk(v) has
the property that every 1-cycle has at least three edges and that every 1-cycle with
exactly three edges bounds a triangle contained in exactly one cube. I’m not sure
what lk(v) is, but my first guess is that it is a small ball about the vertex, and
I think this means that it is a simplicial complex. My second guess is that it is
the surface of this ball. Each triangle corresponds to a cube. We’re looking at the
triangulation of a 2-sphere. OK, I think this is it, and I believe lk(v) stands for the
link of v.

2. What is the geometry like around a vertex of a cubed 3-manifold?

The simplest case might be eight cubes arranged like the octants of R3 about
the origin. Adding two more in the most obvious way yields the geometry of stacked
cones with impulse curvature − π2 .
Question 1. What is the nature of the curvature at a point beyond this type
of dihedral curvature?

3. A positive curvature example

This simple configuration consists of the corners of four cubes meeting at one
vertex. A polyhedral ball about this vertex would form a tetrahedron. Since only
one vertex is being considered, we can build the space out of four of the eight oc-
tants of R3 . We will use the x+ y + z + -octant and the three octants adjacent to it,
the x− y + z + -octant, the x+ y − z + -octant, and the x+ y + z − -octant. From this con-
figuration, we will identify the x− z + -quarter plane of the x− y + z + -octant with the
y − z + -quarter plane of the x+ y − z + -octant; the x− y + -quarter plane of the x− y + z + -
octant with the y + z − -quarter plane of the x+ y + z − -octant; and the x+ z − -quarter
plane of the x+ y + z − -octant with the x+ y − -quarter plane of the x+ y − z + -octant.
A view of this using cubes is shown in Figure 1. Note that we are left with four
semi-axes. The x+ -, y + -, and z + -axes remain, and the three negative axes have
been identified. We will refer to this last axis as the negative axis.
The geometry in the interior of each of the octants is Euclidean, and there is
2 radians of dihedral curvature along each of the axes. This can be seen in the
2-gon shown in Figure 2. It might be that it makes sense to say that there is 2π

Figure 1. A depiction of the identifications: x− z + ≡ y − z + ,

x− y + ≡ y + z − , and x+ z − ≡ x+ y − .

Figure 2. A 2-gon with total curvature 2.

steradians of curvature at the central vertex. One possible effect is some torsion in
the curvature of a curve near it.