You are on page 1of 103

lOMoARcPSD|3449400

Gr notes 15-16

Mathematics for General Relativity (University College London)

StuDocu is not sponsored or endorsed by any college or university


Downloaded by Roy Vesey (rdv0044@googlemail.com)
lOMoARcPSD|3449400

Maths for GR

Michael Singer
E-mail address: michael.singer@ucl.ac.uk

Department of Mathematics, University College, London WC1E 6BT

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

Contents

Chapter 1. Introductory mathematical material: some geometry 5


1.1. Vector spaces and affine spaces 5
1.2. Quadratic forms and bilinear forms 7
1.3. Curves, tangent vectors and so on 11
1.4. Calculus of variations 13
Chapter 2. Minkowski Spacetime and Special Relativity 17
2.1. Introduction 17
2.2. What are the inertial coordinate systems? 18
2.3. Worldlines 19
2.4. Why do clocks carried by inertial observers all go at uniform rates? 21
2.5. Spacetime diagrams 21
2.6. Time dilatation—‘moving clocks run slowly’ 22
2.7. Simultaneity and distance 23
2.8. Length contraction 25
2.9. Lorentz transformations 26
2.10. Orientation and time-orientation 27
2.11. Interstellar travel 32
2.12. Summary of key notation and definitions 33
Chapter 3. Further topics in Special Relativity 35
3.1. Non-inertial observers: acceleration 35
3.2. Momentum and energy: E = mc2 37
3.3. Momentum of photons 39
Chapter 4. Multivariable calculus 41
4.1. Smooth functions and changes of coordinates 41
4.2. Two types of vector 43
4.3. The Einstein summation convention 45
4.4. Differentiation along a curve 47
4.5. Tensor fields of rank 2 47
4.6. General tensors 49
4.7. Manifolds 50
Chapter 5. Space-times and geodesics 53
5.1. Curved space-time 53
5.2. Events and worldlines 56
5.3. Geodesics 56
5.4. A first look at the covariant derivative 60
5.5. Local inertial coordinates 61
5.6. A sneak preview of curvature 63
Chapter 6. Covariant differentiation and curvature 67
6.1. Introduction 67
6.2. The covariant derivative 67
6.3. Extension to all tensors 70
6.4. Properties 71
3

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

4 CONTENTS

6.5. Curvature 72
6.6. Curvature at a point 76
6.7. Ricci and scalar curvature 79
6.8. Relative acceleration and geodesic deviation 80
6.9. Comparison with the newtonian theory 81
6.10. Weak field limit 82
6.11. Physical differential equations 83
Chapter 7. The Schwarzschild metric and black holes 85
7.1. Spherically symmetric, static metrics 85
7.2. Schwarzschild 86
7.3. Physical consequences 89
7.4. Extensions of Schwarzschild: introduction to black holes 96
7.5. Gravitational collapse 102
7.6. The life and death of stars 102
7.7. Some figures, or a tale of 3 black holes 102

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

CHAPTER 1

Introductory mathematical material: some geometry

1.1. Vector spaces and affine spaces


It is a commonplace that the world around us—at least on everyday scales—can be very well
described as some kind of a 3-dimensional space, in that we can use some system of coordinates
(x, y, z) to describe where things are.
One of the lessons that Einstein taught us, though, is that coordinate systems by themselves
have no physical significance, and to understand things properly we have to distinguish clearly
between physical phenomena and the way in which they may be described using particular
coordinates. The mathematics underlying this is linear algebra, so we’ll start with a quick
reminder about abstract vector spaces and linear maps.
We shall then add additional structures piece by piece, giving abstract definitions and show-
ing what they mean concretely.
The most important of these additional structures is one that allows us to measure lengths.
(In an abstract vector space there is no natural notion of what the length of a vector should
be.)

1.1.1. Vector spaces. In this course, we shall only be interested in real vector spaces.
Recall that a vector space V is a set of elements (called ‘vectors’) which can be added to each
other and multiplied by real numbers (often called ‘scalars’). We will not reproduce the axioms
here.
If V and W are vector spaces, then we are interested in maps between them that preserve the
given structure (the addition and scalar multiplication). A linear map (or linear transformation)
from V to W is a mapping T from V to W (written in compact form T : V → W ) with the
properties
T (v1 + v2 ) = T (v1 ) + T (v2 ) and T (λv) = λT (v) (1.1.1)
for any vectors v, v1 , v2 of V and scalars (i.e. real numbers) λ.
Remark 1.1.1. It is standard to write T v for T (v) when T is a linear transformation.
Further: T is an isomorphism between V and W if it is linear and also 1:1 and onto as a
mapping, just regarding V and W as sets. If V and W are isomorphic by a linear map T , we
often also say that T is an identification of V and W , or that V and W are identified (by T ).
Example 1.1.2. As in the intro, the space of all triples (x, y, z) of real numbers is a vector
space, denoted R3 , where
(x, y, z) + (x′ , y ′ , z ′ ) = (x + x′ , y + y ′ , z + z ′ ), λ(x, y, z) = (λx, λy, λz)
Example 1.1.3. More generally, the space of lists of n real numbers (x1 , . . . , xn ) is a real
vector space denoted by Rn .
1.1.2. Bases and matrices. We recall that a (finite) basis for a vector space V is a set
of elements (v1 , . . . , vn ) such that every element v in V has a unique representation
n
X
v= λj vj , (λj ∈ R). (1.1.2)
j=1

In the expansion (1.1.2), the numbers λj are called the coefficients (or sometimes coordinates)
of v with respect to the basis (v1 , . . . , vn ). If V does have a finite basis consisting of n elements,
5

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6 1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

then any other basis of V will also consist of n elements. This number n is defined to be the
dimension of V .
If V does not have a finite basis, then it is said to be infinite-dimensional.
Example 1.1.4. The set Rn of n-tuples (x1 , . . . , xn ) (see above) has dimension n. This has
a standard basis
e1 = (1, 0, . . . , 0), e2 = (0, 1, 0 . . . , 0), . . . , en = (0, . . . , 0, 1).
Example 1.1.5. The set of differentiable functions on R is an infinite-dimensional vector
space.
Theorem 1.1.6. If V is a real vector space of dimension n, then V is isomorphic to Rn .
We won’t need the proof. But it’s important to understand that an isomorphism of V with
Rn is exactly the same thing as a choice of basis of V . For if T : Rn → V is an isomorphism,
we define
vj = T (ej ),
where the ej is the standard basis of Rn , and then you check that the vj form a basis of V .
Conversely, given a basis vj of V , for every v ∈ V we have its n-tuple of coefficients λj as in
(1.1.2), and the map from v to its coefficients is an isomorphism from V to Rn .
In particular without more structure there is no natural or unique isomorphism between our
n-dimensional vector space and Rn . Returning to our original motivation, the observation that
our world appears to be very well described by a space with coordinates (x, y, z), but we don’t
want to specify particular coordinates, we see that these ideas are quite well captured by saying
that our world is well described by a 3-dimensional real vector space.
Remark 1.1.7. When we are using vectors to describe ‘physical space’ we often call their
components in a basis coordinates instead.
1.1.3. Symmetries of a vector space. Another way of thinking about choices of basis
in a vector space is in terms of the symmetry of the space.
Definition 1.1.8. Let V be a finite-dimensional real vector space. The set of all invertible
linear maps T : V → V (i.e. linear isomorphisms of V with itself) is a group, denoted GL(V ).
If V = Rn , we also write GLn (R) or GL(n, R) for GL(V ).
GL here stands for ‘general linear’. Recall group just means that there is an associative
multiplication with inverses. In this case it is composition of linear maps. The group GLn (R)
is the same as the group of n × n invertible matrices M . Such M defines a map from Rn to
itself by matrix multiplication, where we write the typical element x of Rn as a column vector
with coefficients (x1 , . . . , xn ), so
x 7→ M x

1.1.4. Affine spaces. A linear map T between vector spaces automatically takes the zero
vector to the zero vector. In particular, thinking about R3 , the translation
(x, y, z) 7→ (x + a, y + b, z + c)
for some fixed vector (a, b, c) is not a linear map. However, from the physical point of view, we
would certainly want to be able to consider such transformations as part of our story.
There is a formal abstract definition of affine space, but we shall not give it. Instead, we
work with our vector space V and just enlarge the symmetries.
Definition 1.1.9. Let V be a finite-dimensional real vector space. The affine group Aff(V )
is the group of all transformations of the form
x 7→ T x + b
where T ∈ GL(V ) (i.e. is an invertible linear transformation) and b ∈ V is a vector.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

1.2. QUADRATIC FORMS AND BILINEAR FORMS 7

Note in particular, by taking T = the identity I, we get the translations as part of Aff(V ).
And on the other hand it contains GL(V ) as the subgroup of elements with b = 0.
We shall somewhat loosely talk about the affine space V to mean the space V , but where
we are allowing the whole of Aff(V ) to act as symmetries.
The following example may entertain the mathematicians.
Example 1.1.10. Let T be a linear mapping from a vector space V to another vector space
W . Let w ∈ W be any given vector. Let P be the solution set of the equation T x = w, i.e.
P = {x ∈ V : T x = w}
Suppose that P is not empty. Then P is a natural example of an affine space if w 6= 0 and is a
vector space if w = 0.
To picture what is going on here, suppose that V = R3 and W = R. Then we picture P
as a two-dimensional plane inside of R3 (usually) which goes through the origin if w = 0 and
does not (necessarily) otherwise. When w = 0, P is a linear (or vector) subspace of R3 , hence a
vector space K in its own right. If w 6= 0, P is identifiable with K; geometrically, P is a plane
parallel to K, but not going through 0. We can identify K with P by picking any element p of
P and mapping k ∈ K to k + p ∈ P . There is no preferred choice of p and hence no given origin
in P .
A further interesting fact is that the group of linear transformations of R3 which map P
into itself is identifiable with Aff(P ).
If we think of V as an affine space, then given any two points p and q, we have the displace-
ment vector →

pq of q relative to p, given in terms of the linear structure of V by


pq = q − p. (1.1.3)
Note that if we subject p and q to an affine transformation
x 7−→ T x + b
then the displacement vector is subjected only to the ‘linear part’ of the transformation,


pq 7−→ T (→

pq);
the vector b cancels out when we form q − p.
To put it another way, if we fix any point p of V , then we get a map from V (viewed as an
affine space) to V viewed as a vector space, in fact the space of all position vectors relative to
p:
q 7→ →

pq.
Indeed, the set of position vectors does have a given origin, namely the zero vector which is the
position vector of p relative to itself. (A bit confusing—sorry about that, but it is the price to
be paid for not burdening you with the ‘proper’ definition of affine space.)
Let’s make this more concrete. Consider the solar system (at a particular instant, perhaps).
We know that we can describe the position of any given object in the solar system by 3 co-
ordinates (x, y, z). But depending upon the problem, we may want to put the origin of these
coordinates at different places. For describing the orbit of the Earth around the Sun, we might
very well put the origin at the centre of the Sun; for describing the Earth-Moon system, we
might put it at the centre of the Earth, or at the centre of mass of the Earth-Moon system. The
point is that the right way to think of things is that coordinates (x, y, z) give the position of
one point relative to another (the Earth relative to the Sun, or the Moon relative to the Earth).

1.2. Quadratic forms and bilinear forms


Informally, a quadratic form in n variables is a homogeneous quadratic polynomial in n
variables.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

8 1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

Example 1.2.1. In one variable, the only possibility is ax2 , where a is real. In two variables
(x, y), we have   
  a g x
ax2 + 2gxy + by 2 = x y .
g b y
In three variables,
  
  a g f x
ax2 + by 2 + cz 2 + 2eyz + 2f zx + 2gxy = x y z  g b e  y  .
f e c z
Example 1.2.2. All terms have to be of degree precisely two: in two variables
ax2 + bxy + cy
is not an example unless c = 0.
Definition 1.2.3. Let V be a vector space and let d be a non-negative integer. A real-valued
function f on V is said to be homogeneous of degree d if
f (λv) = λd f (v)
for all real λ and v ∈ V .
Remark 1.2.4. This explains what ‘homogeneous of degree 2’ means above. Homogeneity
is also important later on in this chapter, when we look at Lagrangians and the calculus of
variations (cf. Proposition 1.4.4).
There are two reasons for the introduction of quadratic forms and bilinear forms. First of
all, quadratic forms on vector spaces provide the additional structure needed to define distance.
The other reason bilinear forms is that they, and their multilinear cousins, are essential in the
development of multivariable calculus (see Chapter 4).
The basic example is x2 + y 2 + z 2 in R3 . By Pythagoras’s theorem, this is the square of the
distance of the point (x, y, z) from (0, 0, 0) if we are thinking of a standard system of mutually
perpendicular axes.
In special relativity (see Chapter 2) the physics is captured by a quadratic form in 4 variables,
c t − x2 − y 2 − z 2 . The significance of this will be that
2 2

c 2 t2 − x 2 − y 2 − z 2 = 0
if and only a photon (particle of light) emitted at the origin at t = 0 (and travelling in a straight
line) can pass through the point (x, y, z) at time t.
Thus one should think generally of quadratic forms as defining ‘squares of distances’, or
‘squares of lengths’ of vectors on a vector space. This needs to be taken with a pinch of salt,
thought, since in the above 4-dimensional example, the square of the ‘distance’ can be 0 or even
negative! We shall explore this in detail in the chapter on special relativity.
We define these homogeneous quadratic polynomials in terms of bilinear forms.
Definition 1.2.5. A bilinear form B on a vector space V is a map
B :V ×V →R
with the property that for each fixed v, the maps
w 7→ B(v, w) and w 7→ B(w, v) are linear in w.
To spell this out,
B(v, λu + µw) = λB(v, u) + µB(v, w).
and similarly
B(λu + µw, v) = λB(u, v) + µB(w, v).
Definition 1.2.6. A bilinear form B is said to be symmetric if B(v, w) = B(w, v) for all
v, w. A bilinear form is said to be skew-symmetric (or just skew) if B(v, w) = −B(w, v).

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

1.2. QUADRATIC FORMS AND BILINEAR FORMS 9

Definition 1.2.7. If B is a symmetric bilinear form, then the associated quadratic form is
Q(v) = B(v, v).
A bilinear form is called non-degenerate if for any v 6= 0, there exists w ∈ V such that
B(v, w) 6= 0.
It is equivalent to the corresponding condition with the roles reversed: that is B is also non-
degenerate if for any v 6= 0, there exists w such that
B(w, v) 6= 0
We shall see later that any bilinear form (on a finite-dimensional vector space) can be
represented by a square matrix B̂. Then
• B is symmetric if and only if the matrix B̂ is symmetric (B̂ = B̂ t );
• B is skew-symmetric if and only if the matrix B̂ is skew (B̂ = −B̂ t );
• B is non-degenerate if and only if the matrix B̂ is invertible.
A linear transformation of V is an isometry (or Q-isometry), if we need to keep track of
which quadratic form is under consideration, if it preserves ‘lengths’ as defined by Q:
Definition 1.2.8. Let V be a vector space with a non-degenerate quadratic form Q. A
linear transformation T of (V, Q) is called a Q-isometry if
Q(T v) = Q(v) (1.2.1)
for all vectors v in V . The set of all Q-isometries forms a group denoted O(V, Q).
Remark 1.2.9. Here the O stands for ‘orthogonal’. The condition
Example 1.2.10. If V = R2 and Q(x, y) = x2 + y 2 , then Q gives the ordinary euclidean
length-squared of the vector from (0, 0) to (x, y). You can verify that the linear transformation
(x, y) 7→ (cx + sy, −sx + cy)
where c = cos θ, s = sin θ is a Q-isometry of R2 . Geometrically, this linear transformation just
represents a rotation of the plane, and rotations preserve lengths. This gives a guide to how
you should think about isometries more generally.
1.2.1. Matrix representation. If (e1 , . . . , en ) is any basis of V , and B is any bilinear
form, then we may form the elements B̂ij = B(ei , ej ), a square matrix that we’ll denote B̂. If
we identify vectors x and y of V with their coefficients (x1 , . . . , xn ), (y1 , . . . , yn ) with respect to
this same basis (e1 , . . . , en ), then it follows from the conditions of bilinearity that
Xn
B(x, y) = B̂ij xi yj .
i,j=1
Thinking of x and y as column vectors, we can also write this
B(x, y) = xt B̂y
where xt is the transpose of x, i.e. the row vector with coefficients (x1 , . . . , xn ).
The isometry condition in terms of Q is equivalent to its ‘polarized version’
B(T v, T v ′ ) = B(v, v ′ ) for all v, v ′ ∈ V. (1.2.2)
Cf. Problem 1.2.
And in terms of matrices, this is equivalent to
T̂ t B̂ T̂ = B̂ (1.2.3)
if T̂ is the matrix representation of T with respect to the basis (e1 , . . . , en ).
Note, by taking determinants of (1.2.3) we obtain (since det(AB) = det(A) det(B))
det(T )2 det(B̂) = det(B̂). (1.2.4)
and so det(T ) = ±1 since B̂ is non-degenerate and so det B̂ 6= 0.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

10 1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

Definition 1.2.11. The set of Q-orthogonal T ’s with det(T ) = 1 forms a subgroup denoted
SO(V, Q), read the ‘special orthogonal’ group.
Example 1.2.12. Suppose that B is a symmetric bilinear form on R2 such that
B(e1 , e1 ) = 0, B(e2 , e2 ) = 0
is this enough to determine B uniquely?
Solution 1.2.13. The answer is no, we need to know B(e1 , e2 ) as well to determine B. If
B(e1 , e2 ) = λ, however, then by symmetry also B(e2 , e1 ) = λ. So the matrix representation of
this symmetric bilinear form will be  
0 λ
B̂ = . (1.2.5)
λ 0
1.2.2. Diagonal quadratic forms. If V = Rn , the standard quadratic form of signature
(r, s) has the matrix representation
Q = diag(1, . . . , 1, −1, . . . , −1) (1.2.6)
with respect to the standard basis, where there are with r +1’s and s −1’s.
Remark 1.2.14. In the pure mathematical literature, the difference r − s of the number of
+1’s and the number of −1’s is often called the signature. On the other hand it is also common
to refer to a quadratic form as having signature +, +, +, + or +, −, −, − rather than 4 or −2.
There is no real risk of confusion with these slight variations.
The corresponding orthogonal group is denoted by Or,s or O(r, s) and the corresponding
subgroup of elements with determinant equal to +1 is denoted SOr,s or SO(r, s).
One can make the case that the most important cases are s = 0, when the groups are also
just denoted O(n) and SO(n) (for ordinary classical n-dimensional euclidean geometry) and
s = 1, r = 3, for the study of special relativity. In this case O(1, 3) is called the Lorentz group.
The following is a basic fact about non-degenerate quadratic forms:
Theorem 1.2.15. Let V be a finite-dimensional vector space and let Q be a non-degenerate
quadratic form on V . Then there exists a basis of V so that the matrix Q̂ of Q in this basis
takes the standard form (1.2.6) for some particular r and s (which depend on Q).
Proof. (Sketch, useful to be aware of it, but not examinable.)
Pick any v at random with Q(v) 6= 0. [If Q(v) = 0 for all v, then by polarization (see
problem set) B(v, w) = 0 for all v and w, contradicting the non-degeneracy of Q. So such v
does exist.] p
Replacing v by e1 = v/ |Q(v)|, we get
Q(v)
Q(e1 ) = = ±1.
|Q(v)|
Let V ′ be the orthogonal complement of e1 with respect to Q, that is
V ′ = {w ∈ V : Q(e1 , w) = 0}.
Because Q(e1 ) = ±1 and Q is non-degenerate, this is an n − 1-dimensional subspace and the
restriction Q′ of Q to V ′ is a non-degenerate quadratic form. If we choose any basis e2 , . . . , en
spanning V ′ , then the matrix representation of Q in this basis will have the form
 
±1 0 · 0
 0 b′ b′n2 
 22 · · · 
Q̂ =  .. .. ..  (1.2.7)
 . . . 
0 b′n2 · · · b′nn
The top left-hand element is Q(e1 ). The zeros in the first row and first column come from the
orthogonality condition B(e1 , w) = 0 for all w ∈ V ′ .

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

1.3. CURVES, TANGENT VECTORS AND SO ON 11

We suppose by induction that the theorem has already been proved for non-degenerate
quadratic forms on vector spaces of dimension < n. Then we can choose e2 , . . . , en so that the
matrix of b′ij is diagonal with entries ±1. 

Recall that if all the signs in (1.2.6) are + then we call Q (or B) positive-definite; if they
are all −, negative-definite. If the signature is r − s, then r is the largest possible dimension of
subspaces of V on which Q is positive-definite; and similarly s is the largest possible dimension
of subspaces of V on which Q is negative-definite. Note, however, that such subspaces are not
unique.

1.2.3. Affine version. Suppose we want to incorporate translations. We may define the
group of affine isometries of (V, Q) to be all maps of the form
x 7→ T x + b, where T ∈ O(V, Q) and b ∈ V.
In the case of of interest in relativity, where Q has signature (+, −, −, −), this is called the
Poincaré group.
In this affine case, we should think of Q as giving the length-squared of position vectors of
one point relative to another, that is
−−→ −−→
Q-length-squared of AB = Q(AB).
In a euclidean space, (i.e. Q is positive-definite) we can also measure angles between dis-
−−→
placement vectors: if X, Y and Z are three points, then the angle, θ, that u = XY makes with
−−→
v = XZ satisfies
B(u, v)
cos θ = p p .
Q(u) Q(v)
Thus we have lengths and angles as in ‘ordinary’ two- or three-dimensional euclidean geometry.
Notation 1.2.16. In a euclidean vector space with fixed positive definite quadratic form it
is common to refer the associate bilinear form as an inner product and denote p= hu, vi.
pit B(u, v)
Similarly the length of a vector in this context is often simply denoted |u| = Q(u) = B(u, u).

Remark 1.2.17. Why have I gone through all this? First of all, why should it be that vector
spaces or affine spaces are the right thing for describing the world?
We agree that triples of real numbers (x, y, z) are very good for describing where things are
in the world (or indeed the solar system...). But the key thing about vector and affine spaces
(which we’ll discuss next) is that there is a distinguished set of curves (or paths or trajectories),
namely the straight lines. These are distinguished in the sense that if you take a straight line
and apply an affine transformation (translation and linear transformation), you get another
straight line.
Fast forward to Newton’s laws of motion: the first of these says that a particle remains at
rest or continues to move with a constant velocity unless acted upon by an external force.
Thus straight lines have a physical significance, as the trajectories of free particles. In this
section, we have learned or recalled the underlying geometry of 3-dimensional spaces of points
(x, y, z), as vector or affine spaces. We’ve discussed the symmetries, both linear and affine, of
such spaces. We’ve introduced bilinear forms so that we have a notion of length and distance
in our vector space. And this flat geometry seems a good basis for Newtonian physics, because
it has straight lines in some sense built in, and these are the trajectories of particles not acted
upon by any force.

1.3. Curves, tangent vectors and so on


For our purposes, the best definition of a curve in a vector space (or an affine space) is as a
smooth map
γ:I→V

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

12 1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

where I is some interval which may be open, closed, (semi-)infinite, whatever. If I is closed
(and bounded), I = [u, v], say, then the endpoints of the curve are γ(u) and γ(v).
We generally assume the parameterisation is regular i.e. γ ′ (t) 6= 0 for every t ∈ I. Then
γ ′ (t) is called the velocity vector and it is tangent to the curve. If we choose a basis of V , then
γ(t) is represented in terms of its components (γ1 (t), . . . , γn (t)), where each of the γj is just an
ordinary smooth function of the variable t. Then γ ′ (t) has component (γ1′ (t), . . . , γn′ (t)).
If γ ′ (t0 ) = 0 for a point t0 of I, then the curve can be singular (have a sharp corner) at that
point. We don’t want to consider such singularities.
The image of γ is sometimes called the trace of the curve.
The vector γ ′′ (t) with components (γ1′′ (t), . . . , γn′′ (t)) along the curve is called the acceleration
vector.
Example 1.3.1. If
a = (a1 , . . . , an ) and b = (b1 , . . . , bn )
are two (constant) vectors in Rn , with b 6= 0, then we may construct the parameterized straight
line
γ(t) = a + bt (1.3.1)

through a in the direction b. In fact the tangent vector γ (t) = b (i.e. b is the velocity vector in
this case) and the acceleration γ ′′ (t) is zero.
Example 1.3.2. In R2 , consider the parameterized circle
γ(t) = (a cos t, a sin t), (1.3.2)
where a > 0 is a constant. Then
γ ′ (t) = (−a sin t, a cos t) and γ ′′ (t) = (−a cos t, −a sin t).
1.3.1. Curves in a euclidean space. Let V be a euclidean space, i.e a real vector space
equipped with a positive-definite inner product h·, ·i and length2 denoted by | · |2 (cf. Nota-
tion 1.2.16).
Definition 1.3.3. A curve γ : I → V is parameterized by arc-length if |γ ′ (t)|2 = 1 for all t.
Proposition 1.3.4. If γ : I → V is any regular curve in a euclidean vector or affine space,
then there is a reparameterization of the curve so that it is parameterized by arclength.
In other words, there is a smooth inverible map τ : I → J, with τ ′ (t) > 0 for all t and such
that
γ̃ : J → V, γ̃(τ ) = γ(t(τ ))
satisfies
|γ̃ ′ (τ )|2 = 1 for all τ ∈ J. (1.3.3)
Proof. Let γ(t) be the given curve. Let τ be any increasing smooth invertible function of
t and define γ̃(τ ) as above, or equivalently,
γ̃(τ (t)) = γ(t).
Differentiating (chain rule),
γ̃ ′ (τ )τ ′ (t) = γ ′ (t).
Take the length-squared, to get
(τ ′ (t))2 |γ̃ ′ (τ )|2 = |γ ′ (t)|2 .
Imposing that τ is arclength, we get the equation
τ ′ (t) = |γ ′ (t)|.
where we choose the positve square consistently with τ being an increasing function of t. Since
we assume that |γ ′ (t)| > 0 (regularity of the curve), we get an equation which determines τ as
a function of t, uniquely up to the addition of a constant. 

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

1.4. CALCULUS OF VARIATIONS 13

1.4. Calculus of variations


The calculus of variations is the art of finding functions that minimize certain functionals.
It may be thought of as an infinite-dimensional version of the ordinary applications of calculus
to find maxima and minima of functions. It is infinite-dimensional in the sense that the the
‘independent variables’ are now functions and these live in an infinite-dimensional space.
A typical example would be the following: amongst all regular curves γ : [0, 1] → Rn , joining
p to q, find the one which minimizes the energy
Z
1 1
E [γ] = |γ̇(t)|2 dt. (1.4.1)
2 0
(Here, for consistency with later sections, we use γ̇ for the derivative of the curve γ.) This is
the main example we shall consider, though it will be substantially generalized when we come
to consider geodesics in general manifolds or space-times.
Calculus of variations was treated in a previous course, and we shall remind you of some of
the notation and other facts here.
First of all, the thing being integrated is called the Lagrangian. The simplest (time-
independent) Lagrangians are functions of 2n variables, L = L(x, y), and then the functional to
be minimized/maximized will be
Z 1
F [γ] = L(γ(t), γ̇(t)) dt where γ(0) = p and γ(1) = q. (1.4.2)
0
Notation 1.4.1. In this section, for consistency with later applications we are going to
denote the components of x and y with upstairs indices. To be clear about this, in the previous
paragraph (taking n = 3 for notational simplicity) x and y both stand for triples of numbers, or
points in a 3D vector space. If we choose a basis to expand these vectors, we would get triples
of real numbers which we are going to denote
x = (x1 , x2 , x3 ) and y = (y 1 , y 2 , y 3 ).
This notation takes some getting used to: x2 is not the square of x! With luck you’ll get used
to this pretty quickly, and will learn not to think that x2 is the square of x.
We shall also write x = (xa ), y = (y a ), with the understanding that the index a runs from
1 to n.
Theorem 1.4.2. If γ is a sufficiently smooth curve which is an extremal for (1.4.2), and
the Lagrangian is sufficiently nice, then we have
d ∂L ∂L
a
− a = 0, xa (0) = pa , xa (1) = q a . (1.4.3)
dt ∂y ∂x
‘Extremal’ means that γ is a local maximum of minimum of (1.4.2) amongst all curves with
the same endpoints.
It is very important to understand the meaning of the d/dt here. It means that we calculate
the partials of L wrt the y a , and then we substitute the values xa (t) for xa and ẋa (t) for y a ,
before differentiation with respect to t.
These equations (1.4.3) are called the Euler–Lagrange equations. The Lagrangian is often
written as a function L(xa , ẋa ) and the Euler–Lagrange equations as
d ∂L ∂L
a
− a = 0. (1.4.4)
dt ∂ ẋ ∂x
With experience, there is no problem with this, except that one has to say something about re-
garding the xa and ẋa as independent variables when computing the partials and then regarding
them ∂L/∂ ẋa as a function of t before differentiation wrt to t.
Example 1.4.3. Consider the energy functional
1 1 
L(x, y) = |y|2 = (y 1 )2 + · · · + (y n )2 . (1.4.5)
2 2

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

14 1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

We have
∂L ∂L
= 0, a = y a . (1.4.6)
∂xa ∂y
Inserting y a = ẋa (t) and subbing in, we get the EL equations:
ẍa = 0. (1.4.7)
We learn that xa (t) = pa + v a · t where pa and v a are constants. Of course, this is just
the parametric form of the straight line through pa in the direction v. Note that with this
parameterization, the straight line is traversed at constant velocity v as well. In fact, given
xa (1) = 1, v a = q a − pa .
1.4.1. Coordinate invariance. If we have a change of coordinates, z a as an invertible
function of the xa , then the result of minimizing a particular functional should not change. Of
course the explicit formulae giving the z a as a function of t will be different from the formulae
giving the xa as a function of t. The relation will be
z a (t) = z a (x(t)). (1.4.8)
As a telling example, consider the same energy functional in spherical polars (r, θ, ϕ). We
have Z
1  2 
E[γ] = ṙ + r2 (θ̇2 + sin2 θϕ̇2 dt (1.4.9)
2
We calculate
∂L ∂L ∂L
= ṙ, = r2 θ̇, = r2 sin2 θϕ̇. (1.4.10)
∂ ṙ ∂ θ̇ ∂ ϕ̇
and
∂L ∂L ∂L
= r(θ̇2 + sin2 θϕ̇2 ), = r2 sin θ cos θϕ̇2 , = 0. (1.4.11)
∂r ∂θ ∂ϕ
Hence the Euler–Lagrange equations are as follows
r̈ = r(θ̇2 + sin2 θϕ̇2 ), (1.4.12)
d  
r2 θ̇ = r2 sin θ cos θϕ̇2 , (1.4.13)
dt
d 2 2 
r sin θϕ̇ = 0. (1.4.14)
dt
It hardly needs saying that these equations are much more daunting than the same system in
euclidean coordinates. There are, however, some tricks to lead us to a solution, which we shall
discuss next.

1.4.2. Symmetries imply conservation laws. A Lagrangian is said to have a ‘sym-


metry’ if it is independent of a coordinate. For example the energy Lagrangian expressed in
euclidean coordinates is independent of all the xa (it depends only upon the ẋa ), while in polar
coordinates it is independent of ϕ. By the Euler–Lagrange equations, if ∂L/∂xa = 0, for a par-
ticular a, then ∂L/∂ ẋa is constant along a solution curve. We also sometimes say ‘is a constant
of the motion’.
This principle is very important in many cases, because by knowing that certain quantities
are constant we can render the Euler–Lagrange equations easier to solve.
Another important symmetry principle is the following: Suppose that we have a Lagrangian
of the form
L(x, y) = T (x, y) − U (x, y) (1.4.15)
Proposition 1.4.4. Suppose that L is as in (1.4.15) and moreover T (x, y) is homogeneous
of degree 2 in y for each fixed x. Then
E(x, y) = T (x, y) + U (x, y)
is constant along solutions of the Euler–Lagrange equations (1.4.3).

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

1.4. CALCULUS OF VARIATIONS 15

The physical interpretation is that T (x, ẋ) is the kinetic energy and U (x, ẋ) is the potential
energy. In many physical situations, the kinetic energy is quadratic in the velocities, so satisfies
the assumptions of the Proposition. Then E is the total energy, and the result is that the total
energy (kinetic + potential) is conserved.
Proof. See Problem 1.11. 
Example 1.4.5. Let us see how to use these ideas to solve the system (1.4.12), or at least
how to obtain some of the solutions. We use the principles we’ve just discovered: along a
solution, the energy itself is constant,
ṙ2 + r2 (θ̇2 + sin2 θϕ̇2 ) = 2E (1.4.16)
and because L is independent of ϕ the angular velocity
J = r2 sin2 θϕ̇ (1.4.17)
is also constant. The full analysis is complicated, but we see that if θ = π/2, then the θ-equation
is satisfied, so we can start by considering those solutions with θ = π/2 identically1. (Note that
θ = 0 or π are not such good choices, as these values of θ are singularities of the coordinate
system.)
So, let us substitute θ = π/2 and see what we’re left with:
ṙ2 + r2 ϕ̇2 = 2E, J = r2 ϕ̇ (1.4.18)
are both constants.
We now distinguish two cases: J = 0 ‘radial’ and J 6= 0, ‘non-radial’.
√ If J = 0, then either
r = 0 which is not terribly interesting or ϕ̇ = 0 and then ṙ = 2E. So what we have found is
the path √
γ(t) given by r = 2Et, θ = π/2, ϕ = ϕ0
This is a straight line emanating from the origin at t = 0 in the equatorial plane going through
the line of longitude ϕ = ϕ0 .
In the non-radial case (perhaps with the benefit of hindsight) it turns out to be better to
try to find u = 1/r as a function of ϕ. We use
du u̇
= .
dϕ ϕ̇
So divide (1.4.18) by ϕ̇2 , getting
 2
dr 2E
+ r2 = (1.4.19)
dϕ ϕ̇2
Thus if u = 1/r,
 2
1 du 2E
+ r2 = (1.4.20)
u4 dϕ ϕ̇2
so multiplying up by u4 ,

du 2 2E 2E
+ u2 = 4 2 = 2 . (1.4.21)
dϕ r ϕ̇ J
We can now integrate this to obtain an equation of a non-radial, equatorial geodesic.
We can solve this by differentiating with respect to ϕ:
d2 u
+ u = 0.
dϕ2
Hence u = u0 cos(ϕ − ϕ0 ) for arbitrary constants ϕ and ϕ0 . This is the equation of a straight
line in polar coordinates, consistently with Example 1.4.3.
In Problem 1.10, you are encouraged to work through the slightly more involved problem of
finding the orbit of a particle around a heavy star, using the same ideas.
1Actually, if we have any solution curve, then by using the spherical symmetry of the problem and re-orienting
our coordinate axes, we can reduce to this case

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

16 1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

Remark 1.4.6. In the calculus of variations, you can consider more general time-dependent
Lagrangians, i.e. functions L(x, y, t). While the EL equations continue to give extrema in this
case also, Proposition 1.4.4 is false in general.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

CHAPTER 2

Minkowski Spacetime and Special Relativity

2.1. Introduction
In this chapter we try to explain how Newton’s basic laws had to be changed to take into
account the famous ‘null result’ of the Michelson–Morley experiment which appeared to show
that the the speed of light was the same independent of the motion of the light-source.
In newtonian physics, there are such things as particles, masses and so on. The basic tenet
of newtonian physics, in modern language, is that there are inertial frames of reference, and an
observer at rest in such a frame observes free particles to remain at rest or travel at constant
speeds. S/he also observes the famous law F = ma to hold for masses acted upon by forces.
These inertial frames are such that if R is such a frame and R′ is moving with constant velocity
(i.e. constant speed and direction) with respect to R, then R′ is also an inertial frame. (See
Woodhouse, Special Relativity, Ch. 1 for more details. Those with an interest in history may
be interested in the end of Sect. 1.4, which indicates that Newton was aware of the idea of
relativity, but preferred to present things in a different way.)
Suppose that Alice and Bob are observers at rest respectively in R and R′ , and that R′ is
moving at constant speed relative to R. With suitable orientation of the axes, we’d expect
x′ = x − vt, y ′ = y, z ′ = z (2.1.1)
where v is the relative speed of the frames. (If Bob is sitting at (x′ , y ′ , z ′ )
= (0, 0, 0), then Alice
gives Bob’s coordinates as x = vt, y = z = 0.)
This transformation (often known as the Galilean transformation between inertial frames)
is clearly incompatible with the idea that the speed of light is the same in all inertial frames.
The idea that this should be the case—and more generally that all physics should appear the
same to all (inertial) observers—is the first of Einstein’s famous postulates:
1. The laws by which the states of physical systems undergo change are not affected,
whether these changes of state be referred to the one or the other of two systems of
coordinates in uniform translatory motion.
2. As measured in any inertial frame of reference, light is always propagated in empty
space with a definite velocity c that is independent of the state of motion of the emitting
body.
In slogan form:
1. The laws of physics are the same in all inertial frames of reference.
2. The speed of light in free space has the same value c in all inertial frames of reference.
If the laws of physics are the same for any inertial frame, then it follows that no experiment
can be performed that will single out one frame as preferred above all others. In particular,
there can be no ‘absolute standard of rest’ and only relative motion is physically meaningful.
Note that the constancy of the speed of light means that something has to go wrong with
the ‘obvious’ change of coordinates (2.1.1). More explicitly, this change of coordinates is not
compatible with Maxwell’s equations of electrodynamics.
It is convenient to make certain subsidiary assumptions explicit as well:
P1 Free particles and photons (light particles) appear to inertial observers to travel in
straight lines at constant speeds.
P2 Photons appear to travel at the same speed c to all inertial observers.
P3 The standard clock of one inertial observer appears to any other observer to run at a
constant rate.
17

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

18 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

P4 Free particles cannot travel faster than c.


Note that in P3, although we assume that all clocks are observed by inertial observers to
run at constant rates, we do not assume that these rates are all the same: if Alice and Bob
are inertial observers moving relative to each other and are carrying identical clocks, Alice will
observe the ‘ticks’ of Bob’s clock to be at equal intervals as measured by her clock. However,
the intervals between these two sets of ‘ticks’ are not assumed to be the same.
Consideration of Maxwell’s equations, and the wave equation satisfied by EM radiation,
suggest that it is sensible to consider a 4-dimensional vector space as the natural arena for
physics. We need to give this space a structure which allows us to capture the constancy of the
speed of light.
Definition 2.1.1. Minkowski Spacetime M (or sometimes just Minkowski Space) is a
four-dimensional affine space equipped with with a Lorentzian symmetric bilinear form η.
‘Lorentzian’ means non-degenerate, signature (+, −, −, −).
The points of M are called events (as they are localized in both space and time). If we
−−→
pick a particular event E in M, we denote the set of position vectors EP , P ∈ M, by M (or
ME ). For the most part we can blur the distinction between M and M—it is, after all, just
the choice of an event—but for a complete understanding, it’s worth making the effort to keep
them separate.
Having chosen E to be an origin, we can now choose a basis (e0 , e1 , e2 , e3 ) of M and expand
the position vector of the general point P as a combination
−−→
EP = cte0 + xe1 + ye2 + ze3 . (2.1.2)
for real numbers (t, x, y, z). We may suppose that η is diagonal with respect to this basis, with
one 1 and three −1s on the diagonals, so that
−−→ −−→
η(EP , EP ) = c2 t2 − x2 − y 2 − z 2 .
(Here c is going to be the speed of light and is included for dimensional consistency to convert
the time coordinate t into a quantity with the units of length.) We interpret t and (x, y, z)
as respectively the time and space (or temporal and spatial, if you want to use the correct
adjectives) coordinates. Then
−−→ −−→
η(EP , EP ) = 0 ⇔ c2 t2 = x2 + y 2 + z 2
and this is the case if and only if something travelling at speed c, starting from (0, 0, 0) at time
0 reaches (x, y, z) at time t.
If we take c to be the speed of light, then the geometry of M with
its bilinear from η has the geometry of ‘light-rays’ built into it:
two events E and F can be connected by a light-ray if and only if
−−→ −−→
η(EF , EF ) = 0.
In what follows, we answer the various questions about how P1–P4 are captured by the
geometry of M and η.
From now on we usually take the speed of light c to be equal to 1.
This can be regarded as a choice of units. (For example, if distance is measured in light-
years, and time in years, then c = 1.) In any given formula, you can always see where the
factors of c should go by dimensional analysis.

2.2. What are the inertial coordinate systems?


Hypothesis 2.2.1. The inertial coordinate systems are those obtained as above by fix-
ing a particular event E as an origin and introducing coordinates corresponding to a basis
(e0 , e1 , e2 , e3 ) of M with respect to which the components of η are
η(e0 , e0 ) = 1, η(e1 , e1 ) = η(e2 , e2 ) = η(e3 , e3 ) = −1, η(ea , eb ) = 0 for a 6= b. (2.2.1)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.3. WORLDLINES 19

Note that there are many choices of inertial coordinate systems. From the mathematical
point of view, this is because there are many different choices of basis of M with respect to
which η takes standard diagonal form. From the physical point of view this is because there are
many different inertial observers, all on an equal footing.
Remark 2.2.2. Note that if X and Y are two vectors in M , and if (e0 , e1 , e2 , e3 ) is a basis
as in (2.2.1) then if
X = X 0 e0 + X 1 e1 + X 2 e2 + X 3 e3 , Y = Y 0 e0 + Y 1 e1 + Y 2 e2 + Y 3 e3
we have
η(X, Y ) = X 0 Y 0 − X 1 Y 1 − X 2 Y 2 − X 3 Y 3 .

2.3. Worldlines
Recall that anything—particle, observer, photon—which exists for an extended period of
time, is described in Minkowski spacetime by a worldline. This is a curve in M consisting of all
the events through which our particle, observer, photon passes.
For example, suppose Alice is at rest at the spacial origin of an inertial coordinate system
(t, x, y, z). The events on Alice’s world line have coordinates of the form (t, 0, 0, 0), t being the
time on the clock that Alice has beside her.
More generally, Alice observes a particle by noting its (x, y, z) coordinates for different times
t. In other words she observes the particle’s world-line in the form of a curve
Γ(t) = (t, x(t), y(t), z(t))
in the given coordinates.
It is often useful to ‘decouple’ the parameter which parameterises the curve from an ob-
server’s time coordinate, replacing the above by the more general form
Γ(τ ) = (t(τ ), x(τ ), y(τ ), z(τ ))
so that all 4 coordinates depend on the parameter τ .
Example 2.3.1. If Alice is an inertial observer who sets up an inertial coordinate system
as above with herself at the (spatial) origin x = y = z = 0, then her worldline will be
t(τ ) = τ, x(τ ) = y(τ ) = z(τ ) = 0.
Definition 2.3.2. For the worldline Γ(τ ) of a particle, observer or photon in M, dΓ/dτ is
called the velocity 4-vector.
The use of the term 4-vector is traditional. It helps to distinguish this vector from ‘ordinary’
velocity vectors: e.g. the velocity vector of a particle as measured by an observer.
Note that in terms of the original parameterization, Γ(t) = (t, x(t), y(t), z(t)),
 
dΓ dx dy dz
= 1, , , .
dt dt dt dt
and the spatial part of this is the 3-vector
 
dx dy dz
, , .
dt dt dt
which is the instanteous velocity of the particle as calculated by Alice when her clock says time
t.
For now, we shall mainly be concerned with straight, constant-speed worldlines: i.e. where
Γ has the form
Γ(τ ) = X + V τ (2.3.1)
where X and V are constant vectors in M . (Here again we are regarding M and M as the same
by choice of an event E of M corresponding to the zero-vector in M .
We now have to see how P2 and P4 are to be interpreted: photons travel at speed c = 1 as
measured by any inertial observer, and no particle is ever observed to travel faster than light.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

20 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

2.3.1. What are the photon worldlines? Suppose that a photon is emitted by a laser
at the event with coordinates (0, 0, 0, 0) and passes through the event (t, x, y, z), relative to the
above inertial coordinate system. In other
p words, at the later time t, its spatial coordinates are
(x, y, z). Then the distance covered is x2 + y 2 + z 2 but this must be equal to t as the speed
−−→
is 1. In particular, if E and P are two events on the worldline of a (free) photon, then EP is
−−→ −−→
null in the sense that η(EP , EP ) = 0. [NB, a null vector need not be the zero vector!]
The following definition is useful:
Definition 2.3.3. Two events P and Q are null-separated if the displacement vector X =
−−→
P Q is null, i.e. η(X, X) = 0.
Remark 2.3.4. This definition depends only upon the events P and Q, and the form η; it
does not depend upon any choice of inertial basis or coordinate system.
To flesh this remark out: We saw by calculation in a particular inertial frame, that if P and
−−→
Q are two events on a photon worldline, then P Q is a null vector. But the latter is a statement
purely about the geometry of M: it uses only the basic facts that given any two events we have
a displacement vector, and that we can feed vectors to η. In particular all inertial observers
agree about when a pair of events are null separated, and hence the speed of light is the same
for all such observers.
This leads us to the following
Hypothesis 2.3.5. The worldline of a photon has the form
Γ(τ ) = X + N τ, (2.3.2)
where X and N are constant vectors and N is null, i.e. η(N, N ) = 0.
This hypothesis is justified, to some extent, by the following:
Proposition 2.3.6. If P1 and P2 are any events on the worldline (2.3.2), then P and Q
are null-separated.
Proof. If P1 and P2 correspond to parameter values τ1 and τ2 , then
−−−→
P1 P2 = (X + N τ2 ) − (X + N τ1 ) = (τ2 − τ1 )N. (2.3.3)
Now, by the bilinearity of η,
η((τ2 − τ1 )N, (τ2 − τ1 )N ) = (τ2 − τ1 )2 η(N, N ) = 0. (2.3.4)


In summary, we have seen that if inertial coordinate systems are defined as in Hypothe-
sis 2.2.1 and free photon worldlines are as in Hypothesis 2.3.5, then all inertial observers agree
on the speed at which photons travel.

2.3.2. What are the free particle worldlines? Return to the two events E and P at
the beginning of the previous section, and suppose now that they are on the worldline of a
particle travelling at uniform speed v with 0 6 v < 1. Then we must have
p
x2 + y 2 + z 2 = |vt| < |t| (2.3.5)
and so
t2 − x2 − y 2 − z 2 > 0. (2.3.6)
Definition 2.3.7. A vector X ∈ M is timelike if η(X, X) > 0. Two events P and Q are
−−→
timelike separated if the displacement vector P Q is timelike.
We now make the free-particle hypothesis:

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.5. SPACETIME DIAGRAMS 21

Hypothesis 2.3.8. The worldline of a free particle has the form

Γ(τ ) = X + V τ (2.3.7)

where X and V are constant vectors of M and V is timelike. The parameter τ is called proper
time if η(V, V ) = 1. In this case, if P1 and P2 are two events on the worldline with parameter
values τ1 and τ2 respectively, then τ2 − τ1 is interpreted as the elapsed time between these two
events as measured by a clock carried by an observer on this worldline.

Remark 2.3.9. As for null-vectors, the notion of a vector being time-like is independent of
any choice of observer or coordinate system. In particular if one inertial observer thinks that a
particle is travelling at speed less than that of light, all observers will agree on this.

Remark 2.3.10. Note the analogy between a curve being parameterized by proper time
here and the idea of unit-speed curves being parameterized by arc-length for ‘ordinary’ curves
in euclidean space.

As in the case of photon worldlines, we started in Alice’s coordinate system (t, x, y, z), and
calculated that the events E and P are on the worldline of a particle moving at speed < 1 if
−−→
(and only if) the displacement vector EP is timelike. This, however, is a statement which is
independent of any particular choice of inertial coordinate system. Thus it must be the case
that Bob, with an inertial coordinate system (t′ , x′ , y ′ , z ′ ) will also calculate that E and P are
events on the worldline of a particle moving at speed less than 1.

2.4. Why do clocks carried by inertial observers all go at uniform rates?


Let us remember that the postulate P3 states that if Alice and Bob are inertial observers,
possibly in relative motion, then if Alice looks at Bob’s clock, she will see that it is ticking at
a uniform rate, but that this rate may be different from the rate at which her own (identical)
clock is ticking.
Suppose that Bob has worldline
Γ(τ ) = V τ (2.4.1)
(so that he passes through the event E at parameter value τ = 0). Recall the hypothesis that
τ is proper time (i.e. the time as measured on the clock he’s carrying with him) if η(V, V ) = 1.
In Alice’s coordinate system (t, x, y, z), this has the form

t(τ ) = V 0 τ, x(τ ) = V 1 τ, y(τ ) = V 2 τ, z(τ ) = V 3 τ. (2.4.2)

where
(V 0 )2 − (V 1 )2 − (V 2 )2 − (V 3 )2 = 1. (2.4.3)
In particular V 0 6= 0 and we find the time τ measured on Bob’s clock is related to Alice’s time
coordinate t by the fixed multiple V 0 . So with all our hypotheses made about what inertial
frames are, we see that each of P1—P4 are now satisfied.

2.5. Spacetime diagrams


Minkowski spacetime can be pictured by suppressing one or two of the spatial dimensions
and drawing a picture with time going up the page and x or x and y going across.
Suppressing y and z, leaving just a space variable x and a time variable t in play, a typical
spacetime diagram is shown below. I’ve drawn in the worldlines of two free massive particles,
two photon worldlines and the axes of two inertial coordinate systems.
Note that the photon worldlines are at 45◦ and that the free particles have worldlines inclined
at less than 45◦ to the vertical.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

22 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

t-axis t′ -axis
free particle world-line

x′ -axis

x-axis

photon worldlines
free particle worldline

2.6. Time dilatation—‘moving clocks run slowly’


The calculation of the previous section allows us to understand and quantify the sense in
which moving clocks run slowly. We imagine our two inertial observers, Alice and Bob, with
their identical clocks, one moving relative to the other. The slogan ‘moving clocks run slowly’
actually means that Alice will observe Bob’s clock runing more slowly then her clock (and
symmetrically, Bob will observe Alice’s clock running slowly, by the same factor).
Based on our hypotheses, we can work out exactly what is going on here. In the previous
section, we had that Alice set up inertial coordinates (t, x, y, z), with her worldline being x =
y = z = 0. The time coordinate t is time as measured on her clock.
We saw that if Bob’s worldline is Γ(τ ) = V τ , with η(V, V ) = 1, then the components
(v 1 , v 2 , v 3 ) of Bob’s velocity with respect to Alice’s coordinates are related to Bob’s 4-velocity
vector by
v i = V i /V 0 , (i = 1, 2, 3).
At this point it is convenient to use ordinary euclidean 3-dimensional vector notation v for the
three dimensional vector with components (v 1 , v 2 , v 3 ). We can write
(V 0 , V 1 , V 2 , V 3 ) = V 0 (1, v)
from which
η(V, V ) = (V 0 )2 (1 − |v|2 ) = 1.
Thus
1
V0 = p .
1 − |v|2
The RHS here is usually denoted by γ(v) (where as usual v = |v|),
1
γ(v) = p . (2.6.1)
1 − |v|2
This has to be replaced by
1
γ(v) = p . (2.6.2)
1 − |v|2 /c2
in units in which c 6= 1. Notice that
γ(v) > 1 (2.6.3)
with equality if and only if v = 0. Moreover γ(v) → ∞ as v → c.
Now the relation
t = V 0 τ = γ(v)τ (2.6.4)
encodes the time dilatation or ‘moving clocks run slowly’: indeed, if P1 and P2 are two events
on Bob’s world line which occur at parameter values τ1 and τ2 , then he reckons that the time

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.7. SIMULTANEITY AND DISTANCE 23

difference is τ2 − τ1 . Alice however, reckons that the time difference between these two events
is γ(v)(τ2 − τ1 ), so the time elapsed is greater, according to Alice, by a factor of γ(v).

2.7. Simultaneity and distance


We have made a hypothesis about how coordinates introduced by diagonalizing η are inertial
coordinates introduced by inertial observers and how unit speed straight lines are worldlines
parameterized by proper time.
There is a good question, though, which is how would an observer actually try to set up
coordinates without appealing to an absolute standard of rest. So suppose Alice wants to set
up such coordinates.
Suppose that F is any event in M. Alice is travelling on her straight world line which doesn’t
pass through F . She sends out light signals and lets them scatter off F . She finds that the signal
emitted at time τ1 on her clock scatters off F and is picked up by her at time τ2 . She infers
two things: that the distance to F is c(τ2 − τ1 )/2. And that F should have time coordinate
1
2 (τ1 + τ2 ). This is the radar method of assigning times and measuring distances.
So using only allowable methods, she assigns a position and time coordinate to F .
Let’s see how all this looks in terms of trajectories and worldlines.
If Alice’s trajectory is Γ(τ ) = U τ , and F has position vector Y relative to the chosen origin,
the photon trajectory is
σ 7→ U τ1 + N σ
on the outward leg and
σ ′ 7→ U τ2 + N ′ σ ′
on the return leg. Here N and N ′ are null vectors, and we may suppose that the parameters σ
and σ ′ are chosen so that these trajectories hit Y at σ = 1, σ ′ = 1:

U τ1 + N = Y = U τ2 + N ′ (2.7.1)

The displacement vector from E to F is


−−→ 1 1 1
X := EF = Y − (τ1 + τ2 )U = (τ2 − τ1 )U + N ′ = (τ1 − τ2 )U + N (2.7.2)
2 2 2
We also assume η(U, U ) = 1 so that Alice’s worldline is parameterised by her proper time τ .
−−→
Proposition 2.7.1. η(U, EF ) = 0.

Proof. From (2.7.1),


U (τ1 − τ2 ) = N ′ − N. (2.7.3)
Take the η-inner product of this with N and with N ′ to get

(τ1 − τ2 )η(U, N ) = η(N, N ′ ) (2.7.4)


(τ1 − τ2 )η(U, N ′ ) = −η(N, N ′ ) (2.7.5)

where we’ve used


η(U, U ) = 1, η(N, N ) = η(N ′ , N ′ ) = 0.
We can get our hands on η(N, N ′ ) by squaring (2.7.3)

η(N ′ − N, N ′ − N ) = (τ1 − τ2 )2 η(U, U ) so − 2η(N, N ′ ) = (τ1 − τ2 )2 . (2.7.6)

Combining with (2.7.4),


1
η(U, N ) = − (τ1 − τ2 ). (2.7.7)
2

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

24 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

Now we calculate
 
−−→ 1
η(U, EF ) = η U, N + (τ1 − τ2 )U
2
1
= η(U, N ) + (τ1 − τ2 )
2
1 1
= − (τ1 − τ2 ) + (τ1 − τ2 )
2 2
= 0
as required. 
If you didn’t like that proof, here is another. There are often several different ways to
accomplish the same thing.
Proof. The idea of this proof is to write everything in terms of the null vectors N and N ′ .
It is perhaps a more symmetrical proof than the previous one. From (2.7.3), we obtain
N′ − N
U= . (2.7.8)
τ1 − τ2
−−→
We also have the two formulae for EF in (2.7.2). Adding these, we get
−−→
2EF = N + N ′ . (2.7.9)
Now we calculate
−−→ 1
η(U, EF ) = η(N + N ′ , N ′ − N ) = 0. (2.7.10)
2(τ1 − τ2 )
by using again the bilinearity of η to expand the RHS. 
U τ2
N′
A’s world-line Γ(τ ) = U τ
F

E N

U τ1

Definition 2.7.2. If Alice is moving uniformly with 4-velocity vector U , then she reckons
two events E and F to be simultaneous if
−−→
η(U, EF ) = 0.
These events then have a well-defined spatial separation d, where
−−→ −−→
d2 = −η(EF , EF ).
However we look at it, the key point is that if we have two observers, Alice and Bob, moving
relative to each other, then they will generally disagree about which pairs of distant events are
simultaneous.
From the mathematical or geometric point of view, they have different 4-velocity vectors U
−−→
and V . If F and G are distant events, Alice thinks they are simultaneous if η(U, F G) = 0, while
−−→
Bob thinks they are simultaneous if η(V, F G) = 0. These are different conditions, and if one is
satisfied, then there is no guarantee that the other one will also be.
Definition 2.7.3. Let P and Q be two particles. If Alice is an inertial observer with
4-velocity U , she measures the distance between these two particles at time τ on her clock by

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.8. LENGTH CONTRACTION 25

• Finding events F on the world-line of P and G on the worldline of Q which are simul-
taneous with the event E at time τ on her worldline; in other words, finding F and G
−−→ −−→
such that η(U, EF ) = 0, η(U, EG) = 0.
• Calculating the distance as
q
−−→ −−→
d = −η(F G, F G).

2.8. Length contraction


Let’s push this operational definition of distance or length to see how an inertial observer will
measure the length of a moving rod. Suppose that we have a rod of length d, whose endpoints
have world-lines
α(τ ) = V τ, β(τ ′ ) = D + V τ ′
where η(D, V ) = 0. The length of the rod should be defined as the length of the rod as measured
by an observer at rest with respect to the rod. For such an observer, with 4-velocity V , we ask:
which pairs of events α(τ ) and β(τ ′ ) are simultaneous. Plugging in the defintion we need
η(V, D + V (τ ′ − τ )) = η(V, D) + τ ′ − τ = 0.
By assumption η(V, D) = 0, so we get 0 if and only if τ ′ = τ . For these simultaneous events,
−−→
the
p relative position vector EF is equal to D, independently of τ . So the length of the rod is
−η(D, D).
To be more concrete, in the rest-frame of the rod, if it is lying along the x-axis, we’d have
α(τ ) = (τ, 0, 0, 0), β(τ ) = (τ, d, 0, 0).
If Alice’s worldline is Γ(τ ) = U τ , to measure the length, she has to find events E and F on
the world-lines that she considers to be simultaneous.
If these are V τ and D + V τ ′ , then the displacement vector is
X = D + V (τ ′ − τ ) (2.8.1)
To satisfy the simultaneity condition (Definition 2.7.3) we need to solve
η(U, D + V (τ ′ − τ )) = 0 (2.8.2)
τ′
which gives − τ = −η(U, D)/η(U, V ).
Hence the relative position vector X between these simultaneous events will be
η(U, D)
X =D− V. (2.8.3)
η(U, V )
Thus we compute
η(U, D)η(V, D) η(U, D)2 η(U, D)2
η(X, X) = η(D, D) − 2 + = η(D, D) + , (2.8.4)
η(U, V ) η(U, V )2 η(U, V )2
making heavy use of the bilinearity of η, and the fact η(V, D) = 0.
Hence Alice calculates the length of the rod as
s
′ η(U, D)2
d = −η(D, D) − . (2.8.5)
η(U, V )2
Recall that −η(D, D) = d2 is the square of the length of the rod as measured in its rest-frame,
so d′ 6 d in general.
To understand this calculation, we may work explicitly in the frame in which the rod is at
rest. Then
V = (1, 0, 0, 0), D = (0, d, 0, 0). (2.8.6)
Alice’s 4-velocity vector U has the form
U = γ(u)(1, u) = γ(u)(1, u1 , u2 , u3 ) (2.8.7)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

26 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

where u is the velocity of Alice relative to Bob and (u1 , u2 , u3 ) are its components in Bob’s
coordinates and
1
γ(u) = p . (2.8.8)
1 − |u|2
as in (2.6.1). Hence
η(U, D) = −γ(u)du1 , η(U, V ) = γ(u) (2.8.9)
and so from (2.8.5), we find
q
d′ = d 1 − u21
This is the famous Lorentz–Fitzgerald length contraction: if the component u1 of the relative
velocity of the observer in the direction of the
p rod is non-zero, then the observer judges the length
of the rod to be less than d by the factor 1 − u21 . Note that there is no length contraction if
the observer is moving at right-angles to the rod.

2.9. Lorentz transformations


The set of linear transformations of M which preserve η is called the Lorentz group. This
means, concretely, the set of linear maps L : M → M such that
η(LX, LY ) = η(X, Y )
for all X, Y ∈ M . Even more concretely, if we choose a diagonalizing basis (e0 , e1 , e2 , e3 ) of M ,
then M is identified with R4 , L becomes a real 4 × 4 matrix and the condition is
Lt ηL = η, η = diag(1, −1, −1, −1).
The group of all Lorentz transformations, or the Lorentz group, is also denoted by O(1, 3).
We shall use Lorentz transformations mainly to compare different inertial frames with the same
event in M as origin. More precisely, suppose that Alice introduces an inertial basis (e0 , e1 , e2 , e3 )
and Bob introduces an inertial basis (e e0 , ee1 , ee2 , ee3 ). Their coordinates are respectively (t, x, y, z)
and (et, x
e, ye, ze).
Because (ei ) and (e ei ) are both diagonalizaing bases, there is a Lorentz transformation L
with the property
(e
e0 , ee1 , ee2 , ee3 ) = (e0 , e1 , e2 , e3 )L (2.9.1)
This is shorthand for expressing the primed basis vectors as linear combinations of the unprimed
ones.
Note that the matrix product
 t
ee0
eet 
 1t  η(e
ee2  e0 , ee1 , ee2 , ee3 ) (2.9.2)
eet3
has as its ab component the scalar η(e
ea , eeb ). Hence, as (e
ea ) are diagonalizing, this matrix is
diagonal, with entries
η(e
e0 , ee0 ) = 1, η(e
e1 , ee1 ) = η(e e3 , ee3 ) = −1
e2 , ee2 ) = η(e
But substituting in terms of e and L,
Lt et ηeL = η, so Lt ηL = η,
confirming the relevance of the Lorentz group for changing frame between observers.
Suppose that our frames are related by (2.9.1). Multiplying on the right by the column
vector (e
t, x
e, ye, ze), we get
 
e
t
xe
e eee1 + yeee2 + zeee3 = (e0 , e1 , e2 , e3 )L 
tee0 + x ye
 (2.9.3)
ze

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.10. ORIENTATION AND TIME-ORIENTATION 27

so that by definition of (t, x, y, z),    


t e
t
x  x 
  = L  e (2.9.4)
y  ye
z ze
Note the usual confusing point that the L appears multiplying the untilded e’s in (2.9.1) but
the tilded coordinates in (2.9.4).
2.9.0.1. Examples. A particular example (with c = 1) is
 
γ(v) γ(v)v 0 0
γ(v)v γ(v) 0 0
L=
 0
 (2.9.5)
0 1 0
0 0 0 1
This is often referred to as the standard 2D Lorentz transformation. It is of course, four-
dimensional, but y = ye and z = ze, so all the action is going on in the way the (t, x) and (e t, x
e)
variables are related to each other. To save writing, we’ll ignore the y, z and ye, ze variables in
the rest of the discussion of this
√ example.
Here, as before, γ(v) = 1/ 1 − v 2 . Suppose Bob is sitting at the origin of the ˜ coordinate
system. Then his world line is x e = 0. Inserting (et, 0) into the coordinate transformation, we see
that
t = γ(v)e
t, x = γ(v)ve t.
This gives Bob’s worldline, as a parameterized curve in Alice’s coordinate system (the parameter
being et). Since x/t = v for this curve, we see that Bob is moving at speed v in the direction of
Alice’s positive x-axis. The conclusion is that this Lorentz transformation corresponds precisely
to two inertial observers one moving at speed v relative to the other.
It is of interest to derive this from the postulates P1—P4 and the relativity principle R. I
shall omit this here: you can find it in Woodhouse, SR (new edition, §§4.4–4.6.)
Consideration of this transformation gives a different way to derive the standard counterin-
tuitive properties of SR: time dilatation, length contraction, and so on.
2.9.1. The Lorentz and Poincaré Groups. The Poincaré group is the Lorentz group
with (4-dimensional) translations included. Thinking in terms of coordinate transformations,
the typical element of the Poincaré group has the form
     
t e
t p
x  xe  q
  = L   +  ′ (2.9.6)
y  ye r 
z ze s
where (p, q, r, s) are constants and L is a Lorentz transformation.
From a more sophisticated point of view, it is the natural symmetry group of the affine
space M, preserving the bilinear form η on the set ME of all position vectors relative to E.
Remark 2.9.1. The 3-dimensional euclidean group is contained in the Poincaré group, via
     
t e 0
  t
x  1 0  x
e  q 
 =  +  (2.9.7)
y  0 R ye r 
z ze s
where R is a 3-dimensional orthogonal transformation.

2.10. Orientation and time-orientation


Just as the reflections are isometries of euclidean space that do not seem to be realized
physically, so there are some Lorentz transformations that are not so relevant as others. The
physically most relevant Lorentz transformations are those that preserve the spatial orientation
(rotations rather than reflections) and a time orientation or time’s ‘arrow’.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

28 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

Definition 2.10.1. A time orientation of M or M is a choice of one half or ‘nappe1’ of the


cone of timelike vectors in M . Accordingly every timelike-vector is either future-pointing (if it
is in the chosen nappe) or past-pointing (if it is in the other nappe).
An inertial basis (e0 , e1 , e2 , e3 ) is time-oriented if the timelike vector e0 is future-pointing.
Lemma 2.10.2. Let X and Y be timelike vectors. Then η(X, Y ) 6= 0. Moreover, η(X, Y ) > 0
if X and Y are in the same nappe of the cone and η(X, Y ) < 0 otherwise.
Proof. If X is timelike, then we know that η(X, X) > 0. On the orthogonal space Σ =
{S : η(X, S) = 0}, η is negative definite, because its signature is one plus and three minuses.
Thus any vector S which is η-orthogonal to X must satisfy η(S, S) < 0. So η(X, Y ) 6= 0 if X
and Y are timelike.
The last part is perhaps most easily seen by choosing a frame in which X = λe0 , for some
λ > 0. If the components of Y are (Y 0 , Y 1 , Y 2 , Y 3 ), then η(X, Y ) = λY 0 . Clearly X and Y are
in the same nappe if Y 0 > 0. 
One can show that if a Lorentz transformation maps one future-pointing timelike vector to
a future-pointing timelike vector, then it maps all future-pointing timelike vectors to future-
pointing timelike vectors.
Definition 2.10.3. A Lorentz transformation is orthochronous if it maps the future-pointing
nappe to the future-pointing nappe, in other words, if it preserves the time-orientation. The
subgroup of such transformations is denoted O+ (1, 3).
Definition 2.10.4. A Lorentz transformation is called proper if its determinant is 1. The
group of proper, orthochronous Lorentz transformations is denoted SO+ (1, 3).
We note that spatial reflections are excluded from SO+ (1, 3), and so is any transformation
that reverses the arrow of time. Thus this seems to be the most physically appropriate subgroup
of O(1, 3).
Alongside these restricted groups, we should also restrict the allowable inertial bases. We
say that a basis is oriented and time-oriented if e0 is future-pointing and (e1 , e2 , e3 ) is right-
handed. Then SO+ (1, 3) maps any oriented and time-oriented basis to another such basis, and
conversely any two such bases are related by an element of SO+ (1, 3).
Remark 2.10.5. It is worth mentioning that X is future-pointing (timelike or null) if and
only if −X is past-pointing (timelike or null).
2.10.1. Causality in Special Relativity. If M is given a time-orientation, then the non-
zero null vectors also fall into two distinct sets, the future-pointing and the past-pointing. A null
vector N is future-pointing if η(X, N ) > 0 for any given future-pointing timelike X. Similarly
a null vector is past-pointing if η(X, N ) < 0 for future-pointing timelike X. (Of course, if N is
null future-pointing, then η(Y, N ) < 0 if Y is timelike past-pointing.
A future-pointing null vector is in the boundary of the future-pointing nappe of the cone—it
is a limiting case of future-pointing timelike vectors. For example, if we consider
Xt = e0 + te1 ,
where e0 is future-pointing timelike and e1 is spacelike (i.e. η(e1 , e1 ) = −1), then
η(Xt , Xt ) = 1 − t2
is timelike future-pointing if |t| < 1 and null future-pointing if t = ±1.
Let E and F be two events in M. The event E (for example an explosion or a light signal)
−−→
can only have a causal effect on F if the displacement vector EF is timelike or null future-
pointing. This condition guarantees that E and F can be connected by a particle travelling in
a straight line at speed 6 c = 1 (the speed of light) and that F is in the future of E.
The set of all events F which can be affected causally effected by a given event E is the set
−−→
Fut(E) = {F ∈ M such that EF is timelike or null future-pointing} (2.10.1)
1Dictionary definition: In geometry, a nappe is half of a double cone

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.10. ORIENTATION AND TIME-ORIENTATION 29

The set Fut(E) can be pictured as the solid half-cone whose boundary is the set of future-
pointing null vectors emanating from E.
Similarly, the set of all events G which can affect or influence E is
−−→
Past(E) = {G ∈ M such that GE is timelike or null future-pointing}
−−→
= {G ∈ M such that EG is timelike or null past-pointing} (2.10.2)
This can be pictured as the solid half-cone whose boundary is the set of past-pointing null
vectors emanating from E.
So although space and time are mixed up in the geometry of special relativity, there is still
a well defined notion of causality.
We have defined timelike and null vectors. If a vector is not timelike or null, it is called
spacelike:
Definition 2.10.6. If X ∈ M , then X is called spacelike if η(X, X) < 0. Two events E and
−−→ −−→
F in M are said to be spacelike separated if η(EF , EF ) < 0.
We end by noting the following:
−−→
Proposition 2.10.7. Suppose that E and F are events such that EF is future-pointing
timelike. Then there exist inertial coordinates such that E has coordinates (0, 0, 0, 0) and F has
coordinates (t, 0, 0, 0) with t > 0.
Suppose that E and F are spacelike separated events. Then there exists an inertial frame
with respect to which E and F are simultaneous (e.g. E has coordinates (0, 0, 0, 0) and F has
coordinates (0, d, 0, 0). Moreover, there exist other coordinate systems in which E occurs before
F.
Proof. The first follows from the basic fact that given if X is any future-pointing timelike
vector, then there is an oriented and time-oriented basis (e0 , e1 , e2 , e3 ) with respect to which
−−→
η is diagonal, and such that X is a positive multiple of e0 . In such a basis, EF = (t, 0, 0, 0),
where t > 0, and if we choose the origin so that the coordinates of E are (0, 0, 0, 0), then the
coordinates of F will be (t, 0, 0, 0).
−−→ −−→ −−→
Similarly if η(EF , EF ) < 0, we can pick a multiple e1 of EF such that η(e1 , e1 ) = −1.
−−→
We extend this to a diagonalizing (oriented and time-oriented) basis of η, and then EF has
the desired form. In particular, it is η-orthogonal to e0 and so these events will be judged
simultaneous by an observer with 4-velocity e0 .
For the last part, let V = e0 + λe1 . Then
η(V, e1 ) = −λ.
So if V is the 4-velocity vector of an observer, Bob, he will reckon that F happens after E if
λ < 0 and that F happens before E if λ > 0. 
Remark 2.10.8. The above makes complete sense from the point of view of the radar
method. See the picture below. Consider two inertial observers, Alice and Bob, and suppose
that E is an event on both of their worldlines. If we suppose that Alice judges E and F to be
simultaneous, this means that Alice bounces a light signal off F , then if she sends it out at τ1
and receives it at τ2 , she assigns E time (τ1 + τ2 )/2. In the diagram, Alice sends her light signal
out at event A1 , and receives it at A2 , and E is the midpoint of the segment A1 A2 .
Now if Bob is heading towards F , it is clear that the light signal he needs to send to bounce
of F has to be transmitted at event B1 and received at event B2 . The event on his worldline
that he judges to be simultaneous with F is therefore the midpoint of B1 B2 , shown as E ′ . It
is clear from the geometry that the segment EB1 is longer than EB2 , so E ′ will be, as shown,
before E on his worldline.
Similarly, if B is heading away from F , the event he judges simultaneous with F will be
later, on his worldline, then the event E.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

30 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

A2
B2
A’s world-line

E F
E′

B’s world-line
A1
B1

2.10.2. Spatial and temporal components. We have seen that inertial observers, and
free particles and photons have straight lines, and the basic feature of a worldline is the 4-
velocity vector. It is often annoying to choose a full inertial basis to solve particular problems,
but it is important to split Minkowski vectors into their spatial and temporal components with
respect to a particular timelike vector.
Suppose that V is a timelike future-pointing 4-vector. Then we can write any vector X in
terms of its components parallel to and η-orthogonal to V . That is
X = λV + Y, where η(V, Y ) = 0. (2.10.3)
Taking the scalar product with V ,
η(V, X)
η(V, X) = λη(V, V ) so λ = . (2.10.4)
η(V, V )
Then
η(V, X)
Y =X− V. (2.10.5)
η(V, V )
More concretely, relative to an inertial basis in which V is a positive multiple of e0 ,
V = (V 0 , 0), X = (X 0 , ξ) (2.10.6)
where
η(V, X) = V 0 X 0 , η(V, V ) = (V 0 )2 (2.10.7)
and
Y = (0, ξ). (2.10.8)
Here 0 is the ordinary 3-dimensional zero-vector and ξ is also a euclidean 3-vector.
It is worth spelling out that if X and Z are any two Minkowski vectors with components
X = (X 0 , ξ), Z = (Z 0 , ζ)
then
η(X, Z) = X 0 Z 0 − ξ · ζ.
A photon’s velocity 4-vector will take the form
ω(1, e) (2.10.9)
where ω > 0 (the photon is travelling forward in time) and e is a unit vector. For physical
reasons, ω is identified with the frequency of the photon as measured by an observer with
4-velocity V .

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.10. ORIENTATION AND TIME-ORIENTATION 31

Example 2.10.9. Relative velocity. Suppose that Alice and Bob are inertial observers with
4-velocity vectors U and V , with η(U, U ) = η(V, V ) = 1.
In Alice’s rest-frame,
U = (1, 0), V = γ(1, v) (2.10.10)
for some constant γ. This is, of course, the γ factor again, because the condition η(V, V ) = 1
says
γ 2 (1 − |v|2 ) = 1.
Thus v is the velocity vector of Bob as measured by Alice.

Remark 2.10.10. What we’ve just seen is a very useful way of calculating γ-factors: if
Alice and Bob are inertial observers with 4-velocities U and V , then the γ-factor of their relative
speed is equal to η(U, V ).

We shall use this in the following:

Example 2.10.11. Alice, Bob and Chris are inertial observers with 4-velocity vectors U ,
V , and W , respectively, so η(U, U ) = η(V, V ) = η(W, W ) = 1. Suppose that Bob reckons that
Chris’s (relative) speed is w and Alice reckons Bob’s (relative) speed is u. What does Alice
reckon that Chris’s speed (relative to her) is?
Call the unknown speed ζ. Then from the above, the γ-factor of ζ is η(U, W ),

η(U, W ) = γ(ζ). (2.10.11)

What we know is that


η(U, V ) = γ(u), η(V, W ) = γ(w). (2.10.12)
To get a handle on this it turns out to be simplest to work in B’s rest-frame. Then

U = γ(u)(1, u), V = (1, 0), W = γ(w)(1, w) (2.10.13)

where u is the velocity vector of Alice relative to Bob and w is the velocity vector of Chris
relative to Bob.
Then
γ(ζ) = η(U, W ) = γ(u)γ(w)(1 − u · w) (2.10.14)
This is an answer, but it is instructive to rearrange it a bit. By squaring and taking the
reciprocal,
(1 − u2 )(1 − v 2 )
1 − ζ2 = (2.10.15)
(1 − u · w)2
Hence
(1 − u · w)2 − (1 − u2 )(1 − w2 )
ζ2 = (2.10.16)
(1 − u · w)2
1 − 2u · w + (u · w)2 − 1 + u2 + w2 − u2 w2
= (2.10.17)
(1 − u · w)2
(u2 − 2u · w + w2 ) + u · w2 − u2 w2
= (2.10.18)
(1 − u · w)2
|u − w|2 + u · w2 − u2 w2
= (2.10.19)
(1 − u · w)2

This remarkably complicated formula nonetheless reproduces the classical answer |u − w|2 to a
first approximation if u and w are much less than the light-speed c = 1.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

32 2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

2.11. Interstellar travel


Consider the following simplified model of a voyage made by a space-ship from earth. We
will make it more complicated when we discuss uniform acceleration in the next Chapter. A
spaceship sets off from the earth at constant speed v to travel to a star distance D from the
earth. In the earth’s frame of reference, the travel time is clearly
T = 2D/v. (2.11.1)
Let E be the event ‘spaceship leaves earth’ let A be the event ‘spaceship reaches destination’
and let R be the event ‘spaceship arrives back at earth’. We assume that the space ship travels
at constant speed v on outward and return trip. Continuing the idealization, let’s suppose that
the earth is inertial, with 4-velocity U . Taking E to be the origin of M, the world-line of the
earth is then τ 7→ τ U . On the outward leg, the spaceship’s trajectory is σ 7→ σV1 and on the
inward leg it is µ 7→ SV1 + µV2 , where SV1 is the displacement vector from E to the ‘arrival’
event A. (Here S > 0 is the value of σ corresponding to the arrival of the spaceship at the
distant star.) Given that the speeds relative to the Earth on outward and return trips are the
same, in terms of these abstract 4-vectors, we have
T U = SV1 + SV2 ; (2.11.2)
see also the diagram below.
We may choose the frame so U = (1, 0, 0, 0), V1 = γ(v)(1, v, 0, 0), V2 = γ(v)(1, −v, 0, 0). By
geometry (see the picture below, in which the y and z directions have been suppressed)
T (1, 0) = Sγ(v)(1, v) + Sγ(v)(1, −v). (2.11.3)
Hence T = 2Sγ(v) and so
2D
2S = T γ(v)−1 = . (2.11.4)
vγ(v)
This quantity is the elapsed time from the point of view of the astronauts, as measured by their
on-board clocks. As v approaches the light speed 1, so D/v approaches D and 1/γ(v) → 0.
Thus S → 0 as v → 1.
The conclusion is that with a fast enough space ship, astronauts could complete a trip to a
stars hundreds of light-years away within their life-spans. However, none of their friends would
be alive when they got back to Earth...
R = (T, 0)

A = Sγ(v)(1, v)

E = (0, 0)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

2.12. SUMMARY OF KEY NOTATION AND DEFINITIONS 33

2.12. Summary of key notation and definitions


Definition 2.12.1. If X ∈ M is a vector, we say that X is
— timelike if η(X, X) > 0;
— null if η(X, X) = 0;
— spaceelike if η(X, X) < 0.
−−→
If X is the displacement vector EF from an event E to an event F in M, we have the
following corresponding definitions
−−→
Definition 2.12.2. Let E and F be events in M and let X = EF be the displacement
vector. Then
— E and F are timelike separated if X is timelike;
— E and F are null separated if X is null;
— E and F are spacelike separated if X is spacelike.
We usually assume that M and hence M are given a time-orientation. Then for timelike
and null vectors, we can say whether they are future or past-pointing. 4-velocity vectors of
genuine particles are then taken to be future-pointing. If E and F are timelike or null separated
−−→
and EF is future-pointing, then we assume that something happening at event E can have a
causal influence on what happens at F : and only in this case. In particular, space-like separated
events cannot causally affect each other (as faster-than-light travel would be needed). And also
−−→
E cannot affect F even if they are timelike or null separated if EF is past-pointing.
A velocity 4-vector U is a timelike (future-pointing) vector with η(U, U ) = 1. If Alice has
velocity 4-vector U , then in her rest-frame, U = (1, 0) and the velocity 4-vector V of a particle
is written V = γ(v)(1, v) where v is the relative velocity of another particle as measured by
Alice. In particular η(U, V ) = γ(v) gives the γ-factor of the relative velocity v.
Remark 2.12.3. If you compare the way I have set out SR with what is in Woodhouse,
you will notice that throughout his book, 4-vectors are denoted X a , Y a , V a , etc., and that
the Minkowski metric is denoted by g or gab . The point is that X a is the set of components
(X 0 , X 1 , X 2 , X 3 ) in a basis (e0 , e1 , e2 , e3 ) which is not necessarily explicitly mentioned. Thus
my X corresponds to Woodhouse’s X a via
X = X 0 e0 + X 1 e1 + X 2 e2 + X 3 e3 .
He also uses the summation convention over repeated upstairs and downstairs indices (which
we’ll come back to) and would write the RHS X a ea . I have avoided the use of the summation
convention so far, though there will be no escape when we come to GR.
I have chosen η rather than g as notation for the Minkowski metric in order to reserve g as
notation for the curved metrics that we’ll use in GR.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

CHAPTER 3

Further topics in Special Relativity

3.1. Non-inertial observers: acceleration


Particles acted upon by forces will not have straight worldlines in M. We want to capture
two key notions in the straight-line case: that of not travelling faster than the speed of light
(which we continue to take to be 1 most of the time) and that of proper time parameter, namely
the time as measured by a clock travelling along the worldline.
Hypothesis 3.1.1. If Γ(τ ) is the worldline of any massive particle, then τ is a proper time
parameter along Γ if the velocity vector

V (τ ) =

is timelike, future-pointing, and satisfies
η(V (τ ), V (τ )) = 1
for all τ .
We also make an assumption about how such an accelerating observer will judge simultane-
ity:
Hypothesis 3.1.2. If Alice is an observer with worldline τ 7→ Γ(τ ) and E is an event on her
worldline with parameter value τ1 , say, then an event F is judged by Alice to be simultaneous
−−→
with E if η(V (τ1 ), EF ) = 0.
Another way of thinking about this is as follows. If Bob is an inertial observer with 4-velocity
−−→
V (τ1 ) then he will judge E and F to be simultaneous if η(V (τ1 ), EF ) = 0. So we are saying
that Alice’s notion of simultaneity at the event E = Γ(τ1 ) should be the same as the notion of
simultaneity of an inertial observer (Bob) who has the same 4-velocity as she does, at the event
E.
From now on we shall use dot to denote differentiation with respect to τ . Thus
V (τ ) = Γ̇(τ ). (3.1.1)
The acceleration vector A = Γ̈ = V̇ is the second derivative of the parameterized curve.
Note the following calculation:
d
η(V, V ) = 2η(V, V̇ ) = 0 (3.1.2)

The zero on the RHS is because η(V (τ ), V (τ )) is constantly equal to 1!
Thus A is η-orthogonal to V and in particular is space-like. In particular in the frame in
which V has components (1, 0, 0, 0), A will have components (0, a) and this acceleration is what
the particle actually feels. So the magnitude of the acceleration felt is
q
a = −η(Γ̈(τ ), Γ̈(τ ))
Example 3.1.3. Constant acceleration in a plane Consider an accelerating particle in
a plane, which we may as well take to be the (t, x) plane. Then Γ(τ ) has the form
(t(τ ), x(τ )). (3.1.3)
The proper time condition is
ṫ2 − ẋ2 = 1 (3.1.4)
35

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

36 3. FURTHER TOPICS IN SPECIAL RELATIVITY

Constant acceleration is the condition


ẗ2 − ẍ2 = −a2 (3.1.5)
where a is a constant. A trick to solve this is to suppose
ṫ = cosh u(τ ), ẋ = sinh u(τ ). (3.1.6)
Then
ẗ = u̇ sinh u, ẗu̇ cosh u. (3.1.7)
Substituting in to (3.1.5)
u̇2 = a2 (3.1.8)
So u = a(τ − τ0 ) (if a > 0, so the curve is future-pointing). Integrating the equations
ṫ = cosh aτ, ẋ = sinh aτ, (3.1.9)
yields
1 1
t(τ ) = sinh aτ, x(τ ) = (cosh aτ − 1) (3.1.10)
a a
choosing the constants of integration so that the particle is at (t, x) = (0, 0) when τ = 0.
3.1.1. Interstellar travel revisited. Suppose that a spaceship starts from rest and its
engines deliver uniform acceleration a. What happens?
One thing is that the velocity remains below that of light, as it must. For large τ ,
1 aτ 1 aτ
t(τ ) ∼ e , x(τ ) ∼ e (3.1.11)
2a 2a
which is a parameterization (not by proper time) of the null ray t = x.
[The trajectory is a hyperbola.]
The relation t = cosh aτ relates the time τ which elapses on board the ship, compared with
that t measured by clocks left behind on earth.
To reach distance D, you have to solve D = a1 (cosh aτ − 1). If D is reasonably large,
sinh aτ ≃ exp(aτ )/2 so
1
τ = log(2D). (3.1.12)
a
This logarithmic relationship means that in principle, with modest accelerations from rest, a
uniformly accelerating spaceship can cover interstellar distances in reasonable times (as mea-
sured by the astronauts). For example, suppose that we measure distance in light-years and
time in years. There are 1016 metres in a light-year and 3 × 107 seconds in a year.
So the acceleration due to gravity, 10ms−2 , is equal to 10 × 10−16 × (3 × 107 )2 light-years
per year2 . This is (miraculously) approximately 1. So according to the above, a spaceship
accelerating so that the astronauts would feel a earth’s gravity on board would cover distance
D light-years in τ years, where τ ∼ log(2D), if D is reasonably large. If D = 100, then
τ = log(200) ≃ 5.3 years.
3.1.2. Relativistic motion in a circle. (Cf. Woodhouse, SR, p. 111).
Suppose that a particle moves on a circle
x(t) = R cos ωt, y(t) = R sin ωt, z(t) = 0.
In other words,
Γ(t) = (t, R cos ωt, R sin ωt, 0)
We do not claim t is proper time, and the first thing to work out is the relation between t and
τ . We have

= (1, −Rω sin ωt, Rω cos ωt, 0)
dt
which has Minkowski length-squared equal to 1 − R2 ω 2 . Thus
dτ p p
= η(dΓ/dt, dΓ/dt) = 1 − R2 ω 2 .
dt

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

3.2. MOMENTUM AND ENERGY: E = mc2 37

Hence the 4-velocity vector of the particle is


dΓ dΓ dt 1
= =√ (1, −ωR sin ωt, ωR cos ωt, 0).
dτ dt dτ 1 − R2 ω 2
Then the acceleration is
d2 Γ 1
2
== (0, −ω 2 R cos ωt, −ω 2 R sin ωt, 0).
dτ 1 − R2 ω 2
The Minkowski length-squared of this is
ω 4 R2

(1 − R2 ω 2 )
and so the acceleration felt by the particle is
ω2R
a=
1 − R2 ω 2
This is larger than the non-relativistic value by the factor 1/(1 − R2 ω 2 ).

3.2. Momentum and energy: E = mc2


In this section we shall consider collisions between particles (including photons) in special
relativity.
We have to make some physical assumptions. We assume that massive particles have a
well-defined mass (the rest-mass). If a particle has 4-velocity V , with η(V, V ) = 1, then the
4-momentum of the particle is defined to be
P = mV. (3.2.1)
Suppose we have k particles with 4-momenta P1 , . . . , Pk which interact (for example in the
LHC) and after the interaction there are m outgoing particles with momenta Q1 , . . . , Qm . The
basic assumption is the conservation of total 4-momentum
Q 1 + · · · + Q m = P1 + · · · + Pk . (3.2.2)
If Alice is an inertial observer with 4-velocity U , and we have a particle of rest-mass m and
with 4-velocity P = mV , then Alice can look at the spatial and temporal parts of P . If she
measures the velocity of the particle in her rest frame as v, then
P = mγ(v)(1, v) (3.2.3)
where as usual
1
γ(v) = √ , v = |v|. (3.2.4)
1 − v2
Expanding this using the binomial expansion for small v,
v2
γ(v) = 1 + + O(v 4 )
2
we see that
P ≃ m(1 + v 2 /2)(1, v) = (m + mv 2 /2, mv) + O(v 3 ).
It is good to restore c here, in which case we’d get
P ≃ (mc2 + mv 2 /2, mv) + O(v 3 /c).
Now the term mv 2 /2 appearing here is the classical kinetic energy of a particle of mass m and
moving at speed v. The spatial component mv is just the classical momentum.
Einstein’s conclusion from these considerations was the ‘equivalence’ of mass and energy:
that a particle of rest-mass m should have total energy mc2 : an observer with 4-velocity U
assigns total energy η(U, P ) to a particle with 4-momentum P . The very surprising conclusion
is that a free particle of mass m has to be assigned energy mc2 by an inertial observer for whom
the particle is at rest.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

38 3. FURTHER TOPICS IN SPECIAL RELATIVITY

Example 3.2.1. Elastic collisions A particle of rest mass m, velocity u relative to an


inertial frame, collides with a particle of rest mass m, which is at rest. After the collision, the
velocities of the particles are v and w. If θ is the angle between v and w, show that
1
cos θ = (1 − 1/γ(v))(1 − 1/γ(w)). (3.2.5)
vw
Let the incoming momenta be P1 = mU1 , P2 = mU2 , and the outgoing momenta be Q1 =
mV1 and Q2 = mV2 . In the rest frame of the at-rest particle, we have
P1 = mγ(u)(1, u), P2 = (m, 0), Q1 = mγ(v)(1, v), Q2 = mγ(w)(1, w). (3.2.6)
Conservation of momentum says
P1 + P2 = Q 1 + Q 2 . (3.2.7)
We can get some information by taking the η-square of each side,
η(P1 + P2 , P1 + P2 ) = η(Q1 + Q2 , Q1 + Q2 ) (3.2.8)
which yields
2m2 + 2η(P1 , P2 ) = 2m2 + 2η(Q1 , Q2 ) (3.2.9)
2
since η(P1 , P1 ) = η(P2 , P2 ) = η(Q1 , Q1 ) = η(Q2 , Q2 ) = m . Using (3.2.6) to compute the
cross-terms in (3.2.9), we get
m2 γ(u) = m2 γ(v)γ(w)(1 − v · w). (3.2.10)
This is useful because v · w = vw cos θ, and cos θ is what we are looking for. On the other
hand, we need to eliminate γ(u). This can be done by taking the scalar product with P2 , or, in
down-to-earth terms, just by inspecting the temporal component of the conservation equation
(3.2.7), which gives
γ(u) + 1 = γ(v) + γ(w) (3.2.11)
Combining this with (3.2.10) gives
γ(v) + γ(w) − 1
1 − vw cos θ = (3.2.12)
γ(v)γ(w)
so
γ(v) + γ(w) − 1
vw cos θ = 1 −
γ(v)γ(w)
γ(v)γ(w) − γ(v) − γ(w) + 1
vw cos θ =
γ(v)γ(w)
(γ(v) − 1)(γ(w) − 1)
=
γ(v)γ(w)
= (1 − 1/γ(v))(1 − 1/γ(w)). (3.2.13)
√ p
This is the result. Notice that vγ(v) = v/ 1 − v 2 = γ 2 (v) − 1, so a nice way of writing this
is
(γ(v) − 1)(γ(w) − 1)
cos θ = p p
γ(v)2 − 1 γ(w)2 − 1
s s
γ(v) − 1 γ(w) − 1
= (3.2.14)
γ(v) + 1 γ(w) + 1
If v and w are small,
γ(v) − 1 ≃ v 2 /2, γ(v) + 1 ≃ 2
and so cos θ ≃ 0. The fact that the outgoing trajectories are at 90◦ is a standard consequence
of conservatio of energy in newtonian mechanics.
Relativistically, however, γ(v) > 1 and γ(w) > 1, so cos θ is strictly less than 90◦ . Such
trajectories are observed in high energy particle interactions (e.g. in the LHC) and provide
confirmation of special relativity (or more precisely of the conservation of 4-momentum).

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

3.3. MOMENTUM OF PHOTONS 39

3.3. Momentum of photons


Photons are massless, but they do have momentum. To any photon is associated a null
vector, K, say. If Alice is an inertial observer with 4-velocity U , then in Alice’s rest frame, we
have
L = ω(1, e), (3.3.1)
where e is a unit vector. A natural assumption (by considering solutions of the wave equation) is
that ω is the (angular) frequency of the photon, as measured by Alice. We make this assumption
without further justification. We then assume that the momentum of the photon is
~L (3.3.2)
where ~ ≃ 1.05 × 10−34 Js is Planck’s constant.
By way of partial justification (at least if you know a tiny bit of quantum mechanics):
recall that the temporal component (relative to Alice’s frame) of the 4-momentum of a massive
particle is the energy of the particle as measured by Alice. Thus a reasonable requirement
of the 4-momentum of a photon is that it should be a multiple of its 4-velocity, such that its
temporal component is the energy that Alice would measure. With K as in (3.3.1) the temporal
component of (3.3.2) is ~ω. That the energy of a photon is given by E = ~ω is a basic principle
of quantum mechanics.
It is now natural to assume that 4-momentum is also conserved in collisions involving pho-
tons.
Example 3.3.1. A photon with frequency ω collides with an electron at rest in an inertial
frame. After the collision, the frequency of the electron is ω ′ . Obtain a relation between the
scattering angle of the photon, the frequencies and the rest-mass of the electron.
The initial momenta (in the electron’s rest frame) are
P1 = ~ω(1, e), P2 = (m, 0) (3.3.3)
The final momenta are
Q1 = ~ω ′ (1, e′ ), Q2 = mγ(v)(1, v). (3.3.4)
The momentum conservation equation is
P1 + P2 = Q1 + Q2 . (3.3.5)
e′
Since we want to know cos θ, we want to square (3.3.5) in such a way as to get e · as a cross
term. For this purpose we rearrange it so that both photon momenta are on the same side of
the equation:
P1 − Q 1 = Q 2 − P2 , (3.3.6)
which gives
η(P1 − Q1 , P1 − Q1 ) = −2η(Q1 , P1 ) = η(Q2 − P2 , Q2 − P2 ) = 2m2 − 2m2 γ(v). (3.3.7)
Simplifying,
~2 ωω ′ (1 − cos θ) = m2 (γ(v) − 1). (3.3.8)
Looking at the temporal component of (3.3.5), we find
~ω + m = mγ(v) + ~ω ′ ⇒ m(γ(v) − 1)) = ~(ω − ω ′ ) (3.3.9)
so finally
~ωω ′ (1 − cos θ) = m(ω − ω ′ ) (3.3.10)
Remark 3.3.2. This process is known as Compton scattering.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

CHAPTER 4

Multivariable calculus

4.1. Smooth functions and changes of coordinates


4.1.1. Smooth functions. Let Ω be an open set of Rn . ‘Open’ means that Ω is a possibly
infinite union of open balls. The open balls of Rn are all of the form
B(a, r) = {x ∈ Rn such that |x − a| < r} (4.1.1)
where a is any point of Rn and r > 0. It is the strict inequality in (4.1.1) that makes B(a, r)
an open ball.
Let f : Ω → R be a real-valued function in Ω. I assume you know what partial derivatives
are.
Definition 4.1.1. A function f is smooth, also written C ∞ , if all partial derivatives, of any
order, exist. That is, for any non-negative integers, α1 , . . . , αn ,
   
∂ α1 ∂ αn
··· f (p) (4.1.2)
∂x1 ∂xn
exists for every point p of Ω. The set of all functions which are smooth in Ω is denoted by
C ∞ (Ω). Then C ∞ (Ω) is an infinite-dimensional vector space.
Remark 4.1.2. The order of the partial derivative in (4.1.2) is α = α1 + · · · + αn .
Recall that for smooth functions, partial differentiation with respect to different variables
‘commutes’ in the sense that
∂ ∂ ∂ ∂
f (x) = f (x) = (4.1.3)
∂xi ∂xj ∂xj ∂xi
for all 1 6 i, j 6 n.
Remark 4.1.3. Smooth functions model ‘scalar’ physical quantities such as density, pres-
sure, charge-density, temperature,...
4.1.2. Changes of coordinates. We have tacitly taken the (x1 , . . . , xn ) to be standard
linear coordinates in Rn (i.e. associated by the standard basis of Rn ). It is fairly clear that the
idea of ‘smoothness’ of functions should be independent of the choice of such linear coordinates,
but here we are going to take things further by considering more general coordinate systems.
As far as GR goes, this is a requirement of trying to make a theory that is ‘generally covariant’
(i.e. transforms predictably under general changes of coordinates).
So, what are changes of coordinates?
Definition 4.1.4. A change of coordinates written in compact form x = x(y) is a collection
of smooth functions
x1 = x1 (y 1 , . . . , y n )
x2 = x2 (y 1 , . . . , y n )
··· ··· ···
xn = xn (y 1 , . . . , y n )

with the further properties:


• x = x(y) gives a 1:1 correspondence between points x ∈ Ω and points y in some other
open set Ω′ ;
41

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

42 4. MULTIVARIABLE CALCULUS

• The corresponding inverse map y = y(x) is also smooth, i.e.


y j = y j (x1 , . . . , xn )
is smooth for each j = 1, . . . , n.
Example 4.1.5. Any affine linear transformation,
X j
yj = L k x k + Aj (4.1.4)
is invertible if the matrix L is invertible. (In this case, the Jacobian of the transformation is L.)
Example 4.1.6. Plane polar coordinates.
x = r cos θ, y = r sin θ.
(To make this look like the above, write it as
x1 = y 1 cos y 2 , x2 = y 1 sin y 2 ,
i.e. (x, y) = (x1 , x2 ), (r, θ) = (y 1 , y 2 ).) The Jacobian is
   
xr xθ cos θ −r sin θ
=
y r yθ sin θ r cos θ
This gives a change of coordinates between
Ω = {(x1 , x2 ) such that(x1 , x2 ) 6= (t, 0) with t > 0}
and
Ω′ = {(y 1 , y 2 ) such that y 1 > 0 and 0 < y 2 < 2π}
Remark 4.1.7. Make sure you understand why we have to restrict the values of (y 1 , y 2 ) to
get a change of coordinates.
We can think of a change of coordinates more actively as follows. Given a function f in
C ∞ (Ω), we get a new function fe ∈ C ∞ (Ω′ ) by the formula
fe(y) = f (x(y)) (4.1.5)
Similarly, given ge ∈ C ∞ (Ω′ ) we get a new function g ∈ C ∞ (Ω) by the formula
g(x) = ge(y(x)). (4.1.6)
Remark 4.1.8. The fancy terminology for this is ‘pull-back’: fe is obtained from f by pulling
back by the change-of-coordinates map Ω′ → Ω. If you find this helpful (because you’ve seen it
elswhere) fine. If you haven’t, don’t worry.
The fact that fe is smooth if f is smooth follows from the chain rule:
n
X ∂xi ∂
∂ e
f (y) = f (x) where x = x(y), (4.1.7)
∂y j ∂y j ∂xi
i=1
which I assume you’ve seen before and are happy with. The matrix whose components are
(∂xj /∂y i ) entering in (4.1.7) is called the Jacobian of the coordinate transformation x = x(y).
The Jacobian matrix (∂y j /∂xi ) of the inverse transformation y = y(x) is the inverse of
the Jacobian matrix (∂xj /∂y i ). As an exercise, you can verify this by using the chain rule to
differentiate the equation
xj (y(x)) = xj for j = 1, . . . , n (4.1.8)
which is true by definition of the transformations being inverse to each other. In particular for
a change of coordinates, the Jacobian matrix must be invertible everywhere.
The explicit forms of these inverse relationships are
n
X n
X
∂xk ∂y j ∂y k ∂xj
= δik , = δik . (4.1.9)
∂y j ∂xi ∂xj ∂y i
j=1 j=1

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

4.2. TWO TYPES OF VECTOR 43

Remark 4.1.9. The inverse function theorem says that if x = x(y) is just smooth for y ∈ Ω′
and x ∈ Ω and if the Jacobian matrix if invertible at a point q, say, in Ω′ , then in fact x = x(y)
is invertible at least if you restrict the transformation to a small ball B ′ containing q inside Ω′
and its image W ⊂ Ω. That is, after restricting in this way, there is smooth y = y(x), for x ∈ W
such that y(x) ∈ B ′ inverting x = x(y).
Thus the inverse function is a (partial) converse to the fact that Jacobians of coordinate
transformations must be invertible.
Example 4.1.10. In the case of polar coordinates, the determinant of the Jacobian is just
r. This is invertible if and only if r 6= 0. This ties in with the fact that polar coordinates go
wrong at r = 0.

4.2. Two types of vector


Many physical quantities are ‘vectorial’, as we know. We now consider vectorial quantities
in the context of general coordinate transformations. A major subtlety is that there are two
different kinds of vectorial quantities and we need to be clear on the difference between them.
4.2.1. Vector fields. A vector field in Ω is smooth first-order differential operator of the
form
Xn

V = V j (x) j (4.2.1)
∂x
j=1
where the V j (x) are smooth functions in Ω. If f ∈ C ∞ (Ω) we obtain a new function V f ∈
C ∞ (Ω) called the derivative of f along V ,
n
X ∂f
V f (x) = V j (x) j (x) (4.2.2)
∂x
j=1

A vector field is also known as a tensor (field) of type (1, 0).


4.2.2. Covector fields. A covector field in Ω is a quantity
X n
ω= ωj (x) dxj (4.2.3)
j=1

where the ωj (x) are smooth functions in Ω.


If f ∈ C ∞ (Ω) we obtain a covector field
n
X ∂f
df = dxj (4.2.4)
∂xj
j=1
also known as the exterior derivative or differential of f . To match up the notation between
(4.2.3) and (4.2.4), df is the covector whose components are ωj = ∂f /∂xj .
Remark 4.2.1. If ω is a covector field in Ω it is not generally true that ω = df for some
function f . (Indeed the condition
∂ωi ∂ωj
j
=
∂x ∂xi
is necessary for ω = df , by (4.1.3).)
A covector field is also known as a tensor (field) of type (0, 1).
4.2.3. Dual pairing (or contraction). Given a vector field V and a covector field ω, the
contraction hV, ωi is a scalar function defined in terms of components by
Xn
hV, ωi = V j (x)ωj (x). (4.2.5)
j=1
The directional derivative is an example of this contraction: from the above formulae,
hV, df i = V f (4.2.6)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

44 4. MULTIVARIABLE CALCULUS

4.2.4. Transformation laws for vector fields and covector fields. Let V be a vector
field in Ω and let x = x(y) (for y ∈ Ω′ ) be a change of coordinates with inverse y = y(x). We
get a vector field Ve in Ω′ as follows: Ve is supposed to differentiate functions in Ω′ . We know
how to differentiate functions in Ω. But we can transfer a function in Ω′ to Ω by change of
variables. So we define
Ve ge (y) = (V g)(x(y)) (4.2.7)
where g and ge are related as in (4.1.6).
P j P ej
Proposition 4.2.2. If V = V (∂/∂xj ) in terms of the xj in Ω and Ve = V (∂/∂y j )
in terms of the y j in Ω′ , then
X ∂y i
Ve i = Vj j (4.2.8)
∂x
Proof. The chain rule tells us everything: from g(x) = ge(y(x)), we get
∂g X ∂y i ∂e
g
j
= .
∂x ∂xj ∂y i
i
j
Multiplying by V and summing,
X ∂g X i g
j ∂y ∂e
Vj = V .
∂xj ∂xj ∂y i
j i,j
P ei
But the RHS is supposed to be g /∂y i ), so the result follows by equating coefficients.
i V (∂e 
Remark 4.2.3. This is an example of covariance (as opposed to invariance). The coefficients
of a vector depend on a choice of coordinates, but they transform in a predictable and linear
way. In particular if the coefficients are all zero at a given point in one coordinate system then
they are also zero in any other coordinate system. This is as it should be: if there is no wind
at a particular point (and time) in the atmosphere, then all observers should agree on this fact,
regardless of how they choose their coordinates!
The classical formula
X ∂xj
dxj = dy i (4.2.9)
∂y i
i
P P
suggests that if j ω
ej dy j is to agree with j ωj dxj , then we should have
X X ∂xj i
ωj dxj = ωj dy
∂y i
j i,j

and equating coefficients


X ∂xj
ω
ei = ωj . (4.2.10)
∂y i
j

Note that even allowing for the difference between upstairs and downstairs indices, (4.2.8) and
(4.2.10) are different transformation laws.
The rule (4.2.10) gives a way of transferring a covector field ω in Ω to a new covector field ω
e
in Ω′ . We already have a rule for transferring vector fields from Ω to Ω′ . These are compatible
in the sense that the contraction is invariant:
Proposition 4.2.4. Let ω and V be a covector field and a vector field on Ω and let ω
e and
e ′
V be the corresponding covector field and vector field on Ω . Then we have
hVe , ω
e i = hV, ωi (4.2.11)
where the LHS is calculated at y in Ω′ and the RHS at x = x(y) Ω.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

4.3. THE EINSTEIN SUMMATION CONVENTION 45

Proof. This follows at once from (4.2.8) and (4.2.10) for


X X ∂y i
hVe , ω
ei = Ve i ω
ei = Vj jω ei . (4.2.12)
∂x
i i,j

Using the fact that the Jacobians (∂y i /∂xj ) and (∂xi /∂y j ) are inverse to each other, (4.2.10) is
seen to be equivalent to
X ∂y i
ωj = ei j .
ω (4.2.13)
∂x
i
Hence the RHS of (4.2.12) can be written
X ∂y i X
Vj jω ei = V j ωj = hV, ωi.
∂x
i,j

This completes the proof. 


Remark 4.2.5. It is possible to change the logic around: given the relation between V and
e
V which is natural in terms of the way vector fields are supposed to differentiate functions, we
e in terms of ω so that (4.2.11) holds. This would then have implied the
could have defined ω
transformation law (4.2.10) for the coefficients of the covector field ω.
4.2.5. Tangent space and cotangent space. We shall not make great use of the follow-
ing, but they are really important in a more systematic development of these ideas.
If p is a point of Ω then Tp Ω, the tangent space to Ω at p, is defined to be the space of all
directional derivatives acting at p. A typical element of Tp Ω is thus written
n
X
j ∂
V = V (4.2.14)
∂x p
j
j=1

where the V j are just numbers and by definition



∂ ∂f
f= (p) (4.2.15)
∂xj p ∂xj
(i.e. differentiate the function then evaluate it at p).
Then Tp Ω is a vector space of dimension n (the dimension of the vector space in which Ω
is sitting as an open set) and is independent of coordinates. One should think of Tp Ω as the
set of arrows emanating from p, and pointing in every possible direction in Ω. The coordinate
independence follows from equation (4.2.10).
Similarly the cotangent space Tp∗ Ω is the dual vector space to Tp Ω; by definition this is the
space of linear maps Tp Ω → R and a typical element has the form
n
X
ω= ωj dxj |p . (4.2.16)
j=1

This is also an n-dimensional vecto space, independent of choice of coordinates. The duality
between Tp Ω and Tp∗ Ω is given by
X
hV, ωip = V j ωj (4.2.17)
as above but now producing a number rather than a function on the RHS.

4.3. The Einstein summation convention


In the above (and in the previous chapters), there are many expressions involving a summa-
tion over repeated indices, one upstairs and one downstairs. The Einstein summation convention
is to omit the Σ symbol, so that whenever a repeated index appears in an expression it is to be
understood that you sum over the range of that index (in this case from 1 to n).

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

46 4. MULTIVARIABLE CALCULUS

In order for this to work, of course, it is essential that if an index is repeated then it must
not occur anywhere else in the expression, so for example
Ai B i Ci is not OK.
Multiple sums (unfortunately) very often occur. For instance, if L and L̃ are matrices with
components Lij and L̃ij so that
X
LX has components Lij X j = Lij X j (summation convention)
j

and X
L̃X has components L̃ij X j = L̃ij X j
j
Then
LL̃X has components Lip [L̃X]p = Lip [L̃pq X q ] = Lip L̃pq X q . (4.3.1)
When the summation convention is in operation, repeated indices are dummy indices in the
sense that
Ai B i = Ap B p = As B s
as each of these is equal to
A1 B 1 + A2 B 2 + · · · + An B n .
The expression for the components of LL̃X in (4.3.1) is unpacked as
n X
X n
Lip L̃pq X q
p=1 q=1

and the summation over p corresponds to matrix multiplication of L and L̃ while the summation
over q corresponds to the multiplication of L̃ by the column vector with components X i .
Here is an example where you have to be careful to change the dummy indices to get an
unambigous expression. Suppose α and β are covector fields and X and Y are vector fields.
Consider
P = hX, αihY, βi (4.3.2)
We can write
hX, αi = X i αi , hY, βi = Y i βi .
Substitution of these into (4.3.2) give
P = X i αi Y i βi
However this is an ambiguous expression because the index i has been overworked, appearing
4 times. So before putting them together we should change one of the dummy indices, writing
(say)
Y i βi = Y j βj .
Thus
P = X i αi Y j βj
is an unambigous way to write P using the summation convention.
Example 4.3.1. Write the expression
X i αj Y j βi
without indices, in terms of the pairing operation h·, ·i.
Definition 4.3.2. The Kronecker δ has components δkj , equal to 1 if j = k and 0 otherwise.
This is the representation, in terms of indices, of the identity matrix.
Remark 4.3.3. If, later on in this chapter or the course, you find the expressions with
multiple repeated indices confusing, it can help to put the Σ signs back in. The summation
convention does take some getting used to.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

4.5. TENSOR FIELDS OF RANK 2 47

4.4. Differentiation along a curve


Let Γ(τ ) be a curve in Ω. That is, Γ is a smooth map from an interval I = (τ1 , τ2 ), say, into
Ω.
If f ∈ C ∞ (Ω), then we get a function of τ
F (τ ) = f (Γ(τ )). (4.4.1)
This is ‘the function f along the curve’. If Γ is the world-line of an observer and f is some
physical quantity (like pressure) then F (τ ) would be the pressure measured by the observer at
different times along her worldline.
If the components of Γ are (x1 (τ ), . . . , xn (τ )) as before, then we compute
dF ∂f
= ẋj (τ ) j (4.4.2)
dτ ∂x
so that Γ defines the vector field
dΓ ∂
= ẋj j (4.4.3)
dτ ∂x
along Γ. Here the LHS will be used as short-hand for the RHS!
In contrast to the vector fields we’ve considered before, this one is only defined along the
curve Γ and not in an open set Ω. Notice that the definition (4.4.2) is independent of any choice
of coordinates and gives a suitably invariant definition of tangent vector to the curve Γ.
In so far as Γ is a mapping from a subset of R into a subset of V , its derivative Γ̇(τ ) is a
mapping from I into V . Again, it is better to regard Γ̇(τ ) as being in the tangent space to Γ(τ ),
Γ̇(τ ) ∈ TΓ(τ ) Ω.
Then Γ̇(τ ) is called the tangent vector to Γ at the point Γ(τ )

4.5. Tensor fields of rank 2


Higher order (or higher rank) tensors are, from the naive point of view, objects with more
indices, upstairs or downstairs, or both.
We have already seen examples of such objects with two indices at least in the context of
vector spaces. First of all, a bilinear form on Rn is an object with two lower indices. We have
seen that if we choose a basis of Rn , such that its elements are identified with column vectors
with components X j , then a bilinear form B has components Bij , such that
B(X, Y ) = Bij X i Y j (summation convention).
4.5.1. Tensor fields of type (0, 2). A tensor field of type (0, 2) is an object of the form
B = Bij dxi dxj (summation convention). (4.5.1)
(In the mathematical literature, you will often see the notation ⊗ dxi dxj
on the RHS. I think
this can be fairly safely ignored in this course. ⊗ is pronounced ‘tensor’, by the way. )
The transformation law for the components Bij are deduced as for covector fields: from
(4.2.9),  i  j 
∂x ∂x
B = Bij dxi dxj = Bij dy p
dy q
(4.5.2)
∂y p ∂y q
epq are the components of B in the y coordinates,
so if B
i j
Bepq = Bij ∂x ∂x (4.5.3)
∂y p ∂y q
Note that if X and Y are vector fields and B is a tensor field of type (0, 2) we can form
ω = ωi dxi , ωi = Bij Y j . (4.5.4)
As a differential geometer, I might write this as ω = B(·, Y ) if I wanted to avoid using indices
and components.
Proposition 4.5.1. ω defined from B and Y as above is a covector field.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

48 4. MULTIVARIABLE CALCULUS

The proof is left as an exercise: you have to write down the transformation laws and check
that the components of ω transform correctly.
We can further form the scalar
B(X, Y ) = Bij X i Y j . (4.5.5)
As the notation suggests, this is a well-defined scalar function; its value at a point does not
depend on the coordinates used to write out the components of B, X and Y .
Remark 4.5.2. The formula (4.5.5) gives another way to think about tensors of type (0, 2).
Namely, you can reverse the logic and define such a B to be a smoothly varying bilinear form
Bp on the tangent space Tp Ω, for each p ∈ Ω. Smoothness can be defined by saying that the
coefficients Bij are smooth functions in Ω for any choice of coordinates x. If we do this for the
tangent space, and require B(X, Y ) to be invariant (i.e. independent of choice of coordinates),
then we say that B is a tensor of type (0, 2).
Tensors of type (0, 2) are important because the metric tensor, which is the fundamental
object in GR, is an example.
Remark 4.5.3. It is very important not to switch the order of the dxi symbols in computa-
tions of this kind. In other words, dxi dxj 6= dxj dxi . Indeed the first one is the bilinear form B
such that B(X, Y ) = 0 unless X = ∂i , Y = ∂j , whereas the second represents the bilinear form
C such that C(X, Y ) = 0 unless X = ∂j , Y = ∂i .
4.5.2. Tensor fields of type (1, 1). Suppose that for each p in Ω, we have linear map
A(p) from Tp V to Tp V which varies smoothly with p. Such a thing is called a smooth tensor
field of type (1, 1). In coordinates, A has an expression of the form

A = Akj dxj k (4.5.6)
∂x
where the Akj are a collection of n2 functions of x. A can be pictured as an n × n matrix whose
entries are smooth functions of x.
We obtain the transformation law under a change of coordinates by substituting
∂ ∂y q ∂ j ∂xj p
= , dx = dy (4.5.7)
∂xk ∂xk ∂y q ∂y p
into (4.5.6), getting
    
∂y q ∂ ∂xj p ∂xj
∂y q ∂
A= Akj dy = Akj p dy p . (4.5.8)
∂xk ∂y q ∂y p ∂y ∂xk ∂y q
Hence the transformation law is
j q
eq = Ak ∂x ∂y
A (4.5.9)
p j
∂y p ∂xk
Example 4.5.4. The identity matrix is an example of a (1, 1) tensor. Its components in
any coordinate system are the Kronecker δ, δkj .
The transformation law (4.5.9) means that if X is a vector field on Ω then Y = AX, with
components Ajk X k is again a vector field on Ω. This means that under coordinate transfor-
mations we have Ye = A eXe where the relation between A e and A is given by (4.5.9) and the
transformation law (4.2.8) is used for the components of the vector fields X and Y .
4.5.3. Tensor fields of type (2, 0). We’ve seen tensors with two downstairs indices and
one up and one down. The zoo of two-index tensors is completed by the ones with two upstairs
indices.
We give the transformation law first:
Definition 4.5.5. A tensor field H of type (2, 0) is an object whose components H ij after
a choice of coordinates transform according to the rule
p q
He pq = H ij ∂y ∂y . (4.5.10)
∂xi ∂xj

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

4.6. GENERAL TENSORS 49

4.5.4. Algebraic operations on tensors. We mention various interrelations between


these types of tensors.
Example 4.5.6. Outer product. If X and Y are two vector fields, then there is a tensor
X ⊗ Y whose components are X i Y j . This is an example of a tensor field of type (2, 0). The
verification of this is straightforward.
Example 4.5.7. Similarly if α is a covector field then α ⊗ X is a tensor whose components
are αi X j and this is a tensor field of type (1, 1).
We can also decrease the number of indices.
Example 4.5.8. If α is a covector field and H is of type (2, 0), then the contraction of Hα
is a (1, 0) vector field with components
H ij αj .
In fact there are two such contractions this one, and the one with components
H ij αi
Example 4.5.9. If H is a (2, 0) tensor and B is a (0, 2) tensor, then there is a tensor of
type (1, 1) with components
H ik Bkj
In fact there are four generally different tensors of this kind:
H ik Bjk , H ki Bkj , H ki Bjk
along with the one above.
The verification that the components of these objects transform correctly is straightforward,
as long as we remember that
xi ∂xj
∂e
= δki
∂xj ∂exk
Example 4.5.10. The trace of a (1, 1) tensor with components Aij is the scalar Aii . We can
write this as the contraction
Aii = δji Aji
with the Kronecker δ.

4.6. General tensors


Definition 4.6.1. A tensor field of type (r, s) is an object whose components in any basis
are of the form
Tij11...i
...jr
s
(4.6.1)
and which transform under change of coordinates according to the rule
i1 ∂xis ∂y q1 ∂y qr
...jr ∂x
Tepq11...p
...qr
= Tij11...i · · · · · · (4.6.2)
s s
∂y p1 ∂y ps ∂xj1 ∂xjr
It is possible to give the components the interpretation of a more geometric object, as we
have done for vectors, covectors, and tensors of type (0, 2) and (1, 1). We shall not do this now.
By way of motivation, note that when we differentiate a function we get a covector. In other
words, differentiation seems to increas the number of indices. Thus if we differentiate a function
twice, we might expect to get a tensor of type (0, 2) and if we differentiate a vector field twise
we might expect to get a tensor of type (1, 2).
This is true if we are working in a subset of a vector space and restrict ourselved only to linear
or affine transformations. It is not, however, true for more general types of transformations.
We can see this in the simplest case: starting from
∂f ∂f ∂yq
= (4.6.3)
∂xj ∂y q ∂xj

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

50 4. MULTIVARIABLE CALCULUS

we differentiate again, getting


∂2f ∂ 2 f ∂y p ∂y q ∂f ∂ 2 y q
= + (4.6.4)
∂xi ∂xj ∂y p ∂y q ∂xi ∂xj ∂y q ∂xi ∂xj
This is not the correct transformation law for a (0, 2) tensor (compare with (4.5.3)) unless the
Jacobian ∂y q /∂xj is constant. And the Jacobian is only constant if the original change of
coordinates is affine-linear.
Although this seems like a serious problem, we shall see in the next chapters a work-around:
there is a way to change the way we differentiate vectors (and covectors) by a term which also
transforms in such a way as to compensate for the ‘bad term’ on the RHS of (4.6.4).
4.6.1. Algebra of tensors.
• For each (r, s), the set of all tensors (or tensor fields) forms a vector space. That is
you can add any two tensors of the same type and multiply tensors by scalars.
• If T is of type (r, s) and S is of type (p, q), then the tensor product T ⊗ S is a tensor
of type (r + p, s + q). The components are just
m ...mp
Tij11...i
...jr
s
Sk11...kq
• If T is of type (r, s), then picking a pair of indices, one up and one down, we have a
contraction of T , a tensor of type (r − 1, s − 1). For example if we pick the first indices
upstairs and downstairs, we get
Tiiij22...i
...jr
s
.
Recall that by the summation convention, this is actually a sum over the index i.
Contraction of a different pair of indices will generally give a different tensor.
We note that the tensor product is distributive over addition.

4.7. Manifolds
A manifold is, roughly speaking, a topological space M , with the additional structure nec-
essary to be able to speak of smooth functions from M to R. This additonal structure is called
a smooth atlas and consists of systems of local coordinates satisfying certain compatibility con-
ditions. A function M → R is then called smooth if it is smooth when written in terms of any
of these local coordinate systems.
We are not going to get into the details of what a topological space is: it is a set of points
with enough structure (open sets) to be able to define continuous functions.
As in the definition of curvilinear coordinates on an open subset of Rn , suppose we have a
set of n continuous functions x(p) = (x1 (p), . . . , xn (p)) from U ⊂ M with image some open set
Ω of Rn .
Definition 4.7.1. The functions (x1 , . . . , xn ) from U to Ω form a local coordinate system
on M if the map U → Ω is one-one and onto, and if the inverse is continuous.
Thus every point p of U gets labelled by an ordered set of n real numbers which we’re
calling the coordinates of p, and conversely if this set of labels is taken from Ω, then it is the
label of one and only one point of U .
Definition 4.7.2. If p0 ∈ U is a given point, we say the coordinate system is centred at p0
if xj (p0 ) = 0 for all j.
Definition 4.7.3. An atlas on M is a collection of local coordinate systems xν : Uν → Ω′ν ,
where the open sets Uν cover M .
In this definition the subscript ν does not refer to the different components of the coordinate
system, but rather to the different local coordinate systems needed to cover M .
Remark 4.7.4. The individual local coordinate systems xν : Uν → Ων are often called
charts: thus an atlas is a set of a lot of charts (which made a lot more sense before everyone
was using GPS to find their way around).

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

4.7. MANIFOLDS 51

An atlas, without further conditions, is insufficient to define a consistent notion of when a


function M → R should be smooth. To explain what the additional condition is, let’s see how
far we can get with our atlas. If F is a function from M to R, then for each ν we get a function
Fν : Ων → R, defined by
Fν (xν (p)) = F (p)
which makes sense for p in U and xν in Ων .
Now since Ων is an open set of Rn , we know what it means for Fν to be smooth: it is the
classical condition that all partials of Fν should exist. So we want to say that F : M → R is
smooth if and only if all the Fν are smooth. For this to be consistent, we need it not to matter
which coordinate chart we use at p for those points which belong to two or more coordinate
charts (which will definitely happen).
So suppose that Uν ∩ Uµ 6= ∅ and consider the functions Fν and Fµ . We have, for
Fν (xν (p)) = Fµ (xµ (p)) for p ∈ Uν ∩ Uµ .
Let Ω′νbe the subset of Ων consisting of xν (p with p ∈ Uν ∩ Uµ , and let Ω′µ be the corresponding
subset of Ων . Then there is a 1:1 correspondence
Ω′ν ←→ Ω′µ , xν (p) = xµ (p). (4.7.1)
For clarity, write this as x = x(y), where x ∈ Ω′ν and y ∈ Ω′µ . Then
Fν (x) = Fµ (y(x))
and we want the LHS of this to be smooth whenever the RHS is smooth. This entails that the
functions y = y(x) with x ∈ Ω′ν and y ∈ Ω′µ should be a smooth change of coordinates in the
sense of §4.1.2. Thus we build this into our definition of smooth atlas:
Definition 4.7.5. An atlas as in Definition 4.7.3 is called smooth or differentiable if all the
change-of-coordinates maps (4.7.1), for all possible pairs µ and ν, are smooth where they are
defined1. A topological space, equipped with a smooth atlas, is called a smooth manifold of
dimension n if the image sets Ων are open subsets of Rn .
Remark 4.7.6. This is a daunting definition. In practice, some basic theorems are proved
(using the definition) which give us our supply of manifolds.
Example 4.7.7. The circle x2 + y 2 = 1 is a smooth manifold, of dimension 1.
Example 4.7.8. Rn is a smooth manifold, of dimension n. So is any open subset of Rn .
Example 4.7.9. The closed half-space x > 0 inside R2 is not a manifold. (Though it is a
manifold with boundary.)
Example 4.7.10. The closed quadrant {(x, y) : x > 0 and y > 0} is not a manifold.
(Though it is a manifold with corners.)
Example 4.7.11. The null cone {X ∈ M : η(X, X) = 0} is not a manifold for the more
serious reason that you can’t introduce coordinates at the vertex X = 0.
Example 4.7.12. Generalizing the example of the circle, if f (x, y, z) is a smooth function
of 3 variables then the level-set
{(x, y, z) : f (x, y, z) = c}
(any constant c) is a smooth manifold of dimension 2 if at least one of the partial derivatives
∂f ∂f ∂f
, , ,
∂x ∂y ∂z
is non-zero for every point (x, y, z) on the level set.
1A lot of notation is involved in making this precise, and I’m sparing you the details, which you can find in any
basic book on the subject

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

52 4. MULTIVARIABLE CALCULUS

The previous example generalises to functions of any number of variables, but the condition
on the non-vanishing of at least one of the partials at every point on the level-set f = c continues
to be essential.
It is interesting to note that the null cone, defined by
t2 − x 2 − y 2 − z 2 = 0
exactly fails to satisfy this condition on the partials at the origin: all partials vanish there as
well.
4.7.1. The tangent space. Let M be a smooth manifold. For each point p in M , we
can use the local coordinates defined near p to define the tangent space. We can either say
that it is the abstract vector space spanned by the partials corresponding to any choice of
local coordinates from the atlas or we can say that it is the space of directional derivatives at
p (and then show that this space is an n-dimensional vector space). Either way Tp M is an
n-dimensional vector space naturally associated with the point p.
We can now define a vector field X on M as a function which assigns to each p in M , a
vector Xp in Tp M , which is required to vary smoothly with p. As in the case of open subset of
Rn ‘varying smoothly with p’ means: when expanded as a linear combination of the ∂/∂xj , the
coefficients are smooth (in the domain of the coordinate system).
4.7.2. The cotangent space. This is the dual to the tangent space. If f is a smooth
function on M , then df is a smooth covector field on M : at each point it is in the dual space
Tp∗ M and varies smoothly with p.
If X is a vector field and f is a function, then Xf is the directional derivative of f with
respect to X. It is again a smooth function on M . It can also be written hX, df i, where h·, ·i
is the pairing between T M and T ∗ M .
4.7.3. General tensors. Taking it further, we can extend the idea of tensor field of any
type (r, s) to a manifold M , using the low-tech definition above: in any chart a tensor is given
by a collection of components, and these are required to transform according to (4.6.2) where
x = x(y) is any of the change-of-coordinates map arising from a smooth atlas on M .
4.7.4. Other smooth gadgets. With the aid of a smooth atlas we can define more than
just smooth function on M . For example, a smooth curve Γ on M is defined as a continuous
map Γ : I → M (I is an interval) with the property that the corresponding maps Γν : I → Ων ,
are all smooth, where
Γν (τ ) = xν (Γ(τ )) if Γ(τ ) ∈ Uν .
Similarly (I’ll omit the details) if M and M ′ are two manifolds, and F : M → M ′ is a mapping,
we can define what it means for F to be a smooth map between manifolds. The idea is that we
can look at F using the charts on M and M ′ and define F to be smooth if and only if all these
functions are smooth. The interested reader is referred to any standard introductory book on
differential geometry.
Remark 4.7.13. The formalism of general relativity works most naturally on the assumption
that space-time is a smooth 4-dimensional manifold. This is particularly important when trying
to understand black holes and the large-scale structure of the universe. For the purposes of this
course we shall mostly work with space-times that are subsets of R4 : but we shall need to work
as if it were a manifold, in other words, without assigning any privileged role to the standard
flat coordinates on R4 .

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

CHAPTER 5

Space-times and geodesics

The ‘happiest thought of Einstein’s life’ was a brilliant insight which led to the ‘geometriza-
tion’ of gravity. The physics you observe in a spaceship, with its engines switched off, far from
any source of gravity, is the same as the physics you observe if you are freely falling in a lift
under the influence of the earth’s gravity. This suggests that particles which are acted upon by
no force other than gravity should be regarded as freely falling, or essentially inertial. To incor-
porate the idea mathematically, we need an extension of the formalism of SR which is ‘generally
covariant’. Where the formalism of SR (4-vectors) was invariant under the Lorentz group (and
even the Poincaré group), these transformations are essentially linear. In GR we need a set-up
which allows for the ‘physics’ to be equivalent under arbitrary coordinate transformations.
In this chapter we shall introduce curved 4-dimensional space-times and study geodesics.
The hypothesis is that freely falling particles follow such geodesics. We shall also show that at
any event in space-time, ‘local inertial coordinates can be introduced’. These are coordinates
in which the coefficients are the same as the Minkowski metric up to second order.
In the next chapter we shall introduce curvature. This is the quantity (it is, unfortunately,
a 4-index tensor) which allows us to tell whether space-time is actually curved or not. One
remark about this here. The general equivalence principle described above: that the physics
of the freely falling lift and the inertial spaceship should be the same—only applies on small
scales of space and time, or ‘locally’ as we say in the trade. There is a difference in behaviour
of freely falling particles in deep space as opposed to near the earth. According to Newton, the
gravitational field near the earth is not uniform: in fact it is given by the famous inverse-square
law. This means that if if Alice and Bob are freely falling near the earth, and if Alice is nearer to
the Earth’s surface than Bob, then her acceleration will be greater. In other words the relative
acceleration of Alice, as measured by Bob, will be non-zero. In deep space, far from any stars,
planets, or other gravitating objects, the relative acceleration of two observers will be zero. This
is extremely subtle and is tied in with the subtle notion of curvature. More on that story later.

5.1. Curved space-time


Definition 5.1.1. A curved space-time (or simply space-time (M , g) is a 4-dimensional
manifold, with a given lorentzian metric g.

Remark 5.1.2. For those of you who haven’t yet mastered §4.7, you can think that M is
just fancy notation for R4 , or perhaps an open subset of it. But we are not allowed to use the
vector-space structure of R4 in developing the formalism: everything we do from now on has to
use the metric g and must work in any choice of local coordinates, or be ‘generally covariant’
to use somewhat old-fashioned terminology.

Recall that relative to a choice of (local) coordinates xa = (x0 , x1 , x2 , x3 ), the metric has
the form
ds2 = gab dxa dxb . (5.1.1)
Here the summation convention is in force and the components gab of g with respect to the
coordinates xa form a 4 × 4 symmetric matrix whose entries are smooth functions. To say
the metric is lorentzian is to say that at any point x, the matrix (gab (x)) is invertible and has
signature (+, −, −, −).
53

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

54 5. SPACE-TIMES AND GEODESICS

Remark 5.1.3. A closely related notion is that of a riemannian manifold of dimension n.


This is a pair (M , g), where M is a manifold of dimension n and g is a riemannian metric, i.e.
ds2 = gij dxi dxj
where the components gij form a symmetric n × n positive-definite matrix. We shall use the
case n = 2 to illustrate the theory.
Remark 5.1.4. The ‘right’ way to think of a lorentzian or riemannian metric is an as-
signment p 7→ g(p), where g(p) is a symmetric bilinear form on the tangent space Tp M . For
each p, g(p) is positive-definite in the riemannian case and has signature (+, −, . . . , −) in the
lorentzian case. Of course the assignment is supposed to be smooth as p varies in M . To make
this smoothness precise, we look at the components gab (x) of the metric in some smooth local
coordinate system, and insist that these be smooth.
We shall generally use indices a, b, c, . . . for space-time indices running from 0 to 3; and
indices i, j, k, . . . for indices running from 1 to n (where n is usually 2 or 4 in examples).
Our metric is a symmetric (0, 2)-tensor. It has an inverse, with components denoted g ab ,
which form a (2, 0)-tensor. The statement that g ab is inverse to gab is written
gab g bc = δac . (5.1.2)
Example 5.1.5. The standard metric euclidean metric on R2 is
ds2 = dx2 + dy 2
The same metric in polar coordinates can be computed as follows: from
x = r cos θ, y = r sin θ,
then
dx = cos θ dr − r sin θ dθ, dy = sin θ dr + r cos θ dθ.
Squaring and adding,
dx2 + dy 2 = (cos θ dr − r sin θ dθ)2 + (sin θ dr + r sin θ dθ)2
= dr2 + r2 dθ2 .
Note that the coefficient of the cross-term dr dθ + dθ dr is zero.
Example 5.1.6. If
ds2 = dr2 + r2 dθ2 ,
then writing x1 = r, x2 = θ,
g11 = 1, g12 = g21 = 0, g22 = r2 .
Hence
1
g 11 = 1, g 12 = g 21 = 0, g 22 =
.
r2
This is because    
1 0 ij 1 0
gij = and g =
0 r2 0 r−2
are inverse to each other.
Example 5.1.7. The 2D metric
ds2 = dudv + dvdu
has components
g00 = g11 = 0, g01 = g10 = 1
if u = x0 and v = x1 . In this case the components of g ij are the same
g 00 = g 11 = 0, g 01 = g 10 = 1.
Remark 5.1.8. The change of variables t = u + v, x = u − v brings the previous metric into
inertial form (diagonal, with one + and one − on the diagonal).

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

5.1. CURVED SPACE-TIME 55

Example 5.1.9. Spherical polars 3-dimensional spherical polar coordinates are given by
x = r sin θ cos ϕ, y = r sin θ sin ϕ, x = r cos θ. (5.1.3)
Show that the euclidean metric
ds2 = dx2 + dy 2 + dz 2
takes the form
ds2 = dr2 + r2 (dθ2 + sin2 θdϕ2 ). (5.1.4)
5.1.1. Lorentzian metric at a point.
Proposition 5.1.10. Suppose that
ds2 = gab dxa dxb
is a 4-dimensional lorentzian metric defined near the point p with coordinates xa (p) = 0 (a
coordinate system centred at p). Then there are coordinates y a , also centred at p, such that
geab (p) = ηab .
Thus, very near any given point p of M , the geometry of M is approximately the same as
Minkowski space.
Remark 5.1.11. We shall do better than this: in §5.5, we shall see that we can choose
coordinates so that
geab (e x| 2 )
x) = ηab + O(|e
for small x
e. Such a choice of coordinates will be called local inertial coordinates. They give the
best approximating Minkowski space at the point with coordinates x e = 0.
Proof. It is sufficient to make a change of coordinates which is linear:
xa = Jba y b .
Then by the transformation law for tensors of type (0, 2),
geab = gpq Jap Jbq
In matrix form this is
ge = J t gJ.
By the basic theorem about diagonalization of symmetric bilinear forms, K can be chosen to
make ge(0) diagonal, with diagonal entries ±1. The signs are determined by the signature of g.
If the latter is Lorentzian, this yields the Minkowski metric η = diag(1, −1, −1, −1). 

5.1.2. Timelike/spacelike/null.
Definition 5.1.12. A tangent vector X = X a ∂a is called timelike at p in M if
g(X, X)(p) = gab X a X b |p > 0,
null if
g(X, X)(p) = gab X a X b |p = 0
and spacelike if
g(X, X)(p) = gab X a X b |p < 0.
The set of null vectors at a point p form a cone (in Tp M ) which are supposed to be tangent
to photon worldlines through p.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

56 5. SPACE-TIMES AND GEODESICS

5.2. Events and worldlines


As in the case of SR, the points of our 4-dimensional space M are called events, localized in
time and space. Particles, observers and photons are described by worldlines, i.e. parameterized
curves in M . The following definition captures the idea that massive particles cannot travel
faster than light.
Definition 5.2.1. Let (M , g) be a 4-dimensional space-time. A paramaterized curve γ(t)
is called timelike if its tangent vector is timelike for every value of the parameter t,
g(γ̇, γ̇) > 0 (5.2.1)
Similarly, a parameterized curve γ(t) is called null if its tangent vector is null for every value of
the parameter t,
g(γ̇, γ̇) = 0. (5.2.2)
As in SR, where (M , g) reduces to (M, η), massive particles follow timelike curves in M : this
corresponds to the speed of the particle being everywhere less than the speed of light. Similarly
photons follow null curves. Also as in Minkowski space, these curves are called worldlines.
Hypothesis 5.2.2. A timelike curve γ is parameterized by proper time τ if
 
dγ dγ
g , = 1. (5.2.3)
dτ dτ
Then τ is the time that would be shown on a clock with worldline γ(τ ). More precisely, if Alice’s
worldline is γ(τ ) and p = γ(τ1 ) and q = γ(τ2 ) are two events on her worldline, then her clock
will show an elapsed time τ2 − τ1 between these two events if (5.2.3) holds.
Given any timelike curve, γ
e(u), there is always a reparameterization
γ(τ ) = γ
e(u(τ ))
of the curve by proper time, cf. Proposition 1.3.4.

5.3. Geodesics
In special relativity, inertial observers were taken to travel at constant speed along straight
lines in Minkowski space M. One of the definitions of straight line is a curve which minimises
the energy (cf. §1.4), amongst all those with fixed endpoints.
Using the space-time metric g on M , we can do the same thing.
Definition 5.3.1. The energy of a curve γ : [t0 , t1 ] −→ M is defined to be
Z
1 t1
E [γ] = g(γ̇(t), γ̇(t)) dt. (5.3.1)
2 t0
A geodesic with end-points p and q is a curve which extremizes the energy among all curves
with γ(t0 ) = p, γ(t1 ) = q.
Hypothesis 5.3.2. In GR, freely falling particles (and free photons) follow geodesics, time-
like for particles and null for photons. Here ‘freely falling’ means ‘acted upon by no force except
gravity’.
Definition 5.3.3. Let xa = (x0 , x1 , x2 , x3 ) be a given coordinate system, such that the
metric coefficients are gab ,
ds2 = gab dxa dxb .
The Christoffel symbols of gab are defined by the formula
1
Γcab = g cs (∂a gbs + ∂b gas − ∂s gab ) . (5.3.2)
2
Remark 5.3.4. Note the symmetry of Γ in its two lower indices,
Γcab = Γcba . (5.3.3)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

5.3. GEODESICS 57

Theorem 5.3.5. Let γ(t) be a curve in M and suppose that in some coordinate system, it is
given by t 7→ xc (t). Then the Euler–Lagrange equations for E (γ) are equivalent to the equations
ẍc + Γcab ẋa ẋb = 0. (5.3.4)
where Γcab are as in the previous definition. If xc (t) is a geodesic, then gcd ẋc ẋd is constant.
Remark 5.3.6. The system of equations (5.3.4) are called the geodesic equations. They are
frequently a more convenient way of getting at the Γs, as we shall see in examples.
Proof. This is a calculus of variations problem with Lagrangian L(x, ẋ) = 12 g(x)[ẋ, ẋ] =
1 a b
2 gab (x)ẋ ẋ . We have
∂L ∂L 1
s
= gsb ẋb , s
= ∂s gab ẋa ẋb (5.3.5)
∂ ẋ ∂x 2
so
d ∂L ∂gas a b
s
= gsb ẍb + ẋ ẋ . (5.3.6)
dt ∂ ẋ ∂xb
Thus the Euler–Lagrange equations are
∂gas a b 1
gsb ẍb + ẋ ẋ − ∂s gab ẋa ẋb = 0 (5.3.7)
∂xb 2
Now multiply through by g cs (and sum over s):
 
∂gas 1
ẍc + g cs − ∂ a b
s ab ẋ ẋ = 0
g (5.3.8)
∂xb 2
Now  
∂gas a b 1 ∂gas ∂gbs
ẋ ẋ = + ẋa ẋb (5.3.9)
∂xb 2 ∂xb ∂xa
and substituting this into (5.3.8), taking into account the definition of the Γs, yields (5.3.4).
For the last part, the Lagrangian is homogeneous of degree 2 in the velocities, and so is
conserved along a solution curve (Proposition 1.4.4: the potential energy part is zero in the case
at hand.) 

Computing the Γs is the first step in computing the curvature of the metric, and computing
the geodesic equations is needed to understand particle (and photon) motion in GR. We therefore
give some worked examples. In each case, the Γs are read off the geodesic equations (5.3.4) rather
than from the formula (5.3.2).
Example 5.3.7. Minkowski space. This is
ds2 = ηab dxa dxb .
The Lagrangian is
1
L = ηab ẋa ẋb
2
Then
∂L ∂L
a
= ηab ẋb , a = 0. (5.3.10)
∂ ẋ ∂x
Then the geodesic equations are
d ∂L
= ηab ẍb = 0. (5.3.11)
dτ ∂ ẋa
Thus the geodesic equations are ẍb = 0, from which we read that all Γs are zero. The geodesics
have the form
xa (τ ) = Y a + U a τ.
Of course, we already knew this.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

58 5. SPACE-TIMES AND GEODESICS

Example 5.3.8. (The 2D hyperbolic metric) This is


dx2 + dy 2
ds2 = (5.3.12)
x2
2 2 2
The Lagrangian L = (ẋ + ẏ )/2x . The Euler–Lagrange equations are
d ẋ 1 d ẏ
2
+ 3 (ẋ2 + ẏ 2 ) = 0; = 0.
dτ x x dτ x2
Rearranging, these become
ẋ2 ẏ 2 ẋẏ
ẍ − + = 0, ÿ − 2 = 0. (5.3.13)
x x x
With x1 = x, x2 = y, these are supposed to be identical to the geodesic equations
ẍ1 + Γ1ij ẋi ẋj = 0, ẍ2 + Γ2ij ẋi ẋj = 0. (5.3.14)
Hence
1 1 1
Γ111 = − , Γ122 = , Γ212 = Γ221 = − , others = 0.
x x x
NB the factor of 2 in Γ12 coming from the symmetry Γ12 = Γi21 .
2 i

Example 5.3.9. The geodesics in 2D hyperbolic space Rather than tackle the second-
order equations (5.3.13) directly, it is better to work with conserved quantities. The first is the
length of the velocity vector. If we assume τ is arc-length (i.e. L = 1), then
ẋ2 + ẏ 2 = x2 . (5.3.15)
The Lagrangian is also independent of y, and so ∂L/∂ ẏ is constant along geodesics. So in
addition to (5.3.15), we also have

= C (constant). (5.3.16)
x2
If C = 0, then ẏ = 0, so y = y0 (constant). Subsituting this into (5.3.15) we get dx/x = ±dτ .
Picking the + sign, and integrating, we get x = x0 eτ . Thus one family of geodesics are half-lines
parallel to the x-axis, given by
(x(τ ), y(τ )) = (x0 eτ , y0 ).
Note the non-standard parameterization of these straight lines. In particular, the ‘boundary
point’ (0, y0 ) is infinitely far away in τ : we require τ → −∞ for (x, y) → (0, y0 ). This is not
outrageous because the metric (5.3.12) blows up at these points with x = 0.
Now we have to deal with the case C 6= 0. For this, divide (5.3.15) through by ẏ 2 ,
 2
dx x2 1
+ 1 = 2 = 2 2, (5.3.17)
dy ẏ C x
where we’ve used (5.3.16) and
ẋ dx
= . (5.3.18)
ẏ dy
Rearranging (5.3.17), we obtain
Cxdx
√ = ±dy
1 − C 2 x2
and this can be integrated to get
p
± 1 − C 2 x2 = C(y − y0 )
for some constant of integration y0 . This can be rearranged to give
x2 + (y − y0 )2 = C −2 , x > 0 (5.3.19)
i.e. a semicircle with diameter along the y axis. It is pleasing that as C → 0, the radius C −1
tends to infinity and these semicircles approach the straight half-lines that we saw previously.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

5.3. GEODESICS 59

You can verify that


(x(τ ), y(τ )) = (C −1 sech τ, y0 + C −1 tanh u) (5.3.20)
is a parameterizaiton by arclength. (To obtain this, you eliminate x from (5.3.16) using (5.3.19),
to get
dy
= C(C −2 − (y − y0 )2 )

and integrate this up to get y as a function of τ .)
Note that in this example it is relatively easy to obtain the geodesics as an implicit relation
between x and y (5.3.19), whereas finding x and y as a function of τ is rather more involved.
This is quite typical of the simple examples we shall see in this course.
Example 5.3.10. Minkowski space in polar coordinates. In spherical polars, the Minkowski
metric takes the form
ds2 = dt2 − dr2 − r2 (dθ2 + sin2 θdϕ2 ). (5.3.21)
The Lagrangian is
1 2 
L= ṫ − ṙ2 − r2 θ̇2 − r2 sin2 θϕ̇2 (5.3.22)
2
Then
∂L ∂L ∂L ∂L
= ṫ, = −ṙ, = −r2 θ̇, = −r2 sin2 θϕ̇ (5.3.23)
∂ ṫ ∂ ṙ ∂ θ̇ ∂ ϕ̇
∂L ∂L ∂L ∂L
= 0, 2 2 2
= −r(θ̇ − sin θϕ̇ ), = −r2 sin θ cos θϕ˙2 , = 0. (5.3.24)
∂t ∂r ∂θ ∂ϕ
Thus the geodesic equations are
ẗ = 0, (5.3.25)
2 2 2
r̈ − r(θ̇ + sin θϕ̇ ) = 0, (5.3.26)
2
θ̈ + ṙθ̇ − sin θ cos θϕ̇2 = 0, (5.3.27)
r
2
ϕ̈ + ṙϕ̇ + 2 cot θθ̇ϕ̇ = 0. (5.3.28)
r
With x0 = t, x1 = r, x2 = θ, x3 = ϕ, we read off that
Γ122 = −r, Γ133 = −r sin2 θ, (5.3.29)
1
Γ212 = Γ221 = , Γ233 = − sin θ cos θ, (5.3.30)
r
1
Γ313 = Γ331 = , Γ323 = Γ332 = cot θ, (5.3.31)
r
while all others are zero. Note again that there are factors of 2 between the Γs and the coefficients
in the geodesic equations for the Γabc with b 6= c.
We will not go into finding the geodesics here as the moves you have to make were already
described in Example 1.4.5. And you are urged to review that example now! Of course this is
a bit different because of having the variable t as an additional coordinate in the problem. But
since ṫ is a constant, λ, say, the constancy of L means that
ṙ2 + r2 (θ̇2 + sin2 θϕ̇2 ) = 2L − 2λ2
is a constant. Also J = r2 sin2 θϕ̇ is a constant and we can restrict to equatorial curves where
θ = π/2 identically. Then we find that
ṙ2 + r2 ϕ̇2 = 2L − 2λ2 and J = r2 ϕ̇ (5.3.32)
are both constants just as in (1.4.18); the solution follows as there.
Remark 5.3.11. One important remark from this is that the Christoffels can be non-zero
even if the metric is the Minkowski metric. Written in funny coordinates (here spherical polars)
many of the Γs are non-zero, although the intrinsic geometry of the metric is unchanged.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

60 5. SPACE-TIMES AND GEODESICS

5.4. A first look at the covariant derivative


If X and Y are two vector fields on M , then we denote by ∇X Y the vector field with
components
(∇X Y )c = X a ∂a Y c + Γcab X a Y b . (5.4.1)
This is called the covariant derivative of Y with respect to X. In the previous sentence I slipped
in the idea that ∇X Y is a vector field. This is not obvious: neither of the two terms on the RHS
in (5.4.1) transforms as a vector field under a change of coordinates. However, the combination
does transform as a vector field:
Theorem 5.4.1. For any vector fields X and Y , ∇X Y is again a vector field on M .
A proof will be given in the next chapter. You can also look in §5.1 of Woodhouse, GR, for
a more computational proof.
5.4.1. Covariant derivative of a vector along a curve. Let γ(τ ) be a curve in M with
tangent vector
γ̇ = X. (5.4.2)
If Y is another vector field defined along (or near) γ, then
∇X Y = ∇γ̇ (Y ) (5.4.3)
is the covariant derivative of Y along γ.
On a curved space-time this is the best available replacement for the derivative of a vector
along a curve in Minkowski space. In particular, if we take Y = X = γ̇ we define the acceleration
of the curve to be the vector field along γ
A = ∇γ̇ γ̇ (5.4.4)
Then the LHS of the geodesic equation
ẍc + Γcab ẋa ẋb = Ẋ c + Γcab X a X b = X a ∂a X c + Γcab X a X b = ∇X X. (5.4.5)
Hence
Proposition 5.4.2. The curve τ 7→ γ(τ ) is a geodesic if and only its acceleration is zero.
This is pleasing because it is a natural generalization of what happens in Minkowski space.
In that case, the geodesics are of the form
τ 7→ X + U τ
where X and U are constant vectors. Then the tangent vector is U and the acceleration is zero.
And conversely any curve with zero acceleration in Minkowski space is of the above form and
extremizes the energy.
Definition 5.4.3. If γ(τ ) is a curve in M with tangent vector γ̇ = X then a vector field Y
along γ is said to be parallel, parallel-transported, or parallel-propagated along γ if
∇X Y = 0 along γ. (5.4.6)
Explicitly, in local coordinates, the parallel propagation equation (5.4.6) has the form
Ẏ c + Γcab ẋa Y b = 0. (5.4.7)
Note that to be completely explicit, the Γs here are evaluated at the point xa (τ ).
With the
curve fixed, xa (τ ) and ẋa (τ ) are known, and so (5.4.7) is a first-order system of equations for the
unknown components Y b (τ ) as a functions of τ . Thus given any point τ0 and a given tangent
vector Z at γ(τ0 ), there is a unique solution of (5.4.7) with initial condition Y (τ0 ) = Z. In this
situation, we say that Y (τ1 ) is obtained from Z by parallel transport along γ.
In Minkowski space we know when two vectors at different points are parallel (or point in
the same direction). In a general curved space M there is no such global notion of parallelism,
and the above is the best one can do.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

5.5. LOCAL INERTIAL COORDINATES 61

5.4.2. Photons. A photon worldline in GR is a null geodesic τ 7→ γ(τ ),


g(γ̇, γ̇) = 0.
The frequency 4-vector of a photon, K, is assumed to be a constant multiple of its velocity
vector γ̇. If U is an observer with velocity 4-vector U (i.e. g(U, U ) = 1), and p is an event on
the photon worldline and the observer’s worldline, then the frequency measured by the observer
is g(p)[U (p), K(p)]. Compare with analogous situation in SR, §3.3.

5.5. Local inertial coordinates


Let g be a lorentzian metric with components gab in local coordinates defined near xa = 0.
We have seen that we can suppose that gab (0) = ηab , the standard Minkowski metric, by a linear
change of coordinates. Then
gab (x) = ηab + Habc xc + O(|x|2 ) (5.5.1)
for some set of numbers Habc with
Habc = Hbac . (5.5.2)
Proposition 5.5.1. The Christoffel symbols of (5.5.1) at x = 0 are
1
Γcab (0) = η cs (Hsab + Hsba − Habs ) . (5.5.3)
2
Proof. This follows by substituting
∂s gab (0) = Habs (5.5.4)
into the formula for the Christoffels (5.3.2). 
In particular, geodesics do not have the simple form ẍa = 0, even at x = 0, if H 6= 0. So
such a coordinate system is not a particularly good approximation to an inertial coordinate
system in Minkowski space.
However, a further change of coordinates can always be made to kill the coefficients Habc at
x = 0.
Theorem 5.5.2. Let (M , g) be a space-time and let p be any point of M . There exists a
choice of coordinates xa such that xa (p) = 0 and
gab = ηab + O(|x|2 ) for small x. (5.5.5)
Such coordinates are called local inertial coordinates at p. Note that no statement is made
about the form of the metric at other points near, but not equal to p.
The notation is motivated as follows: from the Proposition, if the metric has the form (5.5.5)
then Γcab = 0 with respect to these coordinates. Thus the geodesic equations
ẍc + Γcab ẋa ẋb = 0
reduce to ẍc at xa = 0, so at this point, at least, the equation is the same as for inertial worldlines
in Minkowski space. In particular freely falling particles and free photons will have worldlines
that appear straight in a very small neighbourhood of x = 0 in these special coordinates. This
is the precise sense in which GR reduces to SR over small length- and time-scales, but it only
works well relative to one of these inertial coordinate systems.

5.5.1. Proof of Theorem 5.5.2. We may suppose that


gab = ηab + Habc xc
as the next term in the Taylor expansion is already of quadratic order. Consider the coordinate
transformation
1
xc = y c − Gcab y a y b (5.5.6)
2

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

62 5. SPACE-TIMES AND GEODESICS

where the Gcab is an array of numbers to be determined, symmetric in the indices ab. The
Jacobian of the transformation is
xc
∂e
= Jpc = δpc + jpc , where jpc = Gcap y a . (5.5.7)
∂xp
This is invertible when y = 0 so by the inverse function theorem, (5.5.6) has a smooth inverse
as a mapping between a neighbourhood of y = 0 and a neighbourhood of x = 0.
Now we calculate, keeping only the leading terms (because we don’t really care what’s
happening at O(|y|2 ) and beyond):
gpq dxp dxq = gpq (x)(dy p − Gpac y a dy c )(dy q − Gqbd y b dy d )
= gpq dy p dy q − Gacq y a dy c dy q − Gbdp y b dy p dy d + O(|y|2 ) (5.5.8)
where we have set
Gacp = gpq Gqac , so Gacp = Gcap . (5.5.9)
Hence
gepq (y) = gpq (x(y)) − Gapq y a − Gaqp y a + O(|y 2 |). (5.5.10)
Now on the RHS we still have g(x) and we need to write this in terms of y. The inverse to our
transformation has the form
1
y a = xa + Gabc xb xc + O(|x|3 ) (5.5.11)
2
as you can see by inserting (5.5.6) on the RHS of this equation. It follows that
gpq (x(y)) = gpq (y) + O(|y|2 ) = ηpq + Hpqr y r + O(|y|2 ).
and so
gepq (y) = ηpq + Hpqa y a − Gapq y a − Gaqp y a + O(|y|2 ). (5.5.12)
Thus we will get rid of the first order terms in y if we can choose the numbers G so that
Gapq + Gaqp = Hpqa . (5.5.13)
(Here we have used the symmetry of the indices of G to neaten up the equation.) This is an
equation for the array of number G in terms of the array of numbers H. In Lemma 5.5.9 below
it is shown that this can always be solved, the solution being
1
Gabc = (Hacb + Hbca − Habc ). (5.5.14)
2
Note that this formula depends in an important way on the symmetry (5.5.9) of G. Inverting
(5.5.9) we define
1
Gdab = η cd Gabc = η cd (Hacb + Hbca − Habc ).
2
Thus G is uniquely determined by H, and the by the above calculations, the change of coor-
dinates (5.5.6) gives metric components geab which satisfy the conditions of the Theorem. The
proof is complete.
Remark 5.5.3. The array of numbers Gcab is nothing but Γcab (0), the Christoffel symbols of
the metric components gab , evaluated at x = 0 (cf. Proposition 5.5.1).
It remains to prove:
Lemma 5.5.4. The equation (5.5.13) is solved by (5.5.14),
1
Gabc = (Hcab + Hcba − Habc )
2
Proof. One proof of this is simply to substitute (5.5.14) into (5.5.13) and see that it works.
Namely, G is symmetric in its first two indices: simply switch them, and use the symmetry of
H in its first two indices. And then
1
Gabc + Gacb = (Hcab + Hcba − Habc + Hbac + Hbca − Hacb ). (5.5.15)
2

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

5.6. A SNEAK PREVIEW OF CURVATURE 63

Now use the symmetry of H in its first two indices to arrange the indices as far as possible in
alphabetical order:
1
Gabc + Gacb = (Hacb + Hbca − Habc + Habc + Hbca − Hacb ) = Hbca (5.5.16)
2
as required.
There is also a derivation of this formula in the Problem set, Problem 4.9. 

5.6. A sneak preview of curvature


By Theorem 5.5.2, in inertial coordinates,
1
gab = ηab + Pabcd xc xd + O(|x|3 ), (5.6.1)
2
where P is an array of numbers with the symmetry properties
Pabcd = Pbacd = Pabdc . (5.6.2)
We ask: is there a change of coordinates which can get rid of the P term here? The answer is
no in general, but it is interesting to try.
It would be natural to try
1 a b c d
ea = xa + Wbcd
x xxx (5.6.3)
6
to change P . Note that here the array of numbers W satisfies
a
Wbcd is totally symmetric in bcd (5.6.4)
in the sense that
a a a a
Wbcd = Wcbd = Wbdc = Wdcb etc.
In Problem 4.8, you are invited to show that if gab is as in (5.6.1) and x
e and x are related
by (5.6.3), then
1
geab = ηab + Peabcd x
ec x
ed + O(|e
x|3 ) (5.6.5)
2
where
Peabcd = Pabcd − Wabcd − Wabdc (5.6.6)
Now I claim that this cannot be solved in general, because W just does not have enough
parameters! For this, we need to do some counting.

5.6.1. Counting tensor components.


Definition 5.6.1. A tensor T of type (0, m) in n dimensions is said to be totally symmetric
if for the corresponding m-linear form we have
T (v1 , . . . , vi , . . . , vj , . . . , vm ) = T (v1 , . . . , vj , . . . , vi , . . . , vm )
for any i and j. In components this is the same as saying
Tp1 ...pi ...pj ...pm = Tp1 ...pj ...pi ...pm .
Proposition 5.6.2. The dimension of the vector space of all totally symmetric tensors of
type (0, m) in n dimensions is
 
n+m−1
. (5.6.7)
m
Definition 5.6.3. A tensor of type (0, m) in n dimensions is totally skew (or totally skew
symmetric if the corresponding m-linear form has the property
T (v1 , . . . , vi , . . . , vj , . . . , vm ) = −T (v1 , . . . , vj , . . . , vi , . . . , vm )
for any i and j.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

64 5. SPACE-TIMES AND GEODESICS

Proposition 5.6.4. The dimension of the vector space of totally skew tensors of type (0, m)
in n dimensions is  
n
(5.6.8)
m
(In particular the dimension is 1 if n = m and 0 if m > n.)
Proof. We shall prove both of these together. By way of a warm-up let’s make sure that
we understand that
The dimension of the space of all tensors of type (0, m) in n dimensions is nm (5.6.9)
To see this, note that such a tensor has m indices. We can choose each one of these in n ways1.
All choices are independent because there are no symmetry conditions. So the total number of
possibilities is nm . The number of these choices is equal to the dimension of the space of these
tensors.
Let us move on to the proof of Proposition 5.6.4. Again we have a tensor with m indices.
For ease of exposition, suppose that m = 3. Any component with two indices the same must
be zero, because (for example)
T115 = −T115
by switching the first two indices. So the only non-zero components of T have distinct indices.
If we have a set of 3 distinct indices, say, 523, then we can use the skew symmetry to relate the
T523 to T235 , where the indices are now in increasing order:
T523 = −T253 = T235
(switching first the first two indices and then the last two). Thus the number of independent
components of a totally skew tensor of type (0, 3) is equal to the number of unordered subsets of
3 elements of the set {1, . . . , n}. This is the binomial coefficient
 
n
3
The general case, with 3 replaced by m, works in the same way.
Finally let us prove Proposition 5.6.2. The big difference from the case of skew symmetry is
that now that indices can take the same value without that component being zero. For a small
number of indices (e.g. 3) it’s possible to count by hand. There are n components where the
indices are the same:
T111 , T222 , . . . , Tnnn .
There are n(n − 1) with precisely two indices the same:
T112 , T113 , . . . , T11n ; T221 , T223 , . . . .
And there are n(n − 1)(n − 2)/6 where the indices are all distinct. Thus the total number of
independent components of a totally symmetric tensor of type (0, 3) in n dimensions is:
1 n(n + 1)(n + 2)
n + n(n − 1) + n(n − 1)(n − 2) = (5.6.10)
6 6
which checks with (5.6.7) if m = 3.
This approach can be generalized to tensor of any rank, but it’s pretty messy. The following
is the cunning way of doing it.
Consider a collection of indices on our totally symmetric tensor with m indices. It will
consist of m1 1’s, m2 2’s, and so on, up to mn n’s. Here the mj are allowed to be 0, but the
constraint that the tensor is of type (0, m) is
m1 + m2 + · · · + mn = m (5.6.11)
This combinatorial problem can be visualised in the following way. Consider an arrangement of
m coins and n − 1 pencils in a line, as in the example below:
1Since they vary from 1 to n

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

5.6. A SNEAK PREVIEW OF CURVATURE 65

Given such an arrangement, we count the coins to the left of the first pencil, and call that m1 .
Then we count the coins between the first and second pencils, and call that m2 . Proceeding
in this way, we get a collection of n integers mj > 0, satisfying the constraint (5.6.11). (Note
that mj = 0 if two of the pencils are right next to each other, or if there is a pencil at the very
beginning or the very end of the line.)
In the pictured configuration there are 5 pencils and 8 coins, and
m1 = 3, m2 = 2, m3 = 1, m4 = 2, m5 = 0, m6 = 0.
This would correspond to the component
T11122344
of a totally symmetric tensor of type (0, 8) in 6 dimensions. The number of arrangements of m
coins and n − 1 pencils is the same as the number of ways of choosing m objects (the ones to
be called coins) from a total of m + n − 1. This is the binomial coefficient (5.6.7). 
5.6.2. Sneak preview of curvature, continued. We’ve seen that the coordinate trans-
formation (5.6.3) allows us to change the array Pabcd , where
Pabcd = Pbacd = Pabdc (5.6.12)
to
Peabcd = Pabcd − Wabcd − Wabdc (5.6.13)
where W is symmetric in its last 3 indices.
What is the dimension of the space of P ’s? P is symmetric in its first two indices and its
third and fourth indices, but there is no other symmetry. So it is like a 2-index object, PIJ ,
where I and J run over a basis of the space of symmetric 2-index tensors. If that dimension is
N , the dimension of the space of P s will be N 2 . But we’ve seen that N = n(n + 1)/2 (if the
dimension is n—we work generally for the moment). Hence:

In an n-dimensional manifold, the dimension of the space of P s is


1 2
n (n + 1)2
4

On the other hand, the dimension of the space of W s is n (for the extra index) times
n(n + 1)(n + 2)/6 (the dimension of totally symmetric tensors of type (0, 3).

In an n-dimensional manifold, the dimension of the space of W s is


1 2
n (n + 1)(n + 2).
6

Thus the dimension of the space of P ’s minus the dimension of the space of W ’s is
n2 (n + 1)2 n2 (n + 1)(n + 2) n2 (n + 1)
− = (3(n + 1) − 2(n + 2))
4 6 12
n2 (n + 1)
= (n − 1)
12
1 2 2
= n (n − 1) (5.6.14)
12
Since this number if positive for n > 2, it will be impossible to solve (5.6.13) to make Pe = 0 in
general. (Our calculation shows that Pe = 0 is a system of linear equations with more equations
than unknowns.)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

66 5. SPACE-TIMES AND GEODESICS

Thus, while some components of P can be killed by coordinate transformations, there are
others, in fact n2 (n2 − 1)/12 of them, which cannot. These ‘unkillable’ components of P form
a tensor called the curvature of g at the point x = 0.
In fact, the Riemann curvature tensor at x = 0 is built out of P in the following way:
1
Rabcd = (Pacbd + Pbdac − Padbc − Pbcad ). (5.6.15)
2
It can be checked that if P is changed to Pe as in (5.6.13), then the components of R do not
change! Thus if this particular combination of components of P is non-zero then it cannot be
killed by coordinate transformation.
These matters will be discussed much more extensively in the next chapter, where we shall
see a different, but equivalent, definition of curvature.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

CHAPTER 6

Covariant differentiation and curvature

6.1. Introduction
In the previous chapter we introduced geodesics in a curved space-time. The geodesic
equation motivated the introduction of the combination
X a ∂a Y c + Γcab X a Y b (6.1.1)
which we claimed form the components of a vector field ∇X Y the covariant derivative of Y with
respect to X. In this chapter we shall sketch a proof of this important fact and shall define the
curvature tensor. One definition of this is as follows
R(X, Y )Z = (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z (6.1.2)
for vector fields X, Y , Z. In index form, the LHS is written
Rabc d X a Y b Z c (6.1.3)
where Rabc d are the components of a tensor of type (1, 3).
The curvature is zero for Minkowski space, and conversely (though we won’t prove this)
if the curvature is zero near a point, then we can find coordinates in which the metric is the
Minkowski metric near that point. R is thus a good coordinate-independent measure of whether
a metric is curved or not.
Curvature makes its presence felt in the behaviour of families of geodesics. We shall derive
the geodesic deviation equation and interpret this in terms of relative acceleration of nearby
geodesics. By comparing this with what happens in newtonian gravity we shall relate the
newtonian potential to the metric and derive the analogue of Laplace’s equation ∇2 ϕ = 0 for
(time-independent) gravitational fields in empty space. Thus we are led to Einstein’s equations
rab = 0, where r is the Ricci tensor of the metric.
Finally we shall work out the weak-field limit as another cross-check between the formalism
of GR and newtonian gravity.

6.2. The covariant derivative


We shall start by showing that if X and Y are vector fields, then so is ∇X Y , defined in
components by (6.1.1). This is not trivial because the two terms separately certainly do not
transform as the components of a vector field. The term in the Γs is needed to correct for
the bad transformation laws of the components of the partial derivatives of a vector field with
respect a given system of coordinates.
Lemma 6.2.1. If X is any vector field, then so is ∇X X.
Proof. We obtain this from the variational formulation.
Consider a 1-parameter family of curves
H(τ, σ) α 6 τ 6 β, −δ < σ < δ. (6.2.1)
For each fixed σ, τ 7→ H(τ, σ) is a curve in M . We get two vector fields
∂H ∂H a ∂
X= = (6.2.2)
∂τ ∂τ ∂xa
and
∂H ∂H a ∂
Y = = . (6.2.3)
∂σ ∂σ ∂xa
67

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

68 6. COVARIANT DIFFERENTIATION AND CURVATURE

(Here xa = H a (σ, τ ) is the description of the family of curves with respect to the local coordinate
system xa .) Define
Z
1 β
E (σ) = g(X(τ, σ), X(τ, σ)) dτ, (6.2.4)
2 α
the energy of the curve labelled by σ in our family. (This is a small abuse of notation.) We
claim Z β
dE
(0) = − g(∇X X, Y ) dτ + [gq (Xq , Yq ) − gp (Xp , Yp )] (6.2.5)
dσ α
where the integrand is evaluated at σ = 0, q = H(β, 0), p = H(α, 0) and the subscripts on the
terms in square brackets denote evaluation of the quantities at the indicated points.
If you grant me (6.2.5) for the moment, then the lemma follows rapidly. The issue is the
coordinate independence, and the term in square brackets is certainly invariant under coordinate
changes, as is the LHS of (6.2.5). It follows that for any fixed curve and vector fields along the
curve, the integral in (6.2.5) is also coordinate independent.
Now letting α and β vary, we conclude that the integrand g(∇X X, Y ) must be a scalar
quantity, and since Y is a vector field, ∇X X must also be a vector field.
So it remains to prove (6.2.5). This involves going through the calculus of variations with
the boundary term.
Z
1 β
E (σ) = g(∂H(τ, σ)/∂t, ∂H(τ, σ)/∂t), dτ. (6.2.6)
2 α
Differentiate with respect to σ to get
Z β
dE ∂ 2 H ∂H
= g( , ) dτ. (6.2.7)
dσ α ∂σ∂τ ∂τ
The usual calculus-of-variations trick is to integrate by parts here. For this, note
∂ ∂H ∂H ∂ 2 H ∂H ∂H ∂ 2 H
g( , ) = (X c ∂c gab )Y a X b + g( , ) + g( , ) (6.2.8)
∂τ ∂σ ∂τ ∂σ∂τ ∂τ ∂σ ∂τ 2
∂ 2 H ∂H
= g( , ) + g(∇X X, Y ), (6.2.9)
∂σ∂τ ∂τ
using the definition of the Γs. Integration from α to β yields (6.2.5). 
Lemma 6.2.2. The quantity
∇X Y − ∇ Y X (6.2.10)
is a vector field, for any vectors X and Y .
Proof. We have
(∇X Y )c = X a ∂a Y c + Γcab X a Y b . (6.2.11)
Hence the components of (6.2.10) are
X a ∂a Y c − Y a ∂a X c , (6.2.12)
again using Γcab = Γcba .
So it is sufficient to check that this combination of components transforms
as a vector field. Under a change of coordinates, Now if we change
e a = ∂e xa b e a ∂e xa b
X b
X , Y = Y . (6.2.13)
∂x ∂xb
Hence
e a ∂ Ye b = X a ∂a (Y c ∂e xb ∂exb a c a c ∂ x
2 eb
X ) = X ∂ a Y + X Y
xa
∂e ∂xc ∂xc ∂xc ∂xa
If we swith X and Y and subtract, the second term on the RHS drops out because of the
symmetry of the mixed partials of x eb with respect to the x variables. Thus we are left with the
transformation law
e b = ∂e
e a ∂˜a Ye b − Ye a ∂˜a X xb a
X (X ∂a Y c − Y a ∂a X c ). (6.2.14)
∂xc
as required for a vector field. 

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.2. THE COVARIANT DERIVATIVE 69

Now we can complete our proof. We know that ∇X X is a vector field for every vector field
X. Replacing X by X + Y , we conclude that
∇X X + ∇ X Y + ∇Y X + ∇ Y Y
is a vector field for every pair of vector fields X, Y . Hence ∇X Y + ∇Y X is a vector field for
every X and Y . But so is the difference, by the above lemma. Now
1
∇X Y = (∇X Y + ∇Y X + [X, Y ]) (6.2.15)
2
and we have written ∇X Y as a sum of two things that we know are vector fields. So it must
itself be a vector field.
Remark 6.2.3. There is a direct, computational proof, in Woodhouse, GR, §4.5.
6.2.1. The Lie bracket.
Definition 6.2.4. The quantity ∇X Y − ∇Y X is also denoted [X, Y ] and is called the Lie
bracket of the vector fields X and Y .
We proved in Lemma 6.2.2 that [X, Y ] is a vector field by a local coordinate calculation. In
this section we give a more conceptual proof of the same fact. The formula (6.2.12) shows that
the Lie bracket depends only on the vector fields X and Y and not on the metric g. That is,
we can define an operator on functions f :
[X, Y ]f = X(Y f ) − Y (Xf ) (6.2.16)
We aim to show directly that [X, Y ], so defined, is a vector field.
We begin with a lemma:
Lemma 6.2.5. Suppose that T : C ∞ (U ) → C ∞ (U ) is a real-linear map which satisfies the
Leibniz rule
T [ϕψ](p) = ϕ(p)T ]ψ(p) + ψ(p)T ϕ(p).
for all smooth functions ϕ and ψ. Then T is equal to a vector field on U .
Proof. Choose local coordinates xj such that xj = 0 corresponds to the point p. We aim
to show that there is a set of numbers T j such that
∂f
T [f ]p = T j j (0)
∂x
Define
X j = T xj .
By linearity T annihilates constants, for
T [1] = T [12 ] = 2T [1] by the Leibniz rule.
Hence T [1] = 0 and so T [c] = 0 for any constant.
Now let f be any smooth function. We can write
f (x) = f (0) + xj ∂j f (0) + O(|x|2 ).
Applying T , we see
T f (0) = X j (0)∂j f (0)
because, by the Leibniz rule, T applied to a smooth function vanishing to order 2 or more must
vanish. 

Now we have the following slick proof that [X, Y ] is a vector field if X and Y are vector
fields.
Proposition 6.2.6. If X and Y are vector fields, then so is [X, Y ].

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

70 6. COVARIANT DIFFERENTIATION AND CURVATURE

Proof. By the preceding Lemma, it is enough to prove that


[X, Y ](ϕψ) = ϕ[X, Y ]ψ + ψ[X, Y ]ϕ
for any two smooth functions ϕ and ψ. We calculate
XY (ϕψ) = X[ϕY ψ + ψY ϕ] = (Xϕ)(Y ψ) + ϕX(Y ψ) + (Xψ)(Y ϕ) + ψX(Y ϕ).
Similarly,
Y X(ϕψ) = (Y ψ)(Xϕ) + ϕY Xψ + (Y ϕ)(Xψ) + ψY (Xϕ).
Subtracting, we obtain
[X, Y ](ϕψ) = ϕ[X, Y ]ψ + ψ[X, Y ]ϕ.

Example 6.2.7. If we pick coordinates xa
then we have the corresponding (locally defined)
vector fields ∂0 , ∂1 , ∂2 , ∂3 . We have
[∂a , ∂b ] = 0. (6.2.17)
6.2.2. The covariant differential. For functions we had df . This was a covector field.
For vector fields Z we can consider ∇Z. This is a (1, 1) tensor. We denote it in indices by
∇a Z c . The covariant directional derivative with respect to X is then the contraction X a ∇a Z c .
In index form,
∇a Z c = ∂a Z c + Γcab Z b .

6.3. Extension to all tensors


We already know how to differentiate functions with respect to vector fields. From now on
we also denote Xf by ∇X f . We now have the covariant derivative of vector fields as well, ∇X Y .
There is now a natural way also to differentiate covector fields, which respects the natural dual
pairing between vectors and covectors.
Definition 6.3.1. If α is a covector (tensor of type (0, 1)), then we define ∇α so that
h∇X α, Y i + hα, ∇X Y i = Xhα, Y i
for all vector fields Y .
In indices, this reads
(X a ∇a αb )Y b + αb (X a ∇a Y b ) = X a ∇a (αb Y b ).
This determines ∇a αb uniquely. Indeed, using
∇a Y b = ∂a Y b + Γbac Y c .
we get
(∇a αb )Y b + αb (∂a Y b + Γbac Y c ) = (∂a αb )Y b + αb ∂a Y b .
Hence we require, for all Y b ,
(∇a αb )Y b = (∂a αb − Γcab αc )Y b .
As this is true for all Y , we have
Proposition 6.3.2. The covariant derivative operator acting on covector fields is given in
coordinates by
∇a αb = ∂a αb − Γcab αc (6.3.1)
Remark 6.3.3. We say that we have extended ∇ from vector fields to covector fields by
using the Leibniz rule.
Similarly, we use the Leibniz rule to extend from tensors of type (1, 0) and (0, 1) to any
tensor of type (r, s).
Theorem 6.3.4. There is a unique extension of ∇ to act on all tensors so that:
(1) If T is a tensor of type (r, s) then ∇X T is again a tensor of type (r, s); equivalently
∇T is a tensor of type (r, s + 1).

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.4. PROPERTIES 71

(2) ∇X is real-linear:
∇(λT + µS) = λ∇T + µ∇S
for any two tensor of type (r, s) and real numbers λ and µ.
(3) ∇ satisfies the Leibniz rule:
∇X (T ⊗ S) = (∇X T ) ⊗ S + T ⊗ (∇X S)
for any tensors T and S, not necessarily of the same type;
(4) If U is a contraction of tensors T and S, then
∇X U = c(∇X T )S + c(T ∇X S)
where c denotes the contraction.
Remark 6.3.5. I omit the proof, but refer you to the problem set fo related exercises.
It is a pain to write out the general case, but here are some examples:
∇a Tbc = ∂a Tbc − Γsab Tsc − Γsac Tbs ; (6.3.2)
∇a Acb = ∂a Acb − Γsab Acs + Γcas Asb ; (6.3.3)
∇a P bc = ∂a P bc + Γbas P sc + Γcas P bs (6.3.4)

6.4. Properties
The covariant derivative operator we have been discussing has the following further proper-
ties:

Symmetry or Torsion-free:
∇a ∇b f = ∇b ∇a f (6.4.1)
for all functions f

Metric preservation
∇a gbc = 0. (6.4.2)

There are more general differentiation operators which satisfy the Leibniz-rule type prop-
erties of the previous section. These are called connections. Given a metric, there is a unique
such connection which satisfies the two boxed properties here. This is also called the metric
connection or the Levi-Civita connection after Tullio Levi-Civita (1873–1941).
Proof. (Of boxed properties) We have
∇a ∇b f = ∂a ∂b f − Γcab ∂c f. (6.4.3)
But the mixed partials of f are symmetric and Γcab
is symmetric in its lower indices. Hence the
RHS is symmetric and the result follows.
Similarly, we have
∇a gbc = ∂a gbc − Γsab gsc − Γsac gbs . (6.4.4)
Now from the definition of the Γs,
1
Γabc = (∂a gbc + ∂b gac − ∂c gab )
2
and so the result follows. 
The metric preservation property has important consequences. Recall that the metric g can
be used to lower indices and that its inverse can be used to raise indices. Because ∇g = 0 it
follows that ∇g −1 = 0, and raising and lowering indices commutes with differentiation.
For example,
∇a (g bs αs ) = ∇a αb = g bs ∇a αs (6.4.5)
The middle thing is the covariant derivative of the index-raised version of α; the right-hand
thing is the derivative of α, with its index raised afterwards.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

72 6. COVARIANT DIFFERENTIATION AND CURVATURE

Example 6.4.1. If Y is parallel propagated along a curve γ, then its length is constant.
For if γ̇ = X, we have ∇X Y = 0. Then
∇X (g(Y, Y )) = 2g(Y, ∇X Y ) = 0
More explicitly,
∇X (Y a Ya ) = (∇X Y a )Ya + Y a ∇X Ya
And ∇X Y a = 0 implies also that ∇X Ya = 0. So both terms are 0.

6.5. Curvature
From the mathematical point of view, this section is possibly the most important in these
notes.

6.5.1. Component definition.


Theorem 6.5.1. Let X be a vector field. Then
(∇a ∇b − ∇b ∇a )X d = Rabc d X c (6.5.1)
where Rabc d are the components of a tensor of type (1, 3).
Definition 6.5.2. The tensor R with components Rabc d is called the Riemann curvature
tensor (or just curvature tensor) for short.
Proof. We prove first the following formula:
Rabc d = ∂a Γdbc − ∂b Γdac + Γdap Γpbc − Γdbp Γpac (6.5.2)
Although it is important, and you need to know that it exists, I DO NOT RECOMMEND
THAT YOU COMMIT THIS TO MEMORY.
Recall that if Tb c is a (1, 1) tensor, then
∇a Tb d = ∂a Tb d + Γdas Tb s − Γsab Ts d . (6.5.3)
We are going to apply this with Tb d = ∇b X d and then subtract the corresponding equation
with a and b switched. In fact if we do this first, we get
∇a Tb d − ∇b Ta d = ∂a Tb d + Γdas Tb s − ∂b Ta d − Γdbs Ta s (6.5.4)
because Γsab = Γsba .
Now put Tbd = ∇b X d and expand ∇X = ∂X + ΓX. The first two terms on the RHS of
(6.5.4) are
∂a (∂b X d + Γdbs X s ) + Γdas (∂b X s + Γsbp X p ) = ∂a Γdbs X s + Γdas Γsbp X p (6.5.5)
+ ∂a ∂b X d + Γdbs ∂a X s + Γdas ∂b X s . (6.5.6)
We have rearranged the terms in this order because those on line (6.5.6) are symmetric in ab.
Hence these disappear when we subtract the corresponding expression with a and b switched.
Thus we get
Rabc d X c = (∇a ∇b − ∇b ∇a )X d = [∂a Γdbc − ∂b Γdac + Γdap Γpbc − Γdbp Γpac ]X c (6.5.7)
which is equivalent to (6.5.2) as X is an arbitrary vector field.
It remains to show that the components Rabc d transform correctly under a change of coor-
dinates. The very bad way to do this is to try to change coordinates in (6.5.2). We do know,
however, that ∇a ∇b X d is a tensor of type (1, 2) for any given vector field X. It follows that the
LHS of (6.5.1) is a tensor of type (1, 2) as well. If we write
∂xa
Jba = , (6.5.8)
xb
∂e
It follows that
epqr s X
R e r = Jpa Jqb (J −1 )s Rabc d X c , (6.5.9)
d

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.5. CURVATURE 73

where J is the Jacobian of the transformation. But we are assuming that X is a vector field as
well, so
e r.
X c = Jrc X
Inserting this into (6.5.9) we obtain
epqr s X
R e r,
e r = Jpa Jqb Jrc (J −1 )s Rabc d X (6.5.10)
d

e r . Hence
for any components X
epqr s = Jpa Jqb Jrc (J −1 )s Rabc d
R (6.5.11)
d

which is the transformation law for a tensor field of type (1, 3). 

Remark 6.5.3. Cf. §5.6 of Woodhouse, GR.

Example 6.5.4. Curvature of hyperbolic space.


In the previous chapter, we considered the 2-dimensional hyperbolic metric

dx2 + dy 2
ds2 = .
x2
With x = x1 and y = x2 we computed the non-zero Christoffel symbols:
1 1 1
Γ111 = − , Γ122 = , Γ212 = Γ221 = − , others = 0. (6.5.12)
x x x
Using the formula (6.5.2) for curvature, we have

R121 2 = ∂1 Γ221 − ∂2 Γ211 + Γ21p Γp21 − Γ22p Γp11 (6.5.13)

There is quite a bit of simplification because many of the Γs are zero in this case:

R121 2 = ∂1 Γ221 + Γ212 Γ221 − Γ221 Γ111 (6.5.14)

is all that survives. Substituting from (6.5.12), we obtain

R121 2 = ∂x (−x−1 ) + x−2 − x−2 = x−2 . (6.5.15)

It turns out that all other components of the curvature are either ± this one, or are zero. In
particular if either 1 or 2 is repeated three or more times, then the corresponding curvature
component is zero. If 1 and 2 appear precisely twice each, then the corresponding component
is ±x−2 . From (6.5.15) we also have
R1212 = x−4
because g22 = x−2 .

Example 6.5.5. Curvature of flat space in polar coordinates. In 2D polar coordinates, the
Γs are:
1
Γ122 = r, Γ212 = Γ221 = , (6.5.16)
r
all others zero ( r = x1 , θ = x2 ). This time, (6.5.13) reduces to
1 1
R121 2 = ∂1 Γ212 + Γ212 Γ221 = − 2
+ 2 = 0.
r r
This illustrates the fact that curvature is a tensor: we knew this result ahead of time, because
it is clear that the curvature of a flat metric is zero, and we’ve just changed coordinates.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

74 6. COVARIANT DIFFERENTIATION AND CURVATURE

6.5.2. Directional derivative definition. In this section, we pursue an alternative, more


invariant, way of defining the curvature tensor.

Definition 6.5.6. The curvature R of the covariant derivative ∇ is defined, for any three
vector fields X, Y , Z, as follows:

R(X, Y )Z = (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z (6.5.17)

Remark 6.5.7. If X = ∂a and Y = ∂b , then the commutator is zero and we get

R(∂a , ∂b )Z = (∇a ∇b − ∇b ∇a )Z (6.5.18)

and comparing with the previous section, the RHS is Rabc d Z c in components. Thus the compo-
nent version of the curvature tensor arises from this definition by taking X = ∂a and Y = ∂b .

We shall show that the operation

(X, Y, Z) 7−→ (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z (6.5.19)

is C ∞ -linear in each variable, i.e.

(∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )(f Z) = f (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z (6.5.20)

and
(∇f X ∇Y − ∇Y ∇f X − ∇[f X,Y ] )Z = f (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z. (6.5.21)
for any smooth function f . We then prove a technical lemma which explains why being C ∞ -
linear implies that the operation (6.5.19) is then given by a tensor

(X, Y, Z) 7→ R(X, Y )Z or Rabc d X a Y b Z c in components. (6.5.22)

6.5.3. Proof of (6.5.20). We simply compute:

∇X ∇Y (f Z) = ∇X (f ∇Y Z + (Y f )Z).

Continuing,
∇X ∇Y (f Z) = f ∇X ∇Y Z + Xf ∇Y Z + XY f Z + (Y f )∇X Z.
Interchanging X and Y and subtracting, we get

[∇X ∇Y − ∇Y ∇X ](f Z) = f [∇X ∇Y − ∇Y ∇X ]Z + (XY f − Y Xf ) Z.

But
∇[X,Y ] (f Z) = ([X, Y ]f )Z + f ∇[X,Y ] Z
and so subtracting this from each side gives the C ∞ -linearity of Z 7→ R(X, Y )Z for fixed X and
Y.

Remark 6.5.8. Note that all we have used in this proof is that directional derivatives satisfy
the Leibniz rule.

6.5.4. Proof of (6.5.21). This is left as an exercise for the reader. It is somewhat shorter
than the proof of (6.5.20). You will need to the formula

[X, f Y ]u = (Xf )(Y u) + f [X, Y ]u (6.5.23)

which is also a good exercise.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.5. CURVATURE 75

6.5.5. Completion of proof. The operation (6.5.19)

(X, Y, Z) 7−→ (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z

is certainly a differential operator of order 2 in Z and order 1 in X and Y . The next lemma
looks complicated but it just says that a differential operator of order 2 which is also C ∞ -linear
must actually be algebraic, i.e. given by multiplication by multiplication by a tensor field.

Lemma 6.5.9. Let P be a second-order differential operator which maps vector fields to vector
fields. Suppose further that P is C ∞ -linear, that is

P (f Z) = f P (Z) (6.5.24)

for any smooth function f . Then in fact P is given by a (1, 1) tensor in the sense that

P (Z)c = Pac Z a . (6.5.25)

Proof. Since P is a second-order operator, in local coordinates we can write it as

P (Z) = Abcd a cd a d a
a ∂ b ∂ c Z + Ba ∂ c Z + C a Z .

We compute P (f Z). We have

∂b ∂c (f Z a ) = ∂b ∂c f Z a + (∂b f ∂c Z a + ∂c f ∂b Z a ) + f ∂b ∂c Z a .

Similarly
∂c (f Z a ) = ∂c f Z a + f ∂c Z a
Hence,

P (f Z) = f P (Z) + Abcd
a (∂b ∂c f )Z
a
(6.5.26)
+ Abcd
a (∂b f ∂c Z
a a
+ ∂ c f ∂b Z ) + Bacd ∂c f Z a . (6.5.27)

Because P (f Z) = f P Z, we obtain

Abcd a a a cd a
a (∂b ∂c f Z + ∂b f ∂c Z + ∂c f ∂b Z ) + Ba ∂c f Z = 0 (6.5.28)

for any f and Z. If we pick f to be a product xp xq of two coordinate functions, substitute into
this and then set x = 0, the only surviving term is

Abcd p q a pqd a
a (∂b ∂c (x x ))Z = 2Aa Z .

and this is supposed to vanish. This being true for all vectors Z, we conclude that Apqd
a vanishes
at x = 0. This point was arbitrary, so Apqd
a vanishes everywhere, and (6.5.28) now reads

Bacd ∂c f Z a = 0 (6.5.29)

for all f and Z. We apply the same argument with f = xp and this gives Bapd = 0 at x = 0.
Again the point was arbitrary, so it follows that B vanishes everywhere. Hence P (f Z) = f P Z
implies that (P Z)d = Cad Z a , as required. 

Applying this Lemma to our map (6.5.19), first as an operator on Z, with (X, Y ) fixed, then
in X, with (Y, Z) fixed, we see that R(X, Y )Z is indeed given by Rabc d X a Y b Z c where Rabc d are
the components of a tensor of type (1, 3).

Remark 6.5.10. This lemma is a bit fiddly to prove, but the idea that a differential operator
which is also C ∞ -linear has to be given algebraically (by multiplication by a tensor) is a powerful
one in differential geometry which saves a lot of computation in local coordinates.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

76 6. COVARIANT DIFFERENTIATION AND CURVATURE

6.6. Curvature at a point


Cf. Woodhouse, GR, §5.7. Recall from Theorem 5.5.2 that at any point p, inertial coordi-
nates can be chosen so that
gab (x) = ηab + O(|x|2 ), (x → 0). (6.6.1)
The formula (6.5.2) for curvature simplifies at such a point:
Proposition 6.6.1. In inertial coordinates xa which satisfy (6.6.1),
1
Rabcd (0) = (∂a ∂c gbd (0) + ∂b ∂d gac (0) − ∂a ∂d gbc (0) − ∂b ∂c gad (0)) (6.6.2)
2
Proof. This follows because in such coordinates all first derivatives of gbc and g bc vanish
at x = 0, so the Γs vanish at x = 0. Hence the quadratic terms in (6.5.2) vanish at p, giving
Rabc d = ∂a Γdbc − ∂b Γdac at x = 0 (6.6.3)
and
1
∂a Γdbc = g ds (∂a ∂b gcs + ∂a ∂c gbs − ∂a ∂s gbc ) at x = 0, (6.6.4)
2
so
1
gds ∂a Γsbc =
(∂a ∂b gcd + ∂a ∂c gbd − ∂a ∂d gbc ) at x = 0. (6.6.5)
2
Formula (6.6.2) follows from this and (6.6.3). 
Example 6.6.2. Consider the two-dimensional lorentzian metric
ds2 = dt2 − (1 + t2 )dx2 . (6.6.6)
If t = x0 and x = x1 , then by the Proposition,
1 1
R0101 = (∂0 ∂0 g11 + ∂1 ∂1 g00 − ∂0 ∂1 g01 − ∂1 ∂0 g10 ) = (−2) = −1. (6.6.7)
2 2
So the curvature is non-zero at the point x = 0, t = 0.
Theorem 6.6.3. If Rabc d 6= 0 at a point, then there is no choice of coordinates centred at
that point with respect to which gab = ηab near the point.
Proof. Suppose that there is such a choice of coordinates. Then the Γs all vanish, ∇a = ∂a
and curvature = 0. Contradiction. 
Remark 6.6.4. There is a converse statement which we shall not prove: if R = 0 in a small
neighbourhood of a point p, then there are local coordinates centred at p with respect to which
gab = ηab .
6.6.1. Commutators on tensors of higher rank. Recall that the covariant derivative
on vectors has been extended to act on all tensors in a way compatible with the basic algebraic
operations on tensors. This implies that when the commutator ∇a ∇b − ∇b ∇a is applied to any
tensor, the result can also be expressed in terms of the algebraic operation of the curvature on
the tensor.
There are good ways to derive and remember these results and bad ways to do it. The worst
way is to work directly with the Γs.
Proposition 6.6.5. If α is a covector, then
(∇a ∇b − ∇b ∇a )αc = −Rabc d αd . (6.6.8)
Proof. One proof is to use the fact that the ∇a preserves the metric and a symmetry of
the curvature tensor proved below. If αb = g ab αa , then we have
(∇a ∇b − ∇b ∇a )αd = Rabc d αc . (6.6.9)
If we lower the index, we get
(∇a ∇b − ∇b ∇a )αd = Rabcd αc (6.6.10)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.6. CURVATURE AT A POINT 77

Now, as proved below Rabcd = −Rabdc . So we can rearrange the indices to get
(∇a ∇b − ∇b ∇a )αd = −Rabdc αc = −Rabd c αc (6.6.11)
A second proof runs as follows. Let X be any vector. Then αa X a is a function and we know

(∇a ∇b − ∇b ∇a )(αd X d ) = (∂a ∂b − ∂b ∂a )(αd X d ) = 0. (6.6.12)


On the other hand, by the Leibniz rule

∇a ∇b (αd X d ) = (∇a ∇b αd )X d + (∇a αd ∇b X d + ∇a X d ∇b αd + αd ∇a ∇b X d . (6.6.13)


The terms in the first derivatives of α and X are symmetric in ab, so subtracting the corre-
sponding expression with a and b switched, we obtain

0 = (∇a ∇b − ∇b ∇a )(αd X d ) = X d (∇a ∇b − ∇b ∇a )αd + αd (∇a ∇b − ∇b ∇a )X d , (6.6.14)

where in the last equation we have switched the two dummy indices c and d so as to have X d
on each side. Hence
X d (∇a ∇b − ∇b ∇a )αd = −αd Rabc d X c = −X d Rabd c αc . (6.6.15)
We now make the usual argument that as X is arbitrary, this equation implies the result. 

Similarly, we obtain results such as

(∇a ∇b − ∇b ∇a )T cd = Rabs c T sd + Rabs d T cs , (6.6.16)


(∇a ∇b − ∇b ∇a )Adc
= −Rabc Ads + Rabs d Asc ,
s
(6.6.17)
(∇a ∇b − ∇b ∇a )Scd = −Rabc s Ssd − Rabd s Scs . (6.6.18)
For a tensor T of type (r, s) the structure of the formula on the RHS will be a sum of r + s
all of the form RT ; there will be r terms with + signs, corresponding to the upstairs indices of
T and s terms with − signs, corresponding to the lower indices of T .

6.6.2. Symmetries of the curvature tensor. The formula for the curvature at a point
in Proposition 6.6.1 allows us to write down the general symmetry properties of Rabcd .
Theorem 6.6.6. For any metric, the curvature tensor has the following symmetries:
Rabcd = −Rbacd , Rabcd = −Rabdc ; (6.6.19)

Rabcd + Rbcad + Rcabd = 0. (6.6.20)


And the interchange symmetry
Rabcd = Rcdab . (6.6.21)
Proof. All follow by inspection of the formula in Proposition 6.6.1 and the fact that
∂a ∂b f = ∂b ∂a f for any smooth function f . 
Theorem 6.6.7. In n dimensions, the number of algebraically independent components of
Rabcd is
1 2 2
n (n − 1). (6.6.22)
12
Proof. The number of independent components of a skew two-index tensor in n dimensions
is n(n − 1)/2. As at the end of the previous chapter, taking into account (6.6.19) and (6.6.21),
the number of independent components of a tensor with just those symmetries is
1
n(n − 1)(n2 − n + 2). (6.6.23)
8

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

78 6. COVARIANT DIFFERENTIATION AND CURVATURE

It turns out that (6.6.20) is only independent of the other two if all four indices are distinct.
So this imposes another n-choose-4 conditions on the components. Hence the number of inde-
pendent components is
1 n(n − 1)(n − 2)(n − 3)
n(n − 1)(n2 − n + 2) −
8 24
1 
= n(n − 1) 3n2 − 3n + 6 − (n − 2)(n − 3)
24
1 2 2
= n (n − 1). (6.6.24)
12


If n = 2 this is equal to 1, and the curvature of a 2D metric is completely determined by


R1212 .
If n = 3 we have 6 components and if n = 4 we have 20 components.
The curvature is precisely the ‘coordinate independent part’ of the second-order part of the
metric in inertial coordinates.
Another important general property of the curvature is the Bianchi identity
Theorem 6.6.8. For any metric, the curvature tensor satisfies
∇a Rbcde + ∇b Rcade + ∇c Rabde = 0. (6.6.25)
Proof. A very bad way to try to do this would be from the explicit formula for R in terms
of the Γs. Instead we use the definition
(∇b ∇c − ∇c ∇b )αd = −Rbcd s αs . (6.6.26)
Apply ∇a to this to get
(∇a ∇b ∇c − ∇a ∇c ∇b )αd = −∇a Rbcd s αs − Rbcd s ∇a αs (6.6.27)
Now skew symmetrise on abc, in other words, add this expression to what you get by cyclically
permuting the indices a, b and c. On the LHS we have
∇a (∇b ∇c − ∇c ∇b )αd + cyclic perms = (∇a ∇b − ∇b ∇a )∇c αd + cyclic perms (6.6.28)
Using this rebracketing, the LHS of (6.6.28) is
− Rabc s ∇s αd − Rabd s ∇c αs + cyclic perms = −Rabd s ∇c αs + cyclic perms (6.6.29)
by the symmetry (6.6.20) of R. The RHS of (6.6.27), skew symmetrized, is
− ∇a Rbcd s αs − Rbcd s ∇a αs + cyclic perms. (6.6.30)
The ∇α terms from this cancel exactly with those on the RHS of (6.6.29), leaving us with
0 = (∇a Rbcd s + ∇b Rcad s + ∇c Rabd s )αs (6.6.31)
As αs was arbitrary, we obtain (6.6.25). 

6.6.3. Alternative proof of Bianchi identity. (Cf. Woodhouse, GR, §5.7). If we choose
coordinates such that Γ = 0 at x = 0, we have, from (6.5.2),
∇a Rbcd e = ∂a ∂b Γecd − ∂a ∂c Γebd + terms of the form Γ∂Γ, ΓΓΓ (6.6.32)
so
∇a Rbcd e = ∂a ∂b Γecd − ∂a ∂c Γebd at x = 0. (6.6.33)
Summing over the cylic permutations of a, b, c, the terms on the RHS cancel out, showing that
the Bianchi identity holds at x = 0. But the LHS is a tensor equation and the point is arbitrary,
so we have obtained the Bianchi identity.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.7. RICCI AND SCALAR CURVATURE 79

6.7. Ricci and scalar curvature


Definition 6.7.1. The Ricci1 tensor, which I shall denote by r or rab is defined as a con-
traction of the Riemann tensor:
rac = Rabc b . (6.7.1)
It is thus a tensor of type (0, 2).
The symmetries of R imply that
Rabc c = 0 and Rdbc d = −rbc .
Also
rca = Rcba b = (−Racb b − Rbac b ) = Rabc b = rac
so the Ricci is a symmetric 2-tensor.
Definition 6.7.2. The scalar curvature, which I shall denote by s, is defined by contracting
the Ricci,
s = g ab rab (6.7.2)
The scalar2 curvature is, as the name implies, a scalar quantity.
Remark 6.7.3. Many texts denote the Ricci tensor by Rab and the scalar curvature by R
(and call it the Ricci scalar). You have been warned.
Theorem 6.7.4. The Riemann curvature, Ricci curvature and scalar curvature are related
by the following identities
∇a rbd − ∇b rad = −∇e Rabde (6.7.3)
and
1
∇b rab = ∇a s. (6.7.4)
2
Proof. Starting from the Bianchi identity (6.6.25), we multiply by g ce (and sum over b and
e). Taking into account the symmetries of R, we obtain (6.7.3). Similarly, from this equation
multiply by g bd (and sum). This yields (6.7.4). 
Remark 6.7.5. The two-dimensional hyperbolic metric has the property that its Ricci cur-
vature is proportional to the metric. Indeed,
R121 2 = 1/x2 so r11 = 1/x2
because R111 1 = 0. Similarly
R2121 = 1/x2 and so r22 = 1/x2 .
Also
r12 = R112 1 + R122 2 = 0
Thus rij = gij for this particular metric.
Definition 6.7.6. A metric such that rab = λgab is called an Einstein metric.
Proposition 6.7.7. If gab is an Einstein metric, then the proportionality factor λ, which a
priori is an arbitrary function, must be a constant. The scalar curvature of an Einstein metric,
being λ/4 (in 4 dimensions) is also automatically constant.
Proof. From the Einstein equation,
s = g ab rab = λg ab gab = 4λ.
Hence rab = 4s gab . Now apply (6.7.4):
1 1 1
∇a s = ∇b rab = ∇b (sgab ) = ∇a s (6.7.5)
2 4 4
because ∇ commutes with g. Hence ∇a s = 0 and s is constant. 
1Named after Gregorio Ricci-Curbastro, 1853–1925
2Not named after a mathematician called scalar

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

80 6. COVARIANT DIFFERENTIATION AND CURVATURE

6.8. Relative acceleration and geodesic deviation


[Cf. Woodhouse, GR, §5.8.]
What is the meaning of curvature? We have already said something about it: non-zero
curvature at a point will manifest itself in unkillable second-order terms in the Taylor expansion
of the metric at that point, no matter what coordinates are used. And we have seen that
curvature is a tensor, so if it is zero in one coordinate system it is zero in any coordinate system
(and conversely if it is non-zero) and that the curvature of euclidean metric and the Minkowski
metric are both zero.
In this section we shall relate curvature to the subtle concept of relative acceleration of
nearby freely falling particles. We take the time to set this up carefully.
What is a good way to think of a family of curves (which will later be geodesics)? It is
a map H from the rectangle (τ0 , τ1 ) × (−δ, δ) into M , the idea being that for fixed σ in the
interval (−δ, δ), τ 7→ H(τ, σ) is a curve in M , and σ labels the different curves. Such a map H
is called a 1-parameter family of curves in M .
Example 6.8.1. We can view plane polar coordinates as giving us a 1-parameter family of
curves in R2 . In fact it gives us two such families. We have
H(τ, σ) = (τ cos σ, τ sin σ).
For each fixed σ, this is the straight line through the origin, inclined at angle σ to the x-axis.
So different values of σ give us different curves in our family. Switching the roles of τ and σ,
we get
e σ) = (σ cos τ, σ sin τ ).
H(τ,
This time, fixing σ, the curve τ 7→ H(τ,e σ) is a circle of radius σ. Varying σ gives curves of
different radii, all centred at the origin.
We now assume that H gives a 1-parameter family of geodesics, that is
τ 7→ H(τ, σ) is a geodesic (6.8.1)
for each fixed σ. To be definite, imagine that the curve H(τ, 0) is Alice’s worldline, and we may
as well suppose that is is parameterized by proper time. Then her velocity 4-vector is
∂H
(τ, 0)
∂τ
We suppose that Bob’s worldline is the nearby geodesic τ 7→ H(τ, η), where η is very small. To
first order, then, Bob’s worldline is
∂H
τ 7→ H(τ, 0) + (τ, 0) η.
∂σ
The vector Y = ∂H/∂σ is called the connecting vector as it connects events on Alice’s worldline
to events on Bob’s worldline.
Definition 6.8.2. The relative acceleration of the family of geodesics, as measured by Alice,
is the vector field ∇2X Y along her worldline τ 7→ H(τ, 0), where X and Y are as above.
Theorem 6.8.3. We have
∇2X Y = R(X, Y )X. (6.8.2)
Remark 6.8.4. Equation (6.8.2) is called the geodesic deviation equation.
Proof. We note first that with X and Y as above,
∇X Y = ∇Y X. (6.8.3)
From the proof of Lemma 6.2.2, this can be computed in local coordinates as
Y a ∂a X b − X a ∂a Y b . (6.8.4)
Now if we make H correspond to xa = H a (τ, σ), then
∂2H b
Y a ∂a X b = (6.8.5)
∂σ∂τ

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.9. COMPARISON WITH THE NEWTONIAN THEORY 81

and
∂2H b
X a ∂a Y b =. (6.8.6)
∂τ ∂σ
Hence (6.8.3) follows from the symmetry of the mixed partials of H.
Let X = ∂H/∂τ be the tangent vector field of the geodesic for all fixed values of σ. Then
∇X X = 0 (6.8.7)
and so
∇Y ∇X X = 0. (6.8.8)
(This would not be true if the neighbouring curves were not geodesics). On the other hand, by
Definition 6.5.6,
(∇X ∇Y − ∇Y ∇X )Z = R(X, Y )Z (6.8.9)
for any vector Z, since [X, Y ] = 0. Putting Z = X,
∇X ∇Y X = R(X, Y )X. (6.8.10)
Using (6.8.3) once more, we get the result. 

6.9. Comparison with the newtonian theory


In Newton’s theory, there is a potential function ϕ such that
∇2 ϕ = 4πρ, (Poisson’s equation) (6.9.1)
in units where the gravitational constant G is 1, where ρ is the mass density. The equation of
motion of a particle in the gravitational field due to the mass density ρ is
ẍ = −∇ϕ. (6.9.2)
This is an absolute statement. For comparision with relativity we need the corresponding
statement about relative acceleration. Thus we consider a set-up similar to that of the previous
section, where we have a 1-parameter family x(t, s), each of which satisfies (6.9.2) for fixed s:
∂2
x(t, s) = −∇ϕ. (6.9.3)
∂t2
The relative acceleration is obtained by differentiating with respect to s:
∂2 ∂ ∂
2
x(t, s) = − ∇ϕ. (6.9.4)
∂t ∂s ∂s
If ∂/∂s = y j ∂j , then we get
∂ 2yj
= −y i ∂i ∂ j ϕ. (6.9.5)
∂t2
This is the formula for relative acceleration in Newton’s theory.
In order to match it up with the GR version, suppose that xa are coordinates which are
inertial at a point xa = 0 and that, as in the previous section, we have a 1-parameter family
H(τ, σ) of timelike geodesics. We assume that H(0, 0) = 0 so that at proper time τ = 0, Alice is
at the event x = 0. We may assume further that her 4-velocity vector at that event is standard,
∂H a
(0, 0) = (1, 0).
∂τ
With this choice, x0 should be equated (approximately) with the newtonian time-variable t. If
we assume that Y (0, 0) is orthogonal to X, then
Y (0, 0) = (0, y). (6.9.6)
This choice corresponds physically to Alice choosing to connect the event τ = 0 on her worldine
with the event on Bob’s worldline that she judges also to happen at τ = 0. Then the geodesic
equation (6.8.2)
∇2X Y = R(X, Y )X or ∇2X Y d = Rabc d X a Y b X c (6.9.7)
translates as follows: the LHS should be (0, ÿ) at τ = σ = 0, while for the RHS
Rabc d X a X c = R0c0 d . (6.9.8)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

82 6. COVARIANT DIFFERENTIATION AND CURVATURE

Now by the symmetries of R, if we put c = 0 or d = 0 in this we get 0, in other words we can


write the RHS, which is a 4 × 4 matrix, in block form where
 
d 0 0
R0c0 = .
0 R0i0 j
where the 3 × 3 matrix R0i0 j is symmetric in i and j (after lowering an index). Hence (6.9.7)
reduces to the 3-dimensional equation
ÿ j = R0i0 j y i . (6.9.9)
Comparison with (6.9.5) suggests that

The components R0i0 j of the Riemann curvature tensor should be equated, to leading
order, with −∂i ∂ j ϕ, where ϕ is the newtonian potential, as observed by an observer with
4-velocity vector with components (1, 0) in inertial coordinates.

Thus we are led to:


Hypothesis 6.9.1. If an observer in GR has velocity 4-vector V , and the gravitational field
is weak and slowly varying, then she reckons that
Rabc d V a V c
should be identified with −∂i ∂ j ϕ, where ϕ is the gravitational potential she observes.
This leads to a suggestion for the GR analogue of Poisson’s equation. Actually I’ll discuss
this only in the case that there is no matter, i.e. ∇2 ϕ = 0. This translates into
R0i0 i = 0 summation over i from 1 to 3.
Since R000 0 = 0 by symmetry, this is the same as
r00 = 0 (6.9.10)
by definition of the Ricci tensor. Hence we are led to Einstein’s vacuum equations:
Hypothesis 6.9.2. In empty space, the space-time metric g is such that
rab = 0. (6.9.11)
Remark 6.9.3. In going from rac X a X c = 0 for all timelike X to rac = 0 there is something
to prove. We can argue that if rac X a X c = 0 for all timelike X, then differentiation with respect
to X yields rac = 0.
Remark 6.9.4. Tidal forces The relative acceleration (in either GR or newtonian gravity)
often goes under the heading of ‘tidal forces’: for example, if you have the misfortune to be freely
falling feet-first, towards a black hole, then the attractive force on your feet will be stronger
than on your head and this will translate into an eventually unbearable stretching effect. The
bulges of water on either side of the earth due to the non-uniform gravitational field of the moon
is the more classical example of tidal forces.

6.10. Weak field limit


We can get another angle on the interplay between full GR and newtonian gravity by
considering the so-called weak field limit. This is the study of a lorentzian metric g = η + h
on R4 , where η is the Minkowski metric and h is a small, slowly varying perturbation. We
neglect terms quadratic and higher in h and linear in ∂0 h. In this section we shall compute the
curvature and the geodesics in this approximation and match the up with ẍ = −∇ϕ. All this
is a very good exercise in understanding the material presented in this chapter.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

6.11. PHYSICAL DIFFERENTIAL EQUATIONS 83

Lemma 6.10.1. If
gab = ηab + hab
then
1
g ab = η ab − hab + O(h2 ) and Γcab = η cs (∂a hbs + ∂b has − ∂s hab )
2
Knowing the Γs we can compute the slow geodesics:
Lemma 6.10.2. If xa (τ ) is a slow geodesic, we may assume τ = x0 so that the velocity vector
is
ẋa = (1, ẋ) (6.10.1)
Then geodesic equations are
1 1
ẍc + η cs (∂0 h0s − ∂s h00 ) = 0, i.e. ẍc − η cs ∂s h00 = 0, (6.10.2)
2 2
neglecting ∂0 -derivatives of h. Thus within our approximation
1
ẍ0 = 0, ẍj = η js ∂s h00 . (6.10.3)
2
This already suggests, since ẍj = −∇ϕ, that we should identify
1
ϕ ≃ h00 up to an additive constant. (6.10.4)
2
NB this is in a particular inertial frame of reference.
Lemma 6.10.3. In our approximation
1 
Rabc d = ∂a ∂c hdb + ∂b ∂ d hac − ∂b ∂c hda − ∂a ∂ d hbc (6.10.5)
2
In particular,
1 
rac = ∂a ∂c hbb + ∂b ∂ b hac − ∂b ∂c hba − ∂a ∂ b hbc . (6.10.6)
2
If we consider the 00 component of this and neglect the ∂0 hab terms, then
1 1
r00 = ∂b ∂ b h00 = − ∇2 h00 . (6.10.7)
2 2
This leads, once again, to the proposal that the newtonian empty space postulate (∇2 ϕ = 0
should translate into rab = 0 (since we’ve already seen that h00 /2 should be identified (up to a
constant) with the newtonian potential.

6.11. Physical differential equations


In SR, the electromagnetic field is described by a skew 2-tensor F ab . In an inertial frame,
the components are  
0 −E1 −E2 −E3
 E1 0 −B3 B2 
F ab =  E2 B 3
 (6.11.1)
0 −B1 
E3 −B2 B1 0
It can be verified that Maxwell’s equations in vacuum are equivalent to the system
∂a F ab = 0, ∂a Fbc + ∂b Fca + ∂c Fab = 0. (6.11.2)
There is a natural generalization of this to a general space time (M , g). We assume that the
electromagnetic field is given by a skew tensor of type (2, 0), F , with components F ab , on M ,
satisfying the equations
∇a F ab = 0, ∇a Fbc + ∇b Fca + ∇c Fab = 0. (6.11.3)
Similarly, a natural replacement for the wave operator ∂a ∂a
in Minkowski space is the
operator ✷ = ∇a ∇a on (M , g). A function is said to satisfy the wave equation on M if ✷u = 0.
In local coordinates,
✷u = ∇a ∇a u = ∇a ∂a u = g ab ∇a ∂b u = g ab ∂a ∂b u − g ab Γcab ∂c u (6.11.4)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

84 6. COVARIANT DIFFERENTIATION AND CURVATURE

In Minkowski space, recall that Maxwell’s equations imply that each of the components Ei ,
Bi , of the electric and magnetic fields satisfies the flat-space wave equation, ✷Ei = 0 = BoxBi .
In general M , we have:
Proposition 6.11.1. If Fab = −Fab satisfies Maxwell’s equations (6.11.3) in (M , g), then
Fab satisfies a modified wave equation:
∇a ∇a Fbc = Rbcas F sa − rba Fca − rca Fab (6.11.5)
Proof. See the Problem set. 

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

CHAPTER 7

The Schwarzschild metric and black holes

In this Chapter we shall give a first example of a non-trivial solution of Einstein’s vacuum
equations, rab = 0. This is the Schwarzschild metric. It is spherically symmetric and is the GR
version of the newtonian potential −m/r due to a mass m at the origin.
We shall study the timelike and null geodesics in the Schwarzschild metric. To a first
approximation, there are timelike geodesics which correspond to closed elliptical orbits, as in
Newton’s theory. When GR corrections are taken into account, it will be seen that the orbits
do not close up: this is the famous precession of the perihelion1 which is measurable in the case
of the orbit of Mercury. This was one of the first observational verifications of GR.
Null geodesics represent the paths taken by light-rays, and we shall see that there is a
bending effect. This effect was observed by Eddington during the solar eclipse of 1919 (and is
responsible for gravitational lensing). This effect provides another observational verification for
GR.
The Minkowski metric is defined initially in a set r > 2m in units with G = c = 1. However,
it turns out that this is a defect of the coordinates rather than of the metric itself. After a
change of coordinates the metric continues through this surface. This surface is now intepreted
as the event horizon of a black hole.

7.1. Spherically symmetric, static metrics


A function on R3 is spherically symmetric if it is a function only of r, the distance from
the origin. It is natural to study such things using spherical polars. We have seen that the flat
metric in R3 in spherical polars is

ds2 = dr2 + r2 (dθ2 + sin2 θdϕ2 ), (7.1.1)

where θ is the colatitude (i.e. latitude, but measured from the north pole rather than the
equator) and ϕ is longitude.
The Minkowski metric in these coordinates is

ds2η = dt2 − dr2 − r2 (dθ2 + sin2 θdϕ2 ). (7.1.2)

Let us write
dω 2 = dθ2 + sin2 θ dϕ2 (7.1.3)
which is the round metric on the unit sphere x2 + y 2 + z 2 = 1 in R3 . This will save writing later.
A spherically symmetric, static2 metric is obtained from this by introducing functions of r
as coefficients
ds2 = A(r)dt2 − B(r)dr2 − C(r)r2 dω 2 . (7.1.4)
where we require A > 0, B > 0 and C > 0 in the region of interest. The dependence of these
functions only on r encodes the spherical symmetry of the metric and also its time-invariance.
One can, of course, consider more general metric forms, but that is beyond the scope of this
course.

1closest point to the sun


2i.e. with coefficients independent of t

85

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

86 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

7.2. Schwarzschild
The Schwarzschild metric3 is an idealization of Einstein’s vacuum equations which describes
the field due to a point mass or perhaps outside a star. It is the general-relativistic analogue of
the newtonian potential −m/r which is the potential describing the gravitational field due to a
point-mass m in Newton’s theory of gravitation.
We shall not derive this in full. We start with the form (7.1.4). This should approach the
Minkowski metric when r is large, so we assume
A(r) ∼ 1, B(r) ∼ 1, C(r) ∼ 1 as r → ∞. (7.2.1)
Proposition 7.2.1. By a change of r variables, ρ = f (r), (7.1.4) can be made to take the
form
ds2 = α(ρ)dt2 − β(ρ)dρ2 − ρ2 dσ 2 . (7.2.2)
p
Proof. If we define ρ = C(r)r, then we shall get the coefficient of dσ 2 correct. Since
C > 0 this is certainly invertible for large enough r, so we define
α(ρ) = A(r(ρ)).
For the dr term,
 2
dr
B(r)dr2 = B(r) dρ2

so
 2
dr
β(ρ) = B(r(ρ)) .

This completes the proof. 

We use this proposition, then rechristen ρ as r. So we may as well look at metrics in the
slightly simpler form
A(r)dt2 − B(r)dr2 − r2 dω 2 (7.2.3)
We saw in the previous chapter that in the weak field limit, the component g00 should be
matched with twice the newtonian potential computed by an observer with 4-vector (1, 0), up
to an additive constant. So the simplest possible guess for A(r) is 1 − 2m/r: the value 1 comes
from the required asymptotic form of the metric.
It turns out that there is a choice of B which then gives a metric which satisfies Einstein’s
equations where the metric is defined:
Theorem 7.2.2. The Schwarzschild metric
ds2 = (1 − 2m/r)dt2 − (1 − 2m/r)−1 dr2 − r2 dσ 2 , (r > 2m) (7.2.4)
satisfies rab = 0.
We shall not prove this in full. You can see most of the details in Woodhouse.
We shall, however, record the geodesic equations and the Γs for this metric
Proposition 7.2.3. The geodesic equations for the metric (7.1.4) are
A′ d 
ẗ + ṫṙ = 0 or Aṫ = 0
A dτ
A′ 2 B ′ 2 r r
r̈ + ṫ + ṙ − θ̇2 − sin2 θϕ̇2 = 0
2B 2B B B
θ̈ + (2/r)θ̇ṙ − sin θ cos θϕ̇2 = 0
2 d 2 2 
ϕ̈ + ṙϕ̇ + 2 cot θθ̇ϕ̇ = 0 or r sin θϕ̇ = 0.
r dτ
3Named in honour of Karl Schwarzschild, 1873–1916

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.2. SCHWARZSCHILD 87

Proof. We know the drill by now...


The Lagrangian for the geodesics is
1 2 
L= Aṫ − B ṙ2 − r2 (θ̇2 + sin2 θϕ̇2 ) . (7.2.5)
2
Hence
∂L ∂L
= Aṫ =0
∂ ṫ ∂t
∂L ∂L 1 ′ 2 
= −B ṙ = A ṫ − B ′ ṙ2 − 2r(θ̇2 + sin2 θϕ̇2 ) .
∂ ṙ ∂r 2
∂L ∂L
= −r2 θ̇ = −r2 sin θ cos θϕ̇2
∂ θ̇ ∂θ
∂L ∂L
= −r2 sin2 θϕ̇ =0
∂ ϕ̇ ∂ϕ
Here dot is differentiation with respect to τ , prime is differentiation with respect to r. Next,
d
(Aṫ) = A(ẗ + (A′ /A)ṙṫ)
du
d
(−B ṙ) = −B(r̈ + (B ′ /B)ṙ2 )
du
d
(−r2 θ̇) = −r2 (θ̈ + (2/r)ṙθ̇)
du
d
(−r2 sin2 θϕ̇) = −r2 sin2 θ(ϕ̈ + (2/r)ṙϕ̇ + 2 cot θ θ̇ϕ̇)
du
and combining these with the previous calculations we get the equations of the Proposition. 
Proposition 7.2.4. The non-zero Christoffel symbols for the metric (7.1.4) are as follows:
A′
Γ001 = Γ010 = ;
2A
A′ B′ r r sin2 θ
Γ100 = Γ111 = Γ122 = − Γ133 = −
2B 2B B B
1
Γ212 = Γ221 = Γ233 = − sin θ cos θ
r
1
Γ313 = Γ331 = Γ323 = Γ332 = cot θ.
r
Proof. These are read off from the geodesic equations in the usual way. 
7.2.1. Curvature computations. Recall:
Rabc d = ∂a Γdbc − ∂b Γdac + Γdap Γpbc − Γdbp Γpac
and
rac = Rabc b = Ra0c 0 + Ra1c 1 + Ra2c 2 + Ra3c 3 . (7.2.6)
To give a flavour of these calculations, let us compute Ra0c 0 . We shall show:
A′′ (A′ )2 A′ B ′
R101 0 = − − (7.2.7)
2A 4A2 4AB
and
Ra0c 0 = 0 for all other ac. (7.2.8)
We shall also compute R121 2 and R131 3 and obtain
A′′ (A′ )2 A′ B ′ B′
r11 = − − − (7.2.9)
2A 4A2 4AB rB
by putting a = c = 1 in (7.2.6).
We have
Ra0c 0 = ∂a Γ00c − ∂0 Γ0ac + Γ0as Γs0c − Γ00s Γsac .

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

88 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

We may as well assume a 6 c because this is symmetric in ac.


Now ∂0 = ∂t and none of the Γ depends upon t. Also,
Γ001 = Γ010 = A′ /2A, Γ0ab = 0 all other ab.
So
A′ 0
Ra0c 0 = ∂a Γ00c + Γ0as Γs0c − Γ .
2A ac
We know R00c d = 0, so start with a = 1
A′ 1
R10c 0 = ∂1 Γ00c + Γ01s Γs0c − Γ .
2A 1c
If c = 2, 3 every term is zero, because the only non-vanishing Γ1ab is
B′
Γ111 = .
2B
So
A′ 1
R101 0 = ∂1 Γ001 + Γ01s Γs01 − Γ
2A 11
 ′ ′
A A′ B ′
= + (Γ010 )2 −
2A 4AB
A ′′ ′
(A ) 2 AB′ ′
= − 2

2A 4A 4AB
and
Ra0c 0 = 0 for all other ac.
We have
R121 2 = ∂1 Γ221 − ∂2 Γ211 + Γ21s Γs21 − Γ22s Γs11
Looking at the non-vanishing Γs, this comes down to
1
R121 2 = − + Γ212 Γ221 − Γ221 Γ111
r2
1 1 B′
= − 2+ 2−
r r 2Br
B ′
= − .
2Br
Similarly
R131 3 = ∂1 Γ331 − ∂3 Γ311 + Γ31s Γs13 − Γ33s Γs11
1
= − 2 + Γ313 Γ313 − Γ331 Γ111
r
B′
= −
2rB
because Γ313 = 1/r and Γ111 = B ′ /2B.
Hence
r11 = R101 0 + R111 1 + R121 2 + R131 3
= R101 0 + R121 2 + R131 3
A′′ (A′ )2 A′ B ′ B′
= − − − .
2A 4A2 4AB rB
Similar computations yield

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.3. PHYSICAL CONSEQUENCES 89

Proposition 7.2.5. The non-vanishing components of the Ricci tensor of the spherically
symmetric metric (7.1.4) are
A′′ A′ B ′ A′2 A′
r00 = − + + − ,
2B 4B 2 4AB rB
A′′ (A′ )2 A′ B ′ B′
r11 = − − − ,
2A 4A2 4AB rB
rA′ rB ′ 1
r22 = − + −1
2AB 2B 2 B
r33 = sin2 θr22 .
Now we can verify that the Schwarzschild metric of Theorem 7.2.2 does indeed satisfy the
Einstein vacuum equations rab = 0.
Eliminating the A′′ terms between r00 and r11 gives AB ′ + BA′ = 0 so AB is constant. And
this should be 1 by the boundary condition.
Inserting B = 1/A, B ′ = −A′ /A2 ,
1 1
r00 = − AA′′ − AA′
2 r
1
r11 = r00
A2
r22 = rA′ + A − 1
r33 = sin2 θ(rA′ + A − 1).
Solving r22 = 0 gives A = 1 − 2m/r, where m is a constant and one checks that this also solves
the r00 = 0 equation. Hence we arrive at the Schwarzschild metric
   
2 2m 2 2m −1 2
ds = 1 − dt − 1 − dr − r2 (dθ2 + sin2 θdϕ2 ).
r r
where m > 0 and, for the moment anyway, r > 2m.
Woodhouse, GR, Sect. 7.1–7.2

7.3. Physical consequences


7.3.1. Gravitational time dilatation: ‘heavy clocks run slowly’. Suppose that Alice
and Bob have positions r = rA and r = rB in the Schwarzschild space-time (angular positions
also fixed) with both rA and rB > 2m. How do they compare the rates at which their ideal
clocks run?
Imagine two ‘ticks’ of Alice’s ideal clock, separated by a small proper time interval δτA . To
compare, we assume that Alice’s clock emits photons at each of the two ticks. Bob receives
these photons and in particular can record the elapsed time between receiving the first and
second photon. This gives him a time interval δτB , and the ratio δτA /δτB is the amount by
which Alice’s clock appears to run slowly as compared with Bob’s.
Let’s do it. Suppose first that δtA is the difference in t-coordinate between the two ticks of
Alice’s clock and suppose that δtB is the difference in t-coordinates of when the two photons are
received by Bob. (NB the Schwarzschild time-coordinate is NOT proper time for either Alice or
Bob!) Then it is pretty clear that δtA = δtB because the metric coefficients are all independent
of t. We shall make this computation explicitly below, just to be sure. Thus we need to see how
δtA is related to δτA and similarly for tB and τB .
Now Alice’s world line has the simple form τA 7→ (U τA , rA , θA , ϕA ), where rA , θA and ϕA
are constants, and τA is a proper time parameter if the associated velocity 4-vector

U (7.3.1)
∂t
is of unit length, i.e. g(U ∂t , U ∂t ) = 1. This entails
dt
(1 − 2m/rA )U 2 = 1, so U = = (1 − 2m/rA )−1/2 . (7.3.2)
dτA

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

90 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

The corresponding equation holds with A replaced by B, so


 
dτB dτB dτA −1
δτB = δtB = δτA (7.3.3)
dt dt dt
Hence s
δτB 1 − 2m/rB
= (7.3.4)
δτA 1 − 2m/rA
Thus if Alice is nearer to r = 2m then this factor is greater than one, and so Bob will record a
longer elapsed time between two ticks of Alice’s clocks, this becoming (in principle) infinite as
rA approaches 2m4 .
Remark 7.3.1. Note that observers with constant (r, θ, ϕ) coordinates in Schwarzschild are
not freely falling.
It is interesting to compute the trajectory of a photon sent by Alice at (tA , rA , θ0 , ϕ0 ) to
Bob at (tB , rB , θ0 , ϕ0 ). This is a radial5 null geodesic for the Schwarzschild metric.

r = 2m r = rB

r = rA

Figure 1. Diagam showing trajectories of photons between Alice and Bob in


the Schwarzschild metric. The curved blue trajectories are radial null geodesics,
the three solid vertical lines are the event horizon r = 2m and the worldlines
r = rA and r = rB of Alice and Bob.

Such a null geodesic has a parameterization


τ 7−→ (t(τ ), r(τ ), θ0 , ϕ0 )
where θ0 and ϕ0 are constants. Being null means
   2    
2m dt 2m −1 dr 2
1− − 1− =0 (7.3.5)
r dτ r dτ
so
dr
= (1 − 2m/r) (7.3.6)
dt
(plus sign if photon is travelling outwards as t increases).
Hence
r dr
dt = (7.3.7)
r − 2m
4We shall later identify the surface r = 2m with the event horizon of a black hole
5i.e. θ and ϕ are constant along the geodesic

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.3. PHYSICAL CONSEQUENCES 91

and
rB − 2m
tB − tA = rB − rA + 2m log (7.3.8)
rA − 2m
for a photon travelling from
(tA , rA , θ0 , ϕ0 ) to (tB , rB , θ0 , ϕ0 ). (7.3.9)
Hence δtA = δtB as previously argued.
7.3.2. Geodesics in Schwarzschild. The Lagrangian L for geodesics in Schwarzschild is
1 
L= (1 − 2m/r)ṫ2 − (1 − 2m/r)−1 ṙ2 − r2 (θ̇2 + sin2 θϕ̇2 .
2
We take the parameter to be proper time τ and ˙ to mean differentiation wrt to τ .
We have conserved quantities
E = (1 − 2m/r)ṫ, J = r2 sin θϕ̇ (7.3.10)
and
L = 1/2 for timelike, L = 0 for null geodesics. (7.3.11)
Remark 7.3.2. The conserved quantity E is the total energy of our particle, assumed to
have unit rest-mass. (Not the gravitating one, the one that’s orbiting.) If the rest-mass of the
orbiting particle is µ, we claim that
E = µ(1 − 2m/r)ṫ (7.3.12)
is the total energy. Remember that total energy is a relative concept in relativity, so this
statement needs careful interpretation. Suppose Alice is an observer sitting at constant (r, θ, ϕ)
in Minkowski space. She measures the energy of the orbiting particle as it passes her, i.e. when
its spatial coordinates are (r, θ, ϕ). Let Alice’s 4-velocity vector be U , and that of the orbiting
particle V . Then
U = (U a ) = (1 − 2m/r)−1/2 (1, 0, 0, 0) and V = (V a ) = (ṫ, ṙ, θ̇, ϕ̇)
where the dot denotes differentiation with respect to the particle’s proper time parameter. The
instantaneous speed v of the particle as measured by Alice as it passes satisfies
γ(v) = g(U, V )
just as in special relativity. In this case,
g(U, V ) = (1 − 2m/r)1/2 ṫ.
Hence
E = µ(1 − 2m/r)ṫ = µ(1 − 2m/r)1/2 g(U, V ) = µ(1 − 2m/r)1/2 (1 − v 2 )−1/2 ≃ µ + µv 2 /2 − µm/r
(7.3.13)
if m/r is small and so is v. Recall that G = 1, c = 1 here; if we restore units, then this becomes
1 Gmµ
E ≃ µc2 + µv 2 − (7.3.14)
2 r
The terms here are the rest-energy of the particle, its kinetic energy and its gravitational
potential energy. So this approximation is in perfect agreement with newtonian gravity and
special relativity.
Proposition 7.3.3. Equatorial6 timelike geodesics in Schwarzschild are given by the equa-
tions  2
dr
= E 2 − (1 − 2m/r) (radial geodesics) (7.3.15)

and by
d2 u m
2
+ u − 3mu2 = 2 (non-radial geodesics), (7.3.16)
dϕ J
6i.e. with θ = π/2

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

92 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

where u = 1/r and the angular momentum J = r2 ϕ̇ 6= 0 is a constant. The equation (7.3.16)
has the first integral
 2
du E 2 − 1 2m
+ u2 − 2mu3 = + 2 u, (7.3.17)
dϕ J2 J

Similarly,
Proposition 7.3.4. Radial null geodesics in Schwarzschild are given by (7.3.5–7.3.9). Non-
radial, equatorial null geodesics in Schwarzschild satisfy
d2 u
+ u − 3mu2 = 0 (7.3.18)
dϕ2
which has the first integral
 2
du E2
+ u2 − 2mu3 = . (7.3.19)
dϕ J2

Proof. We have seen that for Schwarzschild geodesics, with θ = π/2 we have the conserved
quantities
E = (1 − 2m/r)ṫ, J = r2 ϕ̇ (7.3.20)
and the further conservation equation
(1 − 2m/r)ṫ2 − (1 − 2m/r)−1 ṙ2 − r2 ϕ̇2 = 2L. (7.3.21)
where L = 1/2 for timelike and L = 0 for null geodesics. In the radial case, ϕ̇ = 0 and for
timelike geodesics, (7.3.21) gives
dr 2 2m
E2 − =1− , (7.3.22)
dτ r
from which (7.3.15) follows at once.
In the non-radial case, we follow the same moves that led to newtonian orbits (see problem
set 1). We set u = 1/r. Then
(1 − 2mu)−1 E 2 − (1 − 2mu)−1 ṙ2 − r2 ϕ̇2 = 2L (7.3.23)
Divide through by J 2 = r4 ϕ̇2 :
 2
E2 −1 4 dr 2L
(1 − 2mu)−1 2 − (1 − 2mu) u − u2 = (7.3.24)
J dϕ J2
Now
dr 1 du
=− 2 (7.3.25)
dϕ u dϕ
and substituting this in to (7.3.24) gives
 2
1 E2 1 du 2L
2
− − u2 = (7.3.26)
1 − 2mu J 1 − 2mu dϕ J2
Rearranging this yields
 2
du E 2 − 2L 4mL
+ u2 − 2mu3 = + 2 u (7.3.27)
dϕ J2 J
The results in the Propositions follow from this by setting L = 1/2 for timelike and L = 0
for null. The second-order equation follows by differentiation with respect to ϕ, and cancelling
uϕ . 

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.3. PHYSICAL CONSEQUENCES 93

Remark 7.3.5. For newtonian gravity, the equation for orbits (see homework problem 1.10)
are
d2 u m
+u= 2 (7.3.28)
dϕ2 J
with first integral
 
du 2 A 2m
+ u2 = 2 + 2 u (7.3.29)
dϕ J J
Thus the GR correction to this equation is the −2mu3 term on the LHS of (7.3.27).
Remember that u = 1/r so large r corresponds to small u and the effect of the cubic
correction term is stronger for small radii. Remember also that the Schwarzschild metric appears
only to be OK for r > 2m which corresponds to 0 < u < 1/2m.
7.3.3. Circular timelike orbits and the precession of perihelion. You can find an
extensive analysis of the timelike geodesics in Schwarzschild in Woodhouse’s book, Chapter 8.
We shall just look at circular orbits and small perturbations of them. This already leads to the
precession of perihelion, which I may have mentioned was one of the first verifications of GR.
Consider a circular timelike orbit in Schwarzschild. For such an orbit, evidently uϕϕ = 0,
uϕ = 0. Setting uϕϕ = 0 in (7.3.16) gives the equation
3mu2 − u + m/J 2 = 0 (7.3.30)
so solving the quadratic, p
1± 1 − 12m2 /J 2
u= (7.3.31)
6m
Thus we have circular orbits if J 2 > 12m2 . For m small, the larger value of u is approximately
1/3m and is just less than this value. The smaller value of u is
p
1 − 1 − 12m2 /J 2 m
u = u0 = ≃ 2 (7.3.32)
6m J
if m/J 2 is small, and this is the newtonian value of the radius of a circular orbit for given m
and J (cf. (7.3.28))

We consider a small perturbation of this orbit, u(r) = u0 + v(ϕ) where v is supposed to be


very small. Inserting into the equation of motion (7.3.16) gives
d2 v
+ v − 6mu0 v = O(v 2 )
dϕ2
Neglecting √
the quadratic term, v satisfies simple harmonic motion as a function of ϕ, but with
period 2π/ 1 − 6mu0 . Thus the solution has the shape

u(ϕ) = u0 + ε cos( 1 − 6mu0 ϕ + ϕ0 ) (7.3.33)
The perihelion is the point on an orbit closest to the sun. This is the largest value of u (remem-
bering the reciprocal relationship u = 1/r). To maximize u, and taking ϕ0 = 0 for simplicity,
we need cos = 1 and so the perihelia occur at
2π 4π
ϕ = 0, √ ,√ ,....
1 − 6mu0 1 − 6mu0
The gap between these angles is > 2π, which means that the perihelion advances in successive
orbits: see Figure 2.
If we restore the units, within this approximation we get a perihelion advance of
6Gmπ
r0 c 2
which for Mercury comes out to be approximately 40′′ per century. In fact, Mercury’s perihelion
advances even without GR, the perturbation being due to the gravitational interaction with
the other planets. However, the observed value was different from all calculations based on
newtonian gravity. The correction due to GR accounts precisely for this anomaly.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

94 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

Fourth perihelion
Third perihelion

Second perihelion

First perihelion

Figure 2. Plot of r = 1/(0.1 + 0.03 cos(0.95ϕ)), showing the preces-


sion of the perihelion. The succesive perihelia are shown and occur at ϕ =
0, 2π/0.95, 4π/0.95, 6π/0.95, . . .. The red arc is the part of the orbit from ϕ = 0
to ϕ = 2π/.95; the blue arc is the part of the orbit from ϕ = 2π/0.95 to 4π/0.95;
the grey arc is the part of the orbit from ϕ = 4π/0.95 to 6π/0.95

7.3.4. Photon trajectories: gravitational bending of light. From Proposition 7.3.4,


non-radial photon trajectories satisfy:
d2 u
+ u − 3mu2 = 0. (7.3.34)
dϕ2
We note first, that circular orbits exist if u − 3mu2 = 0, i.e. u = 1/3m. The existence of
circular photon orbits shows clearly that light is affected by gravity in Einstein’s theory.
1
These orbits are unstable. Indeed, trying u = 3m + v. Then
d2 v
= v + O(v 2 )
dϕ2
and v ∼ e±ϕ , so these perturbed solutions tend to grow exponentially.
Let us consider instead the trajectory of a photon which comes in from infinity (in a straight
line with respect to the asymptotic coordinate system) and passes near to our gravitating object.
Recall that in polar coordinates, straight lines not through the origin are given by equations
of the form
r cos(ϕ − ϕ0 ) = C. (7.3.35)
Indeed, remembering x = r cos ϕ, y = r sin ϕ, (7.3.35) is equivalent to
r cos ϕ cos ϕ0 − r sin ϕ sin ϕ0 = C i.e. x cos ϕ0 − y sin ϕ0 = C. (7.3.36)
Thus this straight line is inclined at an angle ϕ0 to the y-axis. In terms of the reciprocal
coordinate u = 1/r, (7.3.35) takes the form
u = α cos(ϕ − ϕ0 ) (7.3.37)
Let us seek an approximate solution of (7.3.34) which corresponds to a photon approaching
the sun from infinity along a line parallel to the y-axis. Thus we seek a solution
u(ϕ) = α cos ϕ + v(ϕ) (7.3.38)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.3. PHYSICAL CONSEQUENCES 95

Figure 3. Bending of light by a star: plots of u = α cos ϕ + α2 (1 + sin ϕ)2 for


α = 0.08, 0.06, 0.05

where v(ϕ) is small and


v(−π/2) = 0 (7.3.39)
so that as ϕ → −π/2, u(ϕ) → 0 and r → ∞. (This corresponds to being parallel to being
asymptotically parallel to the y-axis and with y ∼ −∞.
This is now an exercise in differential equations. It is better to work with the first integral
(7.3.19)
 2
du E2
+ u2 − 2mu3 = 2 .
dϕ J
We have to follow our noses and compute:
u′ = −α sin ϕ + v ′ . (7.3.40)
Substituting this into (7.3.19) assuming that v is of order α2 , so that
(α cos ϕ + v(ϕ))3 = α3 cos3 ϕ + O(α4 ) (7.3.41)
we obtain, after some algebra,
α2 − 2α sin ϕ v ′ + 2α cos ϕ v − 2mα3 cos3 ϕ = E 2 /J 2 + O(α4 ). (7.3.42)
Hence
α2 = E 2 /J 2 (7.3.43)
and
v ′ sin ϕ − v cos ϕ = −mα2 cos3 ϕ (7.3.44)
2
We use the integrating factor method to solve this. The integrating factor is 1/ sin ϕ, so
  3
 
d v 2 cos ϕ 2 cos ϕ
= −mα = −mα − cos ϕ (7.3.45)
dϕ sin ϕ sin2 ϕ sin2 ϕ
Hence
v(ϕ) = C sin ϕ + mα2 (1 + sin2 ϕ).
Applying the boundary condition v(−π/2) = 0, we get C = 2mα2 and
v(ϕ) = mα2 (1 + sin ϕ)2 .

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

96 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

We conclude that
u(ϕ) = α cos ϕ + mα2 (1 + sin ϕ)2 . (7.3.46)
is an approximate null geodesic in Schwarzschild for α small.
We calculate the angle of deflection by looking for the values of ϕ for which u = 0. We
already know one of them: ϕ = −π/2. We expect the other one to be approximately ϕ = π/2
(since this would correspond to zero deflection). So try to solve u(π/2 − λ) = 0, assuming λ is
small. Substituting this into (7.3.46),
u(π/2 − λ) = 0 ⇔ α sin λ + mα2 (1 + cos λ)2 = 0 (7.3.47)
and putting sin λ ≃ λ cos λ ≃ 1,
λ ≃ −4αm. (7.3.48)
So the asymptotic direction of the light-ray is approximately π/2 + 4mα, showing that the
photon has been deflected by an angle 4mα due to the gravitational pull of the star.
Again, putting in the units, the deflection is approximately 4mG/Dc2 , where D is the
‘impact parameter’ (the smallest value of r on the trajectory).
This was observed by Eddington during the 1919 total eclipse of the sun.

7.4. Extensions of Schwarzschild: introduction to black holes


We are going to consider the significance of the set r = 2m in Schwarzschild. This is a 3-
dimensional surface inside the 4-dimensional space-time. If we fix t we have a two-dimensional
sphere of radius 2m, so the surface as a whole looks like a cylinder of some kind.
We do have to worry about r = 2m, because a particle in free fall along a radial timelike
geodesic will reach this set in finite proper time. Indeed, recall that for such a geodesic,
r
dr 2m
= − E2 − 1 + (7.4.1)
dτ r
for some constant E > 1. If τ = 0 when r = R, this gives
Z R
ds
τ (r) = p . (7.4.2)
2
E − 1 + 2m/s
r

So the particle reaches r = 2m at proper time τ (2m) < ∞. This means that the problem of
understanding the metric in the vicinity of r = 2m is of real physical relevance.
7.4.1. Toy examples.
Example 7.4.1. The metric
dt2
ds21 = + 2t dθ2 , (0 < x < ∞) (7.4.3)
2t
is singular at t = 0. However, if we define
dt √
dr = √ , so r = 2t,
2t
the metric becomes
ds21 = dr2 + r2 dθ2 = dx2 + dy 2
(x = r cos θ, y = r sin θ). The origin (x, y) = (0, 0) corresponds to the singularity t = 0 of
(7.4.3): despite appearances, the metric is perfectly good there.
Example 7.4.2. Consider the metric
ds22 = e2u du2 + e2v dv 2 , −∞ < u, v < ∞. (7.4.4)
There might appear to be nothing to say about this but if we put
x = eu , y = ev so
mapping uv plane to xy quadrant, then
ds22 = dx2 + dy 2 0 < x, y < ∞

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.4. EXTENSIONS OF SCHWARZSCHILD: INTRODUCTION TO BLACK HOLES 97

and the metric extends (as the flat metric) to the whole xy plane.
v y

(x, y)
u x
(u, v)

Definition 7.4.3. In cases where a metric is ill-defined at a point, but after a change of
coordinates it becomes well defined, we say that we have a coordinate singularity.
A coordinate singularity is not a true geometric singularity: it just corresponds to looking
at the metric in a poor choice of coordinates.
We shall now see that the surface r = 2m is a coordinate singularity rather than a metric
singularity of Schwarzschild by exhibiting new coordinates in which the singularity disappears!

7.4.2. Eddington–Finkelstein coordinates. Define


r∗ = r + 2m log |r − 2m| (7.4.5)
so  
dr∗ 2m r 2m −1
=1+ = = 1− . (7.4.6)
dr r − 2m r − 2m r
The graph of r∗ as a function of r is shown in the diagram:
r∗ r = 2m

Define new variables


v = t + r∗ , w = t − r∗ (7.4.7)
Proposition 7.4.4. Changing variables from (t, r) to (v, r), Schwarzschild becomes
ds2 = (1 − 2m/r)dv 2 − dr dv − dv dr − r2 dω 2 .
Note that this is a non-degenerate Lorentzian metric (signature + − −−) even when r = 2m.
Proof. Since v = t + r∗ ,
dv = dt + dr∗ = dt + (1 − 2m/r)−1 dr (7.4.8)
we have
(1 − 2m/r)dt2 = (1 − 2m/r)(dv − (1 − 2m/r)−1 dr)2 (7.4.9)
2 −1 2
= (1 − 2m/r)dv − drdv − dvdr + (1 − 2m/r) dr (7.4.10)
so
(1 − 2m/r)dt2 − (1 − 2m/r)−1 dr2 = (1 − 2m/r)dv 2 − drdv − dvdr. (7.4.11)
The metric
(1 − 2m/r)dv 2 − dr dv − dv dr

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

98 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

has matrix form   


  1 − 2m −1 dv
dv dr r
−1 0 dr
showing clearly that it is non-singular near r = 2m. 

Let us discuss what’s happened here. To avoid later confusion, we shall rechristen the r
coordinate here ρ. Thus (suppressing θ and ϕ) we have two coordinate systems: the original
(t, r) and the new (v, ρ) with
v = t + r + 2m log |r − 2m|, ρ = r. (7.4.12)
By the laws for change of vector fields,
∂ ∂v ∂ ∂ρ ∂ ∂
= + = (7.4.13)
∂t ∂t ∂v ∂t ∂ρ ∂v
and7
∂ ∂v ∂ ∂ρ ∂ ∂ ∂
= + = (1 − 2m/r)−1 + . (7.4.14)
∂r ∂r ∂v ∂r ∂ρ ∂v ∂ρ
The following picture may help to visualize what’s going on:

ρ=0 ρ = 2m

Figure 4. The change to Eddington–Finkelstein coordinates. The hypersurface


ρ = 2m is shown as are four pairs of future-pointing null vectors. One of the null
vectors is always pointing inwards (toward r = ρ = 0). The other is pointing
outward for r > 2m, is tangent to r = 2m for a point on this hypersurface and
then points inward for r < 2m.

Future-pointing radial null vectors in the original coordinates are (positive multiples of)
∂ ∂
± (1 − 2m/r) . (7.4.15)
∂t ∂r
Using (7.4.13), in the new coordinates these become
 
∂ ∂ ∂
± + (1 − 2m/r) . (7.4.16)
∂v ∂v ∂ρ
Hence a pair of radial future-pointing null vectors in the new coordinates is:
2∂v + (1 − 2m/r)∂ρ , −∂ρ . (7.4.17)
To summarize, a change of variables has enabled us to extend the Schwarzschild metric
through the hypersurface r = 2m. When so extended, the metric is defined for 0 < r < 2m, but
light emitted from any point in this region can never escape. For this reason r = 2m is called
the event horizon.
7this equation would be confusing if we had not renamed r as ρ!

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.4. EXTENSIONS OF SCHWARZSCHILD: INTRODUCTION TO BLACK HOLES 99

Remark 7.4.5. The radius r = 2m is called the Schwarzschild radius. For our sun this is
about 3 kilometres, well inside the sun itself. In particular the Schwarzschild metric is not valid
there, because matter is present in this region. The above discussion is only of significance if
matter is so highly concentrated that the region r = 2m is contained in a region of empty space.
The region r < 2m, to which we have now extended the Schwarzschild metric, is then called the
black hole region of the space-time.
7.4.3. What happens near the event horizon? Suppose Alice and Bob are near a
black hole described by the Schwarzschild metric. Alice is sitting at r = R and the unfortunate
Bob8 falls through r = 2m. How can we analyze this?
If Bob is freely falling, radially, so θ̇ = ϕ̇ = 0, the Lagrangian in Eddington–Finkelstein
coordinates is
1
L = ((1 − 2m/r)v̇ 2 − 2v̇ ṙ). (7.4.18)
2
Here L is independent of v and so
∂L
= (1 − 2m/r)v̇ − ṙ (7.4.19)
∂ v̇
is a constant, F , say. Also L = 1/2 along a timelike geodesic parameterized by proper time.
Suppose that τ = 0 when r = 2m. For small τ , these equations reduce to
− ṙ = F, −v̇ ṙ = 1/2 (7.4.20)
So near the event horizon r = 2m, Bob’s world line is
v ≃ τ /2F, r ≃ 2m − F τ, for small τ. (7.4.21)
Thus Bob doesn’t notice anything particularly strange in crossing the event horizon: though in
reality, for a typical sized black hole, the tidal forces (the difference between the force felt on
your head and your feet) get a bit strong well before you encounter the event horizon.
What does Alice see of Bob’s descent? To answer this question, suppose she is sitting at
r = R > 2m, and receives a photon from Bob at every tick of his clock. Then A’s world-line is
τA 7→ (V τA , R), her velocity 4-vector is (V, 0) and so τA is proper time if this has length2 equal
to 1 with respect to our metric. This gives V = (1 − 2m/R)−1/2 (cf. §7.3.1). So
v(τA ) = (1 − 2m/R)−1/2 τA , r(τA ) = R along A’s worldline. (7.4.22)
A photon emitted by Bob’s clock at proper time τB < 0 and heading out to Alice will satisfy
(1 − 2m/r)v̇ 2 − 2ṙv̇ = 0 (7.4.23)
with initial conditions
τB
v(0) = , r(0) = 2m − F τB . (7.4.24)
2F
Dividing by ṙv̇, (7.4.23) gives
dv 2r
= (7.4.25)
dr r − 2m
and so, integrating,
v(r) = 2(r + 2m log(r − 2m)) + c (7.4.26)
for some integration constant. Inserting the initial conditions, we get
r − 2m 1
v(r) = 2[r − 2m + 2m log + (F + )τB ]. (7.4.27)
F |τB | 4F
So the photon emitted by Bob at his proper time τB is received by Alice when r = R, so
 
R − 2m 1
v(R) = 2 R − 2m + 2m log + (F + )τB . (7.4.28)
F |τB | 4F
and by (7.4.22) the corresponding value of A’s proper time is
r  
2m R − 2m 1
τA = 2 1 − R − 2m + 2m log + (F + )τB . (7.4.29)
R F |τB | 4F
8Perhaps Bob’s a robot

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

100 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

In particular, τA → +∞ as τB → 0 (from below):


r
2m 1
τA ∼ 4m 1 − log as τB → 0.
R |τB |
Alice sees Bob ‘frozen’ at the point at which he crosses r = 2m: she sees his clock run more
and more slowly, and never sees it reach τB = 0.

7.4.4. The full extension of Schwarzschild: Kruskal coordinates. To understand


the structure of the extended Schwarzschild space-time more fully, pass from (t, r) to (v, w),
where
v = t + r + 2m log(r − 2m), w = t − r − 2m log(r − 2m). (7.4.30)
(This is more symmetrical than changing to (v, r) coordinates as we did in the previous section.)
A simple calculation shows
1
(dvdw + dwdv) = dt2 − (dr∗ )2 = dt2 − (1 − 2m/r)−2 dr2 , (7.4.31)
2
So the Schwarzschild metric, in these coordinates, has the form
1
ds2 = (1 − 2m/r) (dvdw + dwdv) − r2 dω 2 , −∞ < v, w < ∞. (7.4.32)
2
where r is defined implicitly by v and w through
1
(v − w) = r + 2m log(r − 2m), r > 2m (7.4.33)
2
Note that as r goes from 2m to ∞ so the RHS of (7.4.33) goes from −∞ to ∞ and so given any
value of (v − w)/2, there’s a unique value r > 2m which solves (7.4.33). This metric degenerates
at r = 2m, which corresponds v − w → −∞. There is a remarkable trick—analogous to what
happened in Example 7.4.2 above—which allows us to extend this metric through r = 2m.
We set
v = 4m log v ′ , w = −4m log(−w′ )
or
v ′ = exp(v/4m), w′ = − exp(−w/4m).
This maps the whole (v, w) plane to the quadrant {v ′ > 0, w′ < 0} in the (v,′ w′ ) plane (compare
Example 7.4.2). Note that for r just a bit bigger than 2m, v has some finite value, w → +∞,
so v ′ has some positive value and w′ is just less than zero.
Then
dv ′ dw′
dvdw = −16m2 (7.4.34)
v ′ w′ ,
v ′ w′ = − exp((v − w)/4m) (7.4.35)
r/2m
= − exp(r/2m + log(r − 2m)) = −e (r − 2m). (7.4.36)
(Here we write dvdw for (dvdw + dwdv)/2 for brevity.)
Hence
Proposition 7.4.6. The exterior region r > 2m of the Schwarzschild metric corresponds to
the region v ′ > 0, w′ < 0 with the metric
16m2 −r/2m ′ ′
g∗ = e dv dw − r2 dω 2
r
where r is defined implicitly as a function of (v ′ , w′ ) by

er/2m (r − 2m) = −v ′ w′ . (7.4.37)

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

7.4. EXTENSIONS OF SCHWARZSCHILD: INTRODUCTION TO BLACK HOLES 101

Remark 7.4.7. This extension was obtained in 1960 by Kruskal. The crucial point is that
g ∗ is a well-defined lorentzian metric wherever r > 0. From (7.4.37)

r > 0 ⇔ er/2m (r − 2m) > −2m ⇔ −v ′ w′ > −2m.

Thus the metric is defined in the set v ′ w′ < 2m, but the coordinates (v ′ , w′ ) can have either
sign.

Remark 7.4.8. If you prefer something looking more obviously Lorentzian, define

t′ = v ′ + w′ , x′ = v ′ − w′ , so (dt′ )2 − (dx′ )2 = 4dv ′ dw′

so
4m2 −r/2m
g∗ = e [(dt′ )2 − (dx′ )2 ] − r2 dω 2
r
The following picture, the Kruskal diagram should help.
w′ Singularity (r = 0) t′ v′

r = 2m r = 2m

Region I’ Region I
Region II

x′
r = const > 2m
Region II’

Singularity (r = 0)

Note:
• Radial null geodesics are given by

v ′ = τ, w′ , θ, ϕ constant,

and
w′ = τ, v ′ , θ, ϕ constant.
(i.e. they are always at 45◦ in the Kruskal diagram).
• The event horizon r = 2m is given by v ′ w′ = 0: the two dashed lines;
• The singularity at r = 0 is given by v ′ w′ = 2m: the two thick hyperbolae at top and
bottom of the picture.
• Region I is the domain of the original Schwarzschild metric, r > 2m.
• Region II is the region in which the Eddington–Finkelstein coordinates describe the
metric: Region II is the region interior to the event horizon r = 2m, i.e. the black hole
region. The boundary between regions I and II is the positive v ′ -axis.
We emphasise that the (v ′ , w′ ) axes are at 45◦ in this diagram. Also that the hyperbola at
the top of the picture is the true singularity of the metric and represents the black hole itself.
The ultimate fate of every particle or photon in region II (inside the event horizon) is to end
up terminated by this singularity.
The singularity is the set v ′ w′ = 2m and is shielded from view by the event horizon. GR
and all known laws of physics break down at the singularity.

Downloaded by Roy Vesey (rdv0044@googlemail.com)


lOMoARcPSD|3449400

102 7. THE SCHWARZSCHILD METRIC AND BLACK HOLES

7.5. Gravitational collapse


There are two problems with the Schwarzschild solution as a realistic model of a gravitating
object such as a star.
Most stars are rotating; and the Schwarzschild solution is a vacuum solution, so one needs
a separate description of the metric along the world-line of the star itself.
The first problem has been solved in the sense that the Kerr metric is a solution of Einstein’s
vacuum equations rab = 0 which contains an additional parameter corresponding to angular
momentum. This metric is beyond the scope of this course...
For the second problem, consider the Schwarzschild radius rs = 2m. Putting back in the
units,
2Gm
rs = 2 .
c
For typical objects, this radius is inside the object: e.g., for the Earth it is about 1 cm and
for the sun, about 3 km. So (ignoring angular momentum) for these objects, it is a question
of ‘grafting’ part of the exterior Schwarzschild metric to another metric which describes the
matter content.

7.6. The life and death of stars


Stars like our sun manage to maintain their size against the pull of gravity because of the
pressure generated by the nuclear fusion reactions going on its core. When the fuel runs out
(to simplify a complicated story), a star of the mass of our sun is expected to settle down as a
white dwarf: not very bright, density approx 109 kg/m3 .
For stars whose mass is between about 1.5 times and 3.2 times the mass of the sun, it is
expected that the final state will be a neutron star. Such an object consists mainly of neutrons
crushed together at almost unimaginable densities in their core of around 1017 kg/m3 .
It seems that for stars of mass more than about 8 times that of the sun, no known physical
process can overcome the pull of gravity once the fuel runs out. The star collapses inside its
Schwarzschild radius, an event horizon is formed, and inside that a singularity - which has no
satisfactory physical description.

7.7. Some figures, or a tale of 3 black holes


(with apologies to Kip Thorne)
In his popular book ‘Black holes and time warps’, Kip Thorne starts by describing a space
mission to explore black holes of different sizes.
• Hades: 10× solar mass: rs = 30 km. Tidal forces already perceptible in an orbit of
circumference 105 km; at 3000 km the tidal force is more than 15g!
• Sagittario (centre of Milky Way) 106 solar masses, rs = 3 × 107 km. (about 8×
moon’s orbit around earth). To hover at 1.0001rs would require a force of 1.5 × 108 g.
• ‘Gargantua’ 15×1012 solar masses. rs = 4.5×1013 km = 5 light years. For such a hole,
no perceptible tidal forces even on an orbit whose circumference is 1.0001× horizon
circumference. To hold in this position, 10g acceleration is required. All physical
experiments are entirely consistent with being in a 10g gravitational field. You could
slip through the horizon with without any ill effects (apart from never being able to
get out again).
From a larger distance, such a large hole would have a major light-bending effect
on stars in the line of sight. See a recent movie for a simulation of this.

Downloaded by Roy Vesey (rdv0044@googlemail.com)

You might also like