Professional Documents
Culture Documents
Gr notes 15-16
Maths for GR
Michael Singer
E-mail address: michael.singer@ucl.ac.uk
Contents
4 CONTENTS
6.5. Curvature 72
6.6. Curvature at a point 76
6.7. Ricci and scalar curvature 79
6.8. Relative acceleration and geodesic deviation 80
6.9. Comparison with the newtonian theory 81
6.10. Weak field limit 82
6.11. Physical differential equations 83
Chapter 7. The Schwarzschild metric and black holes 85
7.1. Spherically symmetric, static metrics 85
7.2. Schwarzschild 86
7.3. Physical consequences 89
7.4. Extensions of Schwarzschild: introduction to black holes 96
7.5. Gravitational collapse 102
7.6. The life and death of stars 102
7.7. Some figures, or a tale of 3 black holes 102
CHAPTER 1
1.1.1. Vector spaces. In this course, we shall only be interested in real vector spaces.
Recall that a vector space V is a set of elements (called ‘vectors’) which can be added to each
other and multiplied by real numbers (often called ‘scalars’). We will not reproduce the axioms
here.
If V and W are vector spaces, then we are interested in maps between them that preserve the
given structure (the addition and scalar multiplication). A linear map (or linear transformation)
from V to W is a mapping T from V to W (written in compact form T : V → W ) with the
properties
T (v1 + v2 ) = T (v1 ) + T (v2 ) and T (λv) = λT (v) (1.1.1)
for any vectors v, v1 , v2 of V and scalars (i.e. real numbers) λ.
Remark 1.1.1. It is standard to write T v for T (v) when T is a linear transformation.
Further: T is an isomorphism between V and W if it is linear and also 1:1 and onto as a
mapping, just regarding V and W as sets. If V and W are isomorphic by a linear map T , we
often also say that T is an identification of V and W , or that V and W are identified (by T ).
Example 1.1.2. As in the intro, the space of all triples (x, y, z) of real numbers is a vector
space, denoted R3 , where
(x, y, z) + (x′ , y ′ , z ′ ) = (x + x′ , y + y ′ , z + z ′ ), λ(x, y, z) = (λx, λy, λz)
Example 1.1.3. More generally, the space of lists of n real numbers (x1 , . . . , xn ) is a real
vector space denoted by Rn .
1.1.2. Bases and matrices. We recall that a (finite) basis for a vector space V is a set
of elements (v1 , . . . , vn ) such that every element v in V has a unique representation
n
X
v= λj vj , (λj ∈ R). (1.1.2)
j=1
In the expansion (1.1.2), the numbers λj are called the coefficients (or sometimes coordinates)
of v with respect to the basis (v1 , . . . , vn ). If V does have a finite basis consisting of n elements,
5
then any other basis of V will also consist of n elements. This number n is defined to be the
dimension of V .
If V does not have a finite basis, then it is said to be infinite-dimensional.
Example 1.1.4. The set Rn of n-tuples (x1 , . . . , xn ) (see above) has dimension n. This has
a standard basis
e1 = (1, 0, . . . , 0), e2 = (0, 1, 0 . . . , 0), . . . , en = (0, . . . , 0, 1).
Example 1.1.5. The set of differentiable functions on R is an infinite-dimensional vector
space.
Theorem 1.1.6. If V is a real vector space of dimension n, then V is isomorphic to Rn .
We won’t need the proof. But it’s important to understand that an isomorphism of V with
Rn is exactly the same thing as a choice of basis of V . For if T : Rn → V is an isomorphism,
we define
vj = T (ej ),
where the ej is the standard basis of Rn , and then you check that the vj form a basis of V .
Conversely, given a basis vj of V , for every v ∈ V we have its n-tuple of coefficients λj as in
(1.1.2), and the map from v to its coefficients is an isomorphism from V to Rn .
In particular without more structure there is no natural or unique isomorphism between our
n-dimensional vector space and Rn . Returning to our original motivation, the observation that
our world appears to be very well described by a space with coordinates (x, y, z), but we don’t
want to specify particular coordinates, we see that these ideas are quite well captured by saying
that our world is well described by a 3-dimensional real vector space.
Remark 1.1.7. When we are using vectors to describe ‘physical space’ we often call their
components in a basis coordinates instead.
1.1.3. Symmetries of a vector space. Another way of thinking about choices of basis
in a vector space is in terms of the symmetry of the space.
Definition 1.1.8. Let V be a finite-dimensional real vector space. The set of all invertible
linear maps T : V → V (i.e. linear isomorphisms of V with itself) is a group, denoted GL(V ).
If V = Rn , we also write GLn (R) or GL(n, R) for GL(V ).
GL here stands for ‘general linear’. Recall group just means that there is an associative
multiplication with inverses. In this case it is composition of linear maps. The group GLn (R)
is the same as the group of n × n invertible matrices M . Such M defines a map from Rn to
itself by matrix multiplication, where we write the typical element x of Rn as a column vector
with coefficients (x1 , . . . , xn ), so
x 7→ M x
1.1.4. Affine spaces. A linear map T between vector spaces automatically takes the zero
vector to the zero vector. In particular, thinking about R3 , the translation
(x, y, z) 7→ (x + a, y + b, z + c)
for some fixed vector (a, b, c) is not a linear map. However, from the physical point of view, we
would certainly want to be able to consider such transformations as part of our story.
There is a formal abstract definition of affine space, but we shall not give it. Instead, we
work with our vector space V and just enlarge the symmetries.
Definition 1.1.9. Let V be a finite-dimensional real vector space. The affine group Aff(V )
is the group of all transformations of the form
x 7→ T x + b
where T ∈ GL(V ) (i.e. is an invertible linear transformation) and b ∈ V is a vector.
Note in particular, by taking T = the identity I, we get the translations as part of Aff(V ).
And on the other hand it contains GL(V ) as the subgroup of elements with b = 0.
We shall somewhat loosely talk about the affine space V to mean the space V , but where
we are allowing the whole of Aff(V ) to act as symmetries.
The following example may entertain the mathematicians.
Example 1.1.10. Let T be a linear mapping from a vector space V to another vector space
W . Let w ∈ W be any given vector. Let P be the solution set of the equation T x = w, i.e.
P = {x ∈ V : T x = w}
Suppose that P is not empty. Then P is a natural example of an affine space if w 6= 0 and is a
vector space if w = 0.
To picture what is going on here, suppose that V = R3 and W = R. Then we picture P
as a two-dimensional plane inside of R3 (usually) which goes through the origin if w = 0 and
does not (necessarily) otherwise. When w = 0, P is a linear (or vector) subspace of R3 , hence a
vector space K in its own right. If w 6= 0, P is identifiable with K; geometrically, P is a plane
parallel to K, but not going through 0. We can identify K with P by picking any element p of
P and mapping k ∈ K to k + p ∈ P . There is no preferred choice of p and hence no given origin
in P .
A further interesting fact is that the group of linear transformations of R3 which map P
into itself is identifiable with Aff(P ).
If we think of V as an affine space, then given any two points p and q, we have the displace-
ment vector →
−
pq of q relative to p, given in terms of the linear structure of V by
→
−
pq = q − p. (1.1.3)
Note that if we subject p and q to an affine transformation
x 7−→ T x + b
then the displacement vector is subjected only to the ‘linear part’ of the transformation,
→
−
pq 7−→ T (→
−
pq);
the vector b cancels out when we form q − p.
To put it another way, if we fix any point p of V , then we get a map from V (viewed as an
affine space) to V viewed as a vector space, in fact the space of all position vectors relative to
p:
q 7→ →
−
pq.
Indeed, the set of position vectors does have a given origin, namely the zero vector which is the
position vector of p relative to itself. (A bit confusing—sorry about that, but it is the price to
be paid for not burdening you with the ‘proper’ definition of affine space.)
Let’s make this more concrete. Consider the solar system (at a particular instant, perhaps).
We know that we can describe the position of any given object in the solar system by 3 co-
ordinates (x, y, z). But depending upon the problem, we may want to put the origin of these
coordinates at different places. For describing the orbit of the Earth around the Sun, we might
very well put the origin at the centre of the Sun; for describing the Earth-Moon system, we
might put it at the centre of the Earth, or at the centre of mass of the Earth-Moon system. The
point is that the right way to think of things is that coordinates (x, y, z) give the position of
one point relative to another (the Earth relative to the Sun, or the Moon relative to the Earth).
Example 1.2.1. In one variable, the only possibility is ax2 , where a is real. In two variables
(x, y), we have
a g x
ax2 + 2gxy + by 2 = x y .
g b y
In three variables,
a g f x
ax2 + by 2 + cz 2 + 2eyz + 2f zx + 2gxy = x y z g b e y .
f e c z
Example 1.2.2. All terms have to be of degree precisely two: in two variables
ax2 + bxy + cy
is not an example unless c = 0.
Definition 1.2.3. Let V be a vector space and let d be a non-negative integer. A real-valued
function f on V is said to be homogeneous of degree d if
f (λv) = λd f (v)
for all real λ and v ∈ V .
Remark 1.2.4. This explains what ‘homogeneous of degree 2’ means above. Homogeneity
is also important later on in this chapter, when we look at Lagrangians and the calculus of
variations (cf. Proposition 1.4.4).
There are two reasons for the introduction of quadratic forms and bilinear forms. First of
all, quadratic forms on vector spaces provide the additional structure needed to define distance.
The other reason bilinear forms is that they, and their multilinear cousins, are essential in the
development of multivariable calculus (see Chapter 4).
The basic example is x2 + y 2 + z 2 in R3 . By Pythagoras’s theorem, this is the square of the
distance of the point (x, y, z) from (0, 0, 0) if we are thinking of a standard system of mutually
perpendicular axes.
In special relativity (see Chapter 2) the physics is captured by a quadratic form in 4 variables,
c t − x2 − y 2 − z 2 . The significance of this will be that
2 2
c 2 t2 − x 2 − y 2 − z 2 = 0
if and only a photon (particle of light) emitted at the origin at t = 0 (and travelling in a straight
line) can pass through the point (x, y, z) at time t.
Thus one should think generally of quadratic forms as defining ‘squares of distances’, or
‘squares of lengths’ of vectors on a vector space. This needs to be taken with a pinch of salt,
thought, since in the above 4-dimensional example, the square of the ‘distance’ can be 0 or even
negative! We shall explore this in detail in the chapter on special relativity.
We define these homogeneous quadratic polynomials in terms of bilinear forms.
Definition 1.2.5. A bilinear form B on a vector space V is a map
B :V ×V →R
with the property that for each fixed v, the maps
w 7→ B(v, w) and w 7→ B(w, v) are linear in w.
To spell this out,
B(v, λu + µw) = λB(v, u) + µB(v, w).
and similarly
B(λu + µw, v) = λB(u, v) + µB(w, v).
Definition 1.2.6. A bilinear form B is said to be symmetric if B(v, w) = B(w, v) for all
v, w. A bilinear form is said to be skew-symmetric (or just skew) if B(v, w) = −B(w, v).
Definition 1.2.7. If B is a symmetric bilinear form, then the associated quadratic form is
Q(v) = B(v, v).
A bilinear form is called non-degenerate if for any v 6= 0, there exists w ∈ V such that
B(v, w) 6= 0.
It is equivalent to the corresponding condition with the roles reversed: that is B is also non-
degenerate if for any v 6= 0, there exists w such that
B(w, v) 6= 0
We shall see later that any bilinear form (on a finite-dimensional vector space) can be
represented by a square matrix B̂. Then
• B is symmetric if and only if the matrix B̂ is symmetric (B̂ = B̂ t );
• B is skew-symmetric if and only if the matrix B̂ is skew (B̂ = −B̂ t );
• B is non-degenerate if and only if the matrix B̂ is invertible.
A linear transformation of V is an isometry (or Q-isometry), if we need to keep track of
which quadratic form is under consideration, if it preserves ‘lengths’ as defined by Q:
Definition 1.2.8. Let V be a vector space with a non-degenerate quadratic form Q. A
linear transformation T of (V, Q) is called a Q-isometry if
Q(T v) = Q(v) (1.2.1)
for all vectors v in V . The set of all Q-isometries forms a group denoted O(V, Q).
Remark 1.2.9. Here the O stands for ‘orthogonal’. The condition
Example 1.2.10. If V = R2 and Q(x, y) = x2 + y 2 , then Q gives the ordinary euclidean
length-squared of the vector from (0, 0) to (x, y). You can verify that the linear transformation
(x, y) 7→ (cx + sy, −sx + cy)
where c = cos θ, s = sin θ is a Q-isometry of R2 . Geometrically, this linear transformation just
represents a rotation of the plane, and rotations preserve lengths. This gives a guide to how
you should think about isometries more generally.
1.2.1. Matrix representation. If (e1 , . . . , en ) is any basis of V , and B is any bilinear
form, then we may form the elements B̂ij = B(ei , ej ), a square matrix that we’ll denote B̂. If
we identify vectors x and y of V with their coefficients (x1 , . . . , xn ), (y1 , . . . , yn ) with respect to
this same basis (e1 , . . . , en ), then it follows from the conditions of bilinearity that
Xn
B(x, y) = B̂ij xi yj .
i,j=1
Thinking of x and y as column vectors, we can also write this
B(x, y) = xt B̂y
where xt is the transpose of x, i.e. the row vector with coefficients (x1 , . . . , xn ).
The isometry condition in terms of Q is equivalent to its ‘polarized version’
B(T v, T v ′ ) = B(v, v ′ ) for all v, v ′ ∈ V. (1.2.2)
Cf. Problem 1.2.
And in terms of matrices, this is equivalent to
T̂ t B̂ T̂ = B̂ (1.2.3)
if T̂ is the matrix representation of T with respect to the basis (e1 , . . . , en ).
Note, by taking determinants of (1.2.3) we obtain (since det(AB) = det(A) det(B))
det(T )2 det(B̂) = det(B̂). (1.2.4)
and so det(T ) = ±1 since B̂ is non-degenerate and so det B̂ 6= 0.
Definition 1.2.11. The set of Q-orthogonal T ’s with det(T ) = 1 forms a subgroup denoted
SO(V, Q), read the ‘special orthogonal’ group.
Example 1.2.12. Suppose that B is a symmetric bilinear form on R2 such that
B(e1 , e1 ) = 0, B(e2 , e2 ) = 0
is this enough to determine B uniquely?
Solution 1.2.13. The answer is no, we need to know B(e1 , e2 ) as well to determine B. If
B(e1 , e2 ) = λ, however, then by symmetry also B(e2 , e1 ) = λ. So the matrix representation of
this symmetric bilinear form will be
0 λ
B̂ = . (1.2.5)
λ 0
1.2.2. Diagonal quadratic forms. If V = Rn , the standard quadratic form of signature
(r, s) has the matrix representation
Q = diag(1, . . . , 1, −1, . . . , −1) (1.2.6)
with respect to the standard basis, where there are with r +1’s and s −1’s.
Remark 1.2.14. In the pure mathematical literature, the difference r − s of the number of
+1’s and the number of −1’s is often called the signature. On the other hand it is also common
to refer to a quadratic form as having signature +, +, +, + or +, −, −, − rather than 4 or −2.
There is no real risk of confusion with these slight variations.
The corresponding orthogonal group is denoted by Or,s or O(r, s) and the corresponding
subgroup of elements with determinant equal to +1 is denoted SOr,s or SO(r, s).
One can make the case that the most important cases are s = 0, when the groups are also
just denoted O(n) and SO(n) (for ordinary classical n-dimensional euclidean geometry) and
s = 1, r = 3, for the study of special relativity. In this case O(1, 3) is called the Lorentz group.
The following is a basic fact about non-degenerate quadratic forms:
Theorem 1.2.15. Let V be a finite-dimensional vector space and let Q be a non-degenerate
quadratic form on V . Then there exists a basis of V so that the matrix Q̂ of Q in this basis
takes the standard form (1.2.6) for some particular r and s (which depend on Q).
Proof. (Sketch, useful to be aware of it, but not examinable.)
Pick any v at random with Q(v) 6= 0. [If Q(v) = 0 for all v, then by polarization (see
problem set) B(v, w) = 0 for all v and w, contradicting the non-degeneracy of Q. So such v
does exist.] p
Replacing v by e1 = v/ |Q(v)|, we get
Q(v)
Q(e1 ) = = ±1.
|Q(v)|
Let V ′ be the orthogonal complement of e1 with respect to Q, that is
V ′ = {w ∈ V : Q(e1 , w) = 0}.
Because Q(e1 ) = ±1 and Q is non-degenerate, this is an n − 1-dimensional subspace and the
restriction Q′ of Q to V ′ is a non-degenerate quadratic form. If we choose any basis e2 , . . . , en
spanning V ′ , then the matrix representation of Q in this basis will have the form
±1 0 · 0
0 b′ b′n2
22 · · ·
Q̂ = .. .. .. (1.2.7)
. . .
0 b′n2 · · · b′nn
The top left-hand element is Q(e1 ). The zeros in the first row and first column come from the
orthogonality condition B(e1 , w) = 0 for all w ∈ V ′ .
We suppose by induction that the theorem has already been proved for non-degenerate
quadratic forms on vector spaces of dimension < n. Then we can choose e2 , . . . , en so that the
matrix of b′ij is diagonal with entries ±1.
Recall that if all the signs in (1.2.6) are + then we call Q (or B) positive-definite; if they
are all −, negative-definite. If the signature is r − s, then r is the largest possible dimension of
subspaces of V on which Q is positive-definite; and similarly s is the largest possible dimension
of subspaces of V on which Q is negative-definite. Note, however, that such subspaces are not
unique.
1.2.3. Affine version. Suppose we want to incorporate translations. We may define the
group of affine isometries of (V, Q) to be all maps of the form
x 7→ T x + b, where T ∈ O(V, Q) and b ∈ V.
In the case of of interest in relativity, where Q has signature (+, −, −, −), this is called the
Poincaré group.
In this affine case, we should think of Q as giving the length-squared of position vectors of
one point relative to another, that is
−−→ −−→
Q-length-squared of AB = Q(AB).
In a euclidean space, (i.e. Q is positive-definite) we can also measure angles between dis-
−−→
placement vectors: if X, Y and Z are three points, then the angle, θ, that u = XY makes with
−−→
v = XZ satisfies
B(u, v)
cos θ = p p .
Q(u) Q(v)
Thus we have lengths and angles as in ‘ordinary’ two- or three-dimensional euclidean geometry.
Notation 1.2.16. In a euclidean vector space with fixed positive definite quadratic form it
is common to refer the associate bilinear form as an inner product and denote p= hu, vi.
pit B(u, v)
Similarly the length of a vector in this context is often simply denoted |u| = Q(u) = B(u, u).
Remark 1.2.17. Why have I gone through all this? First of all, why should it be that vector
spaces or affine spaces are the right thing for describing the world?
We agree that triples of real numbers (x, y, z) are very good for describing where things are
in the world (or indeed the solar system...). But the key thing about vector and affine spaces
(which we’ll discuss next) is that there is a distinguished set of curves (or paths or trajectories),
namely the straight lines. These are distinguished in the sense that if you take a straight line
and apply an affine transformation (translation and linear transformation), you get another
straight line.
Fast forward to Newton’s laws of motion: the first of these says that a particle remains at
rest or continues to move with a constant velocity unless acted upon by an external force.
Thus straight lines have a physical significance, as the trajectories of free particles. In this
section, we have learned or recalled the underlying geometry of 3-dimensional spaces of points
(x, y, z), as vector or affine spaces. We’ve discussed the symmetries, both linear and affine, of
such spaces. We’ve introduced bilinear forms so that we have a notion of length and distance
in our vector space. And this flat geometry seems a good basis for Newtonian physics, because
it has straight lines in some sense built in, and these are the trajectories of particles not acted
upon by any force.
where I is some interval which may be open, closed, (semi-)infinite, whatever. If I is closed
(and bounded), I = [u, v], say, then the endpoints of the curve are γ(u) and γ(v).
We generally assume the parameterisation is regular i.e. γ ′ (t) 6= 0 for every t ∈ I. Then
γ ′ (t) is called the velocity vector and it is tangent to the curve. If we choose a basis of V , then
γ(t) is represented in terms of its components (γ1 (t), . . . , γn (t)), where each of the γj is just an
ordinary smooth function of the variable t. Then γ ′ (t) has component (γ1′ (t), . . . , γn′ (t)).
If γ ′ (t0 ) = 0 for a point t0 of I, then the curve can be singular (have a sharp corner) at that
point. We don’t want to consider such singularities.
The image of γ is sometimes called the trace of the curve.
The vector γ ′′ (t) with components (γ1′′ (t), . . . , γn′′ (t)) along the curve is called the acceleration
vector.
Example 1.3.1. If
a = (a1 , . . . , an ) and b = (b1 , . . . , bn )
are two (constant) vectors in Rn , with b 6= 0, then we may construct the parameterized straight
line
γ(t) = a + bt (1.3.1)
′
through a in the direction b. In fact the tangent vector γ (t) = b (i.e. b is the velocity vector in
this case) and the acceleration γ ′′ (t) is zero.
Example 1.3.2. In R2 , consider the parameterized circle
γ(t) = (a cos t, a sin t), (1.3.2)
where a > 0 is a constant. Then
γ ′ (t) = (−a sin t, a cos t) and γ ′′ (t) = (−a cos t, −a sin t).
1.3.1. Curves in a euclidean space. Let V be a euclidean space, i.e a real vector space
equipped with a positive-definite inner product h·, ·i and length2 denoted by | · |2 (cf. Nota-
tion 1.2.16).
Definition 1.3.3. A curve γ : I → V is parameterized by arc-length if |γ ′ (t)|2 = 1 for all t.
Proposition 1.3.4. If γ : I → V is any regular curve in a euclidean vector or affine space,
then there is a reparameterization of the curve so that it is parameterized by arclength.
In other words, there is a smooth inverible map τ : I → J, with τ ′ (t) > 0 for all t and such
that
γ̃ : J → V, γ̃(τ ) = γ(t(τ ))
satisfies
|γ̃ ′ (τ )|2 = 1 for all τ ∈ J. (1.3.3)
Proof. Let γ(t) be the given curve. Let τ be any increasing smooth invertible function of
t and define γ̃(τ ) as above, or equivalently,
γ̃(τ (t)) = γ(t).
Differentiating (chain rule),
γ̃ ′ (τ )τ ′ (t) = γ ′ (t).
Take the length-squared, to get
(τ ′ (t))2 |γ̃ ′ (τ )|2 = |γ ′ (t)|2 .
Imposing that τ is arclength, we get the equation
τ ′ (t) = |γ ′ (t)|.
where we choose the positve square consistently with τ being an increasing function of t. Since
we assume that |γ ′ (t)| > 0 (regularity of the curve), we get an equation which determines τ as
a function of t, uniquely up to the addition of a constant.
We have
∂L ∂L
= 0, a = y a . (1.4.6)
∂xa ∂y
Inserting y a = ẋa (t) and subbing in, we get the EL equations:
ẍa = 0. (1.4.7)
We learn that xa (t) = pa + v a · t where pa and v a are constants. Of course, this is just
the parametric form of the straight line through pa in the direction v. Note that with this
parameterization, the straight line is traversed at constant velocity v as well. In fact, given
xa (1) = 1, v a = q a − pa .
1.4.1. Coordinate invariance. If we have a change of coordinates, z a as an invertible
function of the xa , then the result of minimizing a particular functional should not change. Of
course the explicit formulae giving the z a as a function of t will be different from the formulae
giving the xa as a function of t. The relation will be
z a (t) = z a (x(t)). (1.4.8)
As a telling example, consider the same energy functional in spherical polars (r, θ, ϕ). We
have Z
1 2
E[γ] = ṙ + r2 (θ̇2 + sin2 θϕ̇2 dt (1.4.9)
2
We calculate
∂L ∂L ∂L
= ṙ, = r2 θ̇, = r2 sin2 θϕ̇. (1.4.10)
∂ ṙ ∂ θ̇ ∂ ϕ̇
and
∂L ∂L ∂L
= r(θ̇2 + sin2 θϕ̇2 ), = r2 sin θ cos θϕ̇2 , = 0. (1.4.11)
∂r ∂θ ∂ϕ
Hence the Euler–Lagrange equations are as follows
r̈ = r(θ̇2 + sin2 θϕ̇2 ), (1.4.12)
d
r2 θ̇ = r2 sin θ cos θϕ̇2 , (1.4.13)
dt
d 2 2
r sin θϕ̇ = 0. (1.4.14)
dt
It hardly needs saying that these equations are much more daunting than the same system in
euclidean coordinates. There are, however, some tricks to lead us to a solution, which we shall
discuss next.
The physical interpretation is that T (x, ẋ) is the kinetic energy and U (x, ẋ) is the potential
energy. In many physical situations, the kinetic energy is quadratic in the velocities, so satisfies
the assumptions of the Proposition. Then E is the total energy, and the result is that the total
energy (kinetic + potential) is conserved.
Proof. See Problem 1.11.
Example 1.4.5. Let us see how to use these ideas to solve the system (1.4.12), or at least
how to obtain some of the solutions. We use the principles we’ve just discovered: along a
solution, the energy itself is constant,
ṙ2 + r2 (θ̇2 + sin2 θϕ̇2 ) = 2E (1.4.16)
and because L is independent of ϕ the angular velocity
J = r2 sin2 θϕ̇ (1.4.17)
is also constant. The full analysis is complicated, but we see that if θ = π/2, then the θ-equation
is satisfied, so we can start by considering those solutions with θ = π/2 identically1. (Note that
θ = 0 or π are not such good choices, as these values of θ are singularities of the coordinate
system.)
So, let us substitute θ = π/2 and see what we’re left with:
ṙ2 + r2 ϕ̇2 = 2E, J = r2 ϕ̇ (1.4.18)
are both constants.
We now distinguish two cases: J = 0 ‘radial’ and J 6= 0, ‘non-radial’.
√ If J = 0, then either
r = 0 which is not terribly interesting or ϕ̇ = 0 and then ṙ = 2E. So what we have found is
the path √
γ(t) given by r = 2Et, θ = π/2, ϕ = ϕ0
This is a straight line emanating from the origin at t = 0 in the equatorial plane going through
the line of longitude ϕ = ϕ0 .
In the non-radial case (perhaps with the benefit of hindsight) it turns out to be better to
try to find u = 1/r as a function of ϕ. We use
du u̇
= .
dϕ ϕ̇
So divide (1.4.18) by ϕ̇2 , getting
2
dr 2E
+ r2 = (1.4.19)
dϕ ϕ̇2
Thus if u = 1/r,
2
1 du 2E
+ r2 = (1.4.20)
u4 dϕ ϕ̇2
so multiplying up by u4 ,
du 2 2E 2E
+ u2 = 4 2 = 2 . (1.4.21)
dϕ r ϕ̇ J
We can now integrate this to obtain an equation of a non-radial, equatorial geodesic.
We can solve this by differentiating with respect to ϕ:
d2 u
+ u = 0.
dϕ2
Hence u = u0 cos(ϕ − ϕ0 ) for arbitrary constants ϕ and ϕ0 . This is the equation of a straight
line in polar coordinates, consistently with Example 1.4.3.
In Problem 1.10, you are encouraged to work through the slightly more involved problem of
finding the orbit of a particle around a heavy star, using the same ideas.
1Actually, if we have any solution curve, then by using the spherical symmetry of the problem and re-orienting
our coordinate axes, we can reduce to this case
Remark 1.4.6. In the calculus of variations, you can consider more general time-dependent
Lagrangians, i.e. functions L(x, y, t). While the EL equations continue to give extrema in this
case also, Proposition 1.4.4 is false in general.
CHAPTER 2
2.1. Introduction
In this chapter we try to explain how Newton’s basic laws had to be changed to take into
account the famous ‘null result’ of the Michelson–Morley experiment which appeared to show
that the the speed of light was the same independent of the motion of the light-source.
In newtonian physics, there are such things as particles, masses and so on. The basic tenet
of newtonian physics, in modern language, is that there are inertial frames of reference, and an
observer at rest in such a frame observes free particles to remain at rest or travel at constant
speeds. S/he also observes the famous law F = ma to hold for masses acted upon by forces.
These inertial frames are such that if R is such a frame and R′ is moving with constant velocity
(i.e. constant speed and direction) with respect to R, then R′ is also an inertial frame. (See
Woodhouse, Special Relativity, Ch. 1 for more details. Those with an interest in history may
be interested in the end of Sect. 1.4, which indicates that Newton was aware of the idea of
relativity, but preferred to present things in a different way.)
Suppose that Alice and Bob are observers at rest respectively in R and R′ , and that R′ is
moving at constant speed relative to R. With suitable orientation of the axes, we’d expect
x′ = x − vt, y ′ = y, z ′ = z (2.1.1)
where v is the relative speed of the frames. (If Bob is sitting at (x′ , y ′ , z ′ )
= (0, 0, 0), then Alice
gives Bob’s coordinates as x = vt, y = z = 0.)
This transformation (often known as the Galilean transformation between inertial frames)
is clearly incompatible with the idea that the speed of light is the same in all inertial frames.
The idea that this should be the case—and more generally that all physics should appear the
same to all (inertial) observers—is the first of Einstein’s famous postulates:
1. The laws by which the states of physical systems undergo change are not affected,
whether these changes of state be referred to the one or the other of two systems of
coordinates in uniform translatory motion.
2. As measured in any inertial frame of reference, light is always propagated in empty
space with a definite velocity c that is independent of the state of motion of the emitting
body.
In slogan form:
1. The laws of physics are the same in all inertial frames of reference.
2. The speed of light in free space has the same value c in all inertial frames of reference.
If the laws of physics are the same for any inertial frame, then it follows that no experiment
can be performed that will single out one frame as preferred above all others. In particular,
there can be no ‘absolute standard of rest’ and only relative motion is physically meaningful.
Note that the constancy of the speed of light means that something has to go wrong with
the ‘obvious’ change of coordinates (2.1.1). More explicitly, this change of coordinates is not
compatible with Maxwell’s equations of electrodynamics.
It is convenient to make certain subsidiary assumptions explicit as well:
P1 Free particles and photons (light particles) appear to inertial observers to travel in
straight lines at constant speeds.
P2 Photons appear to travel at the same speed c to all inertial observers.
P3 The standard clock of one inertial observer appears to any other observer to run at a
constant rate.
17
2.3. WORLDLINES 19
Note that there are many choices of inertial coordinate systems. From the mathematical
point of view, this is because there are many different choices of basis of M with respect to
which η takes standard diagonal form. From the physical point of view this is because there are
many different inertial observers, all on an equal footing.
Remark 2.2.2. Note that if X and Y are two vectors in M , and if (e0 , e1 , e2 , e3 ) is a basis
as in (2.2.1) then if
X = X 0 e0 + X 1 e1 + X 2 e2 + X 3 e3 , Y = Y 0 e0 + Y 1 e1 + Y 2 e2 + Y 3 e3
we have
η(X, Y ) = X 0 Y 0 − X 1 Y 1 − X 2 Y 2 − X 3 Y 3 .
2.3. Worldlines
Recall that anything—particle, observer, photon—which exists for an extended period of
time, is described in Minkowski spacetime by a worldline. This is a curve in M consisting of all
the events through which our particle, observer, photon passes.
For example, suppose Alice is at rest at the spacial origin of an inertial coordinate system
(t, x, y, z). The events on Alice’s world line have coordinates of the form (t, 0, 0, 0), t being the
time on the clock that Alice has beside her.
More generally, Alice observes a particle by noting its (x, y, z) coordinates for different times
t. In other words she observes the particle’s world-line in the form of a curve
Γ(t) = (t, x(t), y(t), z(t))
in the given coordinates.
It is often useful to ‘decouple’ the parameter which parameterises the curve from an ob-
server’s time coordinate, replacing the above by the more general form
Γ(τ ) = (t(τ ), x(τ ), y(τ ), z(τ ))
so that all 4 coordinates depend on the parameter τ .
Example 2.3.1. If Alice is an inertial observer who sets up an inertial coordinate system
as above with herself at the (spatial) origin x = y = z = 0, then her worldline will be
t(τ ) = τ, x(τ ) = y(τ ) = z(τ ) = 0.
Definition 2.3.2. For the worldline Γ(τ ) of a particle, observer or photon in M, dΓ/dτ is
called the velocity 4-vector.
The use of the term 4-vector is traditional. It helps to distinguish this vector from ‘ordinary’
velocity vectors: e.g. the velocity vector of a particle as measured by an observer.
Note that in terms of the original parameterization, Γ(t) = (t, x(t), y(t), z(t)),
dΓ dx dy dz
= 1, , , .
dt dt dt dt
and the spatial part of this is the 3-vector
dx dy dz
, , .
dt dt dt
which is the instanteous velocity of the particle as calculated by Alice when her clock says time
t.
For now, we shall mainly be concerned with straight, constant-speed worldlines: i.e. where
Γ has the form
Γ(τ ) = X + V τ (2.3.1)
where X and V are constant vectors in M . (Here again we are regarding M and M as the same
by choice of an event E of M corresponding to the zero-vector in M .
We now have to see how P2 and P4 are to be interpreted: photons travel at speed c = 1 as
measured by any inertial observer, and no particle is ever observed to travel faster than light.
2.3.1. What are the photon worldlines? Suppose that a photon is emitted by a laser
at the event with coordinates (0, 0, 0, 0) and passes through the event (t, x, y, z), relative to the
above inertial coordinate system. In other
p words, at the later time t, its spatial coordinates are
(x, y, z). Then the distance covered is x2 + y 2 + z 2 but this must be equal to t as the speed
−−→
is 1. In particular, if E and P are two events on the worldline of a (free) photon, then EP is
−−→ −−→
null in the sense that η(EP , EP ) = 0. [NB, a null vector need not be the zero vector!]
The following definition is useful:
Definition 2.3.3. Two events P and Q are null-separated if the displacement vector X =
−−→
P Q is null, i.e. η(X, X) = 0.
Remark 2.3.4. This definition depends only upon the events P and Q, and the form η; it
does not depend upon any choice of inertial basis or coordinate system.
To flesh this remark out: We saw by calculation in a particular inertial frame, that if P and
−−→
Q are two events on a photon worldline, then P Q is a null vector. But the latter is a statement
purely about the geometry of M: it uses only the basic facts that given any two events we have
a displacement vector, and that we can feed vectors to η. In particular all inertial observers
agree about when a pair of events are null separated, and hence the speed of light is the same
for all such observers.
This leads us to the following
Hypothesis 2.3.5. The worldline of a photon has the form
Γ(τ ) = X + N τ, (2.3.2)
where X and N are constant vectors and N is null, i.e. η(N, N ) = 0.
This hypothesis is justified, to some extent, by the following:
Proposition 2.3.6. If P1 and P2 are any events on the worldline (2.3.2), then P and Q
are null-separated.
Proof. If P1 and P2 correspond to parameter values τ1 and τ2 , then
−−−→
P1 P2 = (X + N τ2 ) − (X + N τ1 ) = (τ2 − τ1 )N. (2.3.3)
Now, by the bilinearity of η,
η((τ2 − τ1 )N, (τ2 − τ1 )N ) = (τ2 − τ1 )2 η(N, N ) = 0. (2.3.4)
In summary, we have seen that if inertial coordinate systems are defined as in Hypothe-
sis 2.2.1 and free photon worldlines are as in Hypothesis 2.3.5, then all inertial observers agree
on the speed at which photons travel.
2.3.2. What are the free particle worldlines? Return to the two events E and P at
the beginning of the previous section, and suppose now that they are on the worldline of a
particle travelling at uniform speed v with 0 6 v < 1. Then we must have
p
x2 + y 2 + z 2 = |vt| < |t| (2.3.5)
and so
t2 − x2 − y 2 − z 2 > 0. (2.3.6)
Definition 2.3.7. A vector X ∈ M is timelike if η(X, X) > 0. Two events P and Q are
−−→
timelike separated if the displacement vector P Q is timelike.
We now make the free-particle hypothesis:
Γ(τ ) = X + V τ (2.3.7)
where X and V are constant vectors of M and V is timelike. The parameter τ is called proper
time if η(V, V ) = 1. In this case, if P1 and P2 are two events on the worldline with parameter
values τ1 and τ2 respectively, then τ2 − τ1 is interpreted as the elapsed time between these two
events as measured by a clock carried by an observer on this worldline.
Remark 2.3.9. As for null-vectors, the notion of a vector being time-like is independent of
any choice of observer or coordinate system. In particular if one inertial observer thinks that a
particle is travelling at speed less than that of light, all observers will agree on this.
Remark 2.3.10. Note the analogy between a curve being parameterized by proper time
here and the idea of unit-speed curves being parameterized by arc-length for ‘ordinary’ curves
in euclidean space.
As in the case of photon worldlines, we started in Alice’s coordinate system (t, x, y, z), and
calculated that the events E and P are on the worldline of a particle moving at speed < 1 if
−−→
(and only if) the displacement vector EP is timelike. This, however, is a statement which is
independent of any particular choice of inertial coordinate system. Thus it must be the case
that Bob, with an inertial coordinate system (t′ , x′ , y ′ , z ′ ) will also calculate that E and P are
events on the worldline of a particle moving at speed less than 1.
where
(V 0 )2 − (V 1 )2 − (V 2 )2 − (V 3 )2 = 1. (2.4.3)
In particular V 0 6= 0 and we find the time τ measured on Bob’s clock is related to Alice’s time
coordinate t by the fixed multiple V 0 . So with all our hypotheses made about what inertial
frames are, we see that each of P1—P4 are now satisfied.
t-axis t′ -axis
free particle world-line
x′ -axis
x-axis
photon worldlines
free particle worldline
difference is τ2 − τ1 . Alice however, reckons that the time difference between these two events
is γ(v)(τ2 − τ1 ), so the time elapsed is greater, according to Alice, by a factor of γ(v).
U τ1 + N = Y = U τ2 + N ′ (2.7.1)
Now we calculate
−−→ 1
η(U, EF ) = η U, N + (τ1 − τ2 )U
2
1
= η(U, N ) + (τ1 − τ2 )
2
1 1
= − (τ1 − τ2 ) + (τ1 − τ2 )
2 2
= 0
as required.
If you didn’t like that proof, here is another. There are often several different ways to
accomplish the same thing.
Proof. The idea of this proof is to write everything in terms of the null vectors N and N ′ .
It is perhaps a more symmetrical proof than the previous one. From (2.7.3), we obtain
N′ − N
U= . (2.7.8)
τ1 − τ2
−−→
We also have the two formulae for EF in (2.7.2). Adding these, we get
−−→
2EF = N + N ′ . (2.7.9)
Now we calculate
−−→ 1
η(U, EF ) = η(N + N ′ , N ′ − N ) = 0. (2.7.10)
2(τ1 − τ2 )
by using again the bilinearity of η to expand the RHS.
U τ2
N′
A’s world-line Γ(τ ) = U τ
F
E N
U τ1
Definition 2.7.2. If Alice is moving uniformly with 4-velocity vector U , then she reckons
two events E and F to be simultaneous if
−−→
η(U, EF ) = 0.
These events then have a well-defined spatial separation d, where
−−→ −−→
d2 = −η(EF , EF ).
However we look at it, the key point is that if we have two observers, Alice and Bob, moving
relative to each other, then they will generally disagree about which pairs of distant events are
simultaneous.
From the mathematical or geometric point of view, they have different 4-velocity vectors U
−−→
and V . If F and G are distant events, Alice thinks they are simultaneous if η(U, F G) = 0, while
−−→
Bob thinks they are simultaneous if η(V, F G) = 0. These are different conditions, and if one is
satisfied, then there is no guarantee that the other one will also be.
Definition 2.7.3. Let P and Q be two particles. If Alice is an inertial observer with
4-velocity U , she measures the distance between these two particles at time τ on her clock by
• Finding events F on the world-line of P and G on the worldline of Q which are simul-
taneous with the event E at time τ on her worldline; in other words, finding F and G
−−→ −−→
such that η(U, EF ) = 0, η(U, EG) = 0.
• Calculating the distance as
q
−−→ −−→
d = −η(F G, F G).
where u is the velocity of Alice relative to Bob and (u1 , u2 , u3 ) are its components in Bob’s
coordinates and
1
γ(u) = p . (2.8.8)
1 − |u|2
as in (2.6.1). Hence
η(U, D) = −γ(u)du1 , η(U, V ) = γ(u) (2.8.9)
and so from (2.8.5), we find
q
d′ = d 1 − u21
This is the famous Lorentz–Fitzgerald length contraction: if the component u1 of the relative
velocity of the observer in the direction of the
p rod is non-zero, then the observer judges the length
of the rod to be less than d by the factor 1 − u21 . Note that there is no length contraction if
the observer is moving at right-angles to the rod.
The set Fut(E) can be pictured as the solid half-cone whose boundary is the set of future-
pointing null vectors emanating from E.
Similarly, the set of all events G which can affect or influence E is
−−→
Past(E) = {G ∈ M such that GE is timelike or null future-pointing}
−−→
= {G ∈ M such that EG is timelike or null past-pointing} (2.10.2)
This can be pictured as the solid half-cone whose boundary is the set of past-pointing null
vectors emanating from E.
So although space and time are mixed up in the geometry of special relativity, there is still
a well defined notion of causality.
We have defined timelike and null vectors. If a vector is not timelike or null, it is called
spacelike:
Definition 2.10.6. If X ∈ M , then X is called spacelike if η(X, X) < 0. Two events E and
−−→ −−→
F in M are said to be spacelike separated if η(EF , EF ) < 0.
We end by noting the following:
−−→
Proposition 2.10.7. Suppose that E and F are events such that EF is future-pointing
timelike. Then there exist inertial coordinates such that E has coordinates (0, 0, 0, 0) and F has
coordinates (t, 0, 0, 0) with t > 0.
Suppose that E and F are spacelike separated events. Then there exists an inertial frame
with respect to which E and F are simultaneous (e.g. E has coordinates (0, 0, 0, 0) and F has
coordinates (0, d, 0, 0). Moreover, there exist other coordinate systems in which E occurs before
F.
Proof. The first follows from the basic fact that given if X is any future-pointing timelike
vector, then there is an oriented and time-oriented basis (e0 , e1 , e2 , e3 ) with respect to which
−−→
η is diagonal, and such that X is a positive multiple of e0 . In such a basis, EF = (t, 0, 0, 0),
where t > 0, and if we choose the origin so that the coordinates of E are (0, 0, 0, 0), then the
coordinates of F will be (t, 0, 0, 0).
−−→ −−→ −−→
Similarly if η(EF , EF ) < 0, we can pick a multiple e1 of EF such that η(e1 , e1 ) = −1.
−−→
We extend this to a diagonalizing (oriented and time-oriented) basis of η, and then EF has
the desired form. In particular, it is η-orthogonal to e0 and so these events will be judged
simultaneous by an observer with 4-velocity e0 .
For the last part, let V = e0 + λe1 . Then
η(V, e1 ) = −λ.
So if V is the 4-velocity vector of an observer, Bob, he will reckon that F happens after E if
λ < 0 and that F happens before E if λ > 0.
Remark 2.10.8. The above makes complete sense from the point of view of the radar
method. See the picture below. Consider two inertial observers, Alice and Bob, and suppose
that E is an event on both of their worldlines. If we suppose that Alice judges E and F to be
simultaneous, this means that Alice bounces a light signal off F , then if she sends it out at τ1
and receives it at τ2 , she assigns E time (τ1 + τ2 )/2. In the diagram, Alice sends her light signal
out at event A1 , and receives it at A2 , and E is the midpoint of the segment A1 A2 .
Now if Bob is heading towards F , it is clear that the light signal he needs to send to bounce
of F has to be transmitted at event B1 and received at event B2 . The event on his worldline
that he judges to be simultaneous with F is therefore the midpoint of B1 B2 , shown as E ′ . It
is clear from the geometry that the segment EB1 is longer than EB2 , so E ′ will be, as shown,
before E on his worldline.
Similarly, if B is heading away from F , the event he judges simultaneous with F will be
later, on his worldline, then the event E.
A2
B2
A’s world-line
E F
E′
B’s world-line
A1
B1
2.10.2. Spatial and temporal components. We have seen that inertial observers, and
free particles and photons have straight lines, and the basic feature of a worldline is the 4-
velocity vector. It is often annoying to choose a full inertial basis to solve particular problems,
but it is important to split Minkowski vectors into their spatial and temporal components with
respect to a particular timelike vector.
Suppose that V is a timelike future-pointing 4-vector. Then we can write any vector X in
terms of its components parallel to and η-orthogonal to V . That is
X = λV + Y, where η(V, Y ) = 0. (2.10.3)
Taking the scalar product with V ,
η(V, X)
η(V, X) = λη(V, V ) so λ = . (2.10.4)
η(V, V )
Then
η(V, X)
Y =X− V. (2.10.5)
η(V, V )
More concretely, relative to an inertial basis in which V is a positive multiple of e0 ,
V = (V 0 , 0), X = (X 0 , ξ) (2.10.6)
where
η(V, X) = V 0 X 0 , η(V, V ) = (V 0 )2 (2.10.7)
and
Y = (0, ξ). (2.10.8)
Here 0 is the ordinary 3-dimensional zero-vector and ξ is also a euclidean 3-vector.
It is worth spelling out that if X and Z are any two Minkowski vectors with components
X = (X 0 , ξ), Z = (Z 0 , ζ)
then
η(X, Z) = X 0 Z 0 − ξ · ζ.
A photon’s velocity 4-vector will take the form
ω(1, e) (2.10.9)
where ω > 0 (the photon is travelling forward in time) and e is a unit vector. For physical
reasons, ω is identified with the frequency of the photon as measured by an observer with
4-velocity V .
Example 2.10.9. Relative velocity. Suppose that Alice and Bob are inertial observers with
4-velocity vectors U and V , with η(U, U ) = η(V, V ) = 1.
In Alice’s rest-frame,
U = (1, 0), V = γ(1, v) (2.10.10)
for some constant γ. This is, of course, the γ factor again, because the condition η(V, V ) = 1
says
γ 2 (1 − |v|2 ) = 1.
Thus v is the velocity vector of Bob as measured by Alice.
Remark 2.10.10. What we’ve just seen is a very useful way of calculating γ-factors: if
Alice and Bob are inertial observers with 4-velocities U and V , then the γ-factor of their relative
speed is equal to η(U, V ).
Example 2.10.11. Alice, Bob and Chris are inertial observers with 4-velocity vectors U ,
V , and W , respectively, so η(U, U ) = η(V, V ) = η(W, W ) = 1. Suppose that Bob reckons that
Chris’s (relative) speed is w and Alice reckons Bob’s (relative) speed is u. What does Alice
reckon that Chris’s speed (relative to her) is?
Call the unknown speed ζ. Then from the above, the γ-factor of ζ is η(U, W ),
where u is the velocity vector of Alice relative to Bob and w is the velocity vector of Chris
relative to Bob.
Then
γ(ζ) = η(U, W ) = γ(u)γ(w)(1 − u · w) (2.10.14)
This is an answer, but it is instructive to rearrange it a bit. By squaring and taking the
reciprocal,
(1 − u2 )(1 − v 2 )
1 − ζ2 = (2.10.15)
(1 − u · w)2
Hence
(1 − u · w)2 − (1 − u2 )(1 − w2 )
ζ2 = (2.10.16)
(1 − u · w)2
1 − 2u · w + (u · w)2 − 1 + u2 + w2 − u2 w2
= (2.10.17)
(1 − u · w)2
(u2 − 2u · w + w2 ) + u · w2 − u2 w2
= (2.10.18)
(1 − u · w)2
|u − w|2 + u · w2 − u2 w2
= (2.10.19)
(1 − u · w)2
This remarkably complicated formula nonetheless reproduces the classical answer |u − w|2 to a
first approximation if u and w are much less than the light-speed c = 1.
A = Sγ(v)(1, v)
E = (0, 0)
CHAPTER 3
CHAPTER 4
Multivariable calculus
42 4. MULTIVARIABLE CALCULUS
Remark 4.1.9. The inverse function theorem says that if x = x(y) is just smooth for y ∈ Ω′
and x ∈ Ω and if the Jacobian matrix if invertible at a point q, say, in Ω′ , then in fact x = x(y)
is invertible at least if you restrict the transformation to a small ball B ′ containing q inside Ω′
and its image W ⊂ Ω. That is, after restricting in this way, there is smooth y = y(x), for x ∈ W
such that y(x) ∈ B ′ inverting x = x(y).
Thus the inverse function is a (partial) converse to the fact that Jacobians of coordinate
transformations must be invertible.
Example 4.1.10. In the case of polar coordinates, the determinant of the Jacobian is just
r. This is invertible if and only if r 6= 0. This ties in with the fact that polar coordinates go
wrong at r = 0.
44 4. MULTIVARIABLE CALCULUS
4.2.4. Transformation laws for vector fields and covector fields. Let V be a vector
field in Ω and let x = x(y) (for y ∈ Ω′ ) be a change of coordinates with inverse y = y(x). We
get a vector field Ve in Ω′ as follows: Ve is supposed to differentiate functions in Ω′ . We know
how to differentiate functions in Ω. But we can transfer a function in Ω′ to Ω by change of
variables. So we define
Ve ge (y) = (V g)(x(y)) (4.2.7)
where g and ge are related as in (4.1.6).
P j P ej
Proposition 4.2.2. If V = V (∂/∂xj ) in terms of the xj in Ω and Ve = V (∂/∂y j )
in terms of the y j in Ω′ , then
X ∂y i
Ve i = Vj j (4.2.8)
∂x
Proof. The chain rule tells us everything: from g(x) = ge(y(x)), we get
∂g X ∂y i ∂e
g
j
= .
∂x ∂xj ∂y i
i
j
Multiplying by V and summing,
X ∂g X i g
j ∂y ∂e
Vj = V .
∂xj ∂xj ∂y i
j i,j
P ei
But the RHS is supposed to be g /∂y i ), so the result follows by equating coefficients.
i V (∂e
Remark 4.2.3. This is an example of covariance (as opposed to invariance). The coefficients
of a vector depend on a choice of coordinates, but they transform in a predictable and linear
way. In particular if the coefficients are all zero at a given point in one coordinate system then
they are also zero in any other coordinate system. This is as it should be: if there is no wind
at a particular point (and time) in the atmosphere, then all observers should agree on this fact,
regardless of how they choose their coordinates!
The classical formula
X ∂xj
dxj = dy i (4.2.9)
∂y i
i
P P
suggests that if j ω
ej dy j is to agree with j ωj dxj , then we should have
X X ∂xj i
ωj dxj = ωj dy
∂y i
j i,j
Note that even allowing for the difference between upstairs and downstairs indices, (4.2.8) and
(4.2.10) are different transformation laws.
The rule (4.2.10) gives a way of transferring a covector field ω in Ω to a new covector field ω
e
in Ω′ . We already have a rule for transferring vector fields from Ω to Ω′ . These are compatible
in the sense that the contraction is invariant:
Proposition 4.2.4. Let ω and V be a covector field and a vector field on Ω and let ω
e and
e ′
V be the corresponding covector field and vector field on Ω . Then we have
hVe , ω
e i = hV, ωi (4.2.11)
where the LHS is calculated at y in Ω′ and the RHS at x = x(y) Ω.
Using the fact that the Jacobians (∂y i /∂xj ) and (∂xi /∂y j ) are inverse to each other, (4.2.10) is
seen to be equivalent to
X ∂y i
ωj = ei j .
ω (4.2.13)
∂x
i
Hence the RHS of (4.2.12) can be written
X ∂y i X
Vj jω ei = V j ωj = hV, ωi.
∂x
i,j
This is also an n-dimensional vecto space, independent of choice of coordinates. The duality
between Tp Ω and Tp∗ Ω is given by
X
hV, ωip = V j ωj (4.2.17)
as above but now producing a number rather than a function on the RHS.
46 4. MULTIVARIABLE CALCULUS
In order for this to work, of course, it is essential that if an index is repeated then it must
not occur anywhere else in the expression, so for example
Ai B i Ci is not OK.
Multiple sums (unfortunately) very often occur. For instance, if L and L̃ are matrices with
components Lij and L̃ij so that
X
LX has components Lij X j = Lij X j (summation convention)
j
and X
L̃X has components L̃ij X j = L̃ij X j
j
Then
LL̃X has components Lip [L̃X]p = Lip [L̃pq X q ] = Lip L̃pq X q . (4.3.1)
When the summation convention is in operation, repeated indices are dummy indices in the
sense that
Ai B i = Ap B p = As B s
as each of these is equal to
A1 B 1 + A2 B 2 + · · · + An B n .
The expression for the components of LL̃X in (4.3.1) is unpacked as
n X
X n
Lip L̃pq X q
p=1 q=1
and the summation over p corresponds to matrix multiplication of L and L̃ while the summation
over q corresponds to the multiplication of L̃ by the column vector with components X i .
Here is an example where you have to be careful to change the dummy indices to get an
unambigous expression. Suppose α and β are covector fields and X and Y are vector fields.
Consider
P = hX, αihY, βi (4.3.2)
We can write
hX, αi = X i αi , hY, βi = Y i βi .
Substitution of these into (4.3.2) give
P = X i αi Y i βi
However this is an ambiguous expression because the index i has been overworked, appearing
4 times. So before putting them together we should change one of the dummy indices, writing
(say)
Y i βi = Y j βj .
Thus
P = X i αi Y j βj
is an unambigous way to write P using the summation convention.
Example 4.3.1. Write the expression
X i αj Y j βi
without indices, in terms of the pairing operation h·, ·i.
Definition 4.3.2. The Kronecker δ has components δkj , equal to 1 if j = k and 0 otherwise.
This is the representation, in terms of indices, of the identity matrix.
Remark 4.3.3. If, later on in this chapter or the course, you find the expressions with
multiple repeated indices confusing, it can help to put the Σ signs back in. The summation
convention does take some getting used to.
48 4. MULTIVARIABLE CALCULUS
The proof is left as an exercise: you have to write down the transformation laws and check
that the components of ω transform correctly.
We can further form the scalar
B(X, Y ) = Bij X i Y j . (4.5.5)
As the notation suggests, this is a well-defined scalar function; its value at a point does not
depend on the coordinates used to write out the components of B, X and Y .
Remark 4.5.2. The formula (4.5.5) gives another way to think about tensors of type (0, 2).
Namely, you can reverse the logic and define such a B to be a smoothly varying bilinear form
Bp on the tangent space Tp Ω, for each p ∈ Ω. Smoothness can be defined by saying that the
coefficients Bij are smooth functions in Ω for any choice of coordinates x. If we do this for the
tangent space, and require B(X, Y ) to be invariant (i.e. independent of choice of coordinates),
then we say that B is a tensor of type (0, 2).
Tensors of type (0, 2) are important because the metric tensor, which is the fundamental
object in GR, is an example.
Remark 4.5.3. It is very important not to switch the order of the dxi symbols in computa-
tions of this kind. In other words, dxi dxj 6= dxj dxi . Indeed the first one is the bilinear form B
such that B(X, Y ) = 0 unless X = ∂i , Y = ∂j , whereas the second represents the bilinear form
C such that C(X, Y ) = 0 unless X = ∂j , Y = ∂i .
4.5.2. Tensor fields of type (1, 1). Suppose that for each p in Ω, we have linear map
A(p) from Tp V to Tp V which varies smoothly with p. Such a thing is called a smooth tensor
field of type (1, 1). In coordinates, A has an expression of the form
∂
A = Akj dxj k (4.5.6)
∂x
where the Akj are a collection of n2 functions of x. A can be pictured as an n × n matrix whose
entries are smooth functions of x.
We obtain the transformation law under a change of coordinates by substituting
∂ ∂y q ∂ j ∂xj p
= , dx = dy (4.5.7)
∂xk ∂xk ∂y q ∂y p
into (4.5.6), getting
∂y q ∂ ∂xj p ∂xj
∂y q ∂
A= Akj dy = Akj p dy p . (4.5.8)
∂xk ∂y q ∂y p ∂y ∂xk ∂y q
Hence the transformation law is
j q
eq = Ak ∂x ∂y
A (4.5.9)
p j
∂y p ∂xk
Example 4.5.4. The identity matrix is an example of a (1, 1) tensor. Its components in
any coordinate system are the Kronecker δ, δkj .
The transformation law (4.5.9) means that if X is a vector field on Ω then Y = AX, with
components Ajk X k is again a vector field on Ω. This means that under coordinate transfor-
mations we have Ye = A eXe where the relation between A e and A is given by (4.5.9) and the
transformation law (4.2.8) is used for the components of the vector fields X and Y .
4.5.3. Tensor fields of type (2, 0). We’ve seen tensors with two downstairs indices and
one up and one down. The zoo of two-index tensors is completed by the ones with two upstairs
indices.
We give the transformation law first:
Definition 4.5.5. A tensor field H of type (2, 0) is an object whose components H ij after
a choice of coordinates transform according to the rule
p q
He pq = H ij ∂y ∂y . (4.5.10)
∂xi ∂xj
50 4. MULTIVARIABLE CALCULUS
4.7. Manifolds
A manifold is, roughly speaking, a topological space M , with the additional structure nec-
essary to be able to speak of smooth functions from M to R. This additonal structure is called
a smooth atlas and consists of systems of local coordinates satisfying certain compatibility con-
ditions. A function M → R is then called smooth if it is smooth when written in terms of any
of these local coordinate systems.
We are not going to get into the details of what a topological space is: it is a set of points
with enough structure (open sets) to be able to define continuous functions.
As in the definition of curvilinear coordinates on an open subset of Rn , suppose we have a
set of n continuous functions x(p) = (x1 (p), . . . , xn (p)) from U ⊂ M with image some open set
Ω of Rn .
Definition 4.7.1. The functions (x1 , . . . , xn ) from U to Ω form a local coordinate system
on M if the map U → Ω is one-one and onto, and if the inverse is continuous.
Thus every point p of U gets labelled by an ordered set of n real numbers which we’re
calling the coordinates of p, and conversely if this set of labels is taken from Ω, then it is the
label of one and only one point of U .
Definition 4.7.2. If p0 ∈ U is a given point, we say the coordinate system is centred at p0
if xj (p0 ) = 0 for all j.
Definition 4.7.3. An atlas on M is a collection of local coordinate systems xν : Uν → Ω′ν ,
where the open sets Uν cover M .
In this definition the subscript ν does not refer to the different components of the coordinate
system, but rather to the different local coordinate systems needed to cover M .
Remark 4.7.4. The individual local coordinate systems xν : Uν → Ων are often called
charts: thus an atlas is a set of a lot of charts (which made a lot more sense before everyone
was using GPS to find their way around).
4.7. MANIFOLDS 51
52 4. MULTIVARIABLE CALCULUS
The previous example generalises to functions of any number of variables, but the condition
on the non-vanishing of at least one of the partials at every point on the level-set f = c continues
to be essential.
It is interesting to note that the null cone, defined by
t2 − x 2 − y 2 − z 2 = 0
exactly fails to satisfy this condition on the partials at the origin: all partials vanish there as
well.
4.7.1. The tangent space. Let M be a smooth manifold. For each point p in M , we
can use the local coordinates defined near p to define the tangent space. We can either say
that it is the abstract vector space spanned by the partials corresponding to any choice of
local coordinates from the atlas or we can say that it is the space of directional derivatives at
p (and then show that this space is an n-dimensional vector space). Either way Tp M is an
n-dimensional vector space naturally associated with the point p.
We can now define a vector field X on M as a function which assigns to each p in M , a
vector Xp in Tp M , which is required to vary smoothly with p. As in the case of open subset of
Rn ‘varying smoothly with p’ means: when expanded as a linear combination of the ∂/∂xj , the
coefficients are smooth (in the domain of the coordinate system).
4.7.2. The cotangent space. This is the dual to the tangent space. If f is a smooth
function on M , then df is a smooth covector field on M : at each point it is in the dual space
Tp∗ M and varies smoothly with p.
If X is a vector field and f is a function, then Xf is the directional derivative of f with
respect to X. It is again a smooth function on M . It can also be written hX, df i, where h·, ·i
is the pairing between T M and T ∗ M .
4.7.3. General tensors. Taking it further, we can extend the idea of tensor field of any
type (r, s) to a manifold M , using the low-tech definition above: in any chart a tensor is given
by a collection of components, and these are required to transform according to (4.6.2) where
x = x(y) is any of the change-of-coordinates map arising from a smooth atlas on M .
4.7.4. Other smooth gadgets. With the aid of a smooth atlas we can define more than
just smooth function on M . For example, a smooth curve Γ on M is defined as a continuous
map Γ : I → M (I is an interval) with the property that the corresponding maps Γν : I → Ων ,
are all smooth, where
Γν (τ ) = xν (Γ(τ )) if Γ(τ ) ∈ Uν .
Similarly (I’ll omit the details) if M and M ′ are two manifolds, and F : M → M ′ is a mapping,
we can define what it means for F to be a smooth map between manifolds. The idea is that we
can look at F using the charts on M and M ′ and define F to be smooth if and only if all these
functions are smooth. The interested reader is referred to any standard introductory book on
differential geometry.
Remark 4.7.13. The formalism of general relativity works most naturally on the assumption
that space-time is a smooth 4-dimensional manifold. This is particularly important when trying
to understand black holes and the large-scale structure of the universe. For the purposes of this
course we shall mostly work with space-times that are subsets of R4 : but we shall need to work
as if it were a manifold, in other words, without assigning any privileged role to the standard
flat coordinates on R4 .
CHAPTER 5
The ‘happiest thought of Einstein’s life’ was a brilliant insight which led to the ‘geometriza-
tion’ of gravity. The physics you observe in a spaceship, with its engines switched off, far from
any source of gravity, is the same as the physics you observe if you are freely falling in a lift
under the influence of the earth’s gravity. This suggests that particles which are acted upon by
no force other than gravity should be regarded as freely falling, or essentially inertial. To incor-
porate the idea mathematically, we need an extension of the formalism of SR which is ‘generally
covariant’. Where the formalism of SR (4-vectors) was invariant under the Lorentz group (and
even the Poincaré group), these transformations are essentially linear. In GR we need a set-up
which allows for the ‘physics’ to be equivalent under arbitrary coordinate transformations.
In this chapter we shall introduce curved 4-dimensional space-times and study geodesics.
The hypothesis is that freely falling particles follow such geodesics. We shall also show that at
any event in space-time, ‘local inertial coordinates can be introduced’. These are coordinates
in which the coefficients are the same as the Minkowski metric up to second order.
In the next chapter we shall introduce curvature. This is the quantity (it is, unfortunately,
a 4-index tensor) which allows us to tell whether space-time is actually curved or not. One
remark about this here. The general equivalence principle described above: that the physics
of the freely falling lift and the inertial spaceship should be the same—only applies on small
scales of space and time, or ‘locally’ as we say in the trade. There is a difference in behaviour
of freely falling particles in deep space as opposed to near the earth. According to Newton, the
gravitational field near the earth is not uniform: in fact it is given by the famous inverse-square
law. This means that if if Alice and Bob are freely falling near the earth, and if Alice is nearer to
the Earth’s surface than Bob, then her acceleration will be greater. In other words the relative
acceleration of Alice, as measured by Bob, will be non-zero. In deep space, far from any stars,
planets, or other gravitating objects, the relative acceleration of two observers will be zero. This
is extremely subtle and is tied in with the subtle notion of curvature. More on that story later.
Remark 5.1.2. For those of you who haven’t yet mastered §4.7, you can think that M is
just fancy notation for R4 , or perhaps an open subset of it. But we are not allowed to use the
vector-space structure of R4 in developing the formalism: everything we do from now on has to
use the metric g and must work in any choice of local coordinates, or be ‘generally covariant’
to use somewhat old-fashioned terminology.
Recall that relative to a choice of (local) coordinates xa = (x0 , x1 , x2 , x3 ), the metric has
the form
ds2 = gab dxa dxb . (5.1.1)
Here the summation convention is in force and the components gab of g with respect to the
coordinates xa form a 4 × 4 symmetric matrix whose entries are smooth functions. To say
the metric is lorentzian is to say that at any point x, the matrix (gab (x)) is invertible and has
signature (+, −, −, −).
53
Example 5.1.9. Spherical polars 3-dimensional spherical polar coordinates are given by
x = r sin θ cos ϕ, y = r sin θ sin ϕ, x = r cos θ. (5.1.3)
Show that the euclidean metric
ds2 = dx2 + dy 2 + dz 2
takes the form
ds2 = dr2 + r2 (dθ2 + sin2 θdϕ2 ). (5.1.4)
5.1.1. Lorentzian metric at a point.
Proposition 5.1.10. Suppose that
ds2 = gab dxa dxb
is a 4-dimensional lorentzian metric defined near the point p with coordinates xa (p) = 0 (a
coordinate system centred at p). Then there are coordinates y a , also centred at p, such that
geab (p) = ηab .
Thus, very near any given point p of M , the geometry of M is approximately the same as
Minkowski space.
Remark 5.1.11. We shall do better than this: in §5.5, we shall see that we can choose
coordinates so that
geab (e x| 2 )
x) = ηab + O(|e
for small x
e. Such a choice of coordinates will be called local inertial coordinates. They give the
best approximating Minkowski space at the point with coordinates x e = 0.
Proof. It is sufficient to make a change of coordinates which is linear:
xa = Jba y b .
Then by the transformation law for tensors of type (0, 2),
geab = gpq Jap Jbq
In matrix form this is
ge = J t gJ.
By the basic theorem about diagonalization of symmetric bilinear forms, K can be chosen to
make ge(0) diagonal, with diagonal entries ±1. The signs are determined by the signature of g.
If the latter is Lorentzian, this yields the Minkowski metric η = diag(1, −1, −1, −1).
5.1.2. Timelike/spacelike/null.
Definition 5.1.12. A tangent vector X = X a ∂a is called timelike at p in M if
g(X, X)(p) = gab X a X b |p > 0,
null if
g(X, X)(p) = gab X a X b |p = 0
and spacelike if
g(X, X)(p) = gab X a X b |p < 0.
The set of null vectors at a point p form a cone (in Tp M ) which are supposed to be tangent
to photon worldlines through p.
5.3. Geodesics
In special relativity, inertial observers were taken to travel at constant speed along straight
lines in Minkowski space M. One of the definitions of straight line is a curve which minimises
the energy (cf. §1.4), amongst all those with fixed endpoints.
Using the space-time metric g on M , we can do the same thing.
Definition 5.3.1. The energy of a curve γ : [t0 , t1 ] −→ M is defined to be
Z
1 t1
E [γ] = g(γ̇(t), γ̇(t)) dt. (5.3.1)
2 t0
A geodesic with end-points p and q is a curve which extremizes the energy among all curves
with γ(t0 ) = p, γ(t1 ) = q.
Hypothesis 5.3.2. In GR, freely falling particles (and free photons) follow geodesics, time-
like for particles and null for photons. Here ‘freely falling’ means ‘acted upon by no force except
gravity’.
Definition 5.3.3. Let xa = (x0 , x1 , x2 , x3 ) be a given coordinate system, such that the
metric coefficients are gab ,
ds2 = gab dxa dxb .
The Christoffel symbols of gab are defined by the formula
1
Γcab = g cs (∂a gbs + ∂b gas − ∂s gab ) . (5.3.2)
2
Remark 5.3.4. Note the symmetry of Γ in its two lower indices,
Γcab = Γcba . (5.3.3)
5.3. GEODESICS 57
Theorem 5.3.5. Let γ(t) be a curve in M and suppose that in some coordinate system, it is
given by t 7→ xc (t). Then the Euler–Lagrange equations for E (γ) are equivalent to the equations
ẍc + Γcab ẋa ẋb = 0. (5.3.4)
where Γcab are as in the previous definition. If xc (t) is a geodesic, then gcd ẋc ẋd is constant.
Remark 5.3.6. The system of equations (5.3.4) are called the geodesic equations. They are
frequently a more convenient way of getting at the Γs, as we shall see in examples.
Proof. This is a calculus of variations problem with Lagrangian L(x, ẋ) = 12 g(x)[ẋ, ẋ] =
1 a b
2 gab (x)ẋ ẋ . We have
∂L ∂L 1
s
= gsb ẋb , s
= ∂s gab ẋa ẋb (5.3.5)
∂ ẋ ∂x 2
so
d ∂L ∂gas a b
s
= gsb ẍb + ẋ ẋ . (5.3.6)
dt ∂ ẋ ∂xb
Thus the Euler–Lagrange equations are
∂gas a b 1
gsb ẍb + ẋ ẋ − ∂s gab ẋa ẋb = 0 (5.3.7)
∂xb 2
Now multiply through by g cs (and sum over s):
∂gas 1
ẍc + g cs − ∂ a b
s ab ẋ ẋ = 0
g (5.3.8)
∂xb 2
Now
∂gas a b 1 ∂gas ∂gbs
ẋ ẋ = + ẋa ẋb (5.3.9)
∂xb 2 ∂xb ∂xa
and substituting this into (5.3.8), taking into account the definition of the Γs, yields (5.3.4).
For the last part, the Lagrangian is homogeneous of degree 2 in the velocities, and so is
conserved along a solution curve (Proposition 1.4.4: the potential energy part is zero in the case
at hand.)
Computing the Γs is the first step in computing the curvature of the metric, and computing
the geodesic equations is needed to understand particle (and photon) motion in GR. We therefore
give some worked examples. In each case, the Γs are read off the geodesic equations (5.3.4) rather
than from the formula (5.3.2).
Example 5.3.7. Minkowski space. This is
ds2 = ηab dxa dxb .
The Lagrangian is
1
L = ηab ẋa ẋb
2
Then
∂L ∂L
a
= ηab ẋb , a = 0. (5.3.10)
∂ ẋ ∂x
Then the geodesic equations are
d ∂L
= ηab ẍb = 0. (5.3.11)
dτ ∂ ẋa
Thus the geodesic equations are ẍb = 0, from which we read that all Γs are zero. The geodesics
have the form
xa (τ ) = Y a + U a τ.
Of course, we already knew this.
Example 5.3.9. The geodesics in 2D hyperbolic space Rather than tackle the second-
order equations (5.3.13) directly, it is better to work with conserved quantities. The first is the
length of the velocity vector. If we assume τ is arc-length (i.e. L = 1), then
ẋ2 + ẏ 2 = x2 . (5.3.15)
The Lagrangian is also independent of y, and so ∂L/∂ ẏ is constant along geodesics. So in
addition to (5.3.15), we also have
ẏ
= C (constant). (5.3.16)
x2
If C = 0, then ẏ = 0, so y = y0 (constant). Subsituting this into (5.3.15) we get dx/x = ±dτ .
Picking the + sign, and integrating, we get x = x0 eτ . Thus one family of geodesics are half-lines
parallel to the x-axis, given by
(x(τ ), y(τ )) = (x0 eτ , y0 ).
Note the non-standard parameterization of these straight lines. In particular, the ‘boundary
point’ (0, y0 ) is infinitely far away in τ : we require τ → −∞ for (x, y) → (0, y0 ). This is not
outrageous because the metric (5.3.12) blows up at these points with x = 0.
Now we have to deal with the case C 6= 0. For this, divide (5.3.15) through by ẏ 2 ,
2
dx x2 1
+ 1 = 2 = 2 2, (5.3.17)
dy ẏ C x
where we’ve used (5.3.16) and
ẋ dx
= . (5.3.18)
ẏ dy
Rearranging (5.3.17), we obtain
Cxdx
√ = ±dy
1 − C 2 x2
and this can be integrated to get
p
± 1 − C 2 x2 = C(y − y0 )
for some constant of integration y0 . This can be rearranged to give
x2 + (y − y0 )2 = C −2 , x > 0 (5.3.19)
i.e. a semicircle with diameter along the y axis. It is pleasing that as C → 0, the radius C −1
tends to infinity and these semicircles approach the straight half-lines that we saw previously.
5.3. GEODESICS 59
where the Gcab is an array of numbers to be determined, symmetric in the indices ab. The
Jacobian of the transformation is
xc
∂e
= Jpc = δpc + jpc , where jpc = Gcap y a . (5.5.7)
∂xp
This is invertible when y = 0 so by the inverse function theorem, (5.5.6) has a smooth inverse
as a mapping between a neighbourhood of y = 0 and a neighbourhood of x = 0.
Now we calculate, keeping only the leading terms (because we don’t really care what’s
happening at O(|y|2 ) and beyond):
gpq dxp dxq = gpq (x)(dy p − Gpac y a dy c )(dy q − Gqbd y b dy d )
= gpq dy p dy q − Gacq y a dy c dy q − Gbdp y b dy p dy d + O(|y|2 ) (5.5.8)
where we have set
Gacp = gpq Gqac , so Gacp = Gcap . (5.5.9)
Hence
gepq (y) = gpq (x(y)) − Gapq y a − Gaqp y a + O(|y 2 |). (5.5.10)
Now on the RHS we still have g(x) and we need to write this in terms of y. The inverse to our
transformation has the form
1
y a = xa + Gabc xb xc + O(|x|3 ) (5.5.11)
2
as you can see by inserting (5.5.6) on the RHS of this equation. It follows that
gpq (x(y)) = gpq (y) + O(|y|2 ) = ηpq + Hpqr y r + O(|y|2 ).
and so
gepq (y) = ηpq + Hpqa y a − Gapq y a − Gaqp y a + O(|y|2 ). (5.5.12)
Thus we will get rid of the first order terms in y if we can choose the numbers G so that
Gapq + Gaqp = Hpqa . (5.5.13)
(Here we have used the symmetry of the indices of G to neaten up the equation.) This is an
equation for the array of number G in terms of the array of numbers H. In Lemma 5.5.9 below
it is shown that this can always be solved, the solution being
1
Gabc = (Hacb + Hbca − Habc ). (5.5.14)
2
Note that this formula depends in an important way on the symmetry (5.5.9) of G. Inverting
(5.5.9) we define
1
Gdab = η cd Gabc = η cd (Hacb + Hbca − Habc ).
2
Thus G is uniquely determined by H, and the by the above calculations, the change of coor-
dinates (5.5.6) gives metric components geab which satisfy the conditions of the Theorem. The
proof is complete.
Remark 5.5.3. The array of numbers Gcab is nothing but Γcab (0), the Christoffel symbols of
the metric components gab , evaluated at x = 0 (cf. Proposition 5.5.1).
It remains to prove:
Lemma 5.5.4. The equation (5.5.13) is solved by (5.5.14),
1
Gabc = (Hcab + Hcba − Habc )
2
Proof. One proof of this is simply to substitute (5.5.14) into (5.5.13) and see that it works.
Namely, G is symmetric in its first two indices: simply switch them, and use the symmetry of
H in its first two indices. And then
1
Gabc + Gacb = (Hcab + Hcba − Habc + Hbac + Hbca − Hacb ). (5.5.15)
2
Now use the symmetry of H in its first two indices to arrange the indices as far as possible in
alphabetical order:
1
Gabc + Gacb = (Hacb + Hbca − Habc + Habc + Hbca − Hacb ) = Hbca (5.5.16)
2
as required.
There is also a derivation of this formula in the Problem set, Problem 4.9.
Proposition 5.6.4. The dimension of the vector space of totally skew tensors of type (0, m)
in n dimensions is
n
(5.6.8)
m
(In particular the dimension is 1 if n = m and 0 if m > n.)
Proof. We shall prove both of these together. By way of a warm-up let’s make sure that
we understand that
The dimension of the space of all tensors of type (0, m) in n dimensions is nm (5.6.9)
To see this, note that such a tensor has m indices. We can choose each one of these in n ways1.
All choices are independent because there are no symmetry conditions. So the total number of
possibilities is nm . The number of these choices is equal to the dimension of the space of these
tensors.
Let us move on to the proof of Proposition 5.6.4. Again we have a tensor with m indices.
For ease of exposition, suppose that m = 3. Any component with two indices the same must
be zero, because (for example)
T115 = −T115
by switching the first two indices. So the only non-zero components of T have distinct indices.
If we have a set of 3 distinct indices, say, 523, then we can use the skew symmetry to relate the
T523 to T235 , where the indices are now in increasing order:
T523 = −T253 = T235
(switching first the first two indices and then the last two). Thus the number of independent
components of a totally skew tensor of type (0, 3) is equal to the number of unordered subsets of
3 elements of the set {1, . . . , n}. This is the binomial coefficient
n
3
The general case, with 3 replaced by m, works in the same way.
Finally let us prove Proposition 5.6.2. The big difference from the case of skew symmetry is
that now that indices can take the same value without that component being zero. For a small
number of indices (e.g. 3) it’s possible to count by hand. There are n components where the
indices are the same:
T111 , T222 , . . . , Tnnn .
There are n(n − 1) with precisely two indices the same:
T112 , T113 , . . . , T11n ; T221 , T223 , . . . .
And there are n(n − 1)(n − 2)/6 where the indices are all distinct. Thus the total number of
independent components of a totally symmetric tensor of type (0, 3) in n dimensions is:
1 n(n + 1)(n + 2)
n + n(n − 1) + n(n − 1)(n − 2) = (5.6.10)
6 6
which checks with (5.6.7) if m = 3.
This approach can be generalized to tensor of any rank, but it’s pretty messy. The following
is the cunning way of doing it.
Consider a collection of indices on our totally symmetric tensor with m indices. It will
consist of m1 1’s, m2 2’s, and so on, up to mn n’s. Here the mj are allowed to be 0, but the
constraint that the tensor is of type (0, m) is
m1 + m2 + · · · + mn = m (5.6.11)
This combinatorial problem can be visualised in the following way. Consider an arrangement of
m coins and n − 1 pencils in a line, as in the example below:
1Since they vary from 1 to n
Given such an arrangement, we count the coins to the left of the first pencil, and call that m1 .
Then we count the coins between the first and second pencils, and call that m2 . Proceeding
in this way, we get a collection of n integers mj > 0, satisfying the constraint (5.6.11). (Note
that mj = 0 if two of the pencils are right next to each other, or if there is a pencil at the very
beginning or the very end of the line.)
In the pictured configuration there are 5 pencils and 8 coins, and
m1 = 3, m2 = 2, m3 = 1, m4 = 2, m5 = 0, m6 = 0.
This would correspond to the component
T11122344
of a totally symmetric tensor of type (0, 8) in 6 dimensions. The number of arrangements of m
coins and n − 1 pencils is the same as the number of ways of choosing m objects (the ones to
be called coins) from a total of m + n − 1. This is the binomial coefficient (5.6.7).
5.6.2. Sneak preview of curvature, continued. We’ve seen that the coordinate trans-
formation (5.6.3) allows us to change the array Pabcd , where
Pabcd = Pbacd = Pabdc (5.6.12)
to
Peabcd = Pabcd − Wabcd − Wabdc (5.6.13)
where W is symmetric in its last 3 indices.
What is the dimension of the space of P ’s? P is symmetric in its first two indices and its
third and fourth indices, but there is no other symmetry. So it is like a 2-index object, PIJ ,
where I and J run over a basis of the space of symmetric 2-index tensors. If that dimension is
N , the dimension of the space of P s will be N 2 . But we’ve seen that N = n(n + 1)/2 (if the
dimension is n—we work generally for the moment). Hence:
On the other hand, the dimension of the space of W s is n (for the extra index) times
n(n + 1)(n + 2)/6 (the dimension of totally symmetric tensors of type (0, 3).
Thus the dimension of the space of P ’s minus the dimension of the space of W ’s is
n2 (n + 1)2 n2 (n + 1)(n + 2) n2 (n + 1)
− = (3(n + 1) − 2(n + 2))
4 6 12
n2 (n + 1)
= (n − 1)
12
1 2 2
= n (n − 1) (5.6.14)
12
Since this number if positive for n > 2, it will be impossible to solve (5.6.13) to make Pe = 0 in
general. (Our calculation shows that Pe = 0 is a system of linear equations with more equations
than unknowns.)
Thus, while some components of P can be killed by coordinate transformations, there are
others, in fact n2 (n2 − 1)/12 of them, which cannot. These ‘unkillable’ components of P form
a tensor called the curvature of g at the point x = 0.
In fact, the Riemann curvature tensor at x = 0 is built out of P in the following way:
1
Rabcd = (Pacbd + Pbdac − Padbc − Pbcad ). (5.6.15)
2
It can be checked that if P is changed to Pe as in (5.6.13), then the components of R do not
change! Thus if this particular combination of components of P is non-zero then it cannot be
killed by coordinate transformation.
These matters will be discussed much more extensively in the next chapter, where we shall
see a different, but equivalent, definition of curvature.
CHAPTER 6
6.1. Introduction
In the previous chapter we introduced geodesics in a curved space-time. The geodesic
equation motivated the introduction of the combination
X a ∂a Y c + Γcab X a Y b (6.1.1)
which we claimed form the components of a vector field ∇X Y the covariant derivative of Y with
respect to X. In this chapter we shall sketch a proof of this important fact and shall define the
curvature tensor. One definition of this is as follows
R(X, Y )Z = (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z (6.1.2)
for vector fields X, Y , Z. In index form, the LHS is written
Rabc d X a Y b Z c (6.1.3)
where Rabc d are the components of a tensor of type (1, 3).
The curvature is zero for Minkowski space, and conversely (though we won’t prove this)
if the curvature is zero near a point, then we can find coordinates in which the metric is the
Minkowski metric near that point. R is thus a good coordinate-independent measure of whether
a metric is curved or not.
Curvature makes its presence felt in the behaviour of families of geodesics. We shall derive
the geodesic deviation equation and interpret this in terms of relative acceleration of nearby
geodesics. By comparing this with what happens in newtonian gravity we shall relate the
newtonian potential to the metric and derive the analogue of Laplace’s equation ∇2 ϕ = 0 for
(time-independent) gravitational fields in empty space. Thus we are led to Einstein’s equations
rab = 0, where r is the Ricci tensor of the metric.
Finally we shall work out the weak-field limit as another cross-check between the formalism
of GR and newtonian gravity.
(Here xa = H a (σ, τ ) is the description of the family of curves with respect to the local coordinate
system xa .) Define
Z
1 β
E (σ) = g(X(τ, σ), X(τ, σ)) dτ, (6.2.4)
2 α
the energy of the curve labelled by σ in our family. (This is a small abuse of notation.) We
claim Z β
dE
(0) = − g(∇X X, Y ) dτ + [gq (Xq , Yq ) − gp (Xp , Yp )] (6.2.5)
dσ α
where the integrand is evaluated at σ = 0, q = H(β, 0), p = H(α, 0) and the subscripts on the
terms in square brackets denote evaluation of the quantities at the indicated points.
If you grant me (6.2.5) for the moment, then the lemma follows rapidly. The issue is the
coordinate independence, and the term in square brackets is certainly invariant under coordinate
changes, as is the LHS of (6.2.5). It follows that for any fixed curve and vector fields along the
curve, the integral in (6.2.5) is also coordinate independent.
Now letting α and β vary, we conclude that the integrand g(∇X X, Y ) must be a scalar
quantity, and since Y is a vector field, ∇X X must also be a vector field.
So it remains to prove (6.2.5). This involves going through the calculus of variations with
the boundary term.
Z
1 β
E (σ) = g(∂H(τ, σ)/∂t, ∂H(τ, σ)/∂t), dτ. (6.2.6)
2 α
Differentiate with respect to σ to get
Z β
dE ∂ 2 H ∂H
= g( , ) dτ. (6.2.7)
dσ α ∂σ∂τ ∂τ
The usual calculus-of-variations trick is to integrate by parts here. For this, note
∂ ∂H ∂H ∂ 2 H ∂H ∂H ∂ 2 H
g( , ) = (X c ∂c gab )Y a X b + g( , ) + g( , ) (6.2.8)
∂τ ∂σ ∂τ ∂σ∂τ ∂τ ∂σ ∂τ 2
∂ 2 H ∂H
= g( , ) + g(∇X X, Y ), (6.2.9)
∂σ∂τ ∂τ
using the definition of the Γs. Integration from α to β yields (6.2.5).
Lemma 6.2.2. The quantity
∇X Y − ∇ Y X (6.2.10)
is a vector field, for any vectors X and Y .
Proof. We have
(∇X Y )c = X a ∂a Y c + Γcab X a Y b . (6.2.11)
Hence the components of (6.2.10) are
X a ∂a Y c − Y a ∂a X c , (6.2.12)
again using Γcab = Γcba .
So it is sufficient to check that this combination of components transforms
as a vector field. Under a change of coordinates, Now if we change
e a = ∂e xa b e a ∂e xa b
X b
X , Y = Y . (6.2.13)
∂x ∂xb
Hence
e a ∂ Ye b = X a ∂a (Y c ∂e xb ∂exb a c a c ∂ x
2 eb
X ) = X ∂ a Y + X Y
xa
∂e ∂xc ∂xc ∂xc ∂xa
If we swith X and Y and subtract, the second term on the RHS drops out because of the
symmetry of the mixed partials of x eb with respect to the x variables. Thus we are left with the
transformation law
e b = ∂e
e a ∂˜a Ye b − Ye a ∂˜a X xb a
X (X ∂a Y c − Y a ∂a X c ). (6.2.14)
∂xc
as required for a vector field.
Now we can complete our proof. We know that ∇X X is a vector field for every vector field
X. Replacing X by X + Y , we conclude that
∇X X + ∇ X Y + ∇Y X + ∇ Y Y
is a vector field for every pair of vector fields X, Y . Hence ∇X Y + ∇Y X is a vector field for
every X and Y . But so is the difference, by the above lemma. Now
1
∇X Y = (∇X Y + ∇Y X + [X, Y ]) (6.2.15)
2
and we have written ∇X Y as a sum of two things that we know are vector fields. So it must
itself be a vector field.
Remark 6.2.3. There is a direct, computational proof, in Woodhouse, GR, §4.5.
6.2.1. The Lie bracket.
Definition 6.2.4. The quantity ∇X Y − ∇Y X is also denoted [X, Y ] and is called the Lie
bracket of the vector fields X and Y .
We proved in Lemma 6.2.2 that [X, Y ] is a vector field by a local coordinate calculation. In
this section we give a more conceptual proof of the same fact. The formula (6.2.12) shows that
the Lie bracket depends only on the vector fields X and Y and not on the metric g. That is,
we can define an operator on functions f :
[X, Y ]f = X(Y f ) − Y (Xf ) (6.2.16)
We aim to show directly that [X, Y ], so defined, is a vector field.
We begin with a lemma:
Lemma 6.2.5. Suppose that T : C ∞ (U ) → C ∞ (U ) is a real-linear map which satisfies the
Leibniz rule
T [ϕψ](p) = ϕ(p)T ]ψ(p) + ψ(p)T ϕ(p).
for all smooth functions ϕ and ψ. Then T is equal to a vector field on U .
Proof. Choose local coordinates xj such that xj = 0 corresponds to the point p. We aim
to show that there is a set of numbers T j such that
∂f
T [f ]p = T j j (0)
∂x
Define
X j = T xj .
By linearity T annihilates constants, for
T [1] = T [12 ] = 2T [1] by the Leibniz rule.
Hence T [1] = 0 and so T [c] = 0 for any constant.
Now let f be any smooth function. We can write
f (x) = f (0) + xj ∂j f (0) + O(|x|2 ).
Applying T , we see
T f (0) = X j (0)∂j f (0)
because, by the Leibniz rule, T applied to a smooth function vanishing to order 2 or more must
vanish.
Now we have the following slick proof that [X, Y ] is a vector field if X and Y are vector
fields.
Proposition 6.2.6. If X and Y are vector fields, then so is [X, Y ].
6.4. PROPERTIES 71
(2) ∇X is real-linear:
∇(λT + µS) = λ∇T + µ∇S
for any two tensor of type (r, s) and real numbers λ and µ.
(3) ∇ satisfies the Leibniz rule:
∇X (T ⊗ S) = (∇X T ) ⊗ S + T ⊗ (∇X S)
for any tensors T and S, not necessarily of the same type;
(4) If U is a contraction of tensors T and S, then
∇X U = c(∇X T )S + c(T ∇X S)
where c denotes the contraction.
Remark 6.3.5. I omit the proof, but refer you to the problem set fo related exercises.
It is a pain to write out the general case, but here are some examples:
∇a Tbc = ∂a Tbc − Γsab Tsc − Γsac Tbs ; (6.3.2)
∇a Acb = ∂a Acb − Γsab Acs + Γcas Asb ; (6.3.3)
∇a P bc = ∂a P bc + Γbas P sc + Γcas P bs (6.3.4)
6.4. Properties
The covariant derivative operator we have been discussing has the following further proper-
ties:
Symmetry or Torsion-free:
∇a ∇b f = ∇b ∇a f (6.4.1)
for all functions f
Metric preservation
∇a gbc = 0. (6.4.2)
There are more general differentiation operators which satisfy the Leibniz-rule type prop-
erties of the previous section. These are called connections. Given a metric, there is a unique
such connection which satisfies the two boxed properties here. This is also called the metric
connection or the Levi-Civita connection after Tullio Levi-Civita (1873–1941).
Proof. (Of boxed properties) We have
∇a ∇b f = ∂a ∂b f − Γcab ∂c f. (6.4.3)
But the mixed partials of f are symmetric and Γcab
is symmetric in its lower indices. Hence the
RHS is symmetric and the result follows.
Similarly, we have
∇a gbc = ∂a gbc − Γsab gsc − Γsac gbs . (6.4.4)
Now from the definition of the Γs,
1
Γabc = (∂a gbc + ∂b gac − ∂c gab )
2
and so the result follows.
The metric preservation property has important consequences. Recall that the metric g can
be used to lower indices and that its inverse can be used to raise indices. Because ∇g = 0 it
follows that ∇g −1 = 0, and raising and lowering indices commutes with differentiation.
For example,
∇a (g bs αs ) = ∇a αb = g bs ∇a αs (6.4.5)
The middle thing is the covariant derivative of the index-raised version of α; the right-hand
thing is the derivative of α, with its index raised afterwards.
Example 6.4.1. If Y is parallel propagated along a curve γ, then its length is constant.
For if γ̇ = X, we have ∇X Y = 0. Then
∇X (g(Y, Y )) = 2g(Y, ∇X Y ) = 0
More explicitly,
∇X (Y a Ya ) = (∇X Y a )Ya + Y a ∇X Ya
And ∇X Y a = 0 implies also that ∇X Ya = 0. So both terms are 0.
6.5. Curvature
From the mathematical point of view, this section is possibly the most important in these
notes.
6.5. CURVATURE 73
where J is the Jacobian of the transformation. But we are assuming that X is a vector field as
well, so
e r.
X c = Jrc X
Inserting this into (6.5.9) we obtain
epqr s X
R e r,
e r = Jpa Jqb Jrc (J −1 )s Rabc d X (6.5.10)
d
e r . Hence
for any components X
epqr s = Jpa Jqb Jrc (J −1 )s Rabc d
R (6.5.11)
d
which is the transformation law for a tensor field of type (1, 3).
dx2 + dy 2
ds2 = .
x2
With x = x1 and y = x2 we computed the non-zero Christoffel symbols:
1 1 1
Γ111 = − , Γ122 = , Γ212 = Γ221 = − , others = 0. (6.5.12)
x x x
Using the formula (6.5.2) for curvature, we have
There is quite a bit of simplification because many of the Γs are zero in this case:
It turns out that all other components of the curvature are either ± this one, or are zero. In
particular if either 1 or 2 is repeated three or more times, then the corresponding curvature
component is zero. If 1 and 2 appear precisely twice each, then the corresponding component
is ±x−2 . From (6.5.15) we also have
R1212 = x−4
because g22 = x−2 .
Example 6.5.5. Curvature of flat space in polar coordinates. In 2D polar coordinates, the
Γs are:
1
Γ122 = r, Γ212 = Γ221 = , (6.5.16)
r
all others zero ( r = x1 , θ = x2 ). This time, (6.5.13) reduces to
1 1
R121 2 = ∂1 Γ212 + Γ212 Γ221 = − 2
+ 2 = 0.
r r
This illustrates the fact that curvature is a tensor: we knew this result ahead of time, because
it is clear that the curvature of a flat metric is zero, and we’ve just changed coordinates.
Definition 6.5.6. The curvature R of the covariant derivative ∇ is defined, for any three
vector fields X, Y , Z, as follows:
and comparing with the previous section, the RHS is Rabc d Z c in components. Thus the compo-
nent version of the curvature tensor arises from this definition by taking X = ∂a and Y = ∂b .
and
(∇f X ∇Y − ∇Y ∇f X − ∇[f X,Y ] )Z = f (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )Z. (6.5.21)
for any smooth function f . We then prove a technical lemma which explains why being C ∞ -
linear implies that the operation (6.5.19) is then given by a tensor
∇X ∇Y (f Z) = ∇X (f ∇Y Z + (Y f )Z).
Continuing,
∇X ∇Y (f Z) = f ∇X ∇Y Z + Xf ∇Y Z + XY f Z + (Y f )∇X Z.
Interchanging X and Y and subtracting, we get
But
∇[X,Y ] (f Z) = ([X, Y ]f )Z + f ∇[X,Y ] Z
and so subtracting this from each side gives the C ∞ -linearity of Z 7→ R(X, Y )Z for fixed X and
Y.
Remark 6.5.8. Note that all we have used in this proof is that directional derivatives satisfy
the Leibniz rule.
6.5.4. Proof of (6.5.21). This is left as an exercise for the reader. It is somewhat shorter
than the proof of (6.5.20). You will need to the formula
6.5. CURVATURE 75
is certainly a differential operator of order 2 in Z and order 1 in X and Y . The next lemma
looks complicated but it just says that a differential operator of order 2 which is also C ∞ -linear
must actually be algebraic, i.e. given by multiplication by multiplication by a tensor field.
Lemma 6.5.9. Let P be a second-order differential operator which maps vector fields to vector
fields. Suppose further that P is C ∞ -linear, that is
P (f Z) = f P (Z) (6.5.24)
for any smooth function f . Then in fact P is given by a (1, 1) tensor in the sense that
P (Z) = Abcd a cd a d a
a ∂ b ∂ c Z + Ba ∂ c Z + C a Z .
∂b ∂c (f Z a ) = ∂b ∂c f Z a + (∂b f ∂c Z a + ∂c f ∂b Z a ) + f ∂b ∂c Z a .
Similarly
∂c (f Z a ) = ∂c f Z a + f ∂c Z a
Hence,
P (f Z) = f P (Z) + Abcd
a (∂b ∂c f )Z
a
(6.5.26)
+ Abcd
a (∂b f ∂c Z
a a
+ ∂ c f ∂b Z ) + Bacd ∂c f Z a . (6.5.27)
Because P (f Z) = f P Z, we obtain
Abcd a a a cd a
a (∂b ∂c f Z + ∂b f ∂c Z + ∂c f ∂b Z ) + Ba ∂c f Z = 0 (6.5.28)
for any f and Z. If we pick f to be a product xp xq of two coordinate functions, substitute into
this and then set x = 0, the only surviving term is
Abcd p q a pqd a
a (∂b ∂c (x x ))Z = 2Aa Z .
and this is supposed to vanish. This being true for all vectors Z, we conclude that Apqd
a vanishes
at x = 0. This point was arbitrary, so Apqd
a vanishes everywhere, and (6.5.28) now reads
Bacd ∂c f Z a = 0 (6.5.29)
for all f and Z. We apply the same argument with f = xp and this gives Bapd = 0 at x = 0.
Again the point was arbitrary, so it follows that B vanishes everywhere. Hence P (f Z) = f P Z
implies that (P Z)d = Cad Z a , as required.
Applying this Lemma to our map (6.5.19), first as an operator on Z, with (X, Y ) fixed, then
in X, with (Y, Z) fixed, we see that R(X, Y )Z is indeed given by Rabc d X a Y b Z c where Rabc d are
the components of a tensor of type (1, 3).
Remark 6.5.10. This lemma is a bit fiddly to prove, but the idea that a differential operator
which is also C ∞ -linear has to be given algebraically (by multiplication by a tensor) is a powerful
one in differential geometry which saves a lot of computation in local coordinates.
Now, as proved below Rabcd = −Rabdc . So we can rearrange the indices to get
(∇a ∇b − ∇b ∇a )αd = −Rabdc αc = −Rabd c αc (6.6.11)
A second proof runs as follows. Let X be any vector. Then αa X a is a function and we know
where in the last equation we have switched the two dummy indices c and d so as to have X d
on each side. Hence
X d (∇a ∇b − ∇b ∇a )αd = −αd Rabc d X c = −X d Rabd c αc . (6.6.15)
We now make the usual argument that as X is arbitrary, this equation implies the result.
6.6.2. Symmetries of the curvature tensor. The formula for the curvature at a point
in Proposition 6.6.1 allows us to write down the general symmetry properties of Rabcd .
Theorem 6.6.6. For any metric, the curvature tensor has the following symmetries:
Rabcd = −Rbacd , Rabcd = −Rabdc ; (6.6.19)
It turns out that (6.6.20) is only independent of the other two if all four indices are distinct.
So this imposes another n-choose-4 conditions on the components. Hence the number of inde-
pendent components is
1 n(n − 1)(n − 2)(n − 3)
n(n − 1)(n2 − n + 2) −
8 24
1
= n(n − 1) 3n2 − 3n + 6 − (n − 2)(n − 3)
24
1 2 2
= n (n − 1). (6.6.24)
12
6.6.3. Alternative proof of Bianchi identity. (Cf. Woodhouse, GR, §5.7). If we choose
coordinates such that Γ = 0 at x = 0, we have, from (6.5.2),
∇a Rbcd e = ∂a ∂b Γecd − ∂a ∂c Γebd + terms of the form Γ∂Γ, ΓΓΓ (6.6.32)
so
∇a Rbcd e = ∂a ∂b Γecd − ∂a ∂c Γebd at x = 0. (6.6.33)
Summing over the cylic permutations of a, b, c, the terms on the RHS cancel out, showing that
the Bianchi identity holds at x = 0. But the LHS is a tensor equation and the point is arbitrary,
so we have obtained the Bianchi identity.
and
∂2H b
X a ∂a Y b =. (6.8.6)
∂τ ∂σ
Hence (6.8.3) follows from the symmetry of the mixed partials of H.
Let X = ∂H/∂τ be the tangent vector field of the geodesic for all fixed values of σ. Then
∇X X = 0 (6.8.7)
and so
∇Y ∇X X = 0. (6.8.8)
(This would not be true if the neighbouring curves were not geodesics). On the other hand, by
Definition 6.5.6,
(∇X ∇Y − ∇Y ∇X )Z = R(X, Y )Z (6.8.9)
for any vector Z, since [X, Y ] = 0. Putting Z = X,
∇X ∇Y X = R(X, Y )X. (6.8.10)
Using (6.8.3) once more, we get the result.
The components R0i0 j of the Riemann curvature tensor should be equated, to leading
order, with −∂i ∂ j ϕ, where ϕ is the newtonian potential, as observed by an observer with
4-velocity vector with components (1, 0) in inertial coordinates.
Lemma 6.10.1. If
gab = ηab + hab
then
1
g ab = η ab − hab + O(h2 ) and Γcab = η cs (∂a hbs + ∂b has − ∂s hab )
2
Knowing the Γs we can compute the slow geodesics:
Lemma 6.10.2. If xa (τ ) is a slow geodesic, we may assume τ = x0 so that the velocity vector
is
ẋa = (1, ẋ) (6.10.1)
Then geodesic equations are
1 1
ẍc + η cs (∂0 h0s − ∂s h00 ) = 0, i.e. ẍc − η cs ∂s h00 = 0, (6.10.2)
2 2
neglecting ∂0 -derivatives of h. Thus within our approximation
1
ẍ0 = 0, ẍj = η js ∂s h00 . (6.10.3)
2
This already suggests, since ẍj = −∇ϕ, that we should identify
1
ϕ ≃ h00 up to an additive constant. (6.10.4)
2
NB this is in a particular inertial frame of reference.
Lemma 6.10.3. In our approximation
1
Rabc d = ∂a ∂c hdb + ∂b ∂ d hac − ∂b ∂c hda − ∂a ∂ d hbc (6.10.5)
2
In particular,
1
rac = ∂a ∂c hbb + ∂b ∂ b hac − ∂b ∂c hba − ∂a ∂ b hbc . (6.10.6)
2
If we consider the 00 component of this and neglect the ∂0 hab terms, then
1 1
r00 = ∂b ∂ b h00 = − ∇2 h00 . (6.10.7)
2 2
This leads, once again, to the proposal that the newtonian empty space postulate (∇2 ϕ = 0
should translate into rab = 0 (since we’ve already seen that h00 /2 should be identified (up to a
constant) with the newtonian potential.
In Minkowski space, recall that Maxwell’s equations imply that each of the components Ei ,
Bi , of the electric and magnetic fields satisfies the flat-space wave equation, ✷Ei = 0 = BoxBi .
In general M , we have:
Proposition 6.11.1. If Fab = −Fab satisfies Maxwell’s equations (6.11.3) in (M , g), then
Fab satisfies a modified wave equation:
∇a ∇a Fbc = Rbcas F sa − rba Fca − rca Fab (6.11.5)
Proof. See the Problem set.
CHAPTER 7
In this Chapter we shall give a first example of a non-trivial solution of Einstein’s vacuum
equations, rab = 0. This is the Schwarzschild metric. It is spherically symmetric and is the GR
version of the newtonian potential −m/r due to a mass m at the origin.
We shall study the timelike and null geodesics in the Schwarzschild metric. To a first
approximation, there are timelike geodesics which correspond to closed elliptical orbits, as in
Newton’s theory. When GR corrections are taken into account, it will be seen that the orbits
do not close up: this is the famous precession of the perihelion1 which is measurable in the case
of the orbit of Mercury. This was one of the first observational verifications of GR.
Null geodesics represent the paths taken by light-rays, and we shall see that there is a
bending effect. This effect was observed by Eddington during the solar eclipse of 1919 (and is
responsible for gravitational lensing). This effect provides another observational verification for
GR.
The Minkowski metric is defined initially in a set r > 2m in units with G = c = 1. However,
it turns out that this is a defect of the coordinates rather than of the metric itself. After a
change of coordinates the metric continues through this surface. This surface is now intepreted
as the event horizon of a black hole.
where θ is the colatitude (i.e. latitude, but measured from the north pole rather than the
equator) and ϕ is longitude.
The Minkowski metric in these coordinates is
Let us write
dω 2 = dθ2 + sin2 θ dϕ2 (7.1.3)
which is the round metric on the unit sphere x2 + y 2 + z 2 = 1 in R3 . This will save writing later.
A spherically symmetric, static2 metric is obtained from this by introducing functions of r
as coefficients
ds2 = A(r)dt2 − B(r)dr2 − C(r)r2 dω 2 . (7.1.4)
where we require A > 0, B > 0 and C > 0 in the region of interest. The dependence of these
functions only on r encodes the spherical symmetry of the metric and also its time-invariance.
One can, of course, consider more general metric forms, but that is beyond the scope of this
course.
85
7.2. Schwarzschild
The Schwarzschild metric3 is an idealization of Einstein’s vacuum equations which describes
the field due to a point mass or perhaps outside a star. It is the general-relativistic analogue of
the newtonian potential −m/r which is the potential describing the gravitational field due to a
point-mass m in Newton’s theory of gravitation.
We shall not derive this in full. We start with the form (7.1.4). This should approach the
Minkowski metric when r is large, so we assume
A(r) ∼ 1, B(r) ∼ 1, C(r) ∼ 1 as r → ∞. (7.2.1)
Proposition 7.2.1. By a change of r variables, ρ = f (r), (7.1.4) can be made to take the
form
ds2 = α(ρ)dt2 − β(ρ)dρ2 − ρ2 dσ 2 . (7.2.2)
p
Proof. If we define ρ = C(r)r, then we shall get the coefficient of dσ 2 correct. Since
C > 0 this is certainly invertible for large enough r, so we define
α(ρ) = A(r(ρ)).
For the dr term,
2
dr
B(r)dr2 = B(r) dρ2
dρ
so
2
dr
β(ρ) = B(r(ρ)) .
dρ
This completes the proof.
We use this proposition, then rechristen ρ as r. So we may as well look at metrics in the
slightly simpler form
A(r)dt2 − B(r)dr2 − r2 dω 2 (7.2.3)
We saw in the previous chapter that in the weak field limit, the component g00 should be
matched with twice the newtonian potential computed by an observer with 4-vector (1, 0), up
to an additive constant. So the simplest possible guess for A(r) is 1 − 2m/r: the value 1 comes
from the required asymptotic form of the metric.
It turns out that there is a choice of B which then gives a metric which satisfies Einstein’s
equations where the metric is defined:
Theorem 7.2.2. The Schwarzschild metric
ds2 = (1 − 2m/r)dt2 − (1 − 2m/r)−1 dr2 − r2 dσ 2 , (r > 2m) (7.2.4)
satisfies rab = 0.
We shall not prove this in full. You can see most of the details in Woodhouse.
We shall, however, record the geodesic equations and the Γs for this metric
Proposition 7.2.3. The geodesic equations for the metric (7.1.4) are
A′ d
ẗ + ṫṙ = 0 or Aṫ = 0
A dτ
A′ 2 B ′ 2 r r
r̈ + ṫ + ṙ − θ̇2 − sin2 θϕ̇2 = 0
2B 2B B B
θ̈ + (2/r)θ̇ṙ − sin θ cos θϕ̇2 = 0
2 d 2 2
ϕ̈ + ṙϕ̇ + 2 cot θθ̇ϕ̇ = 0 or r sin θϕ̇ = 0.
r dτ
3Named in honour of Karl Schwarzschild, 1873–1916
7.2. SCHWARZSCHILD 87
Proposition 7.2.5. The non-vanishing components of the Ricci tensor of the spherically
symmetric metric (7.1.4) are
A′′ A′ B ′ A′2 A′
r00 = − + + − ,
2B 4B 2 4AB rB
A′′ (A′ )2 A′ B ′ B′
r11 = − − − ,
2A 4A2 4AB rB
rA′ rB ′ 1
r22 = − + −1
2AB 2B 2 B
r33 = sin2 θr22 .
Now we can verify that the Schwarzschild metric of Theorem 7.2.2 does indeed satisfy the
Einstein vacuum equations rab = 0.
Eliminating the A′′ terms between r00 and r11 gives AB ′ + BA′ = 0 so AB is constant. And
this should be 1 by the boundary condition.
Inserting B = 1/A, B ′ = −A′ /A2 ,
1 1
r00 = − AA′′ − AA′
2 r
1
r11 = r00
A2
r22 = rA′ + A − 1
r33 = sin2 θ(rA′ + A − 1).
Solving r22 = 0 gives A = 1 − 2m/r, where m is a constant and one checks that this also solves
the r00 = 0 equation. Hence we arrive at the Schwarzschild metric
2 2m 2 2m −1 2
ds = 1 − dt − 1 − dr − r2 (dθ2 + sin2 θdϕ2 ).
r r
where m > 0 and, for the moment anyway, r > 2m.
Woodhouse, GR, Sect. 7.1–7.2
r = 2m r = rB
r = rA
and
rB − 2m
tB − tA = rB − rA + 2m log (7.3.8)
rA − 2m
for a photon travelling from
(tA , rA , θ0 , ϕ0 ) to (tB , rB , θ0 , ϕ0 ). (7.3.9)
Hence δtA = δtB as previously argued.
7.3.2. Geodesics in Schwarzschild. The Lagrangian L for geodesics in Schwarzschild is
1
L= (1 − 2m/r)ṫ2 − (1 − 2m/r)−1 ṙ2 − r2 (θ̇2 + sin2 θϕ̇2 .
2
We take the parameter to be proper time τ and ˙ to mean differentiation wrt to τ .
We have conserved quantities
E = (1 − 2m/r)ṫ, J = r2 sin θϕ̇ (7.3.10)
and
L = 1/2 for timelike, L = 0 for null geodesics. (7.3.11)
Remark 7.3.2. The conserved quantity E is the total energy of our particle, assumed to
have unit rest-mass. (Not the gravitating one, the one that’s orbiting.) If the rest-mass of the
orbiting particle is µ, we claim that
E = µ(1 − 2m/r)ṫ (7.3.12)
is the total energy. Remember that total energy is a relative concept in relativity, so this
statement needs careful interpretation. Suppose Alice is an observer sitting at constant (r, θ, ϕ)
in Minkowski space. She measures the energy of the orbiting particle as it passes her, i.e. when
its spatial coordinates are (r, θ, ϕ). Let Alice’s 4-velocity vector be U , and that of the orbiting
particle V . Then
U = (U a ) = (1 − 2m/r)−1/2 (1, 0, 0, 0) and V = (V a ) = (ṫ, ṙ, θ̇, ϕ̇)
where the dot denotes differentiation with respect to the particle’s proper time parameter. The
instantaneous speed v of the particle as measured by Alice as it passes satisfies
γ(v) = g(U, V )
just as in special relativity. In this case,
g(U, V ) = (1 − 2m/r)1/2 ṫ.
Hence
E = µ(1 − 2m/r)ṫ = µ(1 − 2m/r)1/2 g(U, V ) = µ(1 − 2m/r)1/2 (1 − v 2 )−1/2 ≃ µ + µv 2 /2 − µm/r
(7.3.13)
if m/r is small and so is v. Recall that G = 1, c = 1 here; if we restore units, then this becomes
1 Gmµ
E ≃ µc2 + µv 2 − (7.3.14)
2 r
The terms here are the rest-energy of the particle, its kinetic energy and its gravitational
potential energy. So this approximation is in perfect agreement with newtonian gravity and
special relativity.
Proposition 7.3.3. Equatorial6 timelike geodesics in Schwarzschild are given by the equa-
tions 2
dr
= E 2 − (1 − 2m/r) (radial geodesics) (7.3.15)
dτ
and by
d2 u m
2
+ u − 3mu2 = 2 (non-radial geodesics), (7.3.16)
dϕ J
6i.e. with θ = π/2
where u = 1/r and the angular momentum J = r2 ϕ̇ 6= 0 is a constant. The equation (7.3.16)
has the first integral
2
du E 2 − 1 2m
+ u2 − 2mu3 = + 2 u, (7.3.17)
dϕ J2 J
Similarly,
Proposition 7.3.4. Radial null geodesics in Schwarzschild are given by (7.3.5–7.3.9). Non-
radial, equatorial null geodesics in Schwarzschild satisfy
d2 u
+ u − 3mu2 = 0 (7.3.18)
dϕ2
which has the first integral
2
du E2
+ u2 − 2mu3 = . (7.3.19)
dϕ J2
Proof. We have seen that for Schwarzschild geodesics, with θ = π/2 we have the conserved
quantities
E = (1 − 2m/r)ṫ, J = r2 ϕ̇ (7.3.20)
and the further conservation equation
(1 − 2m/r)ṫ2 − (1 − 2m/r)−1 ṙ2 − r2 ϕ̇2 = 2L. (7.3.21)
where L = 1/2 for timelike and L = 0 for null geodesics. In the radial case, ϕ̇ = 0 and for
timelike geodesics, (7.3.21) gives
dr 2 2m
E2 − =1− , (7.3.22)
dτ r
from which (7.3.15) follows at once.
In the non-radial case, we follow the same moves that led to newtonian orbits (see problem
set 1). We set u = 1/r. Then
(1 − 2mu)−1 E 2 − (1 − 2mu)−1 ṙ2 − r2 ϕ̇2 = 2L (7.3.23)
Divide through by J 2 = r4 ϕ̇2 :
2
E2 −1 4 dr 2L
(1 − 2mu)−1 2 − (1 − 2mu) u − u2 = (7.3.24)
J dϕ J2
Now
dr 1 du
=− 2 (7.3.25)
dϕ u dϕ
and substituting this in to (7.3.24) gives
2
1 E2 1 du 2L
2
− − u2 = (7.3.26)
1 − 2mu J 1 − 2mu dϕ J2
Rearranging this yields
2
du E 2 − 2L 4mL
+ u2 − 2mu3 = + 2 u (7.3.27)
dϕ J2 J
The results in the Propositions follow from this by setting L = 1/2 for timelike and L = 0
for null. The second-order equation follows by differentiation with respect to ϕ, and cancelling
uϕ .
Remark 7.3.5. For newtonian gravity, the equation for orbits (see homework problem 1.10)
are
d2 u m
+u= 2 (7.3.28)
dϕ2 J
with first integral
du 2 A 2m
+ u2 = 2 + 2 u (7.3.29)
dϕ J J
Thus the GR correction to this equation is the −2mu3 term on the LHS of (7.3.27).
Remember that u = 1/r so large r corresponds to small u and the effect of the cubic
correction term is stronger for small radii. Remember also that the Schwarzschild metric appears
only to be OK for r > 2m which corresponds to 0 < u < 1/2m.
7.3.3. Circular timelike orbits and the precession of perihelion. You can find an
extensive analysis of the timelike geodesics in Schwarzschild in Woodhouse’s book, Chapter 8.
We shall just look at circular orbits and small perturbations of them. This already leads to the
precession of perihelion, which I may have mentioned was one of the first verifications of GR.
Consider a circular timelike orbit in Schwarzschild. For such an orbit, evidently uϕϕ = 0,
uϕ = 0. Setting uϕϕ = 0 in (7.3.16) gives the equation
3mu2 − u + m/J 2 = 0 (7.3.30)
so solving the quadratic, p
1± 1 − 12m2 /J 2
u= (7.3.31)
6m
Thus we have circular orbits if J 2 > 12m2 . For m small, the larger value of u is approximately
1/3m and is just less than this value. The smaller value of u is
p
1 − 1 − 12m2 /J 2 m
u = u0 = ≃ 2 (7.3.32)
6m J
if m/J 2 is small, and this is the newtonian value of the radius of a circular orbit for given m
and J (cf. (7.3.28))
Fourth perihelion
Third perihelion
Second perihelion
First perihelion
We conclude that
u(ϕ) = α cos ϕ + mα2 (1 + sin ϕ)2 . (7.3.46)
is an approximate null geodesic in Schwarzschild for α small.
We calculate the angle of deflection by looking for the values of ϕ for which u = 0. We
already know one of them: ϕ = −π/2. We expect the other one to be approximately ϕ = π/2
(since this would correspond to zero deflection). So try to solve u(π/2 − λ) = 0, assuming λ is
small. Substituting this into (7.3.46),
u(π/2 − λ) = 0 ⇔ α sin λ + mα2 (1 + cos λ)2 = 0 (7.3.47)
and putting sin λ ≃ λ cos λ ≃ 1,
λ ≃ −4αm. (7.3.48)
So the asymptotic direction of the light-ray is approximately π/2 + 4mα, showing that the
photon has been deflected by an angle 4mα due to the gravitational pull of the star.
Again, putting in the units, the deflection is approximately 4mG/Dc2 , where D is the
‘impact parameter’ (the smallest value of r on the trajectory).
This was observed by Eddington during the 1919 total eclipse of the sun.
So the particle reaches r = 2m at proper time τ (2m) < ∞. This means that the problem of
understanding the metric in the vicinity of r = 2m is of real physical relevance.
7.4.1. Toy examples.
Example 7.4.1. The metric
dt2
ds21 = + 2t dθ2 , (0 < x < ∞) (7.4.3)
2t
is singular at t = 0. However, if we define
dt √
dr = √ , so r = 2t,
2t
the metric becomes
ds21 = dr2 + r2 dθ2 = dx2 + dy 2
(x = r cos θ, y = r sin θ). The origin (x, y) = (0, 0) corresponds to the singularity t = 0 of
(7.4.3): despite appearances, the metric is perfectly good there.
Example 7.4.2. Consider the metric
ds22 = e2u du2 + e2v dv 2 , −∞ < u, v < ∞. (7.4.4)
There might appear to be nothing to say about this but if we put
x = eu , y = ev so
mapping uv plane to xy quadrant, then
ds22 = dx2 + dy 2 0 < x, y < ∞
and the metric extends (as the flat metric) to the whole xy plane.
v y
(x, y)
u x
(u, v)
Definition 7.4.3. In cases where a metric is ill-defined at a point, but after a change of
coordinates it becomes well defined, we say that we have a coordinate singularity.
A coordinate singularity is not a true geometric singularity: it just corresponds to looking
at the metric in a poor choice of coordinates.
We shall now see that the surface r = 2m is a coordinate singularity rather than a metric
singularity of Schwarzschild by exhibiting new coordinates in which the singularity disappears!
Let us discuss what’s happened here. To avoid later confusion, we shall rechristen the r
coordinate here ρ. Thus (suppressing θ and ϕ) we have two coordinate systems: the original
(t, r) and the new (v, ρ) with
v = t + r + 2m log |r − 2m|, ρ = r. (7.4.12)
By the laws for change of vector fields,
∂ ∂v ∂ ∂ρ ∂ ∂
= + = (7.4.13)
∂t ∂t ∂v ∂t ∂ρ ∂v
and7
∂ ∂v ∂ ∂ρ ∂ ∂ ∂
= + = (1 − 2m/r)−1 + . (7.4.14)
∂r ∂r ∂v ∂r ∂ρ ∂v ∂ρ
The following picture may help to visualize what’s going on:
ρ=0 ρ = 2m
Future-pointing radial null vectors in the original coordinates are (positive multiples of)
∂ ∂
± (1 − 2m/r) . (7.4.15)
∂t ∂r
Using (7.4.13), in the new coordinates these become
∂ ∂ ∂
± + (1 − 2m/r) . (7.4.16)
∂v ∂v ∂ρ
Hence a pair of radial future-pointing null vectors in the new coordinates is:
2∂v + (1 − 2m/r)∂ρ , −∂ρ . (7.4.17)
To summarize, a change of variables has enabled us to extend the Schwarzschild metric
through the hypersurface r = 2m. When so extended, the metric is defined for 0 < r < 2m, but
light emitted from any point in this region can never escape. For this reason r = 2m is called
the event horizon.
7this equation would be confusing if we had not renamed r as ρ!
Remark 7.4.5. The radius r = 2m is called the Schwarzschild radius. For our sun this is
about 3 kilometres, well inside the sun itself. In particular the Schwarzschild metric is not valid
there, because matter is present in this region. The above discussion is only of significance if
matter is so highly concentrated that the region r = 2m is contained in a region of empty space.
The region r < 2m, to which we have now extended the Schwarzschild metric, is then called the
black hole region of the space-time.
7.4.3. What happens near the event horizon? Suppose Alice and Bob are near a
black hole described by the Schwarzschild metric. Alice is sitting at r = R and the unfortunate
Bob8 falls through r = 2m. How can we analyze this?
If Bob is freely falling, radially, so θ̇ = ϕ̇ = 0, the Lagrangian in Eddington–Finkelstein
coordinates is
1
L = ((1 − 2m/r)v̇ 2 − 2v̇ ṙ). (7.4.18)
2
Here L is independent of v and so
∂L
= (1 − 2m/r)v̇ − ṙ (7.4.19)
∂ v̇
is a constant, F , say. Also L = 1/2 along a timelike geodesic parameterized by proper time.
Suppose that τ = 0 when r = 2m. For small τ , these equations reduce to
− ṙ = F, −v̇ ṙ = 1/2 (7.4.20)
So near the event horizon r = 2m, Bob’s world line is
v ≃ τ /2F, r ≃ 2m − F τ, for small τ. (7.4.21)
Thus Bob doesn’t notice anything particularly strange in crossing the event horizon: though in
reality, for a typical sized black hole, the tidal forces (the difference between the force felt on
your head and your feet) get a bit strong well before you encounter the event horizon.
What does Alice see of Bob’s descent? To answer this question, suppose she is sitting at
r = R > 2m, and receives a photon from Bob at every tick of his clock. Then A’s world-line is
τA 7→ (V τA , R), her velocity 4-vector is (V, 0) and so τA is proper time if this has length2 equal
to 1 with respect to our metric. This gives V = (1 − 2m/R)−1/2 (cf. §7.3.1). So
v(τA ) = (1 − 2m/R)−1/2 τA , r(τA ) = R along A’s worldline. (7.4.22)
A photon emitted by Bob’s clock at proper time τB < 0 and heading out to Alice will satisfy
(1 − 2m/r)v̇ 2 − 2ṙv̇ = 0 (7.4.23)
with initial conditions
τB
v(0) = , r(0) = 2m − F τB . (7.4.24)
2F
Dividing by ṙv̇, (7.4.23) gives
dv 2r
= (7.4.25)
dr r − 2m
and so, integrating,
v(r) = 2(r + 2m log(r − 2m)) + c (7.4.26)
for some integration constant. Inserting the initial conditions, we get
r − 2m 1
v(r) = 2[r − 2m + 2m log + (F + )τB ]. (7.4.27)
F |τB | 4F
So the photon emitted by Bob at his proper time τB is received by Alice when r = R, so
R − 2m 1
v(R) = 2 R − 2m + 2m log + (F + )τB . (7.4.28)
F |τB | 4F
and by (7.4.22) the corresponding value of A’s proper time is
r
2m R − 2m 1
τA = 2 1 − R − 2m + 2m log + (F + )τB . (7.4.29)
R F |τB | 4F
8Perhaps Bob’s a robot
Remark 7.4.7. This extension was obtained in 1960 by Kruskal. The crucial point is that
g ∗ is a well-defined lorentzian metric wherever r > 0. From (7.4.37)
Thus the metric is defined in the set v ′ w′ < 2m, but the coordinates (v ′ , w′ ) can have either
sign.
Remark 7.4.8. If you prefer something looking more obviously Lorentzian, define
so
4m2 −r/2m
g∗ = e [(dt′ )2 − (dx′ )2 ] − r2 dω 2
r
The following picture, the Kruskal diagram should help.
w′ Singularity (r = 0) t′ v′
r = 2m r = 2m
Region I’ Region I
Region II
x′
r = const > 2m
Region II’
Singularity (r = 0)
Note:
• Radial null geodesics are given by
v ′ = τ, w′ , θ, ϕ constant,
and
w′ = τ, v ′ , θ, ϕ constant.
(i.e. they are always at 45◦ in the Kruskal diagram).
• The event horizon r = 2m is given by v ′ w′ = 0: the two dashed lines;
• The singularity at r = 0 is given by v ′ w′ = 2m: the two thick hyperbolae at top and
bottom of the picture.
• Region I is the domain of the original Schwarzschild metric, r > 2m.
• Region II is the region in which the Eddington–Finkelstein coordinates describe the
metric: Region II is the region interior to the event horizon r = 2m, i.e. the black hole
region. The boundary between regions I and II is the positive v ′ -axis.
We emphasise that the (v ′ , w′ ) axes are at 45◦ in this diagram. Also that the hyperbola at
the top of the picture is the true singularity of the metric and represents the black hole itself.
The ultimate fate of every particle or photon in region II (inside the event horizon) is to end
up terminated by this singularity.
The singularity is the set v ′ w′ = 2m and is shielded from view by the event horizon. GR
and all known laws of physics break down at the singularity.