You are on page 1of 176

AN INTRODUCTION TO

THE THEORY OF RELATIVITY


Course Notes for AM 475
J. Wainwright1
Department of Applied Mathematics
University of Waterloo
August 21, 2006

1
c

J. Wainwright May 2003

Contents
Preface

vii

Acknowledgements

ix

1 Differential Geometry and Tensors


1.1 Tensors in En . . . . . . . . . . . . . . . . . .
1.1.1 Metric Tensor in En . . . . . . . . . .
1.1.2 Tensor Transformation Laws . . . . . .
1.1.3 Operations on Tensors . . . . . . . . .
1.2 Differentiation of Tensors . . . . . . . . . . . .
1.2.1 The Covariant Derivative . . . . . . . .
1.2.2 Christoffel Symbols . . . . . . . . . . .
1.3 Differential Geometry in a Riemannian Space
1.3.1 A Surface as a 2-D Riemannian Space
1.3.2 n-Dimensional Riemannian Space V n .
1.3.3 The Riemann Curvature Tensor . . . .
1.3.4 Curvature of a 2-D Riemannian Space
1.4 Geodesics . . . . . . . . . . . . . . . . . . . .
1.4.1 The Geodesic Equations . . . . . . . .
1.4.2 Geodesics and Curvature . . . . . . . .
1.5 Looking Ahead: Why Tensors? . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
1
4
7
10
10
13
15
15
17
19
23
26
26
30
34

.
.
.
.
.
.
.
.
.
.

37
37
37
38
39
41
41
42
43
44
44

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2 Special Relativity Theory & Lorentzian Metrics


2.1 The Relativity Principle & Light Propagation . . . . .
2.1.1 Inertial frames . . . . . . . . . . . . . . . . . . .
2.1.2 Light Propagation: . . . . . . . . . . . . . . . .
2.1.3 Events and Spacetime . . . . . . . . . . . . . .
2.2 The Lorentz Transformation . . . . . . . . . . . . . . .
2.2.1 Light Propagation and the k-factor . . . . . . .
2.2.2 Relative Velocity and the k-factor . . . . . . . .
2.2.3 Derivation of the special Lorentz transformation
2.3 Minkowski Spacetime & Lorentzian Metrics . . . . . .
2.3.1 The Spacetime Separation between Events . . .
iii

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

2.4
2.5
3 The
3.1
3.2
3.3
3.4
3.5

2.3.2 The Lorentzian metric tensor . . . . . . . . . .


2.3.3 Four-dimensional Minkowski space-time . . . .
2.3.4 The light cone . . . . . . . . . . . . . . . . . . .
Worldlines and geodesics . . . . . . . . . . . . . . . . .
A Brief Introduction to the Energy-Momentum Tensor
General Theory of Relativity
Newtonian Theory of Gravitation . . . . . . . .
General Relativity and the Geodesic Hypothesis
The Einstein Vacuum Field Equations . . . . .
The Einstein Field Equations with Sources . . .
The Weak Field Limit of General Relativity . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

4 The Schwarzschild Metric and its Applications


4.1 The Schwarzschild Solution and the Solar Gravitational Field . .
4.1.1 Newtonian theory . . . . . . . . . . . . . . . . . . . . . .
4.1.2 GRT approach . . . . . . . . . . . . . . . . . . . . . . .
4.2 Planetary Orbits . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Newtonian theory . . . . . . . . . . . . . . . . . . . . . .
4.2.2 GRT approach . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 Predictions concerning planetary orbits . . . . . . . . . .
4.2.4 Observations concerning planetary orbits . . . . . . . . .
4.3 Deflection of Light and Radio Signals . . . . . . . . . . . . . . .
4.3.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 Observations . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Gravitational Frequency Shift . . . . . . . . . . . . . . . . . . .
4.5 GRT and Observation - Summary . . . . . . . . . . . . . . . . .
4.6 The Schwarzschild Metric and Black Holes . . . . . . . . . . . .
4.6.1 Behaviour of the Schwarzschild line-element near r = 2m
4.6.2 The Eddington-Finkelstein line-element . . . . . . . . . .
4.6.3 Model of a collapsing star . . . . . . . . . . . . . . . . .
4.6.4 Black holes and observations . . . . . . . . . . . . . . . .
5 An Introduction to Relativistic Cosmology
5.1 Introduction . . . . . . . . . . . . . . . . . . . .
5.1.1 Aims of Cosmology . . . . . . . . . . . .
5.1.2 Unique Difficulties . . . . . . . . . . . .
5.1.3 The Starting Point . . . . . . . . . . . .
5.1.4 An Unverifiable Hypothesis . . . . . . .
5.1.5 The Choice of a Theory of Gravity . . .
5.2 The Friedmann-Robertson-Walker Line-Element
5.2.1 The Fundamental Timelike Congruence .
iv

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

46
48
50
50
58

.
.
.
.
.

63
63
65
68
72
74

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

77
77
77
78
81
81
82
86
87
88
88
91
91
94
95
95
98
100
102

.
.
.
.
.
.
.
.

107
107
107
108
108
109
109
109
109

5.3
5.4
5.5
5.6
5.7

5.2.2 The Cosmological Principle . . . . . . . . . . . . . . . . . . . . . .


Derivation of the FRW Line-element . . . . . . . . . . . . . . . . . . . . .
The Spatial Geometry of FRW Universes . . . . . . . . . . . . . . . . . . .
The Cosmological Red-Shift and the Expansion of the Universe . . . . . .
The Distance Red-Shift Relation . . . . . . . . . . . . . . . . . . . . . . . .
The FRW Cosmological Models . . . . . . . . . . . . . . . . . . . . . . . .
5.7.1 The Einstein Field Equations applied to the FRW metric . . . . . .
5.7.2 The Big-Bang, and Future Evolution of the Universe . . . . . . .
5.7.3 Uniqueness of the Universe . . . . . . . . . . . . . . . . . . . . . . .
5.7.4 Restrictions on the Observability of the Universe: Particle Horizons
5.7.5 FRW Cosmological Models with Zero Pressure . . . . . . . . . . . .
5.7.6 The Einstein-de-Sitter Model . . . . . . . . . . . . . . . . . . . . .
5.7.7 Summary of Observational Data . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

111
114
119
120
122
125
125
127
131
132
134
138
141

Appendix A: Calculation of the Christoffel Symbols using the Euler-Lagrange


Equations
143
Appendix B: General Frequency Shift Formula

145

Bibliography

149

Problem Sets
Problems for
Problems for
Problems for
Problems for

153
154
156
157
162

Chapter
Chapter
Chapter
Chapter

1
3
4
5

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Preface
These notes give an introduction to Einsteins theory of general relativity, and to three
important applications, the solar gravitational field, the concept of a black hole and the
formulation of cosmological models. In order to make the subject mathematically accessible
we have not made use of modern differential geometry (i.e. differentiable manifolds and the
index-free approach to tensors). Instead, in the first chapter we give a brief introduction to
the essentials of classical differential geometry and tensor analysis.
As regards the mathematical prerequisites, a strong background in calculus and elementary ordinary differential equations is essential (AMATH 231 and AMATH 250). As regards
physics, an introductory course in Newtonian mechanics and special relativity is essential
(AMATH 261). The course has a four hundred level number to reflect the required level of
intellectual maturity.
Warning:
The information concerning observations of black holes in chapter 4, and concerning
cosmological observations in chapter 5, is completely out-of-date, since these notes were
written in 1980 and there have been very significant developments since then. We intend
to update these parts of the notes in the near future. Please note that the mathematical
description of black holes and cosmological models, based on Einsteins theory of general
relativity, is still completely valid.

December, 2004

vii

Acknowledgements
Thanks are expressed to
Matt Calder: for taking the lead in writing and typesetting Chapter 1, and for producing
the figures.
Conrad Hewitt: for helpful discussions concerning the material in the course.
Ann Puncher: for excellent typesetting.

ix

Chapter 1
Differential Geometry and Tensors
Geometrically, surfaces in E3 , such as a sphere or a paraboloid, are fundamentally different
from a plane. In all three spaces, there is defined a notion of distance, but the first two
have non-zero curvature (they are intrinsically curved), while a plane has zero curvature (it
is flat).
One of Einsteins insights was that gravity can be described by the curvature of spacetime,
which is a four-dimensional set whose points are labelled by coordinates (t, x1 , x2 , x3 ), where t
is a time coordinate and x1 , x2 , x3 are spatial coordinates. Hence the need to study curvature
in this course.
Differential geometry provides the mathematical language for describing curvature of
spaces of two or more dimensions. One of the principle tools is the notion of a tensor, which
we will now introduce.

1.1
1.1.1

Tensors in En
Metric Tensor in En

Consider n-dimensional Euclidean space En , which is Rn with the Euclidean inner product,
i.e. if u = (u1 , . . . , un ) and v = (v 1 , . . . , v n ) are elements of En , then
uv=

n
X

ui v i .

i=1

It is helpful to write u v in terms of the Kronecker delta symbol,


(
1, if i = j
ij =
,
0, if i 6= j
as
uv=

n X
n
X

ij uiv j = ij uiv j .

i=1 j=1

(1.1)

We have used here the Einstein summation convention, which is used throughout differential
geometry and Einsteins Theory of Relativity.
Notation: [Einstein Summation Convention] When an index appears repeated in an expression, summation is implied to occur over all values assumed by that index. As is customary
when using the -notation, the summation index is called a dummy index because it can
be relabelled at will. For example, we can relabel the dummy indices in equation (1.1) and
write
u v = m u v m .
Let {X i}ni=1 be Cartesian coordinates labelling points in En . Consider a curve C in En
given by a vector equation
X = X( ), 1 2 .
The tangent vector to C is given by

dX
d

s=

and the arclength of the curve is given by


2

dX dX

d.
d
d

(1.2)

In differential form, (1.2) reads


ds
=
d

dX dX

=
d
d

ij

dX i dX j
,
d d

(1.3)

where we have made use of the notation in (1.1).


Now, introduce arbitrary coordinates {xi }ni=1 that are related to the Cartesian coordinates
by
X = F(x),
where F : Rn Rn . In component form, we write
X i = F i (x1 , x2 , . . . , xn ),

i = 1, . . . , n.

(1.4)

In these new coordinates, write the curve C as


xi = xi ( ).

(1.5)

Then, by applying the chain rule to (1.4) and (1.5), we obtain the tangent vector to C,
n

F i dx1
F i dxn X F i dxj
dX i
=
+
.
.
.
+
=
.
d
x1 d
xn d
xj d
j=1
Using the summation convention,
dX i
F i dxj
=
.
d
xj d
2

In vector form, this equation reads


F dxj
dX
= j
.
d
x d
It follows from (1.3) and (1.6) that
 2
ds
F F dxi dxj
.
= i j
d
x x d d
We introduce the notation
gij =

F F

xi xj

(1.6)

(1.7)

(1.8)

so that (1.7) assumes the form




ds
d

2

= gij

dxi dxj
.
d d

(1.9)

The functions gij are the components of what is known as the metric tensor relative to the
coordinate system {xi }. We write (1.9) in the shorthand form
ds2 = gij dxi dxj ,

(1.10)

which can be interpreted heuristically as the square of the distance between two infintesimally
close points in En . An expression for ds2 is referred to as the line-element.
Remark 1.1: It is important to note that the components gij of the metric tensor form a
symmetric array, i.e.
gij = gji.
This symmetry is a consequence of (1.8) and the fact that the scalar product is symmetric
(i.e. u v = v u).
Example 1.1: Find the components of the metric tensor in E2 relative to polar coordinates.
Solution: We have the change of coordinates X = F(x) given by
 1  1

X
x cos x2
=
,
X2
x1 sin x2
where (x1 , x2 ) = (r, ) represent the radius and angle of polar coordinates.
Calculate the coordinate tangent vectors, namely the partial derivatives of F:

 1


F
F
cos x2
x sin x2
=
=
and
.
sin x2
x1 cos x2
x1
x2
From (1.8), we can calculate the components of the metric tensor in matrix form:

 

F F
1
0
.

=
(gij ) =
0 (x1 )2
xi xj
3

(1.11)

From (1.10) and (1.11), reverting back to r and , we now have an expression for ds2 in polar
coordinates,
ds2 = dr 2 + r 2 d2 .

Exercise: Show that the components gij of the metric tensor in E3 relative to spherical
coordinates (r, , ) are given implicitly by the line-element
ds2 = dr 2 + r 2 (d2 + sin2 d2 ).

(1.12)

Recall that spherical coordinates (x1 , x2 , x3 ) = (r, , ) are related to Cartesian coordinates
according to
1 1

X
x sin x2 cos x3
X 2 = x1 sin x2 sin x3 .
X3

x1 cos x2

The metric tensor also enables one to define the magnitude of a vector and the notion of
orthogonality of two vectors in En . Firstly, the magnitude of the vector V with components
V i relative to arbitrary coordinates xi is given by
kVk2 = gij V i V j .

(1.13)

Relative to Cartesian coordinates, (1.13) simplifies to the usual expression:


kVk2 = ij V i V j = (V 1 )2 + . . . + (V n )2 .
Secondly, two vectors with components U i and V i relative to arbitrary coordinates xi are
orthogonal if and only if
gij U i V j = 0.
(1.14)
Relative to Cartesian coordinates, (1.14) simplifies to the usual expression
U 1 V 1 + . . . + U n V n = 0.

1.1.2

Tensor Transformation Laws

Let xi and xi be two coordinate systems in En related by


xi = F i (x1 , x2 , . . . , xn ),

i = 1, 2, . . . , n,

which we write more consisely as


xi = F i(xj ).

(1.15)

We wish to develop the transformation laws for various geometric quantities under the change
of coordinates (1.15). When dealing with transformation laws, it is convenient to label the
functions F i as xi and write (1.15) as
xi = xi (xj ).
4

(1.16)

Tangent Vector of a Curve:


A curve xi = xi ( ) is given in xi coordinates by
xi = xi (xj ( )),
on using (1.16). By the chain rule, we have
d
xi
xi dxj
= j
.
d
x d
Introducing the notation
Ai j =

xi
xj

(1.17)

(1.18)

we can write (1.17) as


d
xi
dxj
= Ai j
.
(1.19)
d
d
Here, A = (Ai j ) is the derivative matrix of the coordinate transformation (1.15). Equation
i
(1.19) is the transformation law for the components of the tangent vector dx
.
d
Gradient of a Scalar Field:
The value of a scalar field (xi ) at a given point is the same in all coordinate systems, but
relative to a coordinate system {
xj }, the scalar field is described by a different function of
xj ), i.e. we have
the coordinates, (
xj ) = (xi )
(
where xi = xi (
xj ). Substituting for xi gives
xj ) = (xi (
(
xj )).
By the chain rule,


xi
=
.
xj
xi xj

Introducing the notation


B ij =

xi
,
xj

(1.20)

(1.21)

equation (1.20) reads


i
=
B
.
(1.22)
j
xj
xi
Equation (1.22) is the transformation law for the components of the gradient of . The
gradient is an example of a covariant vector field.
Exercise: Verify the key properties that
Ai k B kj = i j

and B ik Akj = i j .
5

(1.23)

What does this result mean in terms of matrices? Assume that in each of the arrays Ai j and
B i j , the upper index labels rows and the lower index labels columns.
Remark 1.2: Throughout the rest of the chapter, the matrices Ai j and B ij will be the
change of coordinate derivative matrices as defined by equations (1.18) and (1.21). We will
use the inverse properties (1.23) extensively.
The Metric Tensor:
Since the length of a vector is a scalar, it has the same value in any coordinate system, i.e.
d
xi d
dxa dxb
xj
= gab
.
d d
d d
Multiplying (1.19) by B ai and using (1.23) gives
gij

d
xi
dxa
= B ai
.
d
d
Substitution of (1.25) in (1.24) leads to

(1.24)

(1.25)

 d
xi d
xj
= 0.
d d
Since the tangent vector is arbitrary,1 we obtain the transformation law
gij B ai B bj gab

gij = B ai B bj gab

(1.26)

for the components of the metric tensor.


A tangent vector, a covariant vector, and the metric tensor are all examples of tensors,
and equations (1.19), (1.22), and (1.26) are the corresponding tensor transformation laws.
In general, a tensor of type (r, s) is an object which has components labelled
T i1 ir j1 js

(1.27)

relative to a given coordinate system {xk } in En . The components relative to another


coordinate system are denoted by
T i1 ir j1 js
and are given by a transformation law that generalizes (1.19), (1.22), and (1.26). We refer
to i1 ir in (1.27) as contravariant indices and to j1 js as covariant indices. A scalar
can be thought of as a tensor of type (0,0).
We illustrate with some examples.
1

Here and elsewhere in this chapter we use a cancellation property from linear algebra. In its simplest
form it states that if
u v = 0 for all v En , then u = 0.
In index notation, if ij ui v j = 0 for all v j , then ui = 0. More generally, if Aij = Aji and
Aij v i v j = 0 for all

vi,

then

Aij = 0.

Vector V i (a (1,0)-tensor):
The components V i transform according to the law
V i = Ai j V j .

(1.28)

(Compare with equation (1.19).)


Covariant vector Wj (a (0,1)-tensor):
The components Wj transform according to the law
j = B i Wi .
W
j
(Compare with equation (1.22).)
A (2,0)-tensor:
The components F ij transform according to the law
F ij = Ai a Aj b F ab .
A (1,2)-tensor:
The components T ijk transform according to the law
T ijk = Ai a B bj B ck T abc .

(1.29)

Remark 1.3: The general rule as regards the tensor transformation law is that the matrix
Ai j acts on each contravariant index, and the matrix B ij acts on each covariant index.

1.1.3

Operations on Tensors

In this section, we introduce three ways of creating new tensors from old.
The Operation of Multiplication:
The operation of multiplication combines an (r1 , s1 )-tensor and an (r2 , s2 )-tensor to form an
(r1 + r2 , s1 + s2 )-tensor. Here is an example.
Example 1.2: Let U i be the components of a vector in some coordinate system and let
Vjk be the components of a (0,2)-tensor in the same coordinate system. Verify that
T ijk = U i Vjk
are the components of a (1,2)-tensor.
7

(1.30)

Solution: By (1.30),
Tijk = U i Vjk
= Ai a U a
i

=A


B bj B ck Vbc ,

by (1.26) and (1.28)

b
c
a
a B j B k T bc .

The components T ijk thus transform according to the law (1.29) for the components of a
(1,2)-tensor.

The Operation of Contraction:
The operation of contraction converts an (r, s)-tensor into an (r 1, s 1)-tensor. Here is
an example.
Example 1.3: Let K ijk be the components of a (1,3)-tensor. Define
Lj = K iji ,

(1.31)

where the repeated index pair indicates summation over i from 1 to n, in accordance with
the Einstein summation convention. Verify that Lj are the components of a (0,2)-tensor.
Solution: By definition of a tensor, the components K ijk transform according to
i = Ai B b B c B d K a .
K
jk
a
j
k

bcd

(1.32)

Thus,
j = K
i ,
L
ji

by (1.31)

= Ai a B bj B ci B d K abcd ,
=
=
=

by (1.32)

ca B bj B d K abcd ,
B bj B d K abad
B bj B d Lbd

by (1.23)

which is the transformation law for the components of a (0,2)-tensor.

Exercise: Let T ij be the components of a (1,1)-tensor. Show that T ii is a scalar, i.e. that
T ii = T ii .

The Operations of Raising and Lowering Indices:


The metric tensor gij can be used to obtain a new form of a tensor by a process known as
raising or lowering an index.
Example 1.4: Consider the (1,2)-tensor with components S jk . Verify that
Sik = gij T jk

(1.33)

are the components of a (0,3)-tensor.


Solution: Define
Lm ijk = gij S m k .
Firstly, by the operation of multiplication, Lm ijk form the components of a (1,4)-tensor.
Then, the components Sik are given by the operation of contraction,
Sik = Lj ijk ,
and hence form the components of a (0,3)-tensor.

Remark 1.4: When the metric tensor is used to lower an index as in (1.33), it is customary
to use the same letter to denote the new tensor, i.e. we write (1.33) as
Tik = gij T jk .

(1.34)

We now introduce the contravariant metric tensor in order to raise an index. The components of the contravariant metric tensor are denoted g ij , and are defined by the equation
g ik gkj = i j

(1.35)

in a given coordinate system {xi }. If we regard (gij ) as an nn matrix (i being the row, j the
column), then the corresponding matrix (g ij ) is the matrix inverse of (gij ), since ( i j ) is the
identity matrix. The components of the contravariant metric tensor in any other coordinate
system {
xk } are defined using the (2,0)-tensor transformation law, i.e.,
gij = Ai a Aj b g ab .
It then follows that
gik gkj = i j

(1.36)

which is consistent with (1.35). We leave the derivation of (1.36) as an exercise.


As an example of raising an index, the equations
Fi j = g jk Fik

(1.37)

convert the (0,2)-tensor with components Fik into the (1,1)-tensor with components Fi j .
9

Terminology: We have seen that a tensor is a mathematical object defined by giving an


array of components relative to a specific coordinate system. The components of the tensor
relative to another coordinate system are defined by the appropriate tensor transformation
law. When specifying a tensor we should write, for example: Let T ij be the components
of a tensor of type (2,0) relative to the coordinate system {xk }. In practice, we will often
write more briefly: Let T ij be a tensor of type (2,0).

1.2

Differentiation of Tensors

1.2.1

The Covariant Derivative

Central to calculus is the idea of a derivative, and thus we need to be able to differentiate
tensors.
Question: Do the partial derivatives of the components of a tensor form a tensor?
To show that this is not the case, consider the array of partial derivatives
i
V . Under a change of coordinates (1.16),

V i

= j Ai a V a
j
x
x
 b

i
a x
= b A aV
x
xj
a
V
= Ai a B bj b + Ai ab B bj V a ,
x

V i
xj

of a vector

by (1.28)
by the chain rule
by the product rule, (1.18), and (1.21),

where
Ai ab =

(1.38)

2 xi
.
xa xb
i

do not form the


Since the second term is not zero in general, the partial derivatives V
xj
components of a tensor.
In order to remedy this deficiency, we are going to construct a new type of derivative
operator called a covariant derivative, denoted by j . We require that j has the following
properties.
1) j is a linear operator,
2) j satisfies the Leibniz (product) rule for derivatives,
3) if T is a tensor of type (r, s), then j T is a tensor of type (r, s + 1), and
4) the covariant derivative of a scalar f is the gradient of f
j f =

10

f
.
xj

Remark 1.5: Properties 1) and 2) are essential for any operation of differentiation. Property
3) is motivated by our goal to define a tensorial derivative, while property 4) is reasonable
in view of property 3) and the fact that the gradient of a scalar field is a covariant vector
(see (1.22)).
Covariant Derivative of a Vector:
We begin with the case of a vector V i . Since the covariant derivative is intended to be a
modified partial derivative and is required to be linear, we consider an expression of the form
V i
+ i jk V k .
(1.39)
j V i =
xj
The functions i jk are called the connection coefficients. First we will use the requirement
3) to determine how the i jk transform under a change of coordinates.
Proposition 1.1: If j V i is a tensor of type (1,1), then the connection coefficients i jk
transform according to
i = Ai B b B c a bc + Ai B a

(1.40)
jk
a
j
k
a
jk
where
B ajk

2 xa
= j k.
x x

Proof: We begin by interchanging barred and unbarred coordinates in (1.28) and (1.38),
which gives
V c = B ck V k
(1.41)
and

c
V a
d
a V
=
A
B
+ Acb B ack V k .
(1.42)
b
c
xb
xd
The transformation law for a (1,1)-tensor applied to j V i in equation (1.39) gives

 a
V i i k
V
i
b
a
c
+ jk V = A a B j
+ bc V
xj
xb


c
c
a k
a
c k
d
a V
i
b
, by (1.41) and (1.42)
= A a B j A b B c d + A b B ck V + bc B k V
x
V c
= i c dj d + cj Ai a B ack V k + Ai a B bj B ck a bc V k ,
by (1.23)
x
V i
=
+ Ai a B bj B ck a bc V k + Ai a B ajk V k .
xj
Thus,
 i

jk Ai a B bj B ck a bc + Ai a B ajk V k = 0.

Since this is true for any V k , we obtain the desired transformation law.

We will now use properties 2) and 4) to derive expressions for the covariant derivative of
other types of tensors. We illustrate the method for two special cases, which will establish
the general pattern.
11

Covariant Derivative of a Covariant Vector:


Wi
j Wi =
k ji Wk
xj

(1.43)

Derivation: By property 4),


(V i Wi )
.
(1.44)
xj
We expand both sides of (1.44), using property 2) and the usual product rule, to obtain


 i
Wi
V
i
i
i
Wi + V
.
(j V )Wi + V (j Wi ) =
xj
xj
j (V i Wi ) =

Substitute for j V i from (1.39) and simplify:




Wi
i
+ i jk V k Wi = 0.
V j Wi
j
x

(1.45)

Relabel dummy indices:


i jk V k Wi = k jiV i Wk .
Equation (1.45) becomes



Wi
k
V j Wi
jiWk
= 0.
xj
i

Since V i is arbitrary, we obtain (1.43).

Covariant Derivative of a (0,2)-Tensor:


k Tij =

Tij
ik Tj jk Ti
xk

(1.46)

Derivation: [Outline] Consider the scalar


Tij V i W j
where V i and W j are arbitrary vectors. By applying properties 2) and 4) and using equation
(1.39) for k V i and k W j , one obtains (1.46). The details are left as an exercise.

Notation: In differential geometry, a comma and a semi-colon are often used as shorthand
notations for a partial derivative and a covariant derivative, respectively. For example,
j V i = V i ;j

and

V i
= V i ,j ,
xj

so that (1.39) can be written as


V i ;j = V i ,j + i jk V k .

12

(1.47)

1.2.2

Christoffel Symbols

At this stage, the connection coefficients i jk are arbitrary quantities, being restricted only
by the transformation law (1.40). One way to uniquely specify the connection coefficients is
to impose two additional requirements.
1) Symmetry property:
i jk = i kj

(1.48)

2) Metric-preserving property:
k gij = 0,
or, equivalently,
gij;k = 0,

(1.49)

in terms of the notation (1.47).


The second property implies that covariant differentiation commutes with the operations of
raising and lowering indices. For example, if
Vi = gij V j ,
then
k Vi = k gij V j

= gij k V j + (k gij ) V j

= gij k V j .

Proposition 1.2: If the connection coefficients i jk satisfy the symmetry property (1.48)
and the metric preserving property (1.49), then
i jk =

1
2

g i (gj,k + gk,j gkj,).

(1.50)

Proof: Write out the three equations


gj;k = 0,

gk;j = 0,

and g kj; = 0,

(1.51)

in full, using the definition of covariant differentiation, form their sum, multiply by g i , and
use the symmetry property. The desired result (1.50) follows.

Exercise: Verify the derivation of (1.50), starting with (1.51).
The connection coefficients defined by (1.50) are called the Christoffel symbols of the
second kind.

13

Example 1.5: Show that for the Euclidean metric tensor in En , i jk = 0 relative to
Cartesian coordinates.
Solution: The line-element is
ds2 = ij dxi dxj .
That is, gij = ij . Thus, gij,k = 0 and from (1.50) we have i jk = 0.

Example 1.6: Show that for the Euclidean metric tensor in E2 , the non-zero Christoffel
symbols relative to polar coordinates (x1 , x2 ) = (r, ) are
1 22 = x1

and 2 12 = 2 21 =

1
.
x1

(1.52)

Solution: From equation (1.11), we know that




1
0
(gij ) =
.
0 (x1 )2
It follows from (1.35) that
ij

(g ) =


1
0
.
0 (x1 )2

The only non-zero partial derivative of the components of the metric tensor is
g22,1 = 2x1 .
The non-zero Christoffel symbols follow from the formula (1.50).

Example 1.7: Show that


 
1
(V ) =
0
i

(1.53)

relative to Cartesian coordinates is a vector field with zero covariant derivative,


V i ;j = 0.

(1.54)

Solution: The desired result follows from (1.39) since


i jk = 0 and V i ,j = 0.

Exercise: Show that the vector field (1.53) has components
!
2
cos x
(V i ) =
2
sinx1x
14

(1.55)

relative to polar coordinates (x1 , x2 ) = (r, ). Using the Christoffel symbols from equation (1.52), show that
V i;j = 0
(1.56)
relative to polar coordinates.
Remark 1.6: The result (1.56) follows directly from (1.54) and the tensor transformation
law, but it is not obvious from simply inspecting (1.55).

1.3

Differential Geometry in a Riemannian Space

In this section we introduce the notion of an n-dimensional Riemannian space V n , i.e. a set
with the following properties:
1) the points of V n are labelled by n coordinates {xi }ni=1 ,
2) two different systems of coordinates {xi }ni=1 and {
xi }ni=1 are related by equations of the
form
xi = xi (xj ),
and
3) there is a concept of distance along a curve joining two points, defined by a metric
tensor.
The simplest example of a Riemannian space is a surface embedded in E3 , for example the
surface of a sphere, which we will now discuss.

1.3.1

A Surface as a 2-D Riemannian Space

Recall (AMATH 231) that a surface in E3 can be described by a vector equation


X = G(x1 , x2 ),

(1.57)

where X = (X 1 , X 2 , X 3 ) are Cartesian coordinates, xi (i = 1, 2) are parameters that label


points on the surface, and G is a function that maps E2 E3 .
The Euclidean metric tensor in E3 gives rise to a metric tensor gij on the surface V 2 as
follows. Consider a curve C on the surface, given by
xi = xi ( ),

[a, b].

(1.58)

This curve can equally well be thought of as a curve in E3 described by substituting (1.58)
in (1.57):
X = G(xi ( )).

15

x2

X3
D

F
x2

11
00
00
11
00
11
00
11
00
11
0
1
0
1
F
0
1
x1
0
1
00
11
0
1
1111
0000
0
1
00
11

1
0

V2

x1

X2

X1
Figure 1.1: A surface as a function of two parameters.

By the chain rule, the tangent vector of this curve is


dX
dxi G
=
d
d xi
where summation occurs over i = 1, 2. In terms of the metric in E3 , the square of the
magnitude of dX
is
d
G G dxi dxj
dX dX

.
(1.59)
d
d
xi xj d d
We now introduce the notation
G G

(i, j = 1, 2),
(1.60)
gij =
xi xj
so that (1.59) assumes the form
dX dX
dxi dxj

= gij
.
d
d
d d
The functions gij , as defined by (1.60), are the components of the metric tensor of the surface
V 2 relative to the coordinate system {xi }.

Remark 1.7: Equation (1.60), giving the components of the metric tensor on a surface V 2
relative to arbitrary coordinates xi on V 2 , should be compared with equation (1.8) giving
the components of the Euclidean metric tensor on E2 relative to arbitrary coordinates. Note
that the functions F and G have different ranges, i.e. F : E2 E2 , while G : E2 E3 . The
common feature is that (gij ) is a symmetric, positive definite 2 2 matrix.2
2

A matrix C is positive definite if u 6= 0 = uT Cu > 0.

16

As in E2 , we can specify the metric tensor on V 2 by giving the line-element


ds2 = gij dxi dxj ,

i, j = 1, 2.

Example 1.8: Derive the metric tensor and line-element for the 2-sphere of radius b,
denoted by S 2 , relative to spherical coordinates (, ) on S 2 .
Solution: For S 2 , X = G(x) is given by


X1
b sin cos
X 2 = b sin sin ,
b cos
X3

where (X 1 , X 2 , X 3 ) are Cartesian coordinates and (x1 , x2 ) = (, ) are spherical coordinates


on the sphere. The coordinate tangent vectors are

b cos cos
G
= b cos sin
x1
b sin

and

b sin sin
G
= b sin cos .
x2
0

Using (1.60), the components of the metric tensor are given by


 2

b
0
(gij ) =
.
0 b2 sin2
The line-element is thus
ds2 = b2 (d2 + sin2 d2 ).

(1.61)

Remark 1.8: Note that this expression can be derived directly from the line element (1.12)
in E3 by setting r = b, which implies that dr = 0.


1.3.2

n-Dimensional Riemannian Space V n

A Riemannian space V n is a space whose points are described by coordinates xi (i = 1, . . . , n)


on which there is defined a notion of distance given by a metric tensor with components gij
relative to this coordinate system. As in En , the metric tensor components can be specified
by giving the line element
ds2 = gij dxi dxj
where summation occurs over i, j = 1, . . . , n.
The metric tensor defines a notion of distance along a curve in V n . If C is a curve, defined
by
xi = xi ( ), 1 2 ,
17

the distance s along C is


s=

2
1

gij

dxi dxj
d.
d d

Equivalently, the rate of change of distance s with respect to time is


r
ds
dxi dxj
= gij
.
d
d d

(1.62)

If we use arclength as the parameter, i.e. = s, then (1.62) yields


gij

dxi dxj
= 1,
d d

i.e. the tangent vector is a unit vector.


The whole machinery of tensor algebra and covariant differentiation of tensors in En , as
developed in Sections 1.1 and 1.2, can be used in the more general setting of a Riemannian
space V n . In particular, under a coordinate transformation
xj = xj (xi ),
the components of the metric tensor transform according to
gij = B ai B bj gab .

(1.63)

As in En , there is a contravariant metric tensor with components g ij that satisfy (1.35). We


can use g ij and gij to raise and lower indices on tensors as in equations (1.34) and (1.37).
Remarks 1.9:
(i) It is assumed that the components gij of the metric tensor relative to the coordinate
system {xi } form a symmetric, positive definite n n matrix. It follows from (1.63)
that the components gij have the same property.
(ii) One important example of an n-dimensional Riemannian space is a hypersurface V n
embedded in En+1 , defined by an equation that generalizes (1.57):
X = G(x1 , . . . , xn ).
The metric tensor in En+1 induces a metric tensor on the hypersurface V n , given by
gij =

G G

xi xj

(i, j = 1, . . . , n),

(1.64)

which generalizes (1.60).


We now give an example of a metric tensor on a 3-D Riemannian space embedded in E4 ,
that is of interest in cosmology.
18

Example 1.9: A 3-sphere S 3 of radius b can be viewed as a hypersurface embedded in E4 ,


given in Cartesian coordinates (X 1 , X 2 , X 3 , X 4 ) = (W, X, Y, Z) by
W 2 + X 2 + Y 2 + Z 2 = b2 .
The 3-sphere can be defined by the equations

W
b sin sin sin

X b sin sin cos


=
,
Y b sin cos
Z
b cos

(1.65)

where 0 , 0 2, and 0 . Find the components of the metric tensor on


S 3 induced by the Euclidean metric in E4 relative to the coordinates (x1 , x2 , x3 ) = (, , ).
Solution: The function X = G(x) is given by (1.65). The coordinate tangent vectors are

b cos sin sin


b sin cos sin

G
G
b cos sin cos
b sin cos cos
=
=
,
,
x1 b cos cos
x2 b sin sin
b sin
0

b sin sin cos

G b sin sin sin

and
=
.
3
x
0

0
By equation (1.64), the components of the metric tensor are given by

(gij ) =

G G

xi xj

b
0
0
.
= 0 b2 sin2
0
2
2
2
0
0
b sin sin

Thus the line-element has the form


ds2 = b2 [d2 + sin2 (d2 + sin2 d2)].

(1.66)


1.3.3

The Riemann Curvature Tensor

We know that in En , one can introduce coordinates X i (i.e. Cartesian coordinates) such
that the metric tensor has components
gij = ij .
19

Equivalently the line-element has the form


ds2 = ij dX i dX j = (dX 1 )2 + . . . + (dX n )2 .
In contrast, in a Riemannian space V n it is not possible in general to introduce coordinates
with this property. For example, we have seen that the metric tensor in E2 can be written
ds2 = dr 2 + r 2 d2 = dX 2 + dY 2 .
On the other hand, for the metric tensor on the 2-sphere S 2 , as given by (1.61),
ds2 = b2 (d2 + sin2 d2 ),
it is not possible to find coordinates X, Y such that
ds2 = dX 2 + dY 2 .
(Any attempt to do so leads to a contradiction.)
It is natural to ask: given a metric tensor
ds2 = gij dxi dxj ,

(1.67)

is there a tensor restriction which, when satisfied, ensures that the given metric tensor can
be written in the form
ds2 = ij dX idX j ?
(1.68)
We reach this goal by associating with a metric tensor a (1,3)-tensor Ri jk , called the Riemann
curvature tensor, with the property that if Ri jk = 0, then a metric (1.67) can be transformed
into the Euclidean form (1.68). We will see that this tensor also describes the curvature of
the Riemannian space V n (hence its name). For example, we will show that for a 2-sphere
S 2 , Ri jk 6= 0, corresponding to the fact that a 2-sphere is curved, whereas for a plane E2 ,
Ri jk = 0, corresponding to the fact that it is flat i.e. has zero curvature.3
We arrive at the definition of the Riemann curvature tensor by considering second covariant derivatives. We know that partial derivatives commute, for example
T,mn T,nm = 0,
using the comma notation for partial derivatives, i.e.
T,mn =

2 T
.
xn xm

We now show that covariant derivatives do not commute in general, by calculating


T;mn T;nm ,
3

(1.69)

A sphere is intrinsically different from a plane in that a piece of a sphere cannot be flattened onto a
plane without stretching.

20

where we are using the semi-colon notation


T;mn = n m T .
From (1.46),
T;mn = (T;m ),n p n Tp;m p mn T;p

= (T,m q m Tq ),n p n (Tp,m q pm Tq ) p mn T;p

by (1.43)

= T,mn p mn T;p (p m Tp,n + p n Tp,m ) q m,n Tq + p n q pm Tq ,

(1.70)

after using the product rule for partial derivatives and rearranging the terms. Observe that
the first two terms and the bracketed term in (1.70) are symmetric in m and n. Form the
difference (1.69), and after cancelling the symmetric terms we obtain
T;mn T;nm = Rqmn Tq ,

(1.71)

Rqmn = q n,m q m,n + p n q pm p m q pn

(1.72)

where
is called the Riemann curvature tensor.
Remark 1.10: It is important to note that the components of the Riemann curvature tensor
depend on the metric tensor gij and its first and second partial derivatives gij,k and gij,k as
follows from the definition (1.50) of the Christoffel symbols.
We can now answer the question posed in equations (1.67) and (1.68).
Proposition 1.3: Let V n be a Riemannian space with metric tensor gij relative to coordinates xi . If Ri jk 6= 0, then there do not exist coordinates xi such that
gij dxi dxj = ij d
xi d
xj .

Proof: We give a proof by contradiction. Suppose coordinates xi exist such that


gij = ij .
i = 0 and hence by the definition (1.72), R
i = 0. It follows from the
Then by (1.50),
jk
jk
transformation law that Ri jk = 0, a contradiction.

Proposition 1.4: Let V n be a Riemannian space with metric tensor gij . If Ri jk = 0, then
there exists coordinates xi such that
gij = ij .

Proof: The proof is beyond the scope of this introduction.


21

Remark 1.11: If Ri jk = 0, then we can regard the Riemannian space V n as being locally
equivalent to Euclidean space En , in the sense that its intrinsic geometry is the same as En .
To illustrate the need for the word locally, we mention that a cylindrical surface in E3
is locally flat (i.e. Ri jk = 0). This fact can be verified geometrically by observing that a
piece of a cylinder can be rolled flat onto a plane without stretching the surface i.e. without
changing distances. One can also verify this analytically (see the following example).
Exercise: Verify that the metric of a cylinder of radius b,
x2 + y 2 = b2 ,
in E3 is given by
ds2 = b2 d2 + dz 2 ,
relative to cylindrical coordinates (, z) on the cylinder. Hence verify that the Riemann
curvature tensor is zero.
Symmetry Properties:
The Riemann curvature tensor has a number of symmetry properties that significantly reduce
its number of independent components. Firstly, Ri jk is antisymmetric in its last two indices
Ri jk = Ri jk .

(1.73)

Secondly, Ri jk has a cyclic symmetry in its covariant indices


Ri jk + Ri kj + Ri jk = 0.

(1.74)

These identities follow directly from the definition (1.72) (exercise). The fully covariant
Riemann tensor has an additional symmetry,
Rijk = Rkij .

(1.75)

The derivation of this identity involves lengthy manipulations with the Christoffel symbols
and is omitted (ref: Synge and Schild 1978, p. 85). Finally, lowering the index in (1.73) and
using (1.75) leads to
Rijk = Rjik .
(1.76)
It can be shown using (1.73), (1.74), (1.75), and (1.76), in conjunction with combinatorial
arguments,4 that the number of independent components of Ri jk is
N(n) =

n2 (n2 1)
.
12

Note in particular that N(2) = 1, which we will make use of in Section 1.3.4
4

Note that equations (1.73) and (1.74) immediately imply the corresponding symmetry properties for
Rijk i.e.
Rijk = Rijk and Rijk + Rikj + Rijk = 0.

22

Ricci and Einstein Tensors:


The Ricci tensor is defined by
Rj = Ri ji .

(1.77)

Performing a second contraction gives the Ricci scalar,


R = g ij Rij .

(1.78)

We can now define the Einstein tensor, which plays a fundamental role in general relativity:
Gij = Rij 12 Rgij .
It follows from (1.75) that Rij is a symmetric tensor,
Rij = Rji ,
and hence so is the Einstein tensor.
Bianchi Identities:
The Riemann tensor also satisfies an identity involving differentiation, called the Bianchi
identity:
Rijk;m + Rijm;k + Rijmk; = 0
(1.79)
The derivation of this identity involves lengthy manipulations with the Christoffel symbols
and is omitted (ref: Synge and Schild 1978, p. 87).
The Bianchi identity leads to an identity involving the Einstein tensor, that is of importance in general relativity.
Proposition 1.5: The Einstein tensor Gi j satisfies
Gi j;i = 0.

(1.80)

Proof: Multiply (1.79) by g i g jk and use the definitions (1.77) and (1.78) and the symmetry
property (1.73). One obtains R;m 2Ri m;i = 0 which is equivalent to (1.80).


1.3.4

Curvature of a 2-D Riemannian Space

In a 2-D Riemannian space the curvature tensor has only one independent component, R1212
since
R2112 = R1212 , R2121 = R1212 , and R1221 = R1212
and the remaining components are identically zero. We can thus describe Rijk by a single
function K, which we define by writing
R1212 = K(g11 g22 g12 g12 ).
23

(1.81)

We now claim that all the components of Rijk are given by


Rijk = K(gik gj gi gjk ).

(1.82)

To verify (1.82) consider the tensor


Gijk = gik gj gi gjk .
It follows (exercise) that Gijk has all the symmetry properties (1.75) and (1.76) of Rijk .
Thus since (1.82) is valid for the one independent, non-zero R1212 by (1.81), it will hold for
all non-zero components. Likewise, Gijk has the same zero components as Rijk . Thus (1.82)
is valid for all components. The scalar K in (1.82) is called the Gaussian curvature of V 2 .
Note that (1.81) can be written in the form
R1212 = Kg,

g = det (gij ).

If the metric components satisfy g12 = 0 relative to some coordinate system (called an
orthogonal coordinate system), then there is a simple formula for K:





1

1 g22
1 g11

K=
+ 2
.
(1.83)

2 g x1
g x1
x
g x2
This formula can be derived by specializing the definition (1.72) of Ri jk to the case n = 2
and writing the Christoffel symbols in terms of the metric components and their partial
derivatives. It is clear that the Gaussian curvature depends on the metric tensor components
and on their first and second partial derivatives.
Example 1.10: Show that the Gaussian curvature of a 2-sphere S 2 of radius b is
K=

1
.
b2

Solution: Relative to spherical coordinates (, ), the metric on S 2 is given by (1.61):


ds2 = b2 (d2 + sin2 d2 ).
Labelling (x1 , x2 ) = (, ) we have
g11 = b2 ,

g22 = b2 sin2 ,

The desired result follows quickly from (1.83).

and g = b4 sin2 .


Example 1.11: Calculate the Gaussian curvature K of a torus. Determine at what points
K is a maximum and a minimum.
Solution: To generate a torus, rotate a circle of radius a, lying in the xz-plane, with centre
x = b, z = 0 (b > a), about the z axis. The angle , 0 2, that describes the rotation
is measured from the y-axis.
24

(b + a cos , a sin )

1
0

1
0

(b, 0)

(x b)2 + z 2 = a2

Figure 1.2: Coordinates for a torus.

The torus is then given by X = G(x), where


1

X
(b + a cos ) cos
X 2 = (b + a cos ) sin ,
a sin
X3
where (x1 , x2 ) = (, ) and b > a. The coordinate tangent vectors are

a sin cos
G
= a sin sin
x1
a cos

(b + a cos ) sin
G
= (b + a cos ) cos .
x2
0

and

The components of the metric tensor are


  2


F F
a
0
=
(gij ) =

.
xi xj
0 (b + a cos )2
The line-element is thus
ds2 = a2 d2 + (b + a cos )2 d2 .
It follows from (1.81) that
K=

cos
.
a(b + a cos )

(1.84)

By inspection, the maximum K occurs when cos = 1 and the minimum occurs when
cos = 1, i.e.
1
1
and Kmin =
.
Kmax =
a(b + a)
a(b a)

25

Figure 1.3: Gaussian curvature on a torus.

Remark 1.12: One can gain an intuitive understanding of curvature from this example. At
a point on the outer circumference of the torus, we have = 0 and hence by (1.84) the
curvature K is positive. This result means that in a neighbourhood of the point the surface
has the shape of a sphere (surface of constant positive curvature). On the other hand, at
a point on the inner circumference, we have = and hence by (1.84) the curvature K is
negative. This means that in a neighbourhood of the point the surface has the shape of a
surface of constant negative curvature (like a saddle).

1.4

Geodesics

In this section we introduce the notion of geodesics in a Riemannian space. A geodesic


in a Riemannian space V n is the analogue of a straight line in Euclidean space En , i.e. a
curve of shortest distance between two points. Geodesics play a fundamental role in general
relativity.

1.4.1

The Geodesic Equations

In order to introduce the notion of a geodesic, we need to present another type of derivative known as the absolute derivative, which can be thought of as a generalization of the
26

directional derivative along a curve in En .


Absolute Derivative of a Vector:
Definition 1.1: Let V i be a vector field in a Riemannian space V n and let C : xi = xi ( )
be a curve. The absolute derivative of the vector field along the curve is defined by
 dxj
dxj
DV i
= V i ;j
= V i ,j + i jk V k
.
D
d
d

(1.85)

Remarks 1.13:
j

(i) Since V i ;j is a tensor and dx


is a vector, it follows from the definition that
d
vector (contraction of tensors).

DV i
D

is a

(ii) The absolute derivative of an arbitrary tensor field is defined in an analogous fashion
D()
dxj
= ();j
.
D
d

(1.86)

For a scalar field f ,

Df
dxj
= f,j
,
D
d
which in E3 is the familiar directional derivative.
(iii) If we regard the vector field V i (xj ) as defining a vector at each point of the curve
xi = xi ( ), i.e.
V i ( ) = V i (xj ( )),
then

j
dV i
i dx
= V ,j
.
(1.87)
d
d
It follows from (1.85) and (1.87) and using the formula (1.39) for the covariant derivative
that
DV i
dV i
dxk
=
+ i jk V j
.
(1.88)
D
d
d

The Geodesic Equations:


In Euclidean space En , a curve joining two points P and Q has the shortest length if and
only if it is a straight line. In differential geometry, a curve of shortest length joining two
points is called a geodesic. To extend the notion of a geodesic to a Riemannian space V n ,
we need to characterize a geodesic in En in a way that can be generalized to V n .

27

In En , a straight line (a geodesic) has constant unit tangent vector. Thus, relative to
Cartesian coordinates we have


d dX i
= 0.
d
d

Relative to arbitrary coordinates, the fact that the tangent vector is constant is expressed
using the absolute derivative:


D dxi
= 0.
D d
We can now define a geodesic in a Riemannian space V n .

Definition 1.2: A geodesic in a Riemannian space V n is a curve C : xi = xi ( ) that satisfies




D dxi
= 0,
(1.89)
D d
i.e. its tangent vector is constant along the curve.
Remarks 1.14:
(i) It follows from (1.89) that
dxi dxj
= C,
d d
where C is a constant (exercise, using (1.86) applied to gij ). If we use arclength as the
parameter then
dxi dxj
= 1.
(1.90)
gij
d d
(See section 1.3.2.)
gij

(ii) If we use the detailed expression (1.88) for the absolute derivative, equation (1.89)
reads
j
k
d2 xi
i dx dx
+

= 0,
(1.91)
jk
d 2
d d
which we shall refer to as the geodesic equation. The geodesic equation is a system of
second order, nonlinear DEs for the curve xi ( ).
The following result is of importance in general relativity because as we shall see, geodesics
describe the motion of particles under the influence of gravity.
Proposition 1.6: (Uniqueness property of geodesics) Given a point in V n with coordinates
xi0 and a unit vector T i at this point, there is exactly one geodesic xi ( ) that passes through
the point in the direction of the unit vector, i.e. that satisfies
xi (0) = xi0

and

dxi
(0) = T i .
d

(1.92)

Proof: Equation (1.92) provides initial conditions for the geodesic equations (1.91). Hence
the result follows from the existence-uniqueness theorem for second order ODEs.

28

Example 1.12: A great circle on a sphere S 2 is a circle on the sphere whose centre coincides
with the centre of the sphere. Show that great circles are geodesics on the sphere.
Solution: Given a great circle, we can choose the usual angular coordinates and so that
the great circle passes through the north and south poles. The great circle is then given by
= const.
We now show that = const satisfies the geodesic equations, and that the equations determine how depends on .
The line-element for the 2-sphere of radius b is given by equation (1.61):
ds2 = b2 (d2 + sin2 d2 ).
The contravariant metric tensor components are given by


1 1
0
ij
.
(g ) = 2
b 0 csc2
Labelling the coordinates (x1 , x2 ) = (, ), the only non-zero partial derivative of the components of the metric tensor is
g22,1 = 2b2 sin cos .
The non-zero Christoffel symbols follow from equation (1.50):
1 22 = sin cos

and 2 12 = 2 21 = cot .

The geodesic equation (1.91) now yields


sin cos 2 = 0

+ 2 cot = 0,

(1.93)
(1.94)

and the normalization condition (1.90) is


2 + sin2 2 =

1
.
b2

(1.95)

For the great circle = const, we have = 0 and = 0, so that (1.94) is satisfied.
Equation (1.93) gives = 0 i.e.
( ) = C1 + C2 .
The condition (1.95) gives C1 = 1b and imposing the initial condition (0) = 0 gives C2 = 0.

In summary, the great circle = const and = b satisfies the geodesic equation.
Exercise: Given a great circle on the 2-sphere one can choose coordinates , such that
the great circle is the equator. Verify that with this choice the geodesic equations (1.93) and
(1.94) are satisfied.
29

1.4.2

Geodesics and Curvature

Geodesics in Euclidean space En are straight lines. This implies that the geodesics that
emanate from one point diverge linearly and never intersect again. The geodesics on the
2-sphere S 2 behave quite differently, due to the fact that S 2 has non-zero curvature. In
particular the geodesics (the great circles) that emanate from one point on the sphere, say
the North Pole, initially diverge, but are then refocussed and intersect again at the South
Pole.
The influence of the curvature on the separation of neighbouring geodesics is described
by the so-called equation of geodesic deviation, which we now derive.
The Equation of Geodesic Deviation:
=const (geodesics)

xi

1
0
0
1

xi

=const

Figure 1.4: A 1-parameter family of geodesics xj = xj (, ).

We wish to describe how neighbouring geodesics deviate from one another. Consider a
1-parameter family of geodesics defined by
xj = xj (, )
where is the arclength along each geodesic and labels geodesics. Assume that the
i
j
parameters are chosen so that x
and x
are orthogonal, i.e.

gij

xi xj
= 0.

30

(1.96)

In this situation, the geodesic equation (1.91) gives




D xi
= 0,
D

(1.97)

for a fixed value of . We also have


D
D

xi

D
=
D

xi

(1.98)

which follows from the definition (1.85) of absolute derivative and the symmetry of i jk
(verify). Furthermore, one can use the commutation relation (1.71) to show that
D2V i
D2V i
xk x

= Ri jkV j
D D DD

(1.99)

(see the exercise to follow). Then







D 2 xi
D
D xi
=
D 2
D D



D
D xi
=
by (1.98)
D D



D xi
xj xk x
D
i
+ R jk
,
=
D D

i

. Finally, using (1.97), we arrive at the


the last step following from (1.99) with V i = x

equation of geodesic deviation




D 2 xi
xj xk x
i

R
= 0.
(1.100)
jk
D 2

i

The vector x
describes how the separation between neighbouring geodesics changes along

the geodesics, showing how the curvature of V n , described by Ri jk, influences the geodesics
in V n .
Exercise: Derive equation (1.99) by following the steps below.
(i) Raise the index in (1.71) and rearrange using (1.76) to obtain
V i ;k V i ;k = Ri jkV j .
(ii) Use the definition (1.85) to show that


 

x xk
x
D
D2V i
i x
i
i D
V ;
= V ;k
.
=
+ V ;
D D
D


D
(iii) Use (ii) in conjunction with (1.101) and (1.98) to calculate the difference
D2V i
D2V i

,
D D DD
leading directly to (1.99).
31

(1.101)

We now discuss the geometrical significance of the equation of geodesic deviation. Consider a 1-parameter family of geodesics
xj = xj (, )
and let

xi
=

(1.102)

for two neighbouring geodesics. The vector i is called the connecting vector. The equation
i

= 0
xi

= 0 +

connecting vector i =

xi

Figure 1.5: The connecting vector.

of geodesic deviation gives

D2i
xj xk
i

R
= 0.
(1.103)
jk
D 2

In a two-dimensional Riemannian space V 2 , it follows from (1.82), (1.96), and (1.102)
that (1.103) simplifies to
D2i
+ K i = 0,
(1.104)
2
D
where K is the Gaussian curvature.
In order to get information from this DE, we need to convert it to a scalar DE. Let V i
be a vector field that satisfies
DV i
= 0.
(1.105)
D
The scalar field defined by
= gij V i j
then is a measure of the separation between neighbouring geodesics. It follows from (1.104)
and (1.105) that satisfies the scalar DE
d2
+ K = 0.
d 2

32

If K is constant, i.e. the space has constant curvature, this DE can be solved explicitly in
the three cases K > 0, K = 0, and K < 0, subject to the initial condition (0) = 0:

A sin K , if K > 0
( ) = A, if K = 0
.
(1.106)

A sinh K , if K < 0

1
0
0
1

11
00
00
11

1
0
0
1
0
1

11
00
00
11
00
11

P
K>0

P
K=0

1
0
0
1

1
0
0
1
0
1

P
K<0

Figure 1.6: Separation of geodesics passing through a point P in a 2-D Riemannian space of
constant curvature K.
In the case of E2 (i.e. K = 0), this result implies that the separation between neighbouring
geodesics through a point increases linearly with respect to arclength along the geodesics.
For the case of a sphere S 2 (K > 0), this result confirms the conclusion that we reached
earlier, namely that two geodesics through a point P initially separate but are then refocussed
together and intersect again at the diametrically opposite point. Note that for a sphere of
radius b, the Gaussian curvature K is K = b12 , implying that if two geodesics intersect at
= 0, they intersect again at = b, i.e. after a distance of one-half of the circumference.
For a surface of constant negative curvature, equation (1.106) implies that the separation
between neighbouring geodesics grows more rapidly than linearly, and that refocussing does
not occur.
The equation of geodesic deviation is of fundamental importance in general relativity. In
general relativity, the Einstein field equations relate matter and energy to the curvature of
four-dimensional spacetime. The world-lines of free particles are postulated to be geodesics
of spacetime, i.e. the geodesic equations are the equations of motion. If the distance between
two nearby geodesics decreases, it appears that there is an attractive force acting between
the two particles whose world-lines in spacetime are described by those geodesics. In this
way, general relativity gives a mathematical description of gravity.

33

1.5

Looking Ahead: Why Tensors?

Tensor calculus has been in use for more than a century, having been developed by the
mathematicians Riemann, Ricci, and Levi-Civita, among others. It is one of the tools for
studying the geometry of surfaces in E3 , or more generally, of n-dimensional Riemannian
spaces. Einstein learned about tensor calculus from a colleague, the Swiss mathematician
Marcel Grossman, and in time realized that it would provide the mathematical framework
for extending his theory of special relativity to incorporate gravity. Thus was born the theory
of general relativity in 1916.
The tensor transformation law ensures that any tensor has the following property: if
all the components of the tensor are zero in one coordinate system then they are zero in
any coordinate system. This property enables us to use tensors to describe physical and
geometrical quantities in a way that is independent of the choice of coordinates. For example,
we have seen that the Riemann tensor, Ri jk , describes the curvature of a Riemannian space
V n . If V n = En , then the components Ri jk are zero in any coordinate system as well they
should be, since curvature is not introduced by changing coordinates. As a second example,
in continuum mechanics, the stresses in an elastic medium are represented by a symmetric
tensor ij in E3 . Again, the existence of stresses does not depend on the choice of coordinates,
leading to the need for a tensor representation. As a third example, in Einsteins theory of
special relativity an electromagnetic field is described by an antisymmetric tensor Fij in
four-dimensional spacetime.
It should be kept in mind that the values of the components of a tensor do depend on the
coordinate system. For this reason, one is interested in scalars formed from tensors by the
operations of multiplication, raising and lowering indices, and contraction, since the value
of a scalar at a point is independent of the coordinates used. Thus, for example, one forms
scalars such as
Rijk Rijk or Fij F ij .
In addition to representing physical and geometrical quantities, tensors are also essential
for formulating the laws of physics, since these laws should not depend on which coordinates
one decides to use. As a first example, we shall see that the Einstein field equations in
general relativity have the tensorial form
Gij = Tij ,
where Gij is the Einstein tensor in four-dimensional spacetime and Tij is the so-called energymomentum tensor, that describes the distribution of energy and momentum. As a second
example, in spacetime Maxwells equations have the tensorial form
F ab ;b = 0,
Fab;c + Fbc;a + Fca;b = 0,
in a region free of charge and current, where Fab is the electromagnetic field tensor and F ab
is its contravariant counterpart.
34

Spacetime and Lorentzian Metrics:


One additional mathematical idea is needed before we can formulate general relativity,
namely the idea of a Lorentzian metric, in contrast to a Riemannian metric. A Riemannian
metric is defined on a space in which all coordinates are viewed as spatial. A Lorentzian
metric is defined on spacetime, a four-dimensional set in which three coordinates, labelled
x, y, z, are spatial and one, labelled t, represents time. A Riemannian metric gij is required
to be positive definite (as a matrix, its eigenvalues are positive). A Lorentzian metric gij
is not positive definite.5 Instead, it has three positive eigenvalues associated with the three
spatial coordinates and one negative eigenvalue associated with the time coordinate. For
example, the line-element for the Euclidean metric in E3 (a Riemannian metric) has the
form
ds2 = dx2 + dy 2 + dz 2 ,
while the line-element for the Lorentzian metric in flat spacetime, called Minkowski spacetime
M4 , is
ds2 = c2 dt2 + dx2 + dy 2 + dz 2 ,
where c is the speed of light in vacuo. One of the main goals of the next chapter is to show
that the special theory of relativity is based on a Lorentzian metric.

It is important to note, however, that the results of tensor analysis and differential geometry that we
have developed for Riemannian metrics also hold for Lorentzian metrics.

35

Chapter 2
Special Relativity Theory &
Lorentzian Metrics
In this chapter we introduce the basic postulates of Special Relativity Theory, and show how
they lead to the well-known Lorentz transformation. We then formulate the theory within
the framework of space-time, as first done by H. Minkowski in 19??, and show how the
notion of a Lorentzian metric tensor arises. In this way, we lay the foundations for Einsteins
General Theory of Relativity.

2.1
2.1.1

The Relativity Principle & Light Propagation


Inertial frames

A frame of reference is a standard of rest relative to which measurements can be made and
experiments described. A frame of reference is said to be an inertial frame if a free particle
which is initially at rest remains at rest, and one which is initially in motion continues its
motion without change in velocity (i.e. Newtons first law holds relative to the frame). Note
that a free particle is one which is not acted on by any forces except gravity. Here are some
examples of inertial frames.
i) A space-ship outside the solar system moving with constant velocity relative to the
fixed stars.
ii) A freely-falling space-ship in the earths gravitational field. This frame of reference
should strictly speaking be called a local inertial frame, since the spatial extent of the
space-ship cannot be too large and the duration in time cannot be too large.
iii) An earthbound laboratory is not an inertial frame, but for experiments of very short
duration, it can be regarded as one.1
1

See Taylor and Wheeler, (1972) p. 73, #31.

37

Newtonian mechanics is based on the following principle.


Newtons Relativity Principle:
Two inertial frames moving with constant relative velocity cannot be distinguished by any mechanical experiment.

2.1.2

Light Propagation:

The propagation of light plays an essential role in relativity (e.g. distances can be measured
using light signals). From everyday experience, we know that light travels with an extremely
high speed, which has been determined to be approximately c = 3 108 m sec1 .
But to what frame of reference does this speed c refer? Maxwells theory says that light
is a wave phenomenon. And like other types of waves, its speed of propagation (relative to
some frame of reference) is experimentally found to be independent of the speed of the source
(e.g. observations of binary star systems, see Resnick (1968) p. 27). All other types of waves
propagate in some material medium (e.g. sound waves in air). So people postulated the
existence of a medium, called the ether in which light propagated, and c was the speed of
light relative to the ether. This means light would have a different speed relative to different
inertial frames.
In 1887, Michelson & Morley attempted to detect the presence of the ether by measuring
the speed of the earth relative to the ether along parts of its orbit, the earth would have to
be moving relative to the ether. They used an optical interferometer2 to obtain the necessary
accuracy, but no effect was measured, then or in subsequent experiments.
This null result essentially forces one to abandon the ether concept, and suggests that
experiments with light will not distinguish inertial frames in uniform relative motion (in a
vacuum), and leads to the following postulates.
Einsteins Relativity Principle:
Two inertial frames moving with constant relative velocity cannot be distinguished by any physical experiment.
The second postulate that one is led to make (and which is certainly compatible with
Einsteins Relativity Principle) is
Constancy of the Speed of Light:
The speed of light in vacuum has the same value c relative to all inertial frames.
2

The idea of the experiment was analogous to using measurements of the speed of sound emitted by a
source to detect the motion of the source relative to the air.

38

2.1.3

Events and Spacetime

The most basic concept in relativity is that of an event i.e. any occurrence idealized to take
place at a single point in space at a single instant in time.
e.g.

decay of an elementary particle


impact of a meteorite on earth
start of a supernova explosion

[laboratory scale]
[solar system scale]
[galactic scale]

Spacetime is defined to be the set of all events (relevant to the physical situation under
discussion).
Experience on earth and in the solar system tells us that four numbers (coordinates)
are needed to specify an event (three space coordinates and one time coordinate). Hence
spacetime is assumed to be a four-dimensional differentiable manifold.3 The existence of a
particle (or an observer) in spacetime is described by an infinite sequence of events, which we
assume to form a smooth curve (unless a collision occurs) called the worldline of the particle.
These then are the basic concepts in relativity theory. In addition, we will find that the
propagation of light signals lead one to define a Lorentzian metric tensor in spacetime.
As regards units, we will use the second as the unit of time, and we will take advantage
of the constancy of the speed of light c to choose the unit of length so that
c = 1.
Distances (at least in all the theory) will then be measured in seconds, for example, the
distance from the earth to the moon is 1.3 secs., meaning: a light signal will take 1.3 secs.
to travel from the earth to the moon. In addition, velocities and speeds are dimensionless,
for example, the speed of an astronaut circling the earth ( 18,000 mph) is .000026 (as a
fraction of the speed of light).
Let us now consider how a single observer O, at rest in some inertial frame, assigns
coordinates to events using light signals (radar) and a standard clock. For simplicity, we
initally consider events in a single spatial direction, and so only two coordinates x, t will be
needed.
Firstly, O assigns x = 0 to the event making up his worldline. Let P be any other event
in the specified spatial direction. We can think of P as being an event on the worldline of
some other observer. O sends out a radar pulse (light signal) so that it reaches the event
P , and is there reflected back to him. Let t1 , t2 be the emission and reception times of the
signal as measured by Os clock. Since the outward and return speeds of the signal are the
same (constancy of the speed of light), O assigns a distance of 12 (t2 t1 ) secs. and a time
3

A four-dimensional differentiable manifold is essentially a set that can be coordinated using four real
numbers.

39

of 12 (t2 + t1 ) secs. to the event P . Hence, O will assign coordinates (xP , tP ) to the event P ,
where
xP = 12 (t2 t1 )
tP = 12 (t2 + t1 )
(2.1)

The sign in the equation for xP corresponds to the fact that we are permitting O to aim
his radar in two diametrically opposite directions: xP is positive for events in one direction,
and negative for events in the other direction.
Convention:
We agree to represent x, t as Cartesian coordinates in the plane, and can then draw the
worldlines of various particles and observers, as in Figure 2.1
t
WL of an observer having
constant velocity relative to O

O2 passes O1

O3 is instantaneously
at rest relative to O

O1 receives light
signal from O
x

WL of a light signal
emitted by O (or by O2 )
straight line of slope 1

O3
O2 passes O

WL of an observer
accelerating relative
to O

O2

O1

O
WL of an observer having
zero velocity relative to O

Figure 2.1: A two-dimensional spacetime diagram

40

2.2

The Lorentz Transformation

In this section we derive the relation between the coordinates assigned by two observers O
and O at rest in two inertial frames moving with constant relative velocity v. This relation
is called the special4 Lorentz transformation.

2.2.1

Light Propagation and the k-factor

In order to proceed further we need the following postulate:


Maximality of the Speed of Light:
In vacuum, the speed of any material particle (and hence of any observer) relative to any
inertial frame is always less than the speed of light.5
Consider two inertial observers O1 , O2 , with constant relative velocity, who send and
receive light signals as shown.
t

Note:
t1 , t4 are measured on O1 s clock
t2 , t3 are measured on O2 s clock

t4 = k(v)t
3
O2

t3

O1

t2 = k(v)t1

t1
x
Figure 2.2: Emission and reception of light signals by observers in relative motion.
4
5

Special refers to the fact that we are at present restricting our considerations to one space dimension.
This postulate is well-supported by experiments with particle accelerators.

41


The Relativity Principle implies k(v)
= k(v), otherwise the inertial frames associated with O1
and O2 would be experimentally distinguished. Thus with each pair of inertial observers with
constant relative velocity v, there is associated a k-factor k(v), which relates time intervals
under propagation by light signals, according to
(t)received = k(v)(t)emitted

(2.2)

Convention: The relative velocity v is positive if O1 , O2 are receding from each other, and
is negative if O1 , O2 are approaching each other.

2.2.2

Relative Velocity and the k-factor

Let O1 reflect a radar pulse (light signal) off O2 as shown in Figure 2.3.

O1

O2

k 2 t0
O1 s clock reading

kt0

O2s clock reading

t0

A
O1 and O2 set their
clocks to read 0 here

Figure 2.3: A reflected light signal


O1 concludes that O2 has travelled a distance 12 (k 2 1)t0 secs. in a time of 12 (k 2 + 1)t0
secs. Thus the velocity v of O2 relative to O1 is
v=

k2 1
,
k2 + 1

where k is the k-factor for O1 and O2 .


This equation implies 1 < v < 1, as expected. Inverting to find k, we obtain
r
1+v
.
k(v) =
1v
42

(2.3)

(2.4)

This implies that


k(v) =

2.2.3

1
.
k(v)

Derivation of the special Lorentz transformation


t

T2
T2
P

t=x
(x, t)
(
x, t)

T1
T1
x

Figure 2.4: Coordinates assigned by observers in two different inertial frames.

Consider two inertial observers O and O, with constant relative velocity v > 0. These
observers assign coordinates (x, t) and x, t), according to (2.1):
x = 21 (T2 T1 ),

t = 21 (T2 + T1 ),

(2.5)

x = 21 (T 1 T 2 ),

t = 12 (T 2 + T 1 ).

(2.6)

The k-factor relates T 1 to T1 according to


T 1 = k(v)T1 ,

(2.7)

T2 = k(v)T 2

(2.8)

and relates T2 to T 2 according to

43

(see figure 2.4 and equation (2.2)). We now substitute T 1 and T 2 from (2.7) and (2.8) into
(2.6), and then express T2 and T1 in terms of x and t using (2.5). On using the expression
(2.4) for k(v) and simplifying we obtain the following transformation formulae:
x vt
x=
,
1 v2

t vx
t=
.
1 v2

(2.9)

These equations describe the special Lorentz transformation. Note that the t-axis is given
by x = 0 or x = vt, while the x-axis is given by t = 0 or t = vx. The t- and x-axis are thus
symmetrically placed with respect to the worldline of the light signal from the origin, which
is given by x = t, or x = t.

2.3

Minkowski Spacetime & Lorentzian Metrics

In this section we introduce the notion of spacetime separation between events, and show
how this leads to a so-called Lorentzian metric.

2.3.1

The Spacetime Separation between Events

Let P and Q be two events in two-dimensional spacetime. We have seen that two inertial
observers with constant relative velocity v, will assign different time differences tQ tP , tP tQ
and different spatial distances xQ xP , xQ xP to the pairs of events, where the barred and
unbarred coordinates are related by the Lorentz transformation (2.9).
An analogous situation exists when one uses different Cartesian coordinate systems
(x, y), (x, y), having the same origin, to specify points in the plane. Two points A, B will
have different x and y separations, namely xB xA , xB xA , and yB yA , y B y A relative
to the two coordinate systems. However, there is a quantity, namely the Euclidean distance
between A and B, which is an invariant, i.e. has the same value in both coordinate systems:
(xB xA )2 + (y B y A )2 = (xB xA )2 + (yB yA )2 .
The square of the Euclidean distance between two points A and B is given by
()2 = (x)2 + (y)2 ,

(2.10)

where x = xA xB and y = yA yB .
Question:
What is the corresponding expression that is invariant under the Lorentz
transformation?
Answer: We write the Lorentz transformation (2.9) in the form
x = t sinh + x cosh
t = t cosh x sinh ,
44

(2.11)

where is the so-called velocity parameter, defined by


v = tanh .
It is then straightforward to show that
(tP tQ )2 + (xP xQ )2 = (tP tQ )2 + (xP xQ )2 .
We thus define the square of the spacetime separation between two events P, Q by
(s)2 = (t)2 + (x)2 .

(2.12)

where
x = xP xQ ,

t = tP tQ ,

relative to any inertial frame.


Physical Interpretation of Space-time separation
i) If (s)2 < 0, we say the events P and Q have timelike separation. This means that
P and Q lie on the worldline of some inertial observer, i.e. P and Q have the same
spatial coordinates, xP = xQ , and that all observers agree on their order in time. In
this case, we write
p
=| s |= (t)2 (x)2 .
In the frame in which P, Q are experienced by the same observer (i.e. x = 0),
=| t | .
Thus, for two events P, Q with timelike separation,
=| s | is the time elapsed between P, Q as measured by the
inertial observer who experiences both events.
Note that <| t | for all other inertial frames, as follows from (2.12).
ii) If (s)2 > 0, we say that the events P, Q have spacelike separation. In this case there
exists an inertial frame in which t = tP tQ = 0, i.e. P, Q are simultaneous. We
write
p
=| s |= (x)2 (t)2 .
In the frame in which P, Q are simultaneous (i.e. t = 0),
s =| x |,
Hence, for two events P, Q with spacelike separation

45

=| s | is the distance between P, Q as measured in the inertial


frame in which P, Q are simultaneous.
iii) If s = 0, we say that the events P and Q have null separation. In this case P and Q
lie on the worldline of a light signal.

2.3.2

The Lorentzian metric tensor

In R2 , the quadratic form


()2 = (x)2 + (y)2

(2.13)

defines the Euclidean metric. Using index notation, we write


()2 = ij xi xj ,

(2.14)

where (x1 , x2 ) = (x, y) are Cartesian coordinates, and ij are the components of the Euclidean metric tensor. The vector space R2 with this Euclidean metric tensor is the familiar
Euclidean space E2 . The quadratic form (2.13) is invariant under a rotation, i.e. an orthogonal transformation of the form
x = x cos y sin

(2.15)

xi = O i j xj ,

(2.16)



cos sin
(O j ) =
.
sin cos

(2.17)

y = x sin + y cos ,

which in index notation reads


where the 2 2 matrix (O i j ) is given
i

The fact that (2.14) is invariant under the coordinate transformation (2.16) reads
ij xi xj = ij xi xj .

(2.18)

Since (O i j ) is a constant matrix it follows from (2.16) that


xi = O i j xj .

(2.19)

We substitute (2.19) in (2.18), relabel dummy indices and rearrange to obtain


(ij O ki O j k )xi xj = 0.
Since the xi are arbitrary it follows that
ij = O ki O j k .
46

(2.20)

We say that the Euclidean metric tensor is invariant under an orthogonal transformation.
On the other hand, the quadratic form (2.12), which gives the spacetime separation,
(s)2 = (t)2 + (x)2 ,

(2.21)

is said to define a Lorentzian metric in R2 . Using index notation we write


(s)2 = ij xi xj ,

(2.22)

where (x0 , x1 ) = (t, x) and ij , defined by




1 0
,
(ij ) =
0 1

(2.23)

are the components of the Lorentzian metric tensor. The vector space R2 with this Lorentzian
metric tensor, is referred to as two-dimensional Minkowski space-time.
The quadratic form (2.21) is invariant under a Lorentz transformation (2.11) i.e.
t = t cosh x sinh

x = t sinh + x cosh ,
where tanh = v, or in index notation
xi = Li j xj ,
where
i

(L j ) =

(2.24)


cosh sinh
.
sinh cosh

The fact that (2.14) is invariant under the coordinate transformation (2.16) implies6 that
ij = Lki L j k ,

(2.25)

i.e. the components of the Lorentzian metric tensor are invariant under the Lorentz transformation (2.24).
In two-dimensional Euclidean geometry the set of all points having a fixed distance from
a specified point P is a circle centred on P . In two-dimensional Lorentzian geometry, the
set of all events having a fixed spacetime separation from a specified event P is one of three
types, depending on whether (s)2 < 0, (s)2 = 0, or (s)2 > 0. The three possibilities
are

constant < 0
2
2
(x xP ) (t tP ) = 0

constant > 0.

We thus obtain two families of hyperbolae, together with their asymptotic lines, as shown
in Figure 2.5.
6

The derivation parallels the derivation of (2.20).

47

light signal
emitted at P

s2 = 1

s2 = 0
(light cone at P )

s2 = +1

set of events which O


regards as simultaneous
with P

light signal
received at P
WL of some observer O
who experiences P
Figure 2.5: The sets (s)2 = 1, 0 in two-dimensional Minkowski spacetime.
For a fixed event P , the set of all events such that (s)2 = 0 is said to form the light
cone7 (or null cone) at P . The half of the light cone generated by light signals emitted at
P is called the future light cone at P , and the half that is generated by light signals which
reach P , is called the past light cone at P . The future of the event P is the set of all events
on or within the future light cone, and the past of the event P is defined to be all events
lying on or inside the past light cone at P .
Future light cone at P :
Future of P :
Past light cone at P :
Past of P :

(s)2
(s)2
(s)2
(s)2

= 0,
0,
= 0,
0,

t 0
t 0
t 0
t 0.

Note that P can influence any event in his future set and that P can be influenced by any
event in his past set.

2.3.3

Four-dimensional Minkowski space-time

The extension from two-dimensional Minkowski space-time M2 to four-dimensional Minkowski


space-time M4 is straightforward. Since the three spatial coordinates should be placed on
7

The reason for this name will become clear when we extend the discussion to three spatial dimensions.

48

an equal footing, the square of the four-dimensional space-time separation is given by


(s)2 = (t)2 + (x)2 + (y)2 + (z)2 .
Using index notation we write
(s)2 = ij xi xj ,

(2.26)

where
(x0 , x1 , x2 , x3 ) = (t, z, y, z)
and the components of the Lorentzian metric tensor ij are given by
(ij ) = diag(1, 1, 1, 1).

(2.27)

In four-dimensional Minkowski space-time, Lorentz transformations are linear coordinate


transformations
xi = Li j xj
(2.28)
that leave invariant the quadratic form (2.26), and hence satisfy
ij = Lri Lsj rs .

(2.29)

This condition has precisely the same form as equation (2.25), that defines the two-dimensional
Lorentz transformations, with the difference that the indices now take on the values 0, 1, 2, 3.
The set of linear transformations (2.28) that satisfy (2.29), with ij given by (2.27), is
called the four-dimensional Lorentz group L4 . Since (Lri ) is a 4 4 matrix (16 entries)
and (2.29) is a set of 10 restrictions (since ij is symmetric), the Lorentz transformations
depend on 6 arbitrary parameters, which can be described geometrically as follows. The
two-dimensional Lorentz transformation (2.11) can be extended to give an element of L4 by
keeping y and z fixed:
t = t cosh x sinh

x = t sinh + x cosh
y=y

z = z,
where tanh = v. This transformation is called a boost in the x-direction with velocity v.
One can similarly define boosts in the y- and z-directions, giving three arbitrary parameters
(the three velocities). Secondly, one can perform a rotation through an angle in each
spatial coordinate plane, keeping t fixed, e.g.
t=t
x = x cos y sin
y = x sin + y cos
z = z,
49

giving three more arbitrary parameters, for a total of 6.


At this stage we can make use of the concepts introduced in Chapter 1 in connection
with Riemannian metrics. Firstly, since the components of the Lorentzian metric tensor are
constant relative to inertial coordinates, the Riemann-Christoffel curvature tensor is zero,
Ri j k = 0,
i.e. the Lorentzian metric tensor in Minkowski space-time is flat. Secondly, the geodesic
equations (1.91) reduce to
d2 xi
= 0,
(2.30)
d2
which describes straight lines, i.e. the geodesics in Minkowski space-time are straight lines.

2.3.4

The light cone

As in two-dimensional Minkowski space M2 , two events in M4 that have zero spacetime


separation i.e. (s)2 = 0, lie on the worldline of a light signal.
The fixed event in P in M4 , the set of all events such that
(s)2 = 0,
i.e.
(t tp )2 + (x xp )2 + (y yp )2 + (z zp )2 = 0
is called the light cone at P . The light cone at P has two branches, the future cone, defined
by t > tp , which represents the worldlines of all possible light signals emitted at P , and
the past cone, defined by t < tp , which represents the worldlines of all possible light signals
received by P .
If one suppresses one space dimension, the light cone at P , given by
(t tp )2 + (x xp )2 + (y yp )2 = 0,
can be drawn as a circular cone in 3-space, treating t, x and y as Cartesian coordinates as
in Figure 2.6.
The subset of M4 defined by (s)2 0, t 0 is called the future of the event P , and
corresponds to the interior of the future light cone. Similarly, the subset of M4 defined by
(s)2 0, t 0 is called the past of the event P , and corresponds to the interior of the
past light cone.

2.4

Worldlines and geodesics

Let C be the worldline of some particle, which could be accelerating relative to inertial
observers:
xi = xi (),
i = 0, . . . , 3.
50

future light cone at P

WL of a light signal emitted at P

WL of a light signal
received at P

past light cone at P


Figure 2.6: The light cone in Minkowski spacetime, with one space dimension suppressed.

where is the parameter for C. The tangent line to C at some event P represents the
worldline of an inertial observer relative to whom the particle is instantaneously at rest, as
shown in Figure 2.7. Hence if Q is any event on the tangent line to the future of P , the
events Q and P must have timelike separation i.e.
(s)2 = ij xi xj < 0,
where xi = xiQ xiP .
Since the tangent vector at P satisfies
dxi
= xi ,
d
where is a constant, it follows that
ij

dxi dxj
< 0.
d d

(2.31)

Equation (2.31) is a restriction that must be satisfied by the tangent vector of any particle
world line. This result motivates the following classification of vectors in spacetime, as shown
in Figure 2.8.

51

C
Q

dxi
d

Figure 2.7: The tangent vector to the worldline of a particle.

A vector Ai which satisfies:


(i) ij Ai Aj < 0 is said to be timelike,
(ii) ij Ai Aj = 0 is said to be null,
(iii) ij Ai Aj > 0 is said to be spacelike.
Likewise, a curve whose tangent vector is always timelike, is said to be a timelike curve.
Using this terminology we can state:
the worldline of any material particle is a timelike curve in space time.
A free particle is at rest in some inertial frame. In terms of the coordinates associated
with that frame its worldline is given by
x0 = t,

x = C ,

= 1, 2, 3

(2.32)

where the C are constants, and we are using the t coordinate as the parameter for the
curve. The tangent vector is
dxi
= 0i ,
(2.33)
dt
which is a timelike vector:
dxi dxj
= 00 = 1.
ij
dt dt
52

TL vector

N vector

Null cone at P
P
SL vector

Figure 2.8: Classification of vectors in spacetime.

It follows immediately from (2.33) that


d2 xi
= 0,
dt2
i.e. the curve (2.32) satisfies the geodesic equations (see (2.30)). In other words, the worldline
of a free particle is a timelike geodesic.
We have seen that the worldline of a light signal (a photon) is a straight line such that
the spacetime separation between events on the line is zero, i.e.
(s)2 = ij xi xj = 0,
where xi = xiP , xiQ , P and Q being any two events on the worldline. Thus the line lies
on the light cone at each point. This fact implies that the tangent vector is a null vector.
In other words, the worldline of a light signal is a null geodesic. We can represent a null
geodesic explicitly in the form
x0 = ,

x = C ,

where the C are constants that satisfy


C C = 1,
i.e. C is a unit vector in E3 . It follows that
dx
= C ,
d

dx0
= 1,
d
53

and hence

dxi dxj
= 00 + C C = 0,
d d
since 00 = 1 and = . The 3-vector C gives the direction of propagation of the
photon relative to the inertial frame with coordinates xi (see Figure 2.9).
ij

dxi
d

t = const
Figure 2.9: The future null cone at an event P , and the worldline of a photon emitted at P
in the direction C .
We have seen that for two events P and Q on the worldline of an inertial observer (i.e.
on a timelike geodesic) the spacetime separation
p
s = ij xi xj

is the time elapsed between P and Q as measured by the inertial observer who experiences
both events. More generally, consider a timelike curve C given by
xi = xi (),
with

dxi dxj
< 0.
d d
The spacetime separation (i.e. arclength) along C between two events P , Q given by =
P , Q , respectively, is given by
Z Q r
dxi dxj
(P, Q) =
d.
(2.34)
ij
d d
P
ij

In differential form we have


d
=
d

ij
54

dxi dxj
.
d d

(2.35)

At this stage we make the following hypothesis concerning clocks undergoing accelerated
(non-geodesic) motion.
The clock hypothesis:
The spacetime separation (P, Q) along any smooth timelike curve in Minkowski
spacetime equals the time elapsed between the events P and Q as measured by
a clock carried by an observer having C as his worldline.
Remark 2.1: It is understood that the accelerations involved (i.e. deviations from geodesic
motion) must not be of such a magnitude as to destroy the time-keeping mechanism (see
Misner, Thorne and Wheeler (1972), p. 396). We refer to Rindler (1969), pages 71-2, for
experimental support for the clock hypothesis.
Euclidean distance versus spacetime separation:
In Euclidean space E3 , the unique geodesic (i.e. straight line) joining two points A and
B is the curve of minimum length joining A and B. In Minkowski spacetime M 4 , the unique
timelike geodesic joining two events P, Q with timelike separation is the timelike curve of
maximum time elapsed joining P and Q. This claim, which is a direct consequence of the
difference between a Lorentzian and a Euclidean metric tensor, is heuristically justified in
Figure 2.10.
y

Q
Non-geodesic curve,
greater length

Non-geodesic TL curve,
lesser spacetime separation

y
A

t
x

P
p
2
2
= (x) + (y) > y

x
=

(t)2 (x)2 < t

Figure 2.10: The extremal property of geodesics in Euclidean space and Minkowski spacetime.
When describing a timelike curve (a potential worldline of a material particle or observer),
it is natural to use the clock time , as defined by (2.34), as the curve parameter and write
xi = xi ( )
55

(2.36)

Choosing = in (2.35) gives


dxi dxj
= 1,
(2.37)
d d
is a unit timelike vector, called the 4-velocity of the particle or
ij

i.e. the tangent vector


observer. We write

dxi
d

dxi
.
(2.38)
u =
d
On the other hand, the 3-velocity of the particle with worldline (2.36), i.e. the velocity
relative to the inertial frame defined by the coordinates (xi ) = (t, x ), is given, as usual, by
i

v =

dx
dt

(2.39)

(i.e. rate of change of position (x ) with respect to time t). It follows that the 4-velocity is
related to the 3-velocity according to
(ui ) =

1
(1, v ),
1 v2

(2.40)

where i = 0, 1, 2, 3 and = 1, 2, 3. The speed v is given by


v 2 = v v .
The 4-acceleration ai of the particle is defined by
ai =

dui
.
d

(2.41)

It follows from (2.38) that


d2 xi
.
d 2
Thus, the worldline is a geodesic if and only if the 4-acceleration is zero.
ai =

Comment: One can introduce arbitrary (i.e. non-interial) coordinates {xi } in Minkowski
spacetime. Relative to arbitrary coordinates the components of the Lorentzian will be denoted by gij . The 4-velocity ui will still be given by (2.38), but the fact that it is a timelike
unit vector will read
gij ui uj = 1.
(2.42)
The 4-acceleration ai will now be expressed using the covariant derivative:
ai =

Dui
D

(see (1.88)). The geodesic equations will still be equivalent to ai = 0.


Example 2.1: Consider the curve C in Minkowski spacetime defined by
(t, x, y, z) = (, k cos b, k sin b, 0)
where is the curve parameter, and k and b are positive constants.
56

(2.43)

(i) Show that if 0 < kb < 1, the curve can be interpreted as the worldline a particle
orbiting around the observer x = y = z = 0, with speed v = kb.
(ii) Calculate the period as measured by the observer x = y = z = 0, and as measured by
a clock carried by the orbiting particle.
Solution:
(i) The tangent vector of the curve C is
 i
dx
= (1, bk sin b, bk cos b, 0)
d
and its magnitude is

dxi dxj
= (1 k 2 b2 ).
(2.44)
d d
i
Thus dx
is timelike iff 0 kb < 1. If in addition kb > 0, the particle will periodically
d
leave and return to the same spatial position. Thus if 0 < kb < 1, the curve C can be
interpreted as the worldline of a particle orbiting around the observer x = y = z = 0
who is at rest in the inertial frame F defined by the coordinates (t, x, y, z).
ij

The 3-velocity of the particle relative to F is


v =

dx
= (bk sin b, bk cos b, 0)
dt

and the speed is thus


v=

v v = kb,

(2.45)

giving a speed less than the speed of light (i.e. v < c = 1).
(ii) For one revolution to occur,

2
.
(2.46)
b
.
The period as measured in the frame F is the change t in t when changes by 2
b
It follows from equation (2.43) that
=

t = =

2
sec .
b

Let denote the time elapsed along the worldline C. By equations (2.35), and (2.44)

d
= 1 k 2 b2 ,
d
a constant. It follows that
=

1 k 2 b2 .

Thus by equations (2.45) and (2.46) the period as measured by C is

2
= 1 v 2 sec .
b
57


Remark 2.2: Note that t is the time elapsed along the geodesic joining the events P and
Q, and that t > , the time elapsed along the non-geodesic curve C. This is in agreement
with the fact that the spacetime separation between two events P, Q with timelike separation
is maximized along the timelike geodesic joining P, Q (see Figures 2.10 and 2.11).
t

Q
geodesic

y
C
P
x
Figure 2.11: The timelike curve C joining the events P and Q and the corresponding timelike
geodesic, the worldline of an observer at rest in the inertial frame F .

2.5

A Brief Introduction to the Energy-Momentum


Tensor

In relativity theory, the physics of a distribution of matter (i.e. its energy content, motion,
etc.) is described by a symmetric rank 2 tensor T ij , called the energy-momentum tensor of
the matter distribution.
In flat spacetime, relative to standard coordinates (xi ) = (t, x, y, z) associated with an
inertial frame F , the physical interpretation of the comonents T ij is as follows:

!
00 0
T
T

,
(T ij ) =

T 0 T
58

T 00

T 0

energy density of the matter relative to F

momentum density of the matter relative to F

spatial stress tensor relative to F ,

e.g. T 12 = T 21 is the x-component of the force exerted by the matter across a unit 2-surface
with normal in the y-direction.
A distribution of matter in spacetime is described by a timelike vector field ui of unit
length, i.e.
gij ui uj = 1,
called the 4-velocity of the matter, which is tangent to the worldlines of the matter particles.
The energy content of the matter is described by a scalar , the matter-energy density
relative to the instantaneous rest frame. We consider the simplest case in which the internal
forces are described by an isotropic scalar pressure p.
The role of p is as follows. At any event, relative to the instantaneous rest frame of the
perfect fluid, the force exerted on a unit 2-surface will be perpendicular to the surface, and
have magnitude p, independently of the orientation of the 2-surface (hence the term isotropic
pressure). Thus, relative to the instantaneous rest frame, T will be diagonal, with diagonal
entries equal to p, i.e.
T = p
T 0 = 0 (matter at rest zero momentum)
T 00 =

and hence
(T ij ) = diag (, p, p, p).
We want a tensorial expression for T ij , and after a little contemplation, we find
T ij = ( + p)uiuj + pg ij ,

(2.47)

where, relative to the instantaneous rest frame (an inertial frame), (gij ) = diag (1, 1, 1, 1).
This is a tensor equation and hence will hold relative to any coordinate system.
Equations of motion for the matter distribution, or equivalently, the equations describing
conservation of energy and momentum, are obtained by assuming that the energy-momentum
tensor has zero divergence, i.e.
T ij,j = 0,
in flat spacetime relative to inertial coordinates, or more generally
T ij;j = 0

(2.48)

relative to arbitrary coordinates, or in curved spacetime.


Lets first review these equations in Newtonian continuum mechanics. The matter distribution is described by a velocity field v (x , t), which gives the velocity of the fluid at the
59

spatial position (x ) at time t [Fung (1977), p. 247]. The matter density is denoted by .
The equation of continuity, which describes conservation of mass, is
d
+ v = 0,
dt
or equivalently

Here

d
dt

d
v
+ = 0.
dt
x

(2.49)

denotes total differentiation with respect to t, i.e.


= (x , t) = (x (t), t),

where x = x (t) describes a particular flow line, and

dx

d
=
+
=
+ v .
dt
t x dt
t
x

(2.50)

[Fung (1977), p. 250].


Newtons second law of motion when applied to the matter distribution leads to

or equivalently

dv
= p,
dt

dv
p
= ,

dt
x

where the acceleration

dv
dt

(2.51)

is given by
v
v
dv
=
+ v ,
dt
dt
x

as in (2.50) [Fung (1977), p. 247, p. 250-2, p. 265-7, with the viscosity terms and external
force set to zero]. This is a special case of the Navier-Stokes equation (Fung, p. 265) and
describes conservation of momentum.
We now derive the relativistic analogues of (2.49) and (2.51) by expanding (2.48) using
the perfect fluid energy-momentum tensor (2.47):
0 = Ti j ;j = [( + p)uiuj + pi j ];j
= (,j + p,j )uj ui + ( + p)ui;j uj + ( + p)uj ;j ui + p,i

(2.52)

Contract with ui , noting ui ui = 1, to obtain


0 = ,j uj + ( + p)uj ;j ,
which can be written
+ ( + p)uj ;j = 0,
60

(2.53)

where denotes differentiation along the fluid flow lines. Next, eliminate ,j uj in (2.52) using
(2.53). This yields
( + p)ui;j uj + (p,i + p,j uj ui ) = 0.
(2.54)
We interpret (p,i + pu
i ) as the spatial pressure gradient, since it is orthogonal to the 4-velocity
ui (verify this). Recall that the flow lines will be geodesic iff the 4-acceleration
ai = ui;j uj = 0.
Thus the physical content of (2.54) is that the spatial gradient exerts a (mechanical) force on
the matter, so that the individual particles no longer have geodesic worldlines. The analogy
with (2.51) is clear. On the other hand, (2.53) is the analogue of (2.49).

61

Chapter 3
The General Theory of Relativity
In this chapter we present the foundations of the general theory of relativity, the theory of
gravity introduced by Einstein in 1916. In order to gain perspective and make comparisons
we begin with a brief review of the Newtonian theory of gravitation.

3.1

Newtonian Theory of Gravitation

In the Newtonian theory of gravity, the gravitational field due to a distribution of mass is
described by a potential function . The gravitational field exerts a force on a test particle
of mass m given by
F = m,
(3.1)
where is the gradient operator. The potential is required to satisfy the gravitational field
equations:
2 = 0,

outside matter,

= 4G,

inside matter,

(3.2)
(3.3)

where 2 is the Laplacian, is the matter density and G is a constant called the gravitational
constant.
For a test particle of mass m moving under the influence of a force field F, Newtons
second law of motion reads
d2 r
m 2 = F.
(3.4)
dt
For a gravitational force field (3.1), equation (3.4) simplifies to
d2 r
= ,
dt2

(3.5)

i.e. the mass of the test particle cancels. Equation (3.5) represents the equations of motion
for a test particle in a given gravitational field, in Newtons theory of gravitation.

63

Remarks 3.1:
1) The essential point about the equations of motion (3.5) is that they imply that the
motion of a test particle in a given gravitational field is independent of the mass and
composition of the particle. This result is supported experimentally to an accuracy of
1 in 1011 [Dicke (1964), page 4] and hence has to also be incorporated into GRT.
2) The gravitational field equations (3.2) and (3.4), which are linear PDEs, can be derived
from the inverse square law of gravitational attraction and the principle of superposition. One can quickly see the relation with the inverse square law by noting that for a
spherically symmetry source, = (r), and (3.2) can be solved to give
=

GM
,
r

where M is the mass of the source. It follows from (3.1) that


F = m =
since r = 1r r. Hence
| F |=

GmM
r
r3

GmM
,
r2

the inverse square law of attraction.


In assessing a physical theory, one asks the following questions:
1) Does it agree with experimental/observational evidence.
2) Does it (or did) it predict new observable phenomena?
The assessment of Newtonian gravity is as follows:
1) It agreed with observations of planetary orbits initially. More accurate observations in
the nineteenth century led to a famous discrepancy concerning the orbit of the planet
Mercury (see Sec. 4.2.3). There are also discrepancies as regards light propagation. In
addition the theory cannot describe the behaviour of binary pulsars and neutron stars,
which generate much stronger gravitational fields than in the solar system.
2) In a classic example of prediction, Adams (1943) and Leverrier (1846) independently
predicted the existence of the planet Neptune the idea was that the orbit of Uranus
did not agree with theory unless there was another planet relatively nearby, which was
perturbing Uranus. The theory also predicts the orbits of satellites and spacecraft with
very high accuracy.
Conclusion: Newtonian theory is still indispensable in many applications in the solar system,
but has been superseded by GRT as the correct theory of gravity.
64

3.2

General Relativity and the Geodesic Hypothesis

We begin by considering the following question:


I

II

at rest

non-geodesic WL

acceleration g

earth

III

IV

free fall

geodesic WL

free fall

earth

gravity present in I and III


tidal accelerations
Rijk 6= 0

no gravity in II and IV
no tidal accelerations
Rijk = 0

Figure 3.1:

Q: Can the four physicists shown in Figure 3.1 perform experiments to distinguish their physical situations?
A: In I and II the physicists feet press on the floor, and ball-bearings fall with
acceleration g relative to the laboratories, while in III and IV they do not. The
distinction between I and II and between III and IV is more subtle. They can
be distinguished by the fact that in I and II, two or more ball-bearings, when
released from rest, will undergo small relative accelerations due to (small) variations in the gravitational force acting inside the laboratories. These relative
accelerations are called gravitational tidal accelerations.

65

In Newtonian theory, the gravitational force is described by the first partial derivatives
/x of the gravitational potential (components of ), while the tidal accelerations,
which arise from changes in the gravitational force, are described by the second partial
derivatives 2 /x x .
Remark 3.2:
(1) Experiment II shows that a gravitational force can be simulated locally by an accelerated frame of reference, but not the gravitational tidal accelerations.
(2) Experiment III shows that one can transform away the gravitational force, but not
the tidal accelerations, by observing from a freely falling frame of reference. The tidal
accelerations thus indicate the presence of an intrinsic gravitational field, i.e. one that
cannot be ascribed to a wrong choice of reference frame.
The geometrical background for special relativity is 4-d space-time, on which is defined
a Lorentzian metric with components (ij ) = diag(1, 1, 1, 1) relative to the preferred coordinates (t, x, y, z). The worldline of a free particle is a TL geodesic of this flat Lorentzian
metric.1
Q: Can we expect special relativity to apply exactly, when gravity is taken into
account?
A: No, for the following reason.
Consider a gravitational field (e.g. the earths, or the suns). By Remark 3.2 above,
we can consider any freely-falling laboratory of sufficiently small size, to be a local inertial
frame (LIF). Two such LIFs that pass each other will have zero relative acceleration, and
hence constant relative velocity for some sufficiently short length of time. Hence we expect
special relativity to apply in sufficiently small regions of space and time. However, spatially
separated LIFs will have non-zero relative acceleration and hence non-constant relative
velocity. Hence we do not expect special relativity to hold in the large. In other words we
do not expect the region of space-time corresponding to the solar system to be subset of flat
spacetime.
So, following Einstein, we assume that in the presence of gravitating bodies the Lorentzian
metric on space-time is non-flat (i.e. Rijk 6= 0).
Q: What are the worldlines of free test particles in the gravitational field described by the non-flat metric?
A: Since gravity affects all particles, we now modify our notion of free particle to be a particle whose motion is influenced only by gravity. Relative to
1

An electron moving in a magnetic field is not a free particle.

66

SRT holds locally


path of projectile
appears straight to the LIF as it passes by
LIF
LIF

earth

LIF
Figure 3.2: Locally inertial frames in the earths gravitational field.

each LIF it passes a free test particle will appear to follow a straight line (for
some sufficiently short length of time), i.e. a TL geodesic, since special relativity
holds locally for each LIF. This suggests the crucial geodesic hypothesis:
the worldline of a free test particle is a TL geodesic of the non-flat
Lorentzian metric of space-time.
and by analogy
the worldline of a light signal (photon) is a null geodesic of the non-flat
Lorentzian metric of spacetime.
Remark 3.3: The consistency of the geodesic hypotheses is guaranteed by the uniqueness
property of geodesics (see Proposition 1.6 in Section 1.4.1), as follows. We mentioned in
Section 3.1 that the motion of a free test particle in a gravitational field is independent
of its mass and composition. Thus an initial point and an initial velocity will determine a
unique free particle trajectory. In other words, an initial event P in spacetime, and an initial
direction at P must determine a unique free particle worldline. This is guaranteed by the
uniqueness result stated in Proposition 1.6, and is illustrated in Figure 3.3.
Remark 3.4: The effect of Einsteins preceding assumptions is to geometrize the gravitational field. The gravitational field is not regarded as exerting a force on test particles, as in
67

(1) is fired downward


(3)
(2) is released from rest

(3) is fired upward

(2)

TL geodesic
(WL is independent of mass and
composition of projectile)

(1)

WL of particle
on earths surface
(non-geodesic)

WL of physicist
(non-geodesic)

world tube of the earth

Figure 3.3: A physicist firing projectiles from a helicopter in the earths gravitational field.

Newtonian theory. Instead the gravitational field gives rise to a non-flat Lorentzian metric,
and the TL geodesics of this metric are the worldlines of free test particles.
matter

gravitational field

non-flat
Lorentzian
metric

TL geodesics

worldlines of free
test particles

This difference is illustrated in Figure 3.4.

3.3

The Einstein Vacuum Field Equations

The next step in developing the theory of general relativity is to introduce the field equations.
We use certain analogies with Newtonian theory as a guide.
We have seen that the Newtonian field equation for gravity is
2 = 0,

(3.6)

exterior to the material source. Relative to Cartesian coordinates x, y, z this equation reads
2 2 2
+
+ 2 = 0.
x2
y 2
z
68

Relative to arbitrary curvilinear coordinates x in E3 , equation (3.6) can be written


g ; = 0

, = 1, 2, 3

(3.7)

where ; denotes covariant differentiation with respect to the Euclidean metric in E3 and
the g are the contravariant components of the metric.
Newtonian Picture

Spacetime (GRT) Picture

satellite orbit
WL of satellite
(TL geodesic)

suppress one
space dimension

add in time
natural path for satellite
when at P gravity forces it
to follow an ellipse

world tube
of earth
Figure 3.4: Newtonian gravity versus GRT illustration of the orbit of a satellite around
the earth.
The Newtonian field equation is a second order partial differential equation for the Newtonian potential , is linear in the second derivatives of and, in the form (3.7), is tensorial
under arbitrary coordinate transformations in E3 .
We mentioned earlier in this section that in Newtonian theory, the tidal accelerations
are described by the second derivatives 2 /x x . On the other hand, in GRT, the tidal
accelerations are described by the Riemann-Christoffel tensor Rijk through the equation of
69

geodesic deviation (see equation (1.100)). Further, by comparing the Newtonian equations of
motion (3.5) and the equations of motion in general relativity, namely the geodesic equations
k
j
d2 xi
i dx dx
+

= 0,
jk
d 2
d d

we see that the Christoffel symbols i jk formally play the role of the gradient of . Thus we
have the following correspondences
2
Rijk
x x

i jk
x
gij

(second derivatives of the metric)


(first derivatives of the metric).

In other words, the components gij of the Lorentzian metric play the role of gravitational
potentials.
The preceding discussion leads to the following requirements on the field equations in
GRT:
1) The field equations should form a system of 10 second order partial differential equations for the gij , as functions of the spacetime coordinates xi (note that there are 10
independent metric components gij ).
2) These partial differential equations should be linear in the second derivatives gij,k .
3) These field equations should have tensorial form under arbitrary transformations of
the spacetime coordinates.
Thus we look for a tensor Aij of rank 2 that is symmetric (such a tensor has 10 independent components) and of the form
Aij (gab , gab,c , gab,cd ),
and which depends on the second derivatives linearly. Such tensors are quite rare.
Theorem 3.1:
form

In a Vn , with metric tensor g, the only tensor fields of type (2, 0) of the
Aij (gab , gab,c , gab,cd )

where the gab,cd occur linearly are constant multiples of


Rij , Rg ij ,

Proof: See H. Weyl (1922), Appendix 2.

and g ij .


70

Thus the most general field equations compatible with our requirements are
Rij + Rg ij + g ij = 0,

(3.8)

where , , are constants.


Q: What can we say about , , ?
A: Contract (3.8) with gij to obtain
( + 4)R 4 = 0

(3.9)

Suppose 6= 0. Then (3.9) implies R = constant =


6 0. On physical grounds
we expect spacetime to become flat (i.e. Rijk = 0) at large distances from the
source. This is incompatible with R = constant 6= 0. Hence = 0.
Eq. (3.9) now reads ( + 4)R = 0. Suppose = 4 6= 0. Then (3.8) becomes
Rij 41 g ij R = 0.
This is a set of nine independent equations, since the left hand side has identically zero trace,
which violates one of our requirements. Hence R = 0, and the field equations (3.8) reduce
to
Rij = 0,
(3.10)
which are referred to as the Einstein vacuum field equations.
Remark 3.5:
1) For further motivation of these field equations, see Rindler (1969), pp. 162-164.
2) The Newtonian field equation (3.6) [or (3.7)] is a linear PDE for . The Einstein
vacuum field equations, do not form a system of linear PDEs for the gij , although they
are linear in the second derivatives gij,k. This leads to important differences between
the two theories.
Summary of Assumptions for GRT
I: Lorentzian geometry
Space-time is a four-dimensional differentiable manifold, with a Lorentzian metric gij
described by the line-element
ds2 = gij dxi dxj .
II. Clock Hypothesis
The worldline of a material particle is a TL curve, and time elapsed along a TL curve
C, as measured by an ideal clock having C as its worldline, equals the spacetime
separation (i.e. arclength) along C.
71

III. Geodesic Hypothesis (Equations of Motion)


The worldline of a freely-falling test particle is a TL geodesic and the worldline of a
photon (i.e. light signal, in vacuo) is a null geodesic, determined by the Lorentzian
metric g.
IV. Vacuum Field Equations
In any region of spacetime not containing sources of the gravitational field, the Ricci
tensor of the Lorentzian metric is zero:
Rij = 0.
Remark 3.6:
1) The definitions of TL, SL and null vectors and curves in flat spacetime (see Section
2.4) carry over into curved spacetime, with gij replacing ij . The null come at any
event P in spacetime is now regarded as a cone in the tangent space at the point P .
2) We have seen that an any event P in spacetime we can introduce coordinates so that
gij = ij

at P.

This is the mathematical realization of the fact that special relativity holds in a local
neighbourhood of any event in curved spacetime. One can do more. Let C be an TL
geodesic (worldline of a freely falling observer). Then one can introduce coordinates
so that
gij = ij , and gij /xk = 0 along C.
However, Rijk 6= 0 along C. These coordinates are called Fermi normal coordinates
based on C. Since the Christoffel symbols i jk correspond to the Newtonian (gravitational force), the introduction of these coordinates is the mathematical realization
of the fact that one can transform away the gravitational force by observing from
a freely falling frame of reference [see also Misner, Thorne and Wheeler, (1972) pages
327-332].

3.4

The Einstein Field Equations with Sources

The basic physical idea is that all forms of matter and energy give rise to a gravitational
field, which in GRT is described by the curvature of spacetime. So the basic form of the
non-vacuum field equations must be




Quantity describing the
Quantity describing the
= const.
distribution of matter & energy
geometry of spacetime
72

The assumptions of listed prior to Theorem 3.1 imply that this equation should be a tensor
equation involving symmetric rank 2 tensors. We assume that the tensor on the right hand
side is the energy-momentum tensor of the distribution of matter and energy. So the field
equations are of the form.
Rij + Rg ij + g ij = CT ij ,

C = const.

(3.11)

where , , are constants with 6= 0.


To determine , , we use the fact that T ij is assumed to have zero divergence
T ij;j = 0.
On taking the covariant derivative of (3.11), this condition leads to
Rij ;j + ,j g ij = 0.

(3.12)

But we know that the Einstein tensor


Gij = Rij 21 Rg ij
has zero divergence, i.e.
Rij ;j 21 R,j g ij = 0,
(see Proposition 1.5 in Section 1.3.3). Combine this equation with (3.12) to obtain

1
R,j = 0.
2

But we cannot have R,j = 0 in general, since this would be an extra field equation. Thus
= 12 , and (3.11) becomes
Gij + g ij = KT ij ,

, K = const.,

(3.13)

on dividing through by and relabelling constants. If we also demand that when T ij = 0


(vacuum spacetime), these field equations reduce to the vacuum field equations Gij = 0 or
equivalently Rij = 0, then we must have
= 0.
The constant K in (3.13) represents the coupling constant for gravity, i.e. it governs the
strength of the gravitational field due to a given distribution of matter and energy. This
constant is determined by requiring that the field equations (3.13), with = 0, reduce to
the Newtonian equation
2 = 4G
in the weak field limit. This leads to
K = 8G.
73

(See for example, Adler, Bazin & Schiffer (1965, p. 347.)


Thus the non-vacuum Einstein field equations are
Gij = 8GT ij ,

(3.14)

where Gij is the Einstein tensor of spacetime, T ij is the energy-momentum tensor which
describes the matter-energy content of spacetime, and G is the Newtonian gravitational
constant. It is customary to choose the units of mass so that
G = 1.

(3.15)

These non-vacuum field equations are applied in two areas:


1) construction of cosmological models (see Chapter 5),
2) construction of models of neutron stars, accretion discs around black holes, etc.

3.5

The Weak Field Limit of General Relativity

Newtonian gravity agrees well with experiment for weak time-independent gravitational
fields, and with slow-moving test particles. Hence for GRT to be a reasonable theory it
should reduce to Newtonian theory approximately, in the above mentioned situations.
Consider a weak field (i.e. approximately flat) time-independent metric of the form2

g00 = 1 + h00 (x ) + O(2 )


g0 = 0

g = + h (x ) + O(2)

(3.16)

where , , = 1, 2, 3, and is a small parameter. We want to determine the relation


between the Newtonian potential and the metric components gij . We first require that
the GRT equations of motion, i.e. the geodesic equations,
k
j
d2 xi
i dx dx
+

= 0.
jk
d 2
d d

(3.17)

reduce to the Newtonian equations of motion


d2 x

= ,
2
dt
x
for the metric (3.16) and for slow-moving particles i.e.
dx
= O().
d
2

This form is to restricted to describe a non-rotating body.

74

(3.18)

It follows from (3.16) and the definition (1.50) of the Christoffel symbols, that i jk = O().
Hence to first order in , (3.17) becomes
0
0
d2 xi
i dx dx
+

= 0.
00
d 2
d d

For i = 0 this yields


in the form

d2 x0
d 2

(3.19)

= 0. Hence for i = , on using the chain rule (3.19) can be written

d2 x
= 12 h00 /x .
dt2
Comparison with (3.18) leads to the identification
= 21 h00

so that
g00 = 1 2 + O(2)

(3.20)

This is the required relation between the metric and the Newtonian potential in the weak
field limit.
As regards the field equations, it follows from (3.16), (1.72) and (1.79) that
R00 = 12 2 h00 /x x + O(2),
with summation over from 1 to 3 (exercise). Hence, on account of (3.20) we conclude that
R00 = 0 implies 2 = 0,
which is the Newtonian field equation. The remaining Einstein field equations restrict the
spatial part of the metric (3.16).
Comment: In the preceding derivations, we have assumed that the order symbols O(2) in
(3.16) can be differentiated without decreasing the order in . This is not always the case
e.g.
 
2
f (x) = sin
= 0(2 )

but
x

6= 0(2 ).
f (x) = cos

75

Chapter 4
The Schwarzschild Metric and its
Applications
4.1

The Schwarzschild Solution and the Solar Gravitational Field

The aim is to describe the gravitational field exterior to the sun, idealized to be spherically
symmetric. In Newtonian theory this means:
find a suitable potential that satisfies the
vacuum field equations 2 = 0.
In GRT this means:
find a suitable Lorentzian metric gij

that satisfies the

vacuum field equations Rij = 0.

4.1.1

Newtonian theory

We use spherical coordinates r, , in E3 , with origin taken to be the centre of the sun (called
heliocentric coordinates). The density of the sun is assumed to depend only on the radial
distance r, and the surface is assumed spherical. It is then plausible (and can be proved)
that
= (r).
The vacuum field equations 2 = 0 simplify to1


d
2 d
r
=0
dr
dr
1

The Laplacian in spherical coordinates is 2 =

1
r 2 r

77


r2
r +

1
r 2 sin


sin
+

2
1
r 2 sin2 2 .

This second order differential equation has general solution


(r) =

A
+ k,
r

where A, k are arbitrary constants. The boundary condition


lim (r) = 0

implies k = 0, and the requirement of a gravitational law of attraction implies A > 0. It is


customary to write A = GM, where M is the (gravitational) mass of the star, and G is the
gravitational constant. Thus
GM
(r) =
.
(4.1)
r

4.1.2

GRT approach

Start with flat spacetime, with line-element


ds2 = dt2 + dx2 + dy 2 + dz 2 .
Replace Cartesian x, y, z by spherical coordinates r, , , defined by
x = r sin cos ,

y = r sin sin ,

z = r cos .

This changes only the spatial part of the line-element and thus by (1.12) the line-element
becomes
ds2 = dt2 + dr 2 + r 2 (d2 + sin2 d2 ).
(4.2)
Consider a cylinder in flat spacetime whose boundary is
r = rB

(constant)

This cylinder intersects each hypersurface t = constant in a sphere of radius rB (see Figure
4.1).
Let E denote the region of flat spacetime exterior to the cylinder. Suppose now that
the cylinder is the boundary of the sun in spacetime, i.e. each 2-sphere r = rB , t =
constant represents the surface of the star at some instant of time. The sun will generate a
gravitational field in E, and hence according to GRT, this region of spacetime will no longer
be flat, i.e. the metric form (4.2) will not be valid since it satisfies Rijk = 0.
We assume, on the grounds of simplicity, that the line-element in the region E has the
form
ds2 = A(r)2 dt2 + B(r)2 dr 2 + r 2 (d2 + sin2 d2 ),
(4.3)
where A(r) > 0, B(r) > 0 are arbitrary functions of r. [For a more detailed justification of
(4.3) see Moller (1972), pp. 436-437.]
Remark 4.1:
78

E : r > rB
r = rB
t = const

a circle in the picture


but a 2-sphere in S-T

( =

in this diagram)

Figure 4.1: Spacetime representation of the sun.

1) The non-flat metric (4.3) has one property in common with the flat metric (4.2),
namely, the 2-spaces r = constant, t = constant are 2-spheres, with induced metric
ds2 = r 2 (d2 + sin2 d2 ).

(4.4)

Since the metric components (4.3) do not depend on , and depend on only through
the 2-sphere metric (4.4), the metric (4.3) is said to be spherically symmetric. [A
mathematical definition of this concept depends on the idea of an isometry a
metric preserving map.]
2) We are also assuming that we are dealing with a time-independent situation. Hence
the metric components are assumed to be independent of the timelike coordinate t.
3) Outside a normal star such as the sun, the deviations of the spacetime from flatness
are small, and hence in this case
A(r) 1,

B(r) 1.

This is not valid close to a neutron star. We list below the approximate maximum
values of the deviations |1 A(r)2 | and |1 B(r)2 | for certain external fields
Earth
Sun
Neutron Star
79

1.5 109
4 106
.4

[A neutron star has density 1014 gm/cm3 ]


4) One expects the gravitational field to tend to zero as r , i.e. one expects spacetime
to become flat as r . Thus we impose the boundary conditions
lim A(r) = 1

lim B(r) = 1

(4.5)

The problem is thus:


solve the field equations Rij = 0 for the spherically symmetric metric (4.3)
subject to the boundary conditions (4.5).
It is convenient to write the line-element (4.3) in the equivalent form
ds2 = e2(r) dt2 + e2(r) dr 2 + r 2 (d2 + sin2 d2 ).

(4.6)

A straightforward but fairly lengthy calculation yields


R00 = e22 [ + 2 + 2r 1 ],

(4.7a)

R11 = + + 2r ,

(4.7b)

R33 = sin2 R22 .

(4.7d)

R22 = 1 e

[1 + r( )]

(4.7c)

The other components of the Ricci tensor vanish identically.


The field equations Rij = 0 can be solved easily due to the fact that
R11 + e22 R00 = 2r 1 ( + ).
Thus + = 0, which implies (r) + (r) = k.
The boundary conditions (4.5) imply that k = 0, so that
(r) + (r) = 0

(4.8)

Equation (4.7c) now yields


e2 (1 2r ) = 2 (re2 ) = 1.
The general solution is

2m
.
(4.9)
r
with appropriate labelling of the constant of integration. It is easily verified that , as
given by (4.8) and (4.9) satisfy the remaining field equation R00 = 0 (or R11 = 0).
e2 = 1

80

We have thus shown that the line-element


2

ds = 1

2m
r

dr 2
 + r 2 (d2 + sin2 d2 )
dt +
2m
1 r
2

is a solution of the vacuum field equations Rij = 0. This solution was discovered by
Schwarzschild in 1916 and is known as the Schwarzschild solution. It plays a central role in
GRT.
Remark 4.2: The preceding derivation of the Schwarzschild metric assumed that the metric
was spherically symmetric and time-independent. The assumption of time-independence is
in fact unnecessary the Schwarzschild solution is in fact the only spherically symmetric
solution of the vacuum field equations (Birkhoffs theorem). See #6 in Problem Set 4.

4.2

Planetary Orbits

4.2.1

Newtonian theory

We solve the equations of motion


d2 r
= grad ,
dt2

(4.10)

for the gravitational potential due to a spherical sun,


= GM/r,

r 2 = x2 + y 2 + z 2 .

Here we are idealizing the planet in question (e.g. Mercury) to be a test particle moving in
the suns gravitational field, and are neglecting the influences of the other planets.
Firstly, it can be shown [Goldstein (1965), pages 59-60] that the orbit will lie in a plane,
passing through the origin ( centre of the sun). Choose coordinates such that this plane is
the xy-plane (i.e. = 2 in spherical coordinates). The simplest approach is to use the fact
that equations (4.10) can be derived from a Lagrangian. With the above simplifications, the
resulting equations, in terms of spherical coordinates r, , , are
GM
r = r 2 2
r

(4.11)

r 2 = H,

(4.12)

where d/dt and H the magnitude of the constant angular momentum per unit mass of
the particle [see Goldstein (1965), pages 60-1, eqs. (3.8) and (3.11)].
Regard r as a function of . Define.
u() =

81

1
.
r()

Equations (4.11) and (4.12) imply


d2 u
GM
+
u
=
d2
H2

(4.13)

[Goldstein, pg. 76, eq. (3.42)]. This is a second order linear DE with constant coefficients,
whose general solution can be written in the form
u=

GM
[1 + e cos( 0 )] ,
H2

e and 0 being the constants of integration. Finally choose the origin of the angle so that
0 = 0. Thus the orbits are given by
r=

(H 2 /GM)
.
1 + e cos

(4.14)

For a planetary orbit, r must be bounded for all which implies |e| < 1. Then equation
(4.14) describes an ellipse, with eccentricity e and major axis of length 2a, where
H 2 = GMa(1 e2 ).

(4.15)

In this way, one arrives at the Newtonian description of planetary orbits: ellipses with the
sun in one focus.
planet

sun
perihelion of orbit

Figure 4.2: The orbit of a planet.

4.2.2

GRT approach

In physical terms, the problem is: describe the planetary orbits. We make the following
idealizations:
1) the geometry of spacetime exterior to the sun is described by the Schwarzschild metric
(i.e. the sun is idealized to be spherically symmetric and time-independent),

82

2) a planet is regarded as a test particle in the suns gravitational field, and hence its
worldline will be a TL geodesic.
Thus in mathematical terms the problem is this: solve the equations for the TL geodesics
of the Schwarzschild metric
 2
1 2
ds2 = 1 2m
dt + 1 2m
dr + r 2 (d2 + sin2 d2 ).
(4.16)
r
r
The geodesic equations can best be obtained as the Euler-Lagrange equations


L
d L
i = 0,
i = 0, 1, 2, 3,
i
d x
x

(4.17)

for the Lagrangian


L = gij x i x j ,
which for the metric (4.16) has the form
 2
t + 1
L = 1 2m
r

(see Appendix A).


Here the coordinates are labelled


2m 1
r

r 2 + r 2 (2 + sin2 2 ),

(4.18)

(t, r, , ) (x0 , x1 , x2 , x3 )
and d/d . Since is chosen to be time (i.e. arclength) along the geodesic we also have
L = 1.

(4.19)

Consider a particular TL geodesic


xj = xj ( ),
with = 0 initially. Then the initial position is (t(0 ), r(0 ), (0 ), (0 )) and the initial
0 ), (
0 )). Now , are angles describing position on a 2-sphere.
0 ), r(
4-velocity is (t(
0 ), (
By using the freedom in choice of and one can ensure that
0 ) = 0.
(

(0 ) = 2 ,

(4.20)

This leads to considerable simplification of the geodesic equations, as follows. Evaluate


equation (4.17) with i = 2, using the Lagrangian (4.18) to obtain
d 2
(r ) r 2 sin cos 2 = 0.
d
This is a second order DE for ( ). It admits ( ) = 2 as a solution corresponding to the
initial data (4.20). Hence by the uniqueness theorem for second order DEs, the unique
solution to (4.20) is
(4.21)
( ) = 2 .
83

The Lagrangian L now simplifies to


L= 1

2m
r

t2 + 1


2m 1
r

r 2 + r 2 2 .

Since L/t = 0 and L/ = 0, the Euler-Lagrange equations (4.17), with i = 0, 3 yield


L/ t = constant and L/ = constant i.e.
r 2 = h

t = ,
1 2m
r

(4.22)
(4.23)

where h, are constants. Finally, instead of using the complicated i = 1 Euler-Lagrange


equation, we use (4.19), simplified using (4.21)-(4.23):
1


2m 1
r

r 2 +

h2
1
r2


2m 1 2

= 1.

(4.24)

Equations (4.21)-(4.23) and (4.24) together with the initial conditions, completely describe
the TL geodesic under consideration.
By analogy with the Newtonian approach to planetary orbits, we regard the radial coordinate r as a function of the angle , and then introduce a new variable u by
u() = 1/r().

(4.25)

du
, so that equation (4.24), after rearranging, assumes
It then follows from (4.22) that r = h d
the form
 2
du
2 1 2m
+ 2 u u2 + 2mu3 .
=
d
h2
h

In order to obtain a second order DE which can be compared with the Newtonian DE
(4.13), we differentiate the preceding equation with respect to u and rearrange to obtain
m
d2 u
+
u
=
+ 3mu2 ,
2
2
d
h

(4.26)

provided du/d
\ 02 Equation (4.26) is the basic DE governing planetary orbits in GRT.
This DE cannot be solved exactly, however, and so a perturbation method must be used.
For the purpose of comparison with Newtonian theory and with observations, we identify
the radial coordinate r of the Schwarzschild solution with the Newtonian radial coordinate
(i.e. radial distance), and the angles , with the corresponding Newtonian angles [see the
quote from McVittie (1956), on page 141, in this regard]. Secondly, on the basis of the
Newtonian equation (4.12) and equation (4.22), we identify the constants h and H:
h = H.

(4.27)

Next, consider the two terms of the right hand side of equation (4.26). The ratio of the
second to the first is 2h2 u2 , which, for planetary orbits, is very small (e.g. for Mercury, it
2

du/d = 0 implies r = constant corresponding to Newtonian circular orbits.

84

is approximately 7.7 108 ). Thus equation (4.26) approximates the Newtonian DE (4.13)
provided we identify the metric Schwarzschild constant m according to
m = GM,

(4.28)

where G is the gravitational constant and M is the mass of the sun.


For convenience in analyzing the DE (4.26), we write
A=

m
,
h2

= 3mA,

(4.29)

so that (4.26) reads


d2 u
u2
+
u
=
A
+
d2
A

(4.30)

where is a small dimensionless parameter (of order 108 for planetary orbits). Consider a
perturbation expansion of the form
u() = u0 () + u1() + O(2).

(4.31)

Substitute (4.31) into (4.30), and compare powers of . This yields


d2 u 0
+ u0 = A,
d2
d2 u 1
u20
+
u
=
.
1
d2
A

(4.32)

Thus the zeroth order term u0 satisfies the Newtonian DE and hence is of the form
u0 () = A + B cos ,

(4.33)

where B is a constant and we have used the freedom + 0 to set the other integration
constant to zero. Substitute (4.33) into (4.32) and rearrange to obtain:


B2
B2
d2 u 1
+
2B
cos

+
+
u
=
A
+
cos 2.
(4.34)
1
d2
2A
2A
A particular3 solution of this DE is
u1 = A +

B2
B2
+ B sin
cos 2.
2A
6A

The solution (4.31) can thus be written in the form


u() = A + B cos + B sin + (periodic terms).
3

We need only use a particular solution of equation (4.13), since the zeroth order solution (4.33) contains
a term B cos , which is essentially the general solution of the homogeneous DE associated with (4.34).

85

We have singled out the term B sin among the terms of first order in , since it is the
only one which is non-periodic, and hence the only one which can contribute to a precession
of the perihelion (which is a departure from periodicity).
In order to obtain an approximate expression for the perihelion precession, we write (to
first order in ):
B cos + B sin B cos[(1 )].
Thus, on neglecting the small periodic variations
u() A + B cos[(1 )].
This expression has a maximum (corresponding to a minimum of r) when
(1 ) = 2n,

n an integer.

To first order in this can be written


= 2n(1 + ).
Thus successive perihelia will occur at intervals of
2(1 + )
instead of at intervals of 2, as in exactly periodic motion. This means that the perihelion
precession per revolution (i.e. change in angular position of the perihelion per revolution) is
= 2 > 0
corresponding to an advance of the perihelion. On using equation (4.29) together with
equations (4.27), (4.28) and (4.15) this can be written
=

6GM
,
a(1 e2 )

(4.35)

where e is the eccentricity and 2a the length of the major axis of the zeroth order elliptic
orbit. Values of calculated from this formula are compared with observed values in Table
4.1 in Section 4.2.4. [For another derivation of (4.35) Rindler (1969) pages 174-178. for a
derivation using the first order DE on page 133 see Moller (1972) pages 495-497.]

4.2.3

Predictions concerning planetary orbits

Newtonian Theory:
A bound orbit of a test particle in a spherically symmetric gravitational field is exactly an
ellipse:
r=

2a = length of major axis

a(1 e2 )
,
1 + e cos

e = eccentricity
86

General Relativity Theory:


Any bound orbit of a test particle in a spherically symmetric gravitational field is approximately4 an ellipse. However, the perihelion precesses by an amount
=

6GM
,
a(1 e2 )

per revolution,

where G is the gravitational constant, and M is the gravitational mass of the source, 2a is
the length of the major axis, and e is the eccentricity of the approximate ellipse (in units
such that c = 1).

Figure 4.3: Schematic illustration of the precession of the oribt of a planet.

4.2.4

Observations concerning planetary orbits

1) By observations from earth, deduce the orbit of Mercury relative to heliocentric coordinates.
2) Using Newtonian perturbation theory eliminate the small influences of the other planets, and the fact that Mercury is not exactly a test particle.
3) One finds that the resulting orbit is not exactly an ellipse, but is an approximate ellipse,
with a perihelion advance of
(42.56 0.94) secs. of arc per century.
4

The derivation depended on the gravitational field being weak i.e. comparable to the field exterior to
our sun. The derivation may fail for orbits close to very dense objects, e.g. neutron stars.

87

Hence Newtonian gravitational theory fails.


4) For a test particle orbit having the dimensions of Mercurys orbit, GRT predicts a
perihelion advance of
43.03 secs. of arc per century.
Hence GRT provides a more accurate description of orbits in a spherically symmetric
gravitational field.
Remark 4.3: The justification for using Newtonian theory to eliminate the effects of the other
planets is as follows: in the suns gravitational field, the GRT effects are small. However,
the perturbing fields due to the other planets are much weaker since the total mass of the
1
planets is about 1000
the mass of the sun. Hence GRT effects due to these perturbing fields
will be unobservable with present day technology.5
The results are summarized in the following table.
Table 4.1: Observational data for the perihelion advance for planetary orbits.
Planet

Sidereal
Period
(Years)

Semimajor
Axis (a)
in AU

Eccentricity
(e)

Mercury

.241

.387

.206

Calculated from
GRT
43.03

Venus

.541

.723

.007

8.6

Earth

1.000

1.000

0.17

3.8

Mars

1.881

1.524

0.93

1.35

Icarus

1.12

1.007

.827

10.3

4.3
4.3.1

Perihelion Advance, seconds of arc/century


Residual Observed
Value
42.56 0.94
8.4 4.8
4.6 2.7

1.5 0.04
9.8 0.8

Deflection of Light and Radio Signals


Theory

Electromagnetic signals (e.g. light and radio signals) are assumed to propagate along null
geodesics in spacetime. Since the null geodesics are determined by the metric tensor, which
describes the gravitational field, GRT predicts that the propagation of electromagnetic signals is affected by any gravitational fields they pass through.
We consider the situation in which light or radio signals from a distant object (star or
quasar) pass close to the suns surface before being received on earth. The deflection of the
signals due to the suns gravitational field is large enough to be measured.
5

I.I. Shapiro et al (1972): Mercurys Perihelion Advance: Determination by radar, Phys. Rev. lett. 28
1594-1597.

88

The relevant worldlines are null geodesics of the Schwarzschild metric. For an arbitrary
null geodesic we can choose coordinates to that = 2 along the geodesic, as in the TL case
(see Figure 4.4). Again, regard the radial coordinate r as a function of the angle and write
u() = 1/r().

WL of star

WL of observer
null geodesic

light ray grazes surface of sun

world tube of the sun


Figure 4.4: Spacetime diagram with = 2 , showing the deflection of light from a star by
the sun.
One find (exercise) that the differential equation (4.26) is replaced by
d2 u
+ u = 3mu2 .
d2

(4.36)

The term 3mu2 is small compared to u ( 7 106 on the suns surface) and so we can use
perturbation approach.6 We seek a solution of the form:
u = u0 + u1 + O(2),

(4.37)

= 3m/R .

(4.38)

with
6

Again an exact solution is not possible.

89

where R is the solar radius. Note that is a small ( 105 ) dimensionless quantity, for the
sun.
On substituting (4.37) into (4.36), we obtain
d2 u 0
+ u0 = 0,
d2
d2 u 1
+ u1 = R u20 .
d2

(4.39)
(4.40)

The general solution of (4.39) can be written as


u0 () =

1
R

cos ,

(4.41)

where R is an arbitrary constant, and we have used the freedom in choice of to eliminate
the other constant of integration. On using (4.41) a particular solution of (4.40) is found to
be
R
(cos2 + 2 sin2 ).
u1 =
3R2
Thus the first order approximation (4.37) assumes the form
u

1
R

cos +

m
(cos2
R2

+ 2 sin2 ).

(4.42)

This equation describes the spatial path of a light ray in a 2-space = /2.
In order to relate this equation to observations we identify the Schwarzschild coordinates
r, , with the heliocentric coordinates used by astronomers (as in the case of planetary
orbits) and analyze the equation as an equation in E3 (see Figure 4.5). It helps to introduce
Cartesian coordinates x, y in the 2-spaces = /2, t = constant:
x = r cos ,

y = r sin .

Since u = 1/r, we obtain from (4.42)


x=R

m
R

x2 + 2y 2
p
x2 + y 2

(4.43)

The zeroth order approximation (when m = 0) is the straight line x = R, which grazes the
Suns surface if we choose
R = R .
For large y, (4.43) assumes the form
(
R
x=
R+

2my
,
R
2my
,
R

if y > 0
if y < 0.

(see Figure 4.6). The angle (in radians) between the asymptotic directions of the limit
ray is 4m/R, which becomes, since m = GM,
4GM
radians.
R
This formula gives the deflection of a light signal grazing the surface of a spherical star
of mass M and radius R. With M = M and R = R , 1.75 .
=

90

4.3.2

Observations

Two types of observations have been performed to detect this effect, which was the first
actual prediction of GRT.
1) During a solar eclipse, the deflection of light from stars, when the light passes close to
the suns surface, can be measured. Prior to 1968, this was the only type of experiment
possible to verify this prediction.
z
E3
=

y
sun

x
Figure 4.5: Deflection of a light signal by the sun.

2) The development of radio astronomy in the 1960s led to another test in which radio
signals are deflected by the suns gravitational field. Each October 8, the sun, as seen
from earth, passes in front of the quasar 3C279. By monitoring the angular separation
between 3C279 and the nearby quasar 3C273 before and after occlusion of 3C279, radio
astronomers can measure the deflection of the radio signals from 3C279.
The results of both observations are consistent with GRT, but the accuracy is as yet
insufficient to exclude certain other metric theories of gravity, e.g. Brans-Dicke theory.
Ref: J.M. Hill (1971): Monthly Notices Royal Astronom. Soc. 153, 7p-11p, and Misner,
Thorne and Wheeler (1972) pages 1101-1105.

4.4

Gravitational Frequency Shift

Consider two worldlines


C 1 : r = r1 ,

= 0 ,

= 0

C 2 : r = r2 ,

= 0 ,

= 0

91

in the Schwarzschild spacetime (regarded as describing either the suns or the earths gravitational field). Suppose that n pulses of electromagnetic radiation are emitted and subsequently
received, as shown.
Let e be the frequency of emission (as measured by 01 s clock) and r be the frequency
of reception (as measured by 02 s clock).
light ray
x=R+

2my
R

x=R

2my
R

R
y
=

2m
R

= 2

2m
R

surface of sun

Figure 4.6: Asymptotic form of the light ray grazing the sun.
Then
e =

n
,
1

r =

n
.
2

Hence

r
1
=
e
2
If is the time elapsed along a TL curve, then

(4.44)

dxi dxj
gij = 1.
d d
Evaluate this along C1 or C2 :
 0 2
dx
1 =
g00 ,
d

i.e.

Hence
=

1=

2m
r

dt
d

2

2m
r

t,

along C1 or C2 . [Note that , t represent finite changes in , t.] Equation (4.44) becomes
s
1 2m/r1 t1
r
=
,
(4.45)
e
1 2m/r2 t2
where t1 = tQ tP , and t2 = tQ tP . The null geodesics in the sketch satisfy
, = constant,
92

and hence are given by


L=0= 1
Thus

dt
= 1
dr
i.e.


2m 1
r

2m
r

t2 + 1


2m 1
r

r 2 .

along any such null geodesic,


Z r2
1
t =
1 2m
dr.
r
r1

Hence

tQ tQ = tP tP

(see Figure 4.7)

which implies
t1 = t2 .
Thus (4.45) simplifies to
r
=
e

1 2m/r1
.
1 2m/r2

Since 2m/r 1 within the solar system this can be written




r e
GM r12 r11 .
e
Thus if the receiver (r = r2 ) is further away from the gravitating object (sun or earth) than
is the emitter (r = r1 ), then r < e , and a frequency shift to the red is predicted.
Experiments:
1) Pound and Rebka (1960) and Pound and Snider (1965) measured the redshift of 14.4
KeV gamma rays from Fe57 crystals in the earths gravitational field over a height of
74 ft. The predicted frequency shift / is 4.9 1015 , which is consistent with
experimental results [see Adler, Bazin and Schiffer (1975) for more details].
2) Brault (1962) measured the red shift of the Sodium D line for light emitted by an atom
on the suns surface, and received on earth. There is a red shift in this case due to the
suns gravitational field.
Comment: See Appendix B for a more general approach to the gravitational frequency shift.

93

n pulses received
in 2 seconds

P
Q
null geodesics

n pulses emitted
in 1 seconds
C2

receiver
r = r2

P
C1

observer O2

emitter
r = r1
observer O1

Figure 4.7: Light signals emitted by one observer and received by another.

4.5

GRT and Observation - Summary

We have seen that the solar system forms a laboratory for testing theories of gravitation, in
particular Newtonian theory and GRT. Its advantage is that it is relatively accessible; its
disadvantage is that it is a relatively weak field system, so that the differences in prediction
between the various theories are extremely small.
The recently discovered (1974) binary pulsar 1913 + 16 (i.e. a binary system, one of
whose members is a pulsar, a pulsar being a rotating neutron star with dipole magnetic
field, is the first possibility for a strong field laboratory. The disadvantage here is that it
is relatively inaccessible, being distant ( 6200 pc) and only susceptible to observation by
radio telescope. By monitoring the pulses of radiation emitted by the pulsar using a radio
telescope, it is possible to plot a velocity curve for this binary system, and hence infer
information concerning the orbit. For example, the total mass of the system is 2.8M, the
94

period of revolution is 8 hours, and the average distance apart is of the order of 1R (this is
not known exactly, since the inclination of the binary orbit is not known). Note that typical
pulsar radii are 10 km. Thus in this system one has a large amount of matter in a small
region of space, and hence one obtains a strong gravitational field (compared to the solar
system). The perihelion advance has been observed to be a massive 4 per year, but as yet
this cannot be used to test GRT since other orbital parameters are not known with sufficient
accuracy. At any rate, this is the first perihelion advance to be observed outside the solar
system.

4.6

The Schwarzschild Metric and Black Holes

Roughly speaking a black hole arises when the gravitational field of a star becomes so strong
that photons emitted from the surface of the star cannot escape and be received by external
observers (i.e. observers at a considerable distance away). In other words, a black hole is
a star which cannot be seen optically, but whose gravitational field can still be felt, i.e. it
attracts, and can even trap other matter.
The possibility of black holes arising in nature, was first suggested by P.S. Laplace in 1798
[see Hawking & Ellis (1973), The large Scale Structure of Spacetime, Appendix A]. That
GRT essentially predicts the existence of black holes was first suggested by Oppenheimer
and Snyder in 1939. However, the intensive theoretical study of these objects was only begun
in the early 1960s, while attempts to detect them experimentally were begun in the 1970s.
The name black hole was coined by J. Wheeler and R. Penrose in the late 1960s.

4.6.1

Behaviour of the Schwarzschild line-element near r = 2m

So far we have considered the Schwarzschild line-element


ds2 = 1
for situations in which


2m 1
r

dr 2 + r 2 (d2 + sin2 d2 ) 1
2m
r

2m
r

dt2 ,

(4.46)

1,

corresponding to weak and hence Newtonian-like gravitational fields. However, the lineelement (4.46) is well-behaved for all r such that r > 2m, and also 0 < r < 2m. But this
line-element is certainly singular [since grr is unbounded] when r = 2m. It is natural to ask
the following questions:
What is the nature of this singularity? Does the surface r = 2m act as a
physically impenetrable barrier which prevents particles from passing from the
region r > 2m to the region r < 2m?
In order to gain insight into this, we consider a test particle which is released from rest by
some preferred observer (i.e. an observer whose worldline is r, , = constant) in the region
95

t
WL of infalling particle
t = t2
light signal

t = t1
WL of preferred observer
at large distance
(r = constant 2m)
O

2m

Figure 4.8: Schwarzschild spacetime with , = constant, and r, t drawn as Cartesian coordinates, showing the worldline of an infalling particle.

r > 2m, and then falls freely (see Figure 4.8). Its world line will be a timelike geodesic, and
for simplicity we assume that the motion is purely in the radial direction i.e.
= constant,

= constant

along the geodesic. The Lagrangian is


1 2
L = 1 2m
r + r 2 (2 + sin2 2 ) 1
r

The requirement that the geodesic be timelike is

(4.47)

2m
r

t2 .

(4.48)

x a x b gab = 1,
or equivalently
L = 1.
On account of the assumption (4.47) this yields
1 2
1 2m
r 1
r

2m
r

t2 = 1.

(4.49)

Here denotes differentiation with respect to time along the geodesic. A second equation
is obtained from the Euler-Lagrange equation
 
L
d L

= 0.
d t
t
96

Since L/t = 0, this yields


1

2m
r

t = k,

(4.50)

where k is a constant of integration. Equations (4.49) and (4.50) are sufficient to determine
the geodesic, since it lies in a 2-space , = constant. Since we wish to consider a particle
which is falling inwards, and since t increases in the future direction we require
t > 0.

r < 0,

(4.51)

.
Suppose that the particle is released from rest at r = r0 , t = t0 i.e.
r = 0 when r = r0 ,

t = r0 .

This implies that the constant k in (4.50) has the value


q
.
k = 1 2m
r0

(4.52)

It turns out that the integration of (4.50) and (4.51) is simplest if we let r0 , i.e. suppose
that the particle is released from reat at infinity. Then from (4.52),
k = 1,

and by combining (4.49) and (4.50) we obtain


q
r = 2m
.
r
This can be integrated to yield

= 4m
3


r 3/2
2m

+ constant

(4.53)

The constant depends on the choice of origin for . Using (4.53), one further finds that
)
(

r 1/2


+
1
3/2
1/2
2m
r
r
+ constant
(4.54)
t = 4m
4m 2m
+ 2m log

3
2m
r 1/2

1
2m

Equations (4.53) and (4.54) describe the motion of the freely-falling test particle.
The essential question is this: does this test particle reach r = 2m?
From (4.54), it follows that
lim + t =
r2m

i.e. the timelike coordinate t becomes infinite as one approaches r = 2m along this geodesic.
Recall that for an observer with world line r, , = constant, r 2m, the coordinate t
corresponds very closely to time as measured by these observers. We thus conclude that to
such observers the particle never actually appears to reach r = 2m (and hence presumably
cannot pass through r = 2m).
97

However, whether or not the particle actually reaches and crosses r = 2m depends not
on the values of t, but on time as measured by a clock falling with the particle. It follows
from (4.53) that
lim = 4m
+ constant,
3
r2m

i.e. r approaches 2m arbitrarily closely along the geodesic in a finite time (and hence
presumably passes through r = 2m).
It could happen, however, that particles which approach r = 2m are destroyed by infinite
tidal accelerations, created by infinite spacetime curvature. To shed light on this, one must
consider the behaviour of scalars7 built from the curvature tensor, e.g. Rijk Rijk . One finds
that these scalars are finite at r = 2m, e.g.
Rijk R

ijk

48m2
=
.
r6

(4.55)

[Some calculation is needed to verify this.]


It thus appears that the singularity in the line-element (4.46) at r = 2m is not a physical
singularity, but is due to the breakdown of the (r, , , t) coordinate system at r = 2m. We
thus look for a new coordinate system in which the line-element will be well-behaved at
r = 2m.

4.6.2

The Eddington-Finkelstein line-element

In order to eliminate the singularity at r = 2m we seek a change of coordinates which


eliminates the dr 2 term in the Schwarzschild metric. In the flat case (m = 0) this metric is
ds2 = dt2 + dr 2 + r 2 (d2 + sin2 d2 ).
To remove the dr 2 term simply define
v = t + r.
Then
ds2 = dv 2 + 2dvdr + r 2 (d2 + sin2 d2 ).
For the non-flat case, we try a coordinate transformation of the form
v = t + f (r).
The requirement that the dr 2 term vanish leads to a simple differential equation which has
the general solution

r
1 + constant.
f (r) = r + 2m log 2m

Thus we replace the t coordinate by a coordinate v, defined by



r
1 ,
r > 2m.
v = t + r + 2m log 2m
7

(4.56)

An infinity in a particular component of Rijk could merely reflect a breakdown of the coordinate system.

98

It follows that the line-element (4.46) assumes the form


 2
dv + 2dvdr + r 2 (d2 + sin2 d2 ).
ds2 = 1 2m
r

(4.57)

In the new coordinate system (v, r, , ) (x0 , x1 , x2 , x3 ) the metric components gab are
given by


1 2m
1
0
0
r

1
0 0
0

(gab ) =
,
0
0 r2
0

0
0 0 r 2 sin2

and the determinant of the metric is

det(gab ) = r 4 sin2 .

(4.58)

Equation (4.55) for the curvature scalar Rijk Rijk remains valid.
The line-element (4.57) is referred to as the Eddington-Finkelstein line-element. For
r > 2m, it is equivalent to the Schwarzschild line-element (4.46), since it was obtained from
(4.46) by a coordinate transformation. The metric components (4.58) thus satisfy Rab = 0
for r > 2m (tensor equation). However, and this is the important difference, the line-element
(4.57) is well-behaved [i.e. the components are at least C 2 functions, and det(gab ) 6= 0] not
only for r > 2m, but for all values of r satisfying r > 0. In addition, it satisfies the vacuum
field equations Rab = 0 for all r > 0.
We will refer to the spacetime determined by the line-element (4.57), with r > 0, as
the Eddington-Finkelstein spacetime. It contains the Schwarzschild spacetime as a proper
subset, namely those events that satisfy r > 2m.
The Eddington-Finkelstein spacetime is best studied by considering its radial null geodesics
i.e. null geodesics which satisfy
= constant,

= constant.

(4.59)

With (4.59), the line-element (4.57) reduces to


ds2 = 1

2m
r

dv 2 + 2dvdr.

The null geodesics of this line-element fall into two classes (exercise):
Class 1:
v=C

(4.60)

Class 2:
(a)
(b)
(c)

v = C + 2[r + 2m log(r 2m)],

r > 2m

v = C + 2[r + 2m log(2m r)],

0 < r < 2m,

r = 2m,

99

(4.61)

v = constant (a null geodesic)

Figure 4.9: Null coordinates in two-dimensional flat spacetime.

where C is an arbitrary constant.


It is essential to sketch the null geodesics in order to understand the geometry. In order to
make a sketch one has to choose a convention for drawing the coordinate lines r = constant,
v = constant. A natural choice can be motivated by considering the flat space limit, with ,
held constant. The original t and r ( radial distance) are usually represented as Cartesian
coordinates. Since v = t + r, the lines v = constant have slope 1, and hence in the flat
space limit, v and r are oblique Cartesian coordinates, as drawn in Figure 4.9.
For the curved spacetime diagram, we thus choose to represent r, v as oblique Cartesian
coordinates. This leads to the spacetime diagram shown in Figure 4.10.
Remark 4.4:
(1) The arrows indicate the future direction along all curves drawn. The null cones drawn
are the future half cones. We see that photons emitted in the region 0 < r < 2m
cannot escape into the region r > 2m. This means that an observer in the region
r > 2m cannot observe anything that occurs in the region 0 < r < 2m.
(2) At r = 0, we have what is called a curvature singularity, since the curvature of spacetime becomes unbounded there. This is indicated by the fact that the polynomial
curvature scalar Rabcd Rabcd becomes unbounded there, (see equation (4.55)).

4.6.3

Model of a collapsing star

For a sufficiently large spherically symmetric star which has completed nuclear burning, the
inward pull of gravity dominates outward pressures due to the structure of the matter to the
100

extent that the star cannot reach a final equilibrium state, and thus collapses. This process
is called continued gravitational collapse. For a spherically symmetric star, the spacetime
exterior to the star will be the Schwarzschild, or equivalently the Eddington-Finkelstein
spacetime. Eventually, the surface of the collapsing star will pass through the hypersurface
r = 2m in the Eddington-Finkelstein spacetime, after which the star will be cut off from
external observers. In order to represent this situation in a diagram like Figure 4.10, we note
that the effect of fixing the , coordinates is to restrict consideration to a single particle
on the stars surface. The worldline of this particle will be a timelike curve in Figure 4.11,
which intersects r = 2m.
Remarks 4.5:
1) The events r = 2m, which form a line in the above diagram, form a hypersurface in
the 4-d Eddington-Finkelstein spacetime, called the Schwarzschild event horizon. The
vacuum region of spacetime insider the event horizon is called the Schwarzschild black
hold.
2) The name black hole arose from a consideration of the optical appearance, as seen
by a distant observer, of a spherically symmetric star which collapses through the
event horizon. An external observer, i.e. one who remains in the region r > 2m, will
never see the surface of the star pass through the event horizon. In fact, a preferred
observer (i.e. one whose world line is r = constant > 2m, , = constant) will continue
to receive light signals from the surface of the star indefinitely, but this will be light
emitted by the star before passing through r 2m. The light received by the preferred
observer will, however, have a red shift which increases without bound as r 2m+ .
Consequently the observed luminosity of the star decreases to zero very rapidly. Thus
although the surface of the star in principle never disappears from sight, in practice
it rapidly becomes so faint as to be invisible, i.e. it becomes black to an external
observer [see Misner, Thorne and Wheeler (1972) pages 862-875]. One is thus left
with an object which is optically invisible, but which still produces a Schwarzschild
gravitational field, i.e. spacetime still has Schwarzschild geometry outside the event
horizon.
3) Any observer who falls through the event horizon can never escape: no matter how
strongly he accelerates he will be pulled into the curvature singularity at r = 0 and
destroyed, within a finite time as measured by his clock.
In Figure 4.12 we now give a 3-d version of Figure 4.11, obtained by rotating the diagram in Figure 4.111 through 2 radians about r = 0, thereby representing the angular
coordinate .
4) For all spherical configurations of matter in equilibrium, the value of the Schwarzschild
radius r 2m is (much) less than the physical radius of the object, and hence an event
101

horizon does not form (the Schwarzschild solution is only valid exterior to the body
insider the body the non-vacuum field equations must be satisfied by the metric
components). The following table gives some values:
Mass (gms)
1.67 1024
6 1027
2 1033
2 1033
1045

proton
earth
sun
typical neutron star
typical galaxy

4.6.4

Radius (cm)
1013
6.4 108
7 1010
104
1023

Schwarzschild radius (cm)


2.4 1052
.88
2.9 103
2.9 103
1017

Black holes and observations

The observational search for black holes is based on two theoretical results:
1) matter, when attracted into a black hole, will be sufficiently heated, due to the large
amount of kinetic energy imparted to it, as to emit X-rays.
2) Compact (R . 1000km) stars in equilibrium are either white dwarfs, with mass
M 1.2 M

[Chandrasekhar mass limit]

or neutron stars, with mass


M . 3M
Observational procedure:
1. Find a compact X-ray source in the sky, e.g.
Cygnus X-1
2. Hopefully identify it with an optical binary, e.g.
HDE226868, in the case of Cygnus X-1
3. Estimate the mass of the optically unseen compact companion, which is the X-ray
source, e.g.
M & 9M
in the present example.
4. If the mass exceeds the white dwarf and neutron star mass limits, then by elimination,
the unseen companion must be a black hole.
These observations are still in their infancy (post 1972) and hence one cannot be too
dogmatic about any results to date.
Ref. e.g. C.T. Bolton (1975): Orbital Elements and an Analysis of Models for HDE226868
= CYGNUS X 1, Astrophysical Journal, 200, 269-277.
102

WL of infalling particle
(TL geodesic)

curvature
singularity

v = constant

r
r=0

r = const < 2m

r = 2m

r = const > 2m

Figure 4.10: Eddington-Finkelstein representation of the worldline of an infalling particle,


and the two families of radial null geodesics.

103

black hole
event horizon (r = 2m)

curvature singularity
(r = 0)

WL of centre of star

non-vacuum spacetime

0110
1010
1010
10
1010
10
10
1010
10
10
1010
10
1010
1010
1010

light from surface of the star


received by an external observer

WL of preferred observer
(r = r0 > 2m)

WL of a particle in stars surface


(, = constant)

Figure 4.11: Eddington-Finkelstein representation of spherical gravitational collapse, with


and fixed..

104

black hole

event horizon

light signals
which escape
surface of star
as it passes
through the
event horizon

WLs of particles
in the surface
of the star

interior of star

Figure 4.12: Eddington-Finkelstein representation of spherical gravitational collapse showing



three spacetime dimensions = 2 .

105

106

Chapter 5
An Introduction to Relativistic
Cosmology
5.1

Introduction

Cosmology . . .
is it pure speculation unfettered
by observation?
OR
is it physics?
Fowler (1969) Cosmology is mostly a dream of zealots, who would oversimplify at the
expense of understanding.
Bondi (1952) The aim of this book is to present cosmology as a branch of physics in its
own right.

5.1.1

Aims of Cosmology

The aim of cosmology is firstly to determine the large-scale structure and evolution in time
of the universe, by constructing models of the universe whose parameters can be determined
by observation. Such a model may also lead to predictions which can be used to test the
underlying theory. Having determined the structure of the universe, a second aim of cosmology is to explain why the universe should have this structure. It should be stressed that
cosmology is not primarily concerned with the detailed physical processes that occur in the
universe, such as galaxy formation, nucleosynthesis, origin of life etc. (although some of
these processes do have cosmological implications). Instead cosmology attempts to provide
the underlying framework in which these physical processes take place.

107

5.1.2

Unique Difficulties

Cosmology is made more difficult than conventional physics by the fact that we observe the
universe only from one position in space and time (from the large scale cosmological point
of view) i.e. we are unable to choose the time or position from which we view the universe.
From the space-time point of view, we receive signals only along one past light cone. [Extragalactic observations have been made for approximately 60 years, whereas the age of the
universe is of the order of 1010 years. Thus on a cosmological time scale, our observations are
made at a single instant in time.] So the aims of cosmology are really quite preposterous. . .
by catching a few photons, we propose to determine the large scale structure of the whole
universe! It is not surprising that strong simplifying assumptions have to be made.

t = t0
the current epoch

O
past null cone
at O

WL of a
distant galaxy

WL of our
galaxy
Figure 5.1: the spacetime representation of cosmological observations.

An additional difficulty is that there is only one universe to observe, so we cannot infer
its probable nature by comparing it to similar objects. Thus one has far less freedom in
obtaining information about the object to be studied than in conventional physics.

5.1.3

The Starting Point

The starting point for cosmology is


1) a knowledge of local (experimentally tested) physical laws, and
2) a knowledge of the local astronomy of the universe, i.e. the organization of matter into
stars, star clusters, galaxies and clusters of galaxies.
In other words, the aim of cosmology is to use knowledge of a local (i.e. small scale)
nature to deduce information about the global (i.e. large scale) nature of the universe.
108

5.1.4

An Unverifiable Hypothesis

The known laws of physics have inevitably been tested only in our immediate vicinity. In any
program of cosmology we are compelled to make an hypothesis about the extent of validity
of these laws. In particular we must assume
1) that these laws of physics are valid locally in any other region of the universe where
it makes sense to apply them (e.g. identical stars in different regions will evolve in an
identical manner) see, Ellis (1975)
2) that the local laws of physics, when applied on the largest scales, will correctly predict
the large scale structure of the universe.
It appears that these two assumptions form an hypothesis which is essentially unverifiable.
As regards assumption (2), since gravity is the longest range force, it is assumed that
the large-scale structure of the universe is governed by gravity (except possibly at very early
times).

5.1.5

The Choice of a Theory of Gravity

We must clearly choose a theory of gravity which is compatible with local observations
and experiments. It seems natural to choose the simplest of such theories, namely general
relativity theory (GRT). So our framework for studying the universe will be spacetime, a
four-dimensional manifold on which is defined a Lorentzian metric tensor g. The metric
tensor in turn determines the propagation of photons (i.e. electromagnetic signals) which is
fundamental for observations in cosmology.
We should mention that this framework covers most (all?) viable gravitational theories.
The differences between the theories lie in the field equations or in extra fields that are
introduced. It is thus useful, when developing the cosmological program, to keep track of
exactly where the field equations are used, since this may lead to a test of GRT, modulo the
unverifiable hypotheses of Section 5.1.4, and any other hypotheses that may be made.
So in Sections 5.2-5.6, we will examine the consequences of the fundamental simplifying
assumption of cosmology, namely, the Cosmological Principle, independently of the Einstein
field equations. This leads to the Friedmann-Robertson-Walker line-element, which enables
one to interpret the basic cosmological observations. Then in 3 we impose the Einstein field
equations and consider to what extend this leads to a unique model of the universe.

5.2
5.2.1

The Friedmann-Robertson-Walker Line-Element


The Fundamental Timelike Congruence

From a large-scale viewpoint, one can treat the galaxies in the universe as particles of
a gas or fluid, which fills the universe. We will ignore the internal structure of these
109

particles (e.g. stars, globular clusters). The particles cluster on a small scale (clusters of
galaxies 3107 light years) but we shall ignore this clustering. To simplify matters further,
we ignore the particulate nature of the gas, by treating it in the perfect fluid approximation.
We imagine the universe as filled with fundamental observers, defined to be observers
who see the galaxies in their neighbourhood as having zero mean motion. The set of all
worldlines of the fundamental observers, one through each event, is said to form a congruence
of timelike curves. Let u denote the 4-velocity of the fundamental observers i.e. u is the unit
tangent vector to the congruence of timelike curves. In practice, we think of particular curves
of this congruence as being the worldlines of particular galaxies, although strictly speaking,
the congruence only describes the average smoothed out motion of the galaxies.

u
u

Figure 5.2: Worldlines of fundamental observers and their future light cones.

We will also assume that there is a family of hypersurfaces which are orthogonal to
the 4-velocity u. These hypersurfaces will subsequently be interpreted as hypersurfaces of
simultaneity. We shall label these hypersurfaces t = const., and will use t as the timelike
coordinate.
In order to assign spatial coordinates we first assign spatial coordinates to events on one
hypersurface t = t0 . Any event P with t 6= t0 corresponds to a unique event P0 on t = t0 ,
which is determined by the fundamental worldline through P . The event P is then given the
same spatial coordinates as the event P0 . Thus the fundamental worldlines are given by x =
constant, and (by choice) are orthogonal to the hypersurfaces t = consts. It follows that
relative to these coordinates (called comoving coordinates), the metric tensor components
satisfy
g0 = 0,

= 1, 2, 3.

Hence the line-element and fundamental 4-velocity have the form


ds2 = V 2 dt2 + g dx dx ,
110

(5.1)

t = t1

P0

t = t0

Figure 5.3: The hypersurfaces of constant time.

ua =

1 a
,
V 0

(5.2)

where V and the g are functions of t and the x .


In order to proceed further we need to introduce the Cosmological Principle.

5.2.2

The Cosmological Principle

The Cosmological Principle consists of the following assumptions.


I: The universe is isotropic about our galaxy, more precisely, at any event on the fundamental worldline corresponding to our galaxy, all directions orthogonal to the world line
are equivalent e.g. number counts of distant galaxies and radio sources are independent
of direction.
II: The universe is spatially homogeneous, more precisely, all events on any fundamental
hypersurface t = const, are equivalent as regards the values of physical and geometric
quantities.
Remarks 5.1:
1) Of course both of these assumptions are intended to apply on a sufficiently large scale
the night sky is certainly not isotropic to the human eye.
2) Assumption I is supported to a fair degree of accuracy by direct observations (more
later). On the other hand, II is very difficult to test directly.
111

t = const
P

O1

O2

Figure 5.4: Each fundamental observer sees the same pattern of anisotopy.

3) Assumptions I and II are together equivalent to the assumption that the universe is
isotropic about any fundamental world line.
4) One can conceive of a universe which is spatially homogeneous but not isotropic about
any fundamental world line. Spatial inhomogeneity would imply that each fundamental
observer would see the same anisotropies. For example the arrows in figure 5.4 could
represent velocities of recession (and hence redshifts) of other galaxies. Each fundamental observer would see the same anisotropy in the (relative) velocities of recession.
5) On can also conceive of a universe which is isotropic about one fundamental world line,
but is not spatially homogeneous, e.g. in figure 5.5 we have isotropy about O1 , but not
about O2 , and hence the events P and Q are not equivalent i.e. the universe is not
spatially homogeneous.
6) The model in (5) places our galaxy in a preferred position in the universe. This violates
the so-called Copernican Principle.

Copernican Principle:
II : Our galaxy is not in a preferred position in the universe (on a large scale).
Note that I plus II implies isotropy about each fundamental world line, which in turn
implies spatial homogeneity. So the Copernican Principle is incorporated in the Cosmological Principle.
Here are some quotes pertaining to the status of the Cosmological Principle.
112

t = const
P

O1

O2

WL of our galaxy
Figure 5.5: Observer 01 , sees an isotropic universe, but any other observer sees anisotropy.

Weinberg (1972) Antianthropocentrism has been incorporated into the scientific


mentality, and no one now would seriously suggest that the earth, or the solar
system, or our galaxy, or our local group of galaxies, occupies any specially
favoured position in the cosmos (p. 407).
BUT. . .
Carter (1974) Copernicus taught us the very sound lesson that we must not
assume gratuitously that we occupy a privileged central position in the Universe.
Unfortunately there has been a strong (not always subconscious) tendency to
extend this to a most questionable dogma to the effect that our situation cannot
be privileged in any sense.
The Copernican principle should be contemplated in conjunction with the Anthropic
Principle.
The Anthropic Principle:
The universe can only be observed from places where it is possible for life to evolve.
So we are faced with the question:
The Cosmological Principle. . .
is it fundamental to all cosmological theories
OR
is it to be accepted until we find out more?
113

Weinberg (1972) The real reason, though, for our adherence here to the Cosmological Principle is not that it is surely correct, but rather that it allows us to
make use of the extremely limited data provided to cosmology by observations.
(p. 408)
Hawking and Ellis (1973) We are not able to make cosmological models without
some admixture of ideology.
So, with some reservations, we use the Cosmological Principle to determine the large
scale geometry of the Universe.

5.3

Derivation of the Friedmann-Robertson-Walker (FRW)


Line-element

By assumption I, spacetime is spherically symmetric about our galaxy. If we introduce


spherical spatial coordinates r, , with the world line of our galaxy given by r = 0, the
line-element (5.1) will simplify to
ds2 = V 2 dt2 + e2 dr 2 + e2 (d2 + sin2 d2 ),

(5.3)

with V, , being functions of r and t only.


By assumptions I and II the universe is isotropic about each fundamental world line.
This implies that the fundamental world lines must be geodesics (i.e. zero 4-acceleration),
since otherwise this 4-acceleration
Dui
ai =
D
would define a preferred direction orthogonal to ua in violation of isotropy. We can write
b
Dua
a dx
= u ;b
= ua ;b ub.
D
d

The geodesic condition


ua;b ub = 0
in conjunction with (5.3) and (5.2) is easily shown to imply
V = V (t)
(exercise). The t-coordinate can now be redefined according to t = f (t) to obtain V = 1.
We make this transformation, and then drop the tilde on t.
The t-dependence in the metric function and leads to a change in the spatial distance
between fundamental observers as time evolves. The Cosmological Principle strongly restricts
this t-dependence as follows:

114

t = const
Q

P
(r, , + )

(r + r, , )
Q0
R0

P0
(r, , )

t = t0

Figure 5.6: The distance between three neighbouring fundamental observers.

Consider 3 neighbouring fundamental observers with worldlines as indicated in Figure 5.6.


Let (P0 , R0 ) etc. denote the relevant spatial distances. Isotropy of the motion about the
world line P0 P requires that
(P0 , Q0 )
(P0 , R0 )
=
.
(P, R)
(P, Q)
On using the line-element (5.3), in the limit as r, 0, this condition yields
e(r,t0 )
e(r,t0 )
=
,
e(r,t)
e(r,t)

for all r, t,

which implies
e(r,t) = f (r)e(r,t) .
Uniformity of the motion (i.e. equivalence of the events P0 and R0 ) in Figure 5.7 implies
(R0 , S0 )
(P0 , Q0 )
=
.
(R, S)
(P, Q)
Use of (5.3) yields

e(r0 ,t0 )
e(r0 ,t)

==

e(r,t0 )
,
e(r,t)

for all r, t and hence

e(r,t) = A(r) S(t).


Finally, by redefining the r-coordinate [according to r = h(r) and then dropping the
tilde], we can set A(r) = 1. Thus we have
e(r,t) = S(t),

e(r,t) = f (r)S(t),
115

t = const
R

R0

S0

Q0

(r, , )

P0
(r0 , , )

t = t0

(r0 + r, , )
(r + r, , )

Figure 5.7: The distances between two pairs of fundamental observers.

and the line-element (5.3) becomes


ds2 = dt2 + S(t)2 [dr 2 + f (r)2 (d2 + sin2 d2 )].

(5.4)

The final consequence of assumptions I and II, is that the fundamental hypersurfaces orthogonal to the fundamental worldlines must be spaces of constant curvature, since otherwise
isotropy and homogeneity would be violated. The metric induced on these hypersurfaces t =
const. is


ds2(3) = S(t)2 dr 2 + f (r)2(d2 + sin2 d2 ) .
(5.5)
In order that this metric be regular at r = 0, we must have
f (0) = 0

(5.6)

(otherwise as r 0+ , the 2-spheres r = const. will not shrink to a point). A necessary and
sufficient condition for constant curvature is that the Riemann-Christofell tensor be of the
form
R = K(g g g g ), , , , = 1, 2, 3

where K is a constant as regards spatial dependence, but in this situation will depend on t,
i.e. on the particular fundamental hypersurface. [See Synge and Schild 1964, pages 111-113.]
This condition implies
R = 2Kg .
(5.7)
A straightforward calculation using the line-element (5.5) yields
R11

2f
,
=
f

R22 = f f f 2 + 1,
116

R33 = sin2 R22 .

Thus, equation (5.7) yields the conditions

2f
= 2KS 2 ,
f

f f f 2 + 1 = 2Kf 2 S 2 ,

which must hold for all values of t. Thus KS 2 must be a constant, and by rescaling S(t), r
and f (r) [see equation (5.5)], we can set
KS 2 = k = 1,

or 0,

(5.8)

depending on whether K is positive, negative or zero. The differential equations for f (r)
becomes
f + kf = 0, f 2 + kf 2 = 1,
(5.9)
after eliminating f from the second equation. The first equation has the general solution

k=0

r + ,
f (r) =

sin r + cos r,
k = +1

sinh r + cosh r, k = 1.

The boundary condition (5.6) implies = 0 in each case, and the remaining equation (5.9)
implies 2 = 1, i.e. = +1 without loss of generality.
To summarize, we have shown that the Cosmological Principle implies that the spacetime
metric and fundamental 4-velocity u have the form


ds2 = dt2 + s(t)2 dr 2 + f (r)2 (d2 + sin2 d2 ) , ua = 0a
(5.10)
where

f (r) =

r,

k=0

sin r,
k = +1

sinh r, k = 1.

The parameter k determines the sign of the curvature of the fundamental hypersurfaces
t = const.., zero, positive or negative depending on whether k = 0, +1 or 1 respectively.
The curvature K is given by (5.8) i.e.
K=

k
.
S2

The line-element (5.10) is referred to as the Friedmann-Robertson-Walker (FRW) lineelement. The arbitrariness in the FRW line-element is the function S(t) and the curvature
parameter k = 1, 0.
The function S(t) determines the distance between fundamental observers, at any specified time t. Let us evaluate the distance (t) between our galaxy (r = 0) and any other
galaxy r = r1 , = 1 , = 1 , as shown in Figure 5.8. Join the events O and P by the curve
= 1 , = 1 , lying in the hypersurface t = const. This curve is in fact a geodesic of the
117

spatial metric (exercise). The distance (t) between the two galaxies at time t is defined to
be the arc length along = 1 , = 1 . We thus obtain
Z P r
dx dx
d.
(t) =
g
d d
0

= 1

= 1
t = const
P

WL of another galaxy
(r = r1 , = 1 , = 1 )

WL of our galaxy
r=0

Figure 5.8: Distance between two fundamental observers, using spherical coorindates.
If we use r as parameter, this equation simplifies to
Z r1
(t) =
S(t)dr,
r=0

and hence
(t) = S(t)r1 .

(5.11)

Thus if S(t)
> 0 we will have an expanding universe, i.e. a universe in which distances
between fundamental observers increases with time.
Because S(t) determines spatial distances in this way, it is often referred to as the expansion factor of the universe, or the cosmic scale factor.
Comment: It is important to note that when k = 0, the expansion factor S(t) is not uniquely
related by S(t)
=
determined. If k = 0, two spacetimes with expansion factors S(t), S(t)
S(t), = const., are equivalent, since can be absorbed into the r-coordinate. This is not
possible when k = 1.
118

5.4

The Spatial Geometry of FRW Universes

The FRW line-element (5.10) induces a Riemannian metric on each fundamental hypersurface
t = const. This metric, which describes the spatial geometry of the model universe, is given
by


ds2(3) = S(t)2 dr 2 + f (r)2(d2 + sin2 d2 ) ,
(5.12)
where

k=0

r,
f (r) = sin r,
k = +1

sinh r, k = 1.

Here and have the usual ranges for spherical coordinates, 0 , 0 2.


The possible values of r (r 0 necessarily) in fact depend on k. In studying the spatial
geometry, it is helpful to consider 2-surfaces defined by r = const., which are 2-spheres of
surface area 4S(t)2 f (r)2 . For r = 0, this area is zero for all k. For k = 0, 1, this area
increases monotonically as r increases to , while for k = 1, the area reaches a maximum
when r = /2, and decreases to zero as r .
Case 1: k = 0
The geometry is clearly Euclidean and r takes on all positive values i.e. 0 r < +.
Case 2: k = 1
In this case the spatial metric at time t = t0 is
ds2(3) = b2 [dr 2 + sin2 r(d2 + sin2 d2 )],

(5.13)

where b = S(t0 ). On comparison with (1.66), we see that this metric is the metric of a
3-sphere of radius b.
It is straightforward to show using (5.13) that the volume of the 3-sphere is
ZZZ

V =
gdrdd = 2 2 b3 .
(5.14)
Note: Volume of a 3-sphere is analogous to surface area of a 2-sphere, which is analogous
to circumference
Z of a circle each quantity depends only on the intrinsic geometry and is

g.
calculated as
Case 3: k = 1
Here the r coordinate takes on all positive values. In this case the spatial geometry has
constant negative curvature, and in fact cannot be embedded in E4 , making it difficult to
visualize. It can however be embedded in flat Minkowski spacetime [ref. MTW (1972), p.
??].
119

Comment: When k = 0, 1, the fundamental hypersurfaces (i.e. 3-d space) are infinite in
extent. On the other hand, when k = +1, the fundamental hypersurfaces are finite in extent,
and their volume is given, according to (5.14) and (5.12) by
V (t) = 2 2 S(t)3 .

(5.15)

Thus in this case, the function S(t) can justifiably be called the radius of the universe.
But in all cases, the earlier terminology for S(t), i.e. expansion factor or cosmic scale
factor is appropriate.

5.5

The Cosmological Red-Shift and the Expansion of


the Universe
u(o)

null geodesic

k(o)

t = t0 , the present epoch

u(e)
k(e)

WL of our galaxy (r = 0)
WL of a distant galaxy (r, , = const)
Figure 5.9: Reception of light emitted by a distant galaxy.

In this section we relate the frequency shift of light from distant galaxies to the expansion
factor S(t). This leads to the first major result of relativistic cosmology.
120

We calculate the frequency shift by means of the general formula of Appendix B:


k(o)a ua(o)
0
=
,
(5.16)
e
k(e)a ua(e)
where e is the frequency of emission and 0 is the observed frequency (on earth). We perform
the analysis with the observer being in our galaxy, but because of the spatial homogeneity
the result will apply to any fundamental observer. Because of spherical symmetry about our
world line, the null geodesic which describes the light signal will be radial i.e. , = const.
Hence by (5.10), the null geodesic satisfies
L = 0 = t2 + S(t)2 r 2 ,
and hence

dt
= S(t),
dr
since r decreases as t increases. Thus a tangent to the null geodesic will be

k a = F ( 0a S(t)1 1a ,

(5.17)

where (t, r, , ) = (x0 , x1 , x2 , x3 ). Here F is a scale factor which must be chosen so that the
geodesic is affinely parameterized i.e.
Dk a
= 0, or equivalently k a;b k b = 0.
D
By the lemma of Appendix B, this will be satisfied if ka is a gradient. We have

ka = F a0 S(t)a1 .

By inspection, we see that F = S(t)1 is a suitable choice since then


Z


dt
1
1 0
ka = S(t) a a = a
+r .
x
S(t)
Again, by (5.10)

ua = 0a
a
a
at all events. Thus gab k(o)
ub(o) = S(t0 )1 , and gab k(e)
ua(e) = S(te )1 , and the formula (5.16)
yields
S(te )
0
=
.
e
S(t0 )
It follows that
0 < e if and only if S(t0 ) > S(te )

i.e. the universe is expanding if and only if the light from distant galaxies is redshifted.
The usual observational frequency shift parameter z is defined in terms of wavelength by
0 e
z=
e
which implies
S(t0 )
(5.18)
1+z =
S(te )
i.e. z > 0 corresponds to redshifts.
121

Discussion:
We have established the following result. If we assume (i) the Cosmological Principle (CP),
(ii) that light propagates along null geodesics, and (iii) that clock time along a TL curve
equals spacetime separation along the curve, then the observed1 redshift of light from distant galaxies implies that the universe is expanding in the sense that distances between
fundamental observers is increasing with time, at the present epoch. In brief
CP plus redshifts implies the universe is expanding
It is of interest that if one only assumes spherical symmetry about our galaxy (i.e. not
the full CP), then one cannot conclude that the universe is expanding. Indeed, under these
circumstances the observed redshifts are compatible with a static universe, i.e. no expansion.
The redshifts are then interpreted as gravitational in origin, rather than as Doppler shifts
due to the velocity of recession of the galaxies [see Ellis, Maartens and Nel (1980)].

5.6

The Distance Red-Shift Relation

Our aim is to derive a relationship between redshift z and the distance of the emitter. From
equation (5.18),
S(t0 ) S(te )
z=
.
(5.19)
S(te )
We expand S(t) about t0 , using Taylors formula:
0 )(t t0 )2 + .
0 )(t t0 ) + 1 S(t
S(t) = S(t0 ) + S(t
2

(5.20)

We introduce the constants


H0 =

0)
S(t
,
S(t0 )

q0 =

0)
S(t
,
H02 S(t0 )

(5.21)

called Hubbles constant, and the deceleration parameter, respectively. H0 has units of
(time)1 , while q0 is dimensionless. Clearly, the universe is expanding at the present epoch
if and only if H0 > 0, and the rate of expansion is decreasing if and only if q0 > 0.
In terms of these constants, (5.20) becomes


S(t) = S(t0 ) 1 + H0 (t t0 ) 21 q0 H02 (t t0 )2 + .
(5.22)

Equation (5.19) for z thus can be written


z=

H0 (te t0 ) + 21 q0 H02 (te t0 )2 +


.
1 + H0 (te t0 ) +

The largest observed redshift for a galaxy is z = 0.46, for the radio galaxy 3C295 in Bo
otes eg. for
the Oii line, e = 3727
A, 0 = 5447
A [Weinberg (1972), p. 447, MTW (1972), p. 775]. Recent progress
has yielded z 1. See Kron (1982) for a survey. Quasars have much larger redshifts, the maximum being
z = 2.88 for 4C05.34 [MTW (1972), p. 761]. There is still some controversy over whether quasar redshifts
are cosmological in origin.

122

On neglecting higher powers of t0 te , we obtain




z = H0 (t0 te ) 1 + 21 q0 H0 (t0 te ) + [1 + H0 (t0 te ) + ]

which yields

1
q
2 0


z = H0 (t0 te ) 1 +



+ 1 H0 (t0 te ) + .

(5.23)

The final step is to express (t0 te ) in terms of the distance between our galaxy and the
emitting galaxy.
There are a variety of ways of measuring this distance, all of which agree to first order.
We shall use the one which is simplest mathematically, namely, the spatial distance along
the hypersurface t = t0 , as calculated using the spacetime metric, as shown in Figure 5.10.
Astronomers use a luminosity distance, or angular diameter ( area) distance, since these
are more directly related to observation. See, for example Weinberg (1972), pages 418-424,
Rindler (1969) p. 243, for more details.
The spatial distance , at t = t0 is given by equation (5.11) i.e.
= S(t0 )re .
We can relate re to t0 te by using the fact that the light signal propagates along a null
geodesic. From (5.17) we have
dr
1
=
dt
S(t)

along the null geodesic,

and hence
Z

which implies

t0
te

dr
dt =
dt
re =

t0

te

t0
te

1
dt,
S(t)

1
dt.
S(t)

This integral can be evaluated approximately by using (5.22) and neglecting (t0 t)2 and
higher powers:
Z t0
[1 H0 (t t0 ) + ] dt
= S(t0 )re =
te

= (t0 te ) + 12 H0 (t0 te )2 + .

This can be inverted to yield


t0 te = 21 H02 2 + .
The expression (5.23) for z thus becomes
z = H0 1 21 H0 +


1+

123

1
q
2 0



+ 1 H0 +

t = t0

r = re

r=0

Figure 5.10: Spatial distance between the emitting and observing galaxies.

i.e.


z = H0 + 21 (q0 + 1)(H0 )2 + O (H0 )3 ,

(5.24)

(ref. MTW, p. 781).


This is the distance-redshift relation, which holds in any FRW model. We stress that it
holds independently of the gravitational field equations. To first order, (5.24) reads
z H0 ,
i.e. a linear distance-redshift relation, first discovered observationally by Hubble in 1929
[MTW p. 759, Weinberg (1972), p. 445]. Hubbles measurements only considered galaxies
out to 6 106 yrs., whereas galaxies cluster on scales up to 3 107 yrs., so in retrospect
his conclusion is not very convincing [Weinberg (1972), p. 445].
To summarize, the following assumptions:
1) spacetime with a Lorentzian metric, plus the clock hypothesis,
2) light signals propagate along null geodesics, and
3) the Cosmological Principle,
in conjunction with observations of the redshifts and distances of other galaxies enable us in
principle to determine H0 , Hubbles constant, q0 , the deceleration parameter, using (5.24).
This gives us the first two terms in the Taylor expansion for the expansion factor S(t).
124

But without additional constraints, these observations cannot fully determine S(t) and the
curvature parameter k, i.e. these observations do not determine a unique FRW model.
The additional constraints are supplied by the Einstein field equations (EFEs) i.e. the
EFEs are needed to determine the dynamics of the universe. This forms the basis of the
next section. But first a few words on H0 .
The Hubble constant H0 :
H0 has units of (time)1 , and is conventionally expressed in (years)1 . Alternatively, one
regards the redshift as a purely classical Doppler shift due to a velocity of recession v i.e.
v =1

e
.
0

e
Since z = 0
, we obtain, to first order in v, z = v, and so v H0 i.e. H0 is the
e
proportionality constant giving the dependence of velocity of recession on distance. Then
H0 is specified as km/sec Mpc, i.e. rate of change of velocity of recession with distance. Mpc
stands for megaparsec 3.26 106 years, the standard distance unit in cosmology [parsec
has its origins in classical astronomy].
Hubbles initial value was H0 500 km/sec Mpc. Subsequent revisions of the distance
scale have revised this downward, so that current bounds are

30 < H0 < 130 km/sec Mpc,


with the most popular value being
H0 55 km/sec Mpc,
i.e. the velocity of recession increases by 55 km/sec for each megaparsec of distance from us.
This value of H0 is equivalent to
H01 1.8 1010 years.

(5.25)

At present q0 , is not known reliably there is not even complete agreement that it is positive!
[See for example Liang & Sachs (1980) p. 337.]

5.7
5.7.1

The FRW Cosmological Models


The Einstein Field Equations applied to the FRW metric

In Section 5.2.1, we indicated that the matter content of the universe, considered on the
scale of galaxies, would be treated in the perfect fluid approximation. The mean motion
of this fluid is described by the 4-velocity ua , and its gravitational effects are described by

125

the energy density and a scalar pressure p (see Section 2.5). These quantities define the
energy-momentum tensor of the fluid according to (2.47):
Tab = ( + p)ua ub + pgab .

(5.26)

We now have to simplify the non-vacuum Einstein field (3.14)-(3.15): equations


Gab = 8Tab ,

(5.27)



ds2 = dt2 + S(t)2 dr 2 + f (r)2 (d2 + sin2 d2 ) ,

(5.28)

for the FRW line-element

with

f (r) =

r,

k=0

sin r,
k=1

sinh r, k = 1.

The 4-velocity is ua = 0a [see equation (5.10)]. A straightforward calculation yields


G00 = 3

S 2 3k
,
S2 S2

G11 = G22 = G33 = 2

S S 2
k
2 2.
S S
S

On the other hand equations (5.26) and (5.28) yield


T00 = ,

T11 = T22 = T33 = p.

Thus the Einstein field equations (5.27) are equivalent to


k
8
S 2
+ 2 =
,
2
S
S
3

(5.29)

and

S S 2
k
2 2 = 8p.
(5.30)
S S
S
The state of a perfect fluid is usually specified by giving the dependence of the pressure
p on the energy density , i.e. p = p(), an equation of state. For the current epoch of
the universe it is reasonable to simply assume p = 0. Given an equation of state, equations
(5.29) and (5.30) determine a DE for S(t), once k is specified. It is in fact convenient to
replace (5.30) as follows. Differentiate (5.29), and eliminate S using (5.30), to obtain
2

S
= 3( + p) .
S

(5.31)

This equation describes how the energy density varies along the fluid world lines, and can be
interpreted in terms of conservation of energy. Equation (5.31) can also be derived directly
from the equation
T ab;b = 0,
126

which is satisfied by T ab , as a consequence of the EFEs (5.27) [see equation (2.48)].


Once the equation of state is specified, equation (5.31) determines as a function of S,
which when substituted in (5.29) gives a first order DE for S(t).
In summary, the dynamics of the FRW models are described by the pair of equations
S 2
k
8
+
=
,
S2 S2
3
S
= 3( + p) ,
S

Friedmann equation,
conservation of energy.

(5.32)
(5.33)

Two other equations which follow from (5.29)-(5.31), are useful in studying the dynamics
of the FRW models. Firstly sum (5.29) and (5.30):
3

S
= 4( + 3p).
S

(5.34)

Secondly, it follows from (5.31) that

(S 2 ) = ( + 3p)S S.

(5.35)

Equation (5.34) is known as Raychaudhuris equation.


Comment: The EFEs (5.29) and (5.30) imply that = (t) and p = p(t). Thus spatial
homogeneity of and p does not have to be assumed.

5.7.2

The Big-Bang, and Future Evolution of the Universe

On physical grounds, one expects the energy density and pressure p to satisfy
> 0,

p 0.

Most of the results of this section only require the weaker restriction
+ 3p > 0.

(5.36)

The following theorems show that, subject to the Cosmological Principle and the Einstein
field equations (EFEs), the universe must be evolving in time and that the energy density was
infinite at a finite time in the past (big-bang initial singularity). In addition, the reciprocal
of the Hubble constant provides an upper bound for the current age of the universe. Finally,
it is shown that the present energy density 0 is rated to the spatial curvature parameter
k, which in turn determines whether the universe expands forever, or recollapses to a final
singularity.
Theorem 5.1: If the CP holds, and the energy density and pressure satisfy
+ 3p > 0,
127


then the EFEs imply that the universe is non-static (i.e. S(t)
6= 0).
Proof: Immediate consequence of (5.34).

Theorem 5.2: Suppose that the CP holds, and the energy density and pressure satisfy
+ 3p > 0,

(t0 ) > 0.

If the universe is expanding at the present time t0 , i.e. H0 > 0, then the EFEs imply that
at some finite time tB in the past i.e. tB < t0 , the expansion factor S(t) was zero, and the
energy density was infinite, i.e.
S(tB ) = 0,

lim = +.

tt+
B

< 0 for all t. Since H0 > 0, it follows from (5.21) that


Proof: Equation (5.34) implies S(t)
0 ) > 0. Thus S(t) is concave down and has positive slope for t < t0 (at least). Hence
S(t
there exists tB < t0 such that S(tB ) = 0. Equation (5.35) implies that
(S 2) < 0 for tB < t t0
i.e. S 2 is decreasing (as t increases), and since (t0 ) > 0, S 2 is positive for tB < t t0 . It
follows that
lim+ S 2 6= 0.
ttB

Since S(tB ) = 0, we obtain


lim = +.

tt+
B


Comment: Without loss of generality, we can define the origin of t so that tB = 0, i.e. S(t)
satisfies
S(0) = 0.
(5.37)
Then t0 , the value of t which defines the current epoch, equals the age of the universe, i.e.
the time since the initial singularity.
Theorem 5.3: If the conditions of theorem 5.2 hold, and the origin of t is chosen so that
S(0) = 0, then the age of the universe t0 satisfies
t0 <

1
.
H0

Proof: From figure 5.11, and the fact that tB = 0 by choice, it follows that t0
Comments:
128

(5.38)

1
H0

< 0. 

y
y S(t0 ) = S (t0 )(t t0 )
t = t0

1
H0

y = S(t)

t
t

tB

t0

Figure 5.11: Graph of the expansion factor S(t) as a function of time.

1) Theorems 5.2 and 5.3 are quite remarkable in that they are independent of the behaviour of the pressure [apart from the restriction (5.28)]. More specifically, no matter how large the pressure becomes as one goes into the past, a singularity cannot be
avoided, or postponed beyond a certain time. Indeed a large positive pressure becomes
counterproductive as regards prevention of a singularity as can be seen from equation
(5.34) S becomes more negative and so that singularity is reached sooner into the
past.
2) Equation (5.38) does lead to a confrontation between theory and observation. Indeed
in the 1940s, this led to the so-called time-scale problem. By studying the evolution
of stars, astrophysicists concluded that
t0 10 109 years.
On the other hand, at this time astronomers had determined
H01 3 109 years,
so that (5.38) was violated! Subsequently errors were discovered in the distance scale
whereby extragalactic distances were calculated, and this led to a revised value of H0 ,
with
H01 18 109 years,
which is compatible with (5.38).
129

3) In order to describe the future evolution of the universe, it is convenient to introduce


a dimensionless density parameter
0 =

80
3H02

(5.39)

where 0 = (t0 ) is the present energy density. The Friedmann equation (5.29) then
implies that
k
= H02 (0 1).
(5.40)
S02
Theorem 5.4: If the conditions of theorem 5.2 hold, and > 0, then the EFEs imply
that the universe recollapses to a singularity of infinite density at a finite time in the future,
if and only if k = +1 (3-sphere spatial geometry), or equivalently if and only if the density
parameter satisfies
0 > 1.
(5.41)

Proof: If k = 1 or 0, the Friedmann equation (5.29) implies that S 6= 0 for all t(> 0)
0 ) > 0 => S > 0 for all t(> 0). This means that the universe expands forever.
and so S(t
Suppose that k = +1. Since S 2 is a positive and decreasing function when S > 0 [see
(5.35)], it follows that there exists t1 > t0 such that
8
(t1 )S 2 (t1 ) = k.
3
1 ) = 0. Since S < 0 for all t, we have
The Friedmann equation (5.29) then implies S(t
1 ) < 0 for t > t1 , and it follows that there exists tF such that S(tF ) = 0. [This is
S(t
essentially a repeat of the argument leading to the existence of the initial singularity.]

It follows that the expansion factor S(t) has the form shown in Figure 5.12 in the 3 cases
k = +1, 0, 1:
Comment: When Einstein first attempted to construct cosmological models in GR, he as
sumed on philosophical grounds that the universe should be static i.e. S(t)
= 0, and also
that the spatial geometry should be that of a 3-sphere. Imagine his disappointment when he
discovered that his field equations had no solutions (Theorem 5.1)! This led him to modify
his field equations by the introduction of the cosmological constant ,
Gij + gij = 8G Tij .
When he later realized that his insistence on a static universe was wrong, he suggested that
be dropped. Nevertheless many authors still include . Its presence will obviously permit
a greater variety of models and will weaken the predictions of the EFEs [Refs. MTW p.
707, 758, Rindler (1969)].
130

S(t)
k = 1

Continuing Expansion

k=0

k = +1
Big Bang

Big Crunch
t

t0

t1

tF

Present Epoch
Figure 5.12: The expansion factor S(t) for the three cases k = 1, 0, 1.

5.7.3

Uniqueness of the Universe

Q: Do the Einstein field equations determine a unique FRW model?


A: No! There is essentially a 1-parameter family of FRW-models, depending on the density
of matter at a particular epoch. The situation is similar to solving the vacuum field
equations for a spherically symmetric metric. One obtains a 1-parameter family of
solutions, depending on the mass of the source.
Theorem 5.5: For a given equation of state p = p(), with + 3p > 0, of k = 1, the
Einstein field equations determine a 1-parameter family of FRW cosmological models, while
if k = 0, the Einstein field equations determine a unique FRW cosmological model.
Proof: For a given equation of state, the energy conservation equation can be formally
integrated to give
Z
Z
dS
d
= 3
+ p()
S
and hence
F () = 3n S + n M,
(5.42)
where M is the constant of integration, and F () is the antiderivative of
determines as a function of SM3 :
 
M
.
=
S3
131

1
.
+p()

This equation

Then the Friedmann equation (5.29) becomes


 
8
M
2
S =
S 2 k,

3
S3
which can be solved in the form
Z

dv

M
v3

= t,
v2

(5.43)

where we have used the initial condition S(0) = 0. We thus have a 1-parameter family of
solutions, with parameter M. In the case k = 0, however, we can absorb M by rescaling S
and we have a unique solution.

Comment: This theorem shows that when k = 1, the Einstein field equations (plus equation
of state) determine a 1-parameter family of FRW cosmological models, while if k = 0, one
obtains a unique model. In order to compare one of these models with observations, one has
also to determine the current age t0 of the universe, i.e. the time at which the observations
are made. There are thus two parameters that have to be determined from the observations
(that is, when k = 1; there is only one when k = 0). These are usually taken to be H0 and
q0 (or the density parameter 0 , see (5.39)). This will be spelled out in more detail when
we consider the zero pressure FRW models.

5.7.4

Restrictions on the Observability of the Universe: Particle


Horizons

The existence of the big-bang singularity at a finite time in the past, leads to possible
restrictions on the observability of the universe. In simple terms, it is possible that at the
current epoch, the universe is not sufficiently old for light to have had sufficient time to reach
us. In other words, it is possible that parts of the universe are unobservable to us at the
present time! We can derive a simple criterion for this to occur.
All that we have to do, is to find the equations of the past null geodesics through the
event P , which is our current position in the spacetime. If these null geodesics intersect
t = 0, the big-bang, at a finite value of r, say r = rh , then any galaxy with r = re > rh
cannot be observed by us.
The past null geodesics through P satisfy , = 0, due to the spherical symmetry, and
hence
dr
1
0 = L = t2 + S(t)2 r 2
=
.
dt
S(t)
Integrate along such a null geodesic from r = r1 to r = 0:
Z t0
Z 0
Z t0
dt
dt
r1 =
.
dr =
t1 S(t)
t1 S(t)
r1
132

WL of our galaxy
r = rh ( = 0)

r = rh ( = )

t = t0
present epoch

cross-section
of our particle
horizon

WL of a galaxy
that cannot be
observed by us

r=0

t = 0, big-bang curvature singularity

WL of a galaxy
observed by us

null geodesics which


generate the past
null cone at P

Figure 5.13: The particle horizon of a fundamental observer.

Thus if lim+
t1 0

t0
t1

dt
is finite, there will be a restriction on the observability of the universe.
S(t)

In this case, let


rh =

t0
0

dt
S(t)

(a convergent improper integral).

(5.44)

Then any galaxy with r = re > rh cannot be observed by us. The 2-sphere r = rh in
the hypersurface t = t0 is called a particle horizon. It is essentially the boundary of the
observable part of the universe at t = t0 .
We will show in the next subsection that in a zero pressure FRW model
S(t) t2/3

as t 0,

so that the improper integral does indeed converge, and a particle horizon exists.
Finally, we note that the spatial (metric) distance to the particle horizon from our position
is
Z t0
dt
h = S(t0 )rh = S(t0 )
.
(5.45)
S(t)
0
Refs.:
See Weinberg (1972), p. 489, for the case of p = 0 FRW models; Rindler (1969), p. 241.

133

5.7.5

FRW Cosmological Models with Zero Pressure

At the present epoch, it appears reasonable to assume that the pressure of the matter in the
universe is negligible compared to the energy density
p0 0 .
In this section, we idealize by assuming that
p=0
throughout the entire evolution of the universe.
The Friedman equation (5.32) and conservation of energy equation (5.33) assume the
form
k
8
S 2
+
=
,
s2
S2
3
S
+ 3 = 0.
S
The latter can be immediately integrated to give S 3 . We write this as
M
4
= 3,
3
S

M = const.,

(5.46)

for future convenience. The Friedman equation now assumes the form
2M
S 2 = k +
.
S

(5.47)

The cases k = 0, k = 1 have to be treated separately. When k = 0, we have freedom to


rescale S (see the comment at the end of Section 5.3). It is convenient to use this freedom
to set M = 29 . Then
4
S 2 =
9S
which integrates (assuming S > 0), to give
S = t2/3 ,
on redefining the origin of t so as to absorb the constant of integration. The resulting model
is


ds2 = dt2 + t4/3 dr 2 + r 2 (d2 + sin2 d2 ) ,
(5.48)
with

1
,
6t2

ua = 0a .

This model, which is called the Einstein-de-Sitter model, is the unique FRW model with
p = 0 (zero pressure) and k = 0 (Euclidean spatial geometry).
134

When k = 1, we can solve (5.47) in implicit form to obtain


(
(
M(1 cos )
M( = sin )
S=
, t=
,
M(cosh 1)
M(sinh )

k = +1
k = 1

(5.49)

in terms of a parameter . The graph of S at a function of t is of the general form given in


Figure 5.12. When k = +1, the graph is in fact one branch of a cycloid.
To summarize, the line-element (5.28), with S(t) given by (5.49) and given by (5.46),
gives the general FRW model with p = 0 and k = 1. There is a one parameter family of
models, labelled by M, in each case k = 1.
In order to compare an FRW cosmological model with the real universe, one has to specify
the present time t0 (i.e. the time elapsed since the big bang, which is the present age of the
universe). Thus when k = 1, we have a two-parameter family of models, labelled by M
and t0 , and when k = 0, a one parameter family of models, labelled by t0 .
There are, however, a variety of parameters which describe various aspects of the universe,
but which are not all independent. It is convenient to group them as relativity parameters
and observational parameters.
Relativity parameters:
physical dimensions
H0

Hubbles constant

T 1

k
S02

spatial curvature at present

T 2

energy density at present

T 2

t0

age of the universe

where
H0 =

0)
S(t
,
S(t0 )

S0 = S(t0 ),

0 = (t0 ).

Here we are using units such that c = 1, and G = 1.


Observational parameters:
physical dimensions
T 1

H0

Hubbles constant

q0

deceleration parameter

density parameter

time parameter,

135

dimensionless

where

0)
S(t
,
S(t0 )H02
80
0 =
,
3H02

q0 =

(5.50)
(5.51)

and
0 = t0 H0 .

(5.52)

On account of Theorem 5.3, we have


0 < 0 < 1.
The spatial curvature is related to the observational parameters by (5.29), i.e.
k
= H02(0 1).
2
S0

(5.53)

Equations (5.51)-(5.53) relate the relativity and the observational parameters.


From our earlier discussion, we know that when the EFEs hold, only two of the observational parameters can be independent (only one if k = 0). The actual relationships are
summarized below.
Theorem 5.6: In any FRW cosmological model the Einstein field equations plus equation
of state imply that there are two independent observational parameters if k = 1, and one
independent observational parameter if k = 0. In a pressure-free model, the observational
parameters are related as follows:
0 = 2q0 ,
Z 1
q
0 =
0

with 0 = 1 and 0 =

2
3

if k = 0.

(5.54)
du
1 0 +

0
u

(5.55)

Proof: Equation (5.54) follows from (5.34) with p = 0, and (5.50) and (5.51). Equation
(5.55) is a consequence of (5.47), as follows.
Evaluate (5.46) at t = t0 , to obtain
M=

40 S03
,
3

and substitute into (5.47). Also eliminate k from (5.47) using (5.53). Let u =
yields

u 2 = H02 1 + 0 + u0 .

S
.
S0

With S > 0, this can be integrated from an arbitrary time t to the present time t0 :
Z t0
Z 1
du
q
H0
dt =
.
t
S/S0
1 + 0 + u0
136

This
(5.56)

Since the origin of t is fixed by S(0) = 0 [see (5.37)], we let t 0+ and obtain (5.55).

Discussion:
(1) It follows from (5.55) and (5.53) that

k = +1 0 > 1 0 < 2/3

k=0
0 = 1 0 = 2/3

k = 1 0 < 1 2/3 < 0 < 1

(5.57)

We see that 0 = 1 is a critical value of 0 . The corresponding value of 0 is


crit
=
0

3H02
,
8

as follows from (5.51). If 0 > crit


(i.e. 0 > 1), there is sufficient matter to close the uni0
verse spatially, and sufficient matter to make the universe eventually recollapse. If 0 crit
0
there is insufficient matter to close the universe, and insufficient matter to make the universe
recollapse.
(2) The relationships (5.54) and (5.55) between the observational parameters represent
predictions of the Einstein field equations and the Cosmological Principle. These predictions
can be tested using observations, since in principle the four parameters can be determined
(or at least estimated) independently from observations:
q0 , H0

red-shift observations

measure the luminosity density in a suitably large region in our neighbourhood, and measure a representative mass-to-light ratio

one can obtain a lower bound for t0 and hence 0 from various age
measurements, e.g. age of the earth and age of the stars.

[see for example, Gunn (1977), MTW, pages 780-797.]


At present the parameters have not been determined with sufficient accuracy to provide
a decisive confrontation between theory and observation. But if in the future (5.54) or
(5.55) are violated it would imply that either the Cosmological Principle or Einsteins field
equations have to be modified!
In Section 5.6, we derived a distance-red-shift relation in the form of a series expansion,
which was valid in any FRW model. Once the expansion factor S(t) has been determined by
the field equations, one can derive a distance-red-shift relation in closed form. We do this
now for the pressure-free FRW models of this section.
From (5.18) and Section 5.6, we have that
Z t0
S(t0 )
dt
1+z =
,
re =
,
= S(t0 )re .
S(te )
te S(t)
137

By performing a change of variable in the integral, we obtain


re =

S(t0 )

1
1
dS =

S(t0 )
SS

S(te )

1
S(te )/S(t0 )

du
,
uu

where u = S/S0 . On making use of (5.56), and the fact that = S(t0 )re , we obtain
=

H01

Another change of variable u =

1
1+w

1
(1+z)1

yields

H01

du
p
.
2
u (1 + 0 ) + u0

dw

,
(1 + w) 1 + 0 w

(5.58)

which gives the desired relationship between spatial distance and redshift z.
This exact relationship reduces to (5.24) for small z. [See Weinberg (1972), p. 485, who
uses a different r-coordinate.]
It is perhaps appropriate to conclude this section with a quotation:
In a discussion of cosmology, the life of the universe is soon reduced to a few
simple equations and some numbers. To compare such elegance with an incredibly
complex reality, from which, using a few photons, observers are asked to provide
observational results, is ridiculous mainly because there are too many details
between an observation and a fact.
Greenstein (1980)

5.7.6

The Einstein-de-Sitter Model

In this section we discuss certain aspects of the k = 0 zero pressure FRW model, i.e. the
Einstein-de-Sitter model, in more detail. The model is

with



ds2 = dt2 + t4/3 dr 2 + r 2 (d2 + sin2 d2 ) ,

1
,
ua = 0a .
2
6t
[see equation (5.48).] The spatial geometry is Euclidean. The model starts at a big-bang
singularity at t = 0, and expands indefinitely, with the density at the present time being
equal to the critical density (ie. 0 = 1). The present age of the universe in this model is
related to the Hubble constant according to
=

t0 = 32 H01,
[see (5.57) and (5.52)].
138

(5.59)

The distance-redshift relation (5.58) assumes the simple form




1
1
,
= 2H0
1
1+z

(5.60)

which can be inverted to give




1 14 (H0 )
z = (H0 ) 
2 .
1 12 (H0 )

(5.61)

Since S(t) = t2/3 , there is a particle horizon given by r = rh , where


Z t0
dt
1/3
rh =
= 3t0 ,
2/3
t
0
[see equation (5.44)]. The spatial distance to the horizon at the present time is
h = S(t)rh = 3t0 .
Using (5.59), we can write this as
h = 2H01 3.6 1010

light years,

using the value of (5.25) for H0 . Note that it follows from (5.60) that
lim z = +,

as might be expected. We illustrate the horizon with a spacetime diagram in Figure 5.14
drawing t, r as Cartesian coordinates. However, since r 0, events to the right of the t-axis
have = 0, while those to the left have = .
t
particle horizon

P
t = t0
1

r = 3 t03

( = )

r = 3 t03

WL of our galaxy

( = 0)


 1
1
r = 3 t03 t 3

past light cone at P


r

t = 0, big-bang curvature singularity

Figure 5.14: The past light cone and the particle horizon in the Einstein-de-Sitter model.

139

For the maximum observed redshift for distant galaxies, z = 0.46 (see footnote 1), it
follows from (5.60) that


1
1
1
= 2H0
2H01(.17) = .17 h .
1.46
In other words, observations of distant galaxies extend to approximately 51 of the distance
of our particle horizon (in this particular model).
The final feature that we discuss, and which leads to a potentially observable phenomenon, is the refocusing of our past light cone.
Our past light cone is given by


1/3
r = 3 t0 t1/3 ,
with vertex at r = 0, t = t0 . The intersection of this light cone with hypersurface t = const.
is a 2-sphere


1/3
r = 3 t0 t1/3 , t = constant.
The area A(t) of this 2-sphere is

4/3

A(t) = 36 t
since the line-element on the 2-sphere is

2

1/3
1/3
t0 t
,

(5.62)

ds2(2) = t4/3 r 2 (d2 + sin2 d2 ),


and
A(t) =

ZZ

g(2) dd.

It is easily verified that this area has an absolute maximum for


t=

8t0
.
27

Thus as one moves into the past, the cross-sectional area of the light cone increases until
t = 8t270 , and thereafter it decreases to 0 as t 0+ . The maximum area is
Amax =

64t20
.
81

We can calculate the angle subtended at our position P by an object of spatial extent
. It is straightforward to show that
=

H0 (1 + z)3/2
.
2 [(1 + z)1/2 1]

(5.63)

For objects of fixed spatial extent , the angle reaches a minimum for z = 5/4 (corresponding to t = 8t270 ), and thereafter increases as z increases. This is a potentially observable
effect.
140

particle horizon

t=0

r = 3 t03

past light cone

t=

8t0
27

t=0
BIG BANG
Figure 5.15: The refocussing of the past light cone in the Einstein-de-Sitter model.

5.7.7

Summary of Observational Data

Evidence for Isotropy about our Galaxy


(1) Distribution of galaxies on the sky is isotropic to 30% (out to 2,200 Mpc).
(2) Hubble constant is isotropic to 25%.
(3) Radio source distribution is isotropic to 5%.
(4) Microwave background radiation is isotropic to 0.1%.
(5) X-ray background radiation is isotropic to 5%.
[Ref.: MacCallum (1979), p. 534, Ellis (1980), p. 149, MacCallum (1973), p. 68, compare
differences.]
Evidence for Spatial Homogeneity
Ellie (1980) Considered overall, the observational verification of spatial homogeneity at large distances does not seem to be possible either by direct or
indirect observational data.
[p. 153 of this ref.]
141

Comment:
Observations certainly indicated spatial inhomogeneities on length scales .
7
Mpc 3 10 years, corresponding to clusters of galaxies. However, it is possible that the
universe it spatially homogeneous on a scale of 108 light years, i.e. (t) is the energy density
when one averages over cubes of side 108 light years, and spatial homogeneity means that
(t) will have the same value for any such cube at time t. But we have seen that particle
horizons restrict the size of the observable universe, and in the Einstein-de-Sitter model the
distance to the horizon is 3.6 1010 light years. Thus the observed inhomogeneities are
.1% of the present horizon distance, and so there are not many orders of magnitude left
for averaging over. The situation might, in fact, be worse. A very recent redshift survey
[Kirshner, Oemler, Schechter and Schectman (1981)] suggest inhomogeneities on a scale of
50-100h1 Mpc where h = 12 if H0 = 50 km/sec Mpc. They have detected what appears to
be a huge void (empty space) of side 100 Mpc 3 108 light years. See also Gregory and
Thompson (1981).
Evidence for the Observational Parameters
H0

: probably known to within a factor of 2

q0

: very uncertain

: The date on 0 is, with a few nagging loose ends, which can and, I
am sure, will be straightened out . . ., supportive of the conclusion that
the mean density in the universe is more than an order of magnitude
below the critical density, Gunn (1977).

: uncertain, but at least plausible (which was not always the case)

Ref.: Gunn (1977), MTW, p. 797, Liang and Sachs (1980), p. 336.]
Remark 5.2: The microwave background radiation is probably the most important observational data from a cosmological point of view, apart from the redshift observations. This
radiation was first discovered by accident by Penzias & Wilson (1965), though ironically
it had been predicted as a consequence of the big-bang, in the 1940s (Gamow (1948)).
The situation is as follows: microwave radiometers (wavelength = 20cm 1mm) detect a
background radiation with the following properties:
1) its spectrum is that of a black-body of temperature T = 2.7 K,
2) it is isotropic to within 1 part in 1000, and
3) it is extragalactic in origin.
The conventional interpretation is that this radiation was last scattered at redshift z
1400, so that it probes the early stages of the universe.

142

Appendix A: Calculation of the


Christoffel Symbols using the
Euler-Lagrange Equations
For a Lagrangian of the form L = gij x i x j , the Euler-Lagrange expressions, which are defined
by


L
d L
i,
Ei (L) =
i
d x
x

assume the form

Ei (L) = 2gij (
xj + kj x k x ),
where denotes differentiation with respect to the curve parameter . Thus by explicitly
writing the Euler-Lagrange equations in the form
xj + j k x k x = 0,
one can read off the Christoffel symbols. This method has the advantage that the Christoffel
symbols which are zero are not explicitly calculated.
Example: Calculate the Christoffel symbols for the 2-d metric tensor defined by
ds2 = 2dudv + f (u)dv 2,
or equivalently


0
1
(gij ) =
1 f (u)
with
x0 = u,

x1 = v.

Solution: The Lagrangian is


L = 2u v + f (u)v 2.
Hence

L
= 2v,

L
= f (u)v 2.
u

143

Thus the first Euler-Lagrange equation is


v 21 f (u)v 2 = 0,

(A1)

from which we infer


010 = 011 = 0,

111 = 21 f (u).

Secondly

L
= 2u + 2f (u)v,

v
Thus the 2nd Euler-Lagrange equation is

L
= 0.
v

2
u + 2f (u)
v + 2f (u)u v = 0,
which by means of (A1) can be simplified to
u + 12 f (u)f (u)v 2 + f (u)u v = 0,
from which we infer
000 = 0,

001 = 21 f (u),

101 = 21 f (u)f (u).




144

Appendix B: General Frequency Shift


Formula
Theorem: Let C1 , C2 be the worldlines of two observers, which can be joined by a 1parameter family of null geodesics. Let u(1) , u(2) be the 4-velocities of the observers, and let
k(1) , k(2) be the tangent vectors to the null geodesics at the points at which they intersect
C1 , C2 , respectively. Let 1 be the frequency of a signal (as measured by C1 ) sent by C1 to
C2 , and let 2 be the frequency of the received signal (as measured by C2 ). Then
u(2)
k(2)

u(1)
k(1)
C2

C1

Figure B1: Light signals transmitted from one observer to another.

ua(1) k(1)a
1
= b
.
2
u(2) k(2)b

(B1)

Proof: Let s denote an affine parameter along all the null geodesics, chosen so that s = 0
on C1 and s = 1 on C2 . Let 1 , 2 denote time as measured along C1 , C2 , respectively. Then
the null geodesics can be described by equations of the form
xa = f a (1 , s)

145

(B1)

[1 labels the different null geodesics]. The geodesics associate with each value of 1 on C1 a
value of 2 on C2 and hence define a function
2 = F (1 ).

(B3)

The ratio of frequencies 1 /2 is simply the rate of change of 2 with respect to 1 :


d2
1
=
.
2
d1

(B4)

Since 1 is time along C1 , and s = 0 on C1 , it follows from (12.2) that


ua(1) =

f a
(1 , 0).
1

(B5)

Now C2 can be described by


xa = g a (2 ).]
But C2 is also given by (B1), with s = 1. Thus
g a (2 ) = f a (1 , 1),
where 2 and 1 are related by (B3). It follows that
ua(2)

f a
d1
dg a
=
(1 , 1)
.
d2
1
d2

(B6)

The tangent vector k a (1 , s) to the null geodesics is given by


da (1 , s) =

f a
(1 , s),
s

and satisfied

Dk a
= 0.
Ds
Using the above equations, it follows (exercise) that


f a
(1 , s)ka (1 , s) = 0.
s 1

Thus, setting s = 0, 1, respectively,


f a
f a
(1 , 0)k(1)a =
(1 , 1)k(2)a .
1
1

(B7)

On combining (B5), (B6) and (B7), it follows that


ua(2) k(2)a ua(1) k(1)a

d1
,
d2

from which the required result follows.



Remark: The preceding theorem can be used to derive:
146

(1) the red-shift formula for the solar system


(2) the relativistic Doppler shift formula in SRT
(3) the cosmological red shift formula.
a

The following result facilitates the application of the theorem of this appendix. If k a = dx
d
is tangent to a null geodesic xa = xa (), the parameter is said to be an affine parameter if
Dk a
= k a ,
D

a scalar.

The following lemma enables one to adjust the scaling of k a so that is an affine parameter.
Let k a be a null vector field, which is tangent to null geodesics. Then by the Chain Rule
for covariant differentiation
Dk a
dxb
= k a;b
= k a;b k b .
D
d
Thus the parameter is a affine parameter iff k a;b k b = 0.
Lemma: Let k a be a null vector field, i.e. k a ka = 0. If ka equals a gradient, i.e.
ka =

f
,
xa

for some function f , then k a;b k b = 0 i.e. the parameter of the geodesics is an affine parameter.
f
Proof: ka = x
a = f;a ka;b = f;ab = f;ba = kb;a
Contract with
k b ka,b k b = k b kb;a .

(B8)

But
(k b kb );a = k b;a kb + k b kb;a = 2k b kb;a .
Since k b kb = 0, it follows that k b kb;a = 0, and hence equation (B8) gives the desired result
ka;b k b = 0.


147

Bibliography
[1] Adler, R., Bazin, M. and Schiffer, M. (1965): Introduction to General Relativity,
McGraw-Hill, New York, 1965 [AC6.A33].
[2] Alvager, T. et al (1964): Test of the Second Postulate of special Relativity in the GeV
Region, Physics Letters 12, 260-262.
[3] Bergman, P.G. (1942): Introduction to the Theory of Relativity, Prentice Hall, Englewood Cliff, New Jersey, 1942 [AC6.B45].
[4] Birkhoff, G. and MacLane (1961): A Survey of Modern Algebra, MacMillan, NY.
[5] Blanco, V.M., and McCuskey, S.W., (1961): Basic Physics of the Solar System, AddisonWesley, Reading, Mass. [QB501.B5].
[6] Bondi, H. (1952): Cosmology, Cambridge University Press.
[7] Bondi, H. (1964): Relativity and Common Sense [QC6.B62].
[8] Breitenberger, E. (1971): On Empirical Foundations of Special Relativity, Il Nuovo
Cimento 1B, 1-22.
[9] Brickell, F. and Clark, R. (1970): Differentiable Manifolds: an Introduction, V.N. Reinhold Company, London [QA614.3 B74].
[10] Brouwer, D. and Clemence, G.M. (1961): Methods of Celestial Mechanics, Academic
Press, NY [QB351.B7].
[11] Carter, B. (1974): Large number coincidences and the Anthropic Principle in Cosmology, in Longair (1974).
[12] Dicke, R.H. (1964): The Theoretical Significance of Experimental Relativity, Gordon
and Breach, NY [QC6.D476].
[13] Ehlers, J. (1972): Article in Relativity, Astrophysics, and Cosmology, ed. W. Israel,
W. Reidel, Boston [QB460 R44 1972].
[14] Ellis, G.F.R. (1971): Relativistic Cosmology, in Sachs (1971).

149

[15] Ellis, G.F.R. (1975): Cosmology & Verifiability, Q. Journal R. Astr. Soc. 16, 245-264.
[16] Ellis, G.F.R. (1980): Limits to Verification in Cosmology, Annals of the New York
Academy of Sciences 336, 130-160.
[17] Ellis, G.F.R, R. Maartens and S.D. Nell (1978): The Expansion of the Universe, Mon.
Not. R. Astr. Soc. ]bf 184, 439-465.
[18] Fowler, (1968): in Robertson & Noonan (1968).
[19] Gamow, G. (1948): The Evolution of the Universe, Nature 162, 680-682.
[20] O. Gingerich (1977) editor: Cosmology +1, Readings from Scientific American, W.H.
Freeman, San Francisco.
[21] Goldstein, H. (1965): Classical Mechanics, Addison-Wesley, Reading, MA [QA805.G6].
[22] Gregory, S.A. and Thompson, L.A. (1981): Superclusters and Voids in the Distribution
of Galaxies, Scientific American, page.
[23] Gunn, J. (1977): Observational Tests in Cosmology, Caltech Preprint (published?).
[24] Greenstein, J. (1980): The Evidence for a Universal Helium and Deuterium Abundance,
Physica Scripta 21, 759-768.
[25] Hawking, S. and Ellis, G.F.R. (1973): The large scale structure of spacetime, Cambridge
University Press.
[26] Isaak, C.R. (1969): Improved Limit on the Absence of Dispersion of Velocity of Light,
Nature 23, 161.
[27] Kirschner, R.P., Oemler, A., Schechter, P.L. and Schectman, S.A. (1981): A Million
Cubic Megaparsec Void in Bootes, Astrophysical Journal 248, L57-L60.
[28] Kron, R.G. (1982): The Most Distant Known Galaxies, Science 216, 265-269.
[29] Liang, E.P.T. and Sachs, R. (1980): Cosmology, in General Relativity and Gravitational, Vol. II, edited by A. Held, Plenum Press, London and NY.
[30] Longair, M. (1974) (editor): Confrontation of Cosmological Theories with Observational
Data, IAU Symposium No. 63, D. Reidel, Holland.
[31] Lovelock D. and Rund, H. (1975): Tensors, Differential Forms and Variational Principles, John Wiley and Sons, NY, 1975.
[32] Lyons, H. (1957): Atomic Clocks, Scientific American 196, 71-82.

150

[33] MacCallum, M.A.H. (1973): Cosmological Models from a Geometrical Point of View,
in Cargese Lectures in Physics, edited by E. Schatzman, Gordon & Breach, NY, Vol.
6.
[34] MacCallum, M.A.H. (1979): Anisotropic and Inhomogeneous Relativistic Cosmologies,
in General Relativity, an Einstein centenary survey, edited by S.W. Hawking, and W.
Israel, Cambridge University Press.
[35] McVittie, G.C. (1965): General Relativity and Cosmology, The University of Illinois
Press, Urbana [QC6.M3247].
[36] Misner, C.W., Thorne, K. and Wheeler, J. (1973): Gravitation, W.H. Freeman, San
Francisco.
[37] Moller, C. (1972): The Theory of Relativity, 2nd Edition, Clarendon Press, Oxford,
1972.
[38] Murray, F.J. and Miller, K.S. (1954): Existence Theorems for Ordinary Differential
Equations, New York Press [QA371 M985].
[39] Pauli, W. (1958): theory of Relativity, Pergamon Press, New York 1958 [QC6.P323].
[40] P.J.E. Peebles (1971): Physical Cosmology, Princeton University Press.
[41] P.J.E. Peebles (1980): Large Scale Structure of the Universe, Princeton University Press.
[42] Penrose, R. (1968): Structure of Space-Time, article in Battelle Recontres, 1967
Lectures in Mathematics and Physics, edited by C. DeWitt and J. Wheeler, W.A.
Benjamin, NY.
[43] Penzias, A.A. and Wilson, R.W. (1965): A Measurement of excess antenna temperature
at 4080 Mc/s, Astrophysical Journal 142, 419-421.
[44] Resnick, R. (1972):
[QC6.R3877].

Relativity and Early Quantum Theory, John Wiley, NY

[45] Rindler, W. (1966): Special Relativity, Oliver and Boyd, Edinburgh and London
[QC6.R48].
[46] Rindler, W. (1969): Essential Relativity; Special, General and Cosmological, Van Nostrand Reinhold Co., NY [QC6.R477].
[47] Robertson, H.P. and Noonan, T.W. (1968): Relativity and Cosmology, W.B. Saundres
Company, Philadelphia, 1968 [QC6.R54].
[48] Rosser, W.G.V. (1967): Introductory Relativity, Plenum Press, NY, 1967 [QC6.R573].

151

[49] M.P. Ryan and L.C. Shepley (1975): Homogeneous Relativistic Cosmologies, Princeton
Unviersity Press.
[50] Sachs, R.K. (1971) editor: General Relativity and Cosmology, Varuna Lectures, Academic Press, NY.
[51] Schwartz, H.M. (1968): Introduction to Special Relativity, McGraw-Hill Book Company,
NY, 1968 [QC6.S435].
[52] Sciama, D.W. (1971): Modern Cosmology, Cambridge University Press.
[53] Skinner, R. (1969): Relativity, Blaisdell Publishing Company, Waltham, MA, 1969.
[54] Sternberg, W.J. and Smith, T.L. (1952): The Theory of Potential and Spherical Harmonics, University of Toronto Press, Toronto [QA825.583].
[55] Synge, J.L. and Schild, A. (1949): Tensor Calculus, Dover Publications.
[56] Synge, J.L. (1960): Relativity: the General Theory, North Holland Publishing Company,
Amsterdam, 1960 [QC6.S93].
[57] Synge, J.L. (1972): Relativity: the Special Theory, 2nd Ed. North Holland Publishing
Company, Amsterdam, 1972 [QC6.S93].
[58] Synge, J.L. and Schild, A. (1964): Tensor Calculus, Toronto University Press.
[59] Taylor, E.F. and Wheeler, J.A. (1966): Spacetime Physics, W.H. Freeman and Company, 1966.
[60] Tolman, R.C. (1950): Relativity, Thermodynamics and Cosmology, Clarendon Press,
Oxford, 1950 [QC6.T67].
[61] Weinberg, S. (1972): Gravitation and Cosmology, John Wiley & Sons, NY.
[62] Weyl, H. (1922): Space, Time and Matter, Dover Publications.
[63] Wilson, R.W. (1980): History of the Discovery of the Cosmic Microwave Background
Radiation, Physica Spripta 21, 599-605.

152

Problem Sets

153

Problems for Chapter 1


1. (a) Calculate the components of the metric in E2 relative to polar coordinates.
(b) Calculate the components of the metric in E3 relative to spherical polar coordinates. Hence obtain the components of the (2 d) metric induced on the sphere
x2 + y 2 + z 2 = r 2 ,

r = constant.

2. Let Vn be an n-dimensional space with metric tensor having components gij (xk ) relative
to coordinates {xk } i, j, k = 1, . . . , n. Define the Christoffel symbols of the first and
second kinds and verify Riccis Lemma i.e., that the metric tensor components have
zero covariant derivative: gij;k = 0.
3. The Einstein tensor is defined by
Gij = Rij 21 Rgij
where the Rij is the Ricci tensor and R the scalar curvature. Use the Bianchi identities
to prove that Gi j;i = 0.
4. For the metric tensor with components
(gij ) = diag(e(r) , e(r) , r 2 , r 2 sin2 )
relative to local coordinates (t, r, , ) = (x0 , x1 , x2 , x3 ) calculate the non-zero Christoffel symbols of the second kind. (We will need the results of this calculation later. This
type of calculation can also be performed on the computer.)
5. Let Ai (xj ) be the covariant components of a vector field in a Vn . Prove that the
functions
Aj
Ai
Fij =
j
i
x
x
form the components of an antisymmetric tensor field of rank 2. (In relativity theory,
electromagnetic fields are described by a tensor of this type.)
[Hint: Consider the expression Aj;i Ai;j ]
6.

a) Verify that Pi j Pj k = Pi k where


Pij = gij + Vi Vj

and Vi V i = 1.

b) For any vector Aj define


Aj = P jk Ak .
Prove that Aj is orthogonal to V J .

154

7.

a) Prove that
jjk =

log g,
k
x

where
g = | det(gij )|.
b) Prove that if V i are the components of a vector field then
1 i
gV ,i .
V i;i =
g
c) Prove that if F ij = F ji are the components of an antisymmetric tensor, then
1 ij 
gF ,j .
F ij;j =
g
8. Consider a surface of revolution in E3 defined parametrically by
x = f (r) cos ,

y = f (r) sin ,

z = g(r).

a) Prove that the metric induced on this surface is




ds2 = f (r)2 + g (r)2 dr 2 + f (r)2 d2 .

b) Derive an expression for the Gaussian curvature K of this surface of revolution.


Hint: The Gaussian curvature of a 2-surface in E3 is related to the RiemannChristoffel tensor according to
K=

R1212
,
g

where g = det(gij ). Further, for a diagonal 2-d metric, one can verify that
"


 #
1
1
1
R1212 = 2 g
+ g11,2
.
g22,1
g
g
,1
,2

155

Problems for Chapter 3


1. For the weak field metric
g00 = 1 + h00 + O(2 ),

g = + h + O(2),
verify that
R00 = 21

2 h00
+ O(2)
x x

= 1, 2, 3 sum over .
2. Consider the 2-d spacetime R2 with non-flat Lorentzian metric
ds2 = dt2 + exp(2t/b)dx2 ,

b > 0,

const.

The future direction is defined by the statement that t increases into the future.
a) Verify that the curves x = const. are TL geodesics.
b) A light signal is emitted at an event (t0 , x0 ) with x0 > 0 decreasing initially. Prove
that if x0 > b exp(t0 /b), the light signal is never received by the observer O with
worldline x = 0.
c) The result b) means that O cannot have knowledge of the whole 2-d spacetime.
Find the equation of the boundary of the set of events from which O can receive
light signals. This boundary is called the event horizon of the observer O. Draw
a spacetime diagram.
d) Can an inertial observer in flat spacetime have an event horizon? Can a noninertial observer in flat spacetime have an event horizon? Give examples if possible.

156

Problems for Chapter 4


1. Consider the 4-d Lorentzian metric defined by
ds2 = e2(r) dt2 + e2(r) dr 2 + r 2 (d2 + sin2 d2 )
where (r), (r) are arbitrary (C 2 ) functions of the coordinate r. Youve already
calculated the Christoffel symbols for this metric. Using these expressions, grind out
the components Rij of the Ricci tensor.
2. Consider an observer with WL r = r0 > 2m, = 0 , = 0 in the Schwarzschild
spacetime. A test particle released from rest by such an observer will have a TL
geodesic with = 0 , = 0 (constant) as its WL. Prove that the test particle is
attracted to the source (i.e. that r decreases into the future along its WL) if an only
if the constant m in the metric is positive.
3. By performing a suitable coordinate transformation (r, , ) (x, y, z) show that the
Schwarzschild line-element can be written in the form

2

1 m/2R
m 4
2
ds =
dt2 + 1 +
(dx2 + dy 2 + dz 2 ),
1 + m/2R
2R
with

R2 = x2 + y 2 + z 2 .
4. A geodesic in the Schwarzschild metric is said to be circular if r = r0 > 2m, = 0
along the geodesic
(i) Prove that = /2 along any circular geodesic.
(ii) Prove that a circular geodesic is TL if r0 > 3m, null if r0 = 3m and SL if
3m > r0 > 2m.
5. A particle falls freely in the Schwarzschild spacetime in a radial direction (i.e. , =
constant). Assume that the particle was released from rest at r = . Show that its
WL is



(r/2m)1/2 + 1
4m  r  12  r
+ constant
+ 3 + 2m log
t=
3 2m
2m
(r/2m)1/2 1
and that time along the WL is

4m  r 3/2
+ constant.
3 2m

6. Prove Birkhoff s theorem: The Schwarzschild solution is the only spherically symmetric
solution of the vacuum Einstein field equations. You may assume that the metric in a
spherically symmetric spacetime can be written in the form
ds2 = e2(r,t) dt2 + e2(r,t) dr 2 + r 2 (d2 + sin2 d2 ).
157

7.

(i) Find the form of the Schwarzschild line-element relative to the coordinates (v, r, , )
where

 r
1 .
v = t + r + 2m log
2m

(ii) Verify that the vector field k defined by k i = 1i (r = x1 ) in these coordinates is a


null vector field.
(iii) Describe the nature (i.e. is it TL, SL or null) of the vector field defined by
i = 0i (v x0 ), where r is permitted to take on values 0 < r < .
(iv) Calculate the components of the inverse metric in this coordinate system.
(v) Determine whether the normal vector to the hypersurfaces v = constant is TL,
SL or null.
(This form of the Schwarzschild metric relates to the theory of black holes.)
8. Find all null geodesics of the 2-d Lorentzian metric


2m
2
ds = 1
dv 2 + 2dvdr
r
(there are special cases which are easy to miss!)
9. Contribution of the spatial geometry to the perihelion advance
Consider the Lorentzian metric


2m
2
dt2 + dr 2 + r 2 (d2 + sin2 d2 ),
ds = 1
r

m = constant

whose associated spatial metric, defined by t = const., is flat.


2

(i) Show that the orbital DE is ddu2 + u = hm2 2 (1 2mu)2 , where the symbols have
the same meaning as in the notes. For planetary orbits, is very close to unity
and mr 1. Hence the DE can be approximated by
m 4m2 u
d2 u
+
u
=
+
.
d2
h2
h2
4m
(ii) Show that this DE leads to a perihelion advance of = a(1e
2 ) radians per rev.
which is 2/3 the usual value (Hint: In this case the DE can be solved exactly).
Hence one regards 1/3 of the total perihelion advance as being due to the curvature
of the spatial geometry.

10. The planet Jupiter (mass = 1.91 1027 kg) has a satellite Io (mass = 7.28 1022 kg)
that has a slightly elliptical orbit with orbital period of about 1.52 105 sec. and
mean orbital radius of about 4.13 108 m. Calculate the rate of precession of Ios orbit
according to general relativity.
158

11. Derive the light deflection DE:

d2 u
d2

+ u = 2mu2 referred to in the notes.

12. On the precarious nature of existence just outside a black hole:


a) Calculate the 4-velocity V i , the 4-acceleration Ai , the scalar Ai Ai , for an observer
r = r0 , = 0 , = 0 in the Schwarzschild spacetime with r > 2m.
b) What happens to Ai Ai as r 2m+ ? What dangers does this pose for an observer
who attempts to hover just outside the event horizon of a Schwarzschild black hole
in a rocket ship?
13. Sketch the curves t = const., r = const., in the rv-diagram of the Eddington-Finkelstein
manifold. This gives one some insight into the nature of the breakdown of the Schwarzschild
(r, t) coordinate system.
14. Death in a black hole
By using the Eddington-Finkelstein metric, prove that a space traveller who falls into a
Schwarzschild black hole is pulled into the curvature singularity at r = 0 and destroyed
in a finite amount of proper time, no matter how powerful his rocket engines are.
15. Determine for what values of p1 , p2 , p3 (constants) the metric
ds2 = dt2 + t2p1 dx2 + t2p2 dy 2 + t2p3 dz 2
satisfies the vacuum Einstein field equations Rij = 0.
16. Consider the Lorentzian metric
ds2 = L2 (e2 dx2 + e2 dy 2 ) 2du dv
where L and are functions of u only. Calculate ijk , Rijk , Rij and R.
(This is a useful example of a gravitational wave.)
17. Consider the Lorentzian metric defined by
ds2 = dx2 + dy 2 + 2du dv + h(x, y, v)dv 2.
a) Under what conditions on h will this metric satisfy the Einstein vacuum field
equations?
b) Verify that h(x, y, v) = f (v)(x2 y 2) 2g(v)xy gives a particular solution. This
solution in fact describes a plane gravitational wave.
18. Consider the Kerr metric:

 2
2mr
dr
2
2
2
+ d + (r 2 + a2 ) sin2 d2 dt2 + 2 (a sin2 d dt)2
ds =

159

with
2 = r 2 + a2 cos2 ,

= r 2 2mr + a2 ,

and m > 0 and a being constants.


This metric2 satisfies the vacuum field equations Rij = 0, is asymptotically flat, and
is interpreted as describing the gravitational field exterior to a uniformly rotating
axisymmetric star or black hole of mass m and angular momentum J = ma [using
units such that the gravitational constant is G = 1].
a) Verify that this metric admits circular geodesics with
r = const.,
and that for such orbits

= /2,

d
= const.
dt

=
dt
1 + a

where = m1/2 r 3/2 , with = 1.


b) Verify that these orbits are TL iff r satisfies

r 2 3rm + 2a mr > 0
and that in this case

 12
d
3m
1
1
= (1 + a)
+ 2a .
dt
r
c) Consider identical clocks carried in satellites travelling in circular orbits, one with
> 0 (i.e. = 1) and one with < 0 (i.e. = 1), and the same constant value
of r. Verify that the time elapsed between successive encounters of these satellites
is unequal and given by
=

(1 a)(1 3m/r + 2a)1/2.

Thus observers in these satellites will age by different amounts, as they keep
encountering each other even though both have geodesic world lines, unlike in the
twin paradox in SRT.
19. Red-Shift for a Freely Falling Particle
Let C be the world line of a radially freely falling particle in the Schwarzschild spacetime, released at infinity. Suppose that an observer 0 (r = r0 , , = const) receives
light signals emitted by this particle.
2

The Kerr metric reduces to the Schwarzschild metric when a = 0.

160

a) Derive the frequency change formula


received
=
emitted

2m
rp

! r

2m
,
r0

where rp < r0 is the value of the r-coordinate at the event of emission. Compare
this to the frequency shift formula between two observers with r, , constant
(see 4.4). How do you account for the difference on physical grounds? Also note
what happens as rp 2m+ .
b) Verify that the time coordinate of the event of reception Q of a light signal emitted
from r = rp is given by
s
!
2m
+ {terms which stay finite as rp 2m+ }.
tQ = 4m log 1
rp
Hence verify that as the particle approaches the even horizon r = 2m the frequency shift behaves according to


received
tQ
exp
.
emitted
4m
What implications does this have for the appearance of a collapsing object?
20. Find all null geodesics for the 2-d Lorentz metric


2m e2
2
ds = 1
+ 2 dv 2 + 2dv dr,
r
r

r > 0,

with m > 0 and e being constants, and sketch them by representing r and v as
oblique Cartesian coordinates, as with the Eddington-Finkelstein metric. How many
horizons are there?

161

Problems for Chapter 5


1. Let ds2 = V 2 dt2 + g dx dx , ua = V1 0a . Prove that the congruence of TL curves
which are tangent to ua are geodesics if and only if V /x = 0, = 1, 2, 3.
2. Let ua be the unit tangent vector field of a congruence of TL curves. The relative
motion of the observers or particles having these curves as world lines is described by
the rate of expansion tensor, which is defined by
ij = 12 (ui;j + uj;i) + 12 (ui u j + uj u i ),
where
u i = ui;j uj
is the 4-acceleration of the observers.
(i) Verify that
ij uj = uj ji = 0.

(1)

The tensor ij , being symmetric and orthogonal to ui, in general has 3 distinct
eigenvalues, with associated eigenvectors orthogonal to ui . Thus in general ij
will determine preferred directions orthogonal to ui , and the relative motion of
the observers will be anisotropic. The condition for isotropic motion is that the 3
eigenvalues are equal, since then any vector orthogonal to u will be an eigenvector,
and so there will be no preferred directions. The condition for isotropy is
ij = (gij + ui uj ),

(2)

the ui uj term being required in order to satisfy equation (1).


(ii) Verify that the scalar is given by
= 13 ,

(3)

where ij g ij = ui3i , is called the rate of expansion scalar.

Equations (2) and (3) suggest that we define the rate of shear tensor ij by
ij = ij 13 (gij + uiuj ).
Then the motion is isotropic iff ij = 0.

(iii) Verify that ij is tracefree, i.e. i i = 0.


3. A simple line-element which describes a spatially homogeneous but anisotropic cosmological model is
ds2 = dt2 + X(t)2 dx2 + Y (t)2 dy 2 + Z(t)2 dz 2 ,
with the 4-velocity given by ua = 0a , where t = x0 .
162

(i) Calculate the rate of expansion tensor and rate of shear tensor for TL congruence
defined by ua . You will find it convenient to show that
u i = 0,

ui;j = uj;i,

so that in this case ij simplifies to


ij = ui;j .
The result in 2(i) implies that certain components of ij are identically zero. Give
the mixed components i j and i j as these are simplest.
(ii) What conditions on the line-element will guarantee that i j = 0, so that the
motion of the fundamental observers is isotropic?
4. Derive the energy conservation equation
= 3( + p)

S
S

for the FRW line-element, by using the fact that the perfect fluid energy momentum
tensor satisfies
Ta b ;b = 0.
Hint: By expanding Ta b ;b = 0, and contracting with ua , verify that
= ( + p),
where = ui ;i is the rate of expansion scalar. All that remains is to calculate for the
FRW line-element.
5. Assume that the pressure and density in the FRW model universe satisfy the equation
of state:
p = ( 1),
where 1 2 is a constant.
(i) Verify that with S(0) = 0, the current time t0 is given by
Z 1
du
1

,
t0 = H0
1 20 + 20 u23
0
where 0 =

80
3H02

is the density parameter, and H0 is Hubbles constant.

Hint: Integrate the energy conservation equation, and then use the Friedman
equation.

163

(ii) show that if the spatial geometry is Euclidean (k = 0)


t0 =

2
.
3H0

Comment:
This shows that the time t0 since the big-bang is greatest when
= 1, i.e. when p = 0. In other words, going into the past, the singularity occurs
sooner when p 6= 0. This shows that (positive) pressure is counterproductive in
preventing a singularity.
6. In the Einstein-de-Sitter universe (the FRW model with p = 0 and k = 0), derive the
relationship between the actual size of a distant object and the angle subtended
at our galaxy by the relevant null geodesics, namely
=

H0 (1 + z)3/2
,
2 [(1 + z)1/2 1]

where z is the redshift parameter, and H0 is Hubbles constant. Show that for fixed
, reaches a minimum for z = 5/4.

r = const

null geodesics , = const

t = const

WL of our galaxy
Angle subtended by a distant object.

7. In an FRW model, the intersection of the past light cone with a t = constant hypersurface is a 2-space. Show that the area of this 2-sphere in the Einstein-de-Sitter universe
is
2

1/3
A(t) = 36 t4/3 t0 t1/3 .

Show that this reaches a maximum when t = 8t0 /27, where t0 is the current time, and
that this time corresponds to z = 5/4.
164

8. Prove that the volume of a 3-sphere of radius R is V = 2 2 R3 .


Z

2
2
2
2
2
2
2
g.
Hint: ds = R [d + sin (d + sin d )], and V =

9. Consider the congruence of curves in flat spacetime

ds2 = dt2 + dr 2 + r 2 (d2 + sin2 d2 ),


which are tangent to the vector field
ua = 0a + w(r)3a ,
where (t, r, , ) = (x0 , x1 , x2 , x3 ).
(i) What condition on w(r) will guarantee that the curves are everywhere timelike?
(ii) Prove that there is no family of hypersurfaces orthogonal to these curves.
Hint: If there is such a family, then
ua =

f
,
xa

for scalars , a.

This implies
ua,b ub,a = ,b f,a = ,a f,b ,

,a =

xa

etc.

and hence that


u[a,buc] = 0,
where [ ] denotes complete antisymmetrization, i.e.
u[a,buc] 16 [(ua,b ub,a )uc + (ub,c uc,b)ua + (uc,a ua,c )ub].
Verify that u[a,b uc] 6= 0 for the given vector field.
10. In the Einstein-de-Sitter universe, derive the exact redshift distance formula,


1
1
1
= 2H0
1
or equivalently z =
2 1.
1
1+z
1 2 H0

Verify that this reduces to the series expansion given in the notes (see equation (5.24)).

11. Closed timelike curves


Consider the 4-d spacetime R4 = {(x, y, z, t)| < x, y, z, t < } with Lorentzian
metric g defined by

2
ds2 = dx2 + dy 2 + dz 2 dt + 21 b(xdy ydx) ,
where b is a positive constant. This spacetime is a solution of the non-vacuum Einstein
field equations, with pressure free matter and an electromagnetic field as source. The
4-velocity of the matter is the vector field
u=
165

a) Verify that the matter is non-expanding in the sense that the divergence of its
4-velocity is zero: i.e.
ui ;i = 0.
b) Prove, by explicitly constructing suitable curves, that a smooth closed TL curve
passes through each event of this spacetime whose coordinates (x, y, z, t) satisfy
x2 + y 2 > 4/b2 . Calculate the proper time elapsed in traversing one loop of these
curves.

166