You are on page 1of 193

Advanced General Relativity

Lecture notes by Sergei Winitzki

DRAFT September 28, 2007, version 1.1


Lecture notes on topics in advanced General Relativity, including differential geometry, singularity theorems, variational
principles, and an introduction to spinors. This text is not yet in a final form.

Copyright c 2005-2007 by S ERGEI W INITZKI. Permission is granted to copy, distribute and/or modify this document un-
der the terms of the GNU Free Documentation License (version 1.2 or any later version published by the Free Software
Foundation) with an Invariant Section being chapter E, with no Front-Cover Texts and no Back-Cover Texts (see Sec. E.2 for
the conditions). The GFDL permits, among other things, unrestricted verbatim copying of the text. The source files used to
prepare a printable version of this text, as well as updates, will be found at the authors home page.
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Suggested literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Calculus in curved space 1


1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Index-free notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Sample practice problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Basic notions: Manifolds and vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Manifolds and coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Manifolds: intrinsic picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Tangent vectors as short curve segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.6 *Tangent space as space of derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.7 Vector fields and flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.8 *Tangent bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.9 Tensor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.10 Commutator of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.11 Connecting vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Commutator as Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Lie derivative of tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Calculus of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.1 Volume as antisymmetric tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.2 Motivation for differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.3 Antisymmetric tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.4 *Oriented volume and n-vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.5 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.6 Differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.7 *Canonical decomposition of 1-forms and 2-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.8 The Poincar lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.4.9 Integration of forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.1 Motivation: metric on surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5.3 Examples of metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.4 Orthonormal frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.5 Correspondence of vectors and covectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5.6 The Levi-Civita tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Affine connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.2 General properties of connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.6.3 The coordinate derivative connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.6.4 Compatibility with the metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.6.5 Torsion and torsion-freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.6.6 Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6.7 Killing vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.6.8 *Koszul formula and the Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.6.9 Divergence of a vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.7 Calculations in index-free notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.7.1 Abstract index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.7.2 Converting expressions into index-free notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.7.3 Index-free computations of trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.7.4 Summary of calculation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.8 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

i
Contents

1.8.1 Curvature of a connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44


1.8.2 Bianchi identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.8.3 Ricci tensor and scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.8.4 Calculations with the curvature tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.9 Geodesic curves, geodesic vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.9.1 Parallel transport of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.9.2 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.9.3 Geodesics extremize proper length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.9.4 *Motion under external forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.9.5 Deviation of geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.10 Example: hypersurface of constant curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.10.1 Tangent bundle and induced metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.10.2 Induced connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.10.3 Riemann tensor within the hypersurface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2 Geometry of null surfaces 55


2.1 Null vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.1 Orthogonal complement spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.2 Divergence of a null vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.2 Null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.2.1 Three-dimensional hypersurfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.2.2 Integrable vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.2.3 Frobenius theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.2.4 Null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2.5 Examples of null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2.6 Lightcones are null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2.7 Null functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.2.8 Null functions generate null geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.2.9 Every lightray comes from null functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.2.10 Conformal invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3 Raychaudhuri equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.1 Distortion tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.2 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.3.3 Introducing Raychaudhuri equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.3.4 Shear for timelike congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.3.5 Shear for null congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.4 Applications of Raychaudhuri equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.1 Energy conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.2 Focusing of timelike geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4.3 Repulsive gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4.4 Focusing of null geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.5 Null tetrad formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3 Asymptotically flat spacetimes 71


3.1 Stationary spacetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.1 Newtonian limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.2 Redshift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.3 Conformal Killing vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.4 Gravitational potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.1.5 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Conformal infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.1 Conformal infinity for Minkowski spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.2 Conformal diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.2.3 How to draw conformal diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.3 Asymptotic flatness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4 Conformal radiation fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.4.1 Scalar field in 1+1 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4.2 Scalar field in 3+1 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4.3 Electromagnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4.4 Gravitational radiation field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4.5 Asymptotic behavior of radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

ii
Contents

4 Global techniques 89
4.1 Singularity theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1.1 Singularities and geodesic incompleteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1.2 Past-incompleteness of inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1.3 Conjugate points on geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.4 Second variation of proper length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1.5 Singularity in collapsing or expanding universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1.6 Singularity in a closed universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.1.7 Singularity in gravitational collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2 Hawkings area theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.3 Holographic principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5 Variational principle 101


5.1 Lagrangian formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.1 Classical field theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.2 Einstein-Hilbert action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.1.3 Nonlinear f (R) gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.1.4 Energy-momentum tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.1.5 General covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.1.6 Symmetries and Noether theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 Hamiltonian formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.1 Electrodynamics in Hamiltonian formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.2 Hamiltonian mechanics of constrained systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.3 Gauss-Codazzi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2.4 Boundary term in Einstein-Hilbert action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.2.5 The Hamiltonian for pure gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.2.6 Constraints in General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3 Quantum cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.1 Wave function of the universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.2 Wheeler-DeWitt equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.3 Interpretation of the wave function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.3.4 Minisuperspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6 Tetrad methods 125


6.1 Tetrad formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.1.1 Tetrads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1.3 Hodge duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1.4 Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.1.5 Connection as a set of 1-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.1.6 *Solving equations for n-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.2 Applications of tetrad formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2.1 Computing geodesic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2.2 Determining Killing vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.2.3 Curvature as a set of 2-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2.4 Ricci tensor and Ricci scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.2.5 Einstein-Hilbert action in tetrads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3 Connections on vector bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.1 Vector bundles as generalization of tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.2 Examples of bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.3 Covariant derivatives on vector bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.4 Gauge theories and associated bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.3.5 Tangent bundle as associated bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7 Spinors 141
7.1 Introducing spinors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.1.1 Quaternions and rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.1.2 The Lorentz group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.1.3 Lorentz transformations of spinors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.2 Spinor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.2.1 The fundamental 2-form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.2.2 Relationship of spinors and vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.2.3 Simplification of spinorial tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

iii
Contents

7.3 Equations for spinor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148


7.3.1 Spinors in curved spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.3.2 Covariant derivative on spinors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.3.3 Maxwell equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.3.4 Dirac equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

A Elements of Special and General Relativity 151


A.1 Special Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
A.1.1 Spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
A.1.2 Motion of bodies in SR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
A.2 Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
A.3 Transition to General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
A.4 Covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
A.4.1 Curved coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
A.4.2 Curved space and induced metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
A.4.3 Covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
A.4.4 Properties of covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.4.5 Choice of connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.5 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.5.1 Parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.5.2 Riemann tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.5.3 Expressing Riemann tensor through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.6 Covariant integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.6.1 Determinant of the metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.6.2 Covariant volume element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.6.3 Derivative of the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.6.4 Covariant divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.6.5 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.7 Einsteins equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

B How not to learn tensor calculus 163


B.1 Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
B.2 Tensor calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
B.3 Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

C Calculations and proofs 167


C.1 For Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
C.2 For Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
C.3 For Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

D Comments on literature 175


D.1 Comments on Ludvigsens General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

E License for this text 177


E.1 Authors position on commercial publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
E.2 GNU Free Documentation License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
E.2.0 Applicability and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
E.2.1 Verbatim copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
E.2.2 Copying in quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
E.2.3 Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Bibliography 181

Index 183

iv
Contents

Preface is complete in the sense that there are no gaps in the cal-
culations. However, the text is presently being heavily re-
This book is a revised version of lecture notes for the course vised and should be in a more satisfactory form by the end of
Topics in Advanced General Relativity taught by the author 2007. This preliminary version is now made freely available
in the fall semester of 2005/06 at Ludwig-Maximilians Uni- because I feel that it is useful even in the present, unfinished
versity, Munich. The audience included advanced undergrad- state. (Warning: the present text is a draft and may contain
uates and beginning graduate students. The choice of topics mistakes! If you must have a set of lecture notes in a finished
is intended to complement an introductory course in general form, please do not read this text until it is marked as a re-
relativity, which is assumed as a prerequisite. The present text lease version. For example, it is presently not clear which
covers differential geometry on manifolds, asymptotically flat sign convention for the Ricci tensor R I will finally adopt;
spacetimes, singularity theorems, the use of variational prin- because of this, signs might be inconsistent in some equations
ciples in GR, tetrad formalism, and spinor calculus. My goal involving R . Some comments regarding possible improve-
is to provide a readable introduction to concepts that are cov- ments are scattered in boldface throughout the text.)
ered in existing advanced texts, such as the classic books [12] This text may be freely distributed according to the
and [36], and also explain a small number of newer results GNU Free Documentation License (see Sec. E.2 or
that are sufficiently tightly connected with the main topics of www.gnu.org/copyleft/fdl.html). Comments and sugges-
the lectures. tions are welcome. The entire text was prepared by the author
I cannot give an overview of the history of the subject, and on computers running GNU/Linux, using the free document
this book is intended a textbook rather than a research mono- preparation systems LYX and LATEX.
graph. Correspondingly, there are few references to original
work. I feel that this approach is justified because all the ma- Sergei Winitzki, 2007
terial (with a small number of exceptions) is well-established.
Instead, the bibliography contains1 all the sources I consulted
while preparing this book. After digesting the main ideas ex- Suggested literature
plained here, readers will be able to understand research-level
There exist many textbooks on General Relativity written at
papers and monographs listed in the bibliography.
every level. In the following table, I list some of the more
I would like to make some comments regarding the organi-
advanced textbooks covering the main topics of the present
zation of the text. I emphasize visual and conceptual explana-
text. For an introduction to GR, see e.g. the books [29, 10].
tions; however, I derive all the principal results in full detail.
Definitions of useful mathematical notions are motivated and [12] [21] [33] [36] [32] [19] [28]
illustrated by examples. New terminology is shown in bold- Difficulty* (1-5): 5 2 4 4 3 2 1
face within or near a definition; the italics type is reserved for Diff. geometry + + + + + + +
emphasis. The symbol  marks the end of a derivation, a re- Index-free +*
mark, or an example. This mark is intended to aid the reader Action principle + + + +
in skipping unwanted material. Asympt. flatness + + + +* +*
The exposition in many contemporary textbooks is inter- Spinors in GR +* + +
spersed with unproved statements that are labeled exer- Singularity thms. + +* +* +*
cises but actually provide important additional information
Hamiltonian GR + + +
or even constitute an integral part of the development of the
Tetrad formalism + + +
material. I call such statements pseudo-exercises since they
are rarely as straightforward as genuine practice problems *Please note:
would be. It seems to me that the main reason for the exis- A books level of difficulty is my subjective estimate of
tence of pseudo-exercises is the authors desire to avoid clut- how hard it would be for a well-prepared student to learn the
tering the text with derivations that may be omitted during a contents of the book. (Higher level is harder.)
first reading. I feel that readers at an advanced level are in- The book [21] discusses only some of the more elementary
sufficiently motivated and/or lack the time needed to solve aspects of spinor calculus and singularity theorems.
pseudo-exercises. So I decided that this book will not contain The book [36] refers the reader to [12] for derivations of
any pseudo-exercises; every relevant statement2 comes with a some key technical statements relevant to singularity theo-
derivation.3 However, many derivations and calculations are rems.
relegated to Appendix C. I also included a small number of The book [32] treats asymptotic flatness exclusively in the
short Practice problems intended as straightforward exercises spinor language.
for the readers. (Solutions to those are not given.) The book [19] uses exclusively the abstract index notation;
A comment on the status of this text is in order. The book other books use non-abstract index notation (as evidenced by
1 Will
the presence of the non-tensorial Christoffel symbols ). It
contain when I finish the current revision of the book.
2 An irrelevant statement is something that seems generally interesting but
also discusses only the basic facts relevant to asymptotic flat-
has no direct relevance to the main subjects of the book. For example, ness.
the statement Bianchi identities can be generalized to connections with The book [28] discusses only some basic aspects of singu-
nonzero torsion is irrelevant since I never consider such connections in larity theorems relevant to black holes.
this book. Irrelevant statements, sometimes accompanied by references
to additional literature, are confined to footnotes. The reader may safely
The present text aims to have the difficulty level 2. Right
ignore all the footnotes. now the exposition is somewhat lacking in the coverage of
3 At present, some pseudo-exercises still remain, but almost all of them
asymptotic flatness, and there are very few applications of
(except a very few) have full solutions. Pseudo-exercises are gradually spinors and of the tetrad formalism. 
converted to statements as the current major revision of the book pro-
gresses.

v
1 Calculus in curved space
In this chapter I attempt to motivate and explain the main 1.1.1 Index-free notation
ideas and suggest images that illustrate the mathematical ap-
paratus of differential geometry. The actual explanations start One approach to differential geometry is to treat vectors and
in Sec. 1.2.2. Additional literature on differential geometry tensors as multi-indexed arrays of numbers (called compo-
from a mathematical viewpoint is [22, 11]. A more modern nents), such as T or R , which depend on the choice
mathematical textbook (which I did not read) is [17]. of the coordinate system {x }. One manipulates expressions
with many indices, such as g g, g , and checks at the end
of the calculations that the results are covariant (i.e. transform
1.1 Summary correctly under changes of coordinates). This approach was
pioneered by Einstein and is still the way tensor calculus is
General Relativity (GR) is a currently well-established phys- presented in the current physical literature. I assume that you
ical theory of gravitation whose mathematical apparatus is were already exposed to an introductory course of GR formu-
based on differential geometry (more specifically, tensor cal- lated using coordinates and components. If you are not sure,
culus in curved spaces). In this chapter, I introduce the main please scan Sec. A.2A.7 in Appendix A for unfamiliar equa-
concepts of this calculus: manifolds, tangent spaces, covariant tions.
derivatives, and curvature. I will explain and use the index- In the mathematical literature, the coordinate-free approach
free notation. You (the reader) will perhaps appreciate this and the index-free notation are predominant. For example,
material more fully if you are already somewhat familiar with the scalar product of vectors is denoted by g(a, b) or just ha, bi
tensor calculus and GR, at least to the extent covered in Ap- instead of g a b or a b . The preference for the coordinate-
pendix A. free approach is, in my opinion, due to the different character
The pragmatic side of this chapter is to develop sufficiently of the tasks typically undertaken by mathematicians and by
powerful formalism for index-free calculations, which will be physicists. Mathematicians study abstract relationships be-
heavily used throughout this book. You may skip this chapter tween objects and are more interested in finding objects that
on first reading iff1 you are familiar with the following con- have rich properties and whose interconnections explain ear-
cepts and notations. lier results on a more general level and yield new concepts.
On the other hand, physicists are mostly interested in making
M smooth manifold specific computations and deriving equations that will even-
Rn n-dimensional Euclidean space tually give numerical values for quantities, even if the meth-
Tp M, Tp M (co)tangent spaces to M at point p ods of computation are not particularly general, elegant, or
TM tangent bundle of a manifold M illuminating.
v Tp M tangent vector at point p In the coordinate-free approach, one avoids introducing
vf derivative of function f w.r.t. vector field v a particular coordinate system {x }, and one does not talk
t vector field along a coordinate axis t about components of tensors. Instead, one uses geometrically
[a, b] La b commutator of two vector fields defined operations (such as, the covariant derivative or the Lie
Lv T Lie derivative of tensor T w.r.t. vector v derivative) and the algebraic properties of tensors. In a calcu-
uv tensor product of vectors lation where a particular tensor is sought, especially when no
R v
1-form applied to vector v symmetries are present, it may be helpful to introduce a suit-
, d integral / exterior differential of n-form able coordinate system and to determine the components of
1 2 exterior product of n-forms 1 and 2 that tensor. However, often one needs to perform a general
v insertion of vector v into n-form calculation that does not depend on a particular tensor (e.g.,
g(a, b) scalar product of vectors using a metric g an investigation of a general property of the curvature ten-
gv 1-form corresp. to vector v through metric g sor). In most cases, such general calculations are easier to
g1 vector corresp. to 1-form through metric g perform in the coordinate-free approach because coordinate
volume n-form, Levi-Civita tensor systems and components offer no computational or concep-
a b covariant derivative of vector b w.r.t. vector a tual advantages. The experience of writing this book shows
divu divergence of vector field u that the coordinate-free approach is well suited for applica-
R(a, b)c curvature tensor (Riemann tensor) tions such as singularity theorems or the Hamiltonian formu-
R(a, b, c, d) fully covariant Riemann tensor lation of GR.
Ric(a, b) Ricci tensor It is natural2 to use the index-free notation when one adopts
the coordinate-free approach. Here is an example of a cal-
Concepts covered in this chapter also include: connecting culation in the index-free notation. Statement: If a vector u
vector fields; definition of the metric-compatible (Levi-Civita) is geodesic then g(u, u) is constant along the flow lines of u.
connection on a manifold with a metric; torsion and torsion-
freeness; properties of the Riemann tensor; geodesics and 2 Although, strictly speaking, not necessary.
Instead, one could use Penroses
geodesic deviation; Riemann tensor for manifolds of constant abstract index notation, where a means a vector with an abstract label
curvature. , rather than an array of components. See Sec. 1.7.1 for more details. One
book that consistently uses the coordinate-free approach together with the
1 If and only if. abstract index notation is [19].

1
1 Calculus in curved space

Proof : The derivative of g(u, u) along the flow of u is only heuristic tools that help the intuition. I write integrals,
R
e.g. d3 x, with a roman d when the integration of a differ-
Lu g(u, u) = u g(u, u) = 2g(u u, u) = 0 ential form is implied (note that d3 x is a shorthand notation
for a 3-form dx dy dz). However, I write an italic d
R in in-
since u u = 0 for geodesic vector fields. For comparison, the
tegrals where a simple one-dimensional integration f (x)dx
same calculation in the index notation looks like this:
is actually used. I also write the metric interval,
d
(g u u ) = u (g u u ); = 2g u u ; u = 0 ds2 = dt2 dx2 dy 2 dz 2 ,
d
since u u ; = 0. with an italic d because this notation is essentially a jar-
The index notation is frequently very useful and even un- gon that bears only marginally on the calculus of differential
avoidable in some calculations, especially when one needs to forms; for instance, ds cannot be understood as a 1-form in
manipulate traces of high-rank tensors, as if often the case in this notation.
GR. Nevertheless, the index-free notation has, in my view, When using ordinary derivatives, I write d/d and d2 /d 2
significant advantages in the study of more mathematically using the italic d, for the following reasons. The operators
advanced topics of field theory and GR. The index-free nota- d/d and d2 /d 2 are not essentially different from the partial
tion is concise, emphasizes the geometrical meaning of every derivative operators /x or 2 /x2 . Traditionally, one writes
quantity, and completely excludes non-tensorial quantities f /x when f depends not only on x but also on other vari-
from consideration; for instance, the non-tensorial Christoffel ables y, z, ... that are held constant while f (x, y, z, ...) is dif-
symbols cannot even be defined. The index-free approach is ferentiated with respect to x. On the other hand, one writes
well-suited for studying the general properties of various ge- df /dx to emphasize that f (x) is a function of only one variable
ometrical objects used in GR. Many general derivations (such x, or that f is an expression that must be reduced, through
as computing the second variation of the geodesic length) can suitable substitutions, to a function of x only. This notation
be performed faster and more transparently in the index-free is widely used to avoid cluttering the equations with explicit
notation. substitutions. For instance, the Euler-Lagrange equation of
In this text, I adopt the coordinate-free approach; this leaves mechanics is written as
the choice of an index-free notation or the abstract index no-
tation. I employ index-free calculations when it is practically L d L
= 0,
more convenient than the abstract index notation. The index- x dt x
free approach turns out to be almost always more convenient,
implying that L/ x must be expressed as a function of t be-
except for certain cases, such as the variation of the Einstein-
fore evaluating d/dt. However, the differential operators /x
Hilbert action with respect to the metric, where index-free cal-
and d/dx are not related to the exterior differential d. For this
culations become cumbersome (although still possible). In
reason, I find it confusing to write d/dx with a roman d
those cases I use the abstract index notation. Section 1.7
when d means the exterior differential. So I use an italic d
presents more discussion and shows how to convert expres-
in d/dx.
sions between the index-free and the index notations. In this
book, I also introduce an admittedly nonstandard but unam- A related recent trend is to write a roman e for the base of
biguous and adequate index-free notation for the trace of a the natural logarithm and a roman i for the imaginary unit
tensor (Sec. 1.7.3). (i2 = 1). The reason for switching to romands, es, and
To make the transition to the index-free approach easier for is is to avoid confusion when other quantities are denoted
by the letters d, e, i. In this book, my intention is to make the
readers accustomed to the index notation, I will sometimes
notation unambiguous. I use a roman i for the imaginary
mention the index representation of tensor quantities used
throughout the text. The formal correspondence between the unit because the letter i is frequently used as an index, e.g. xi .
However, I use an italic e = 2.718... to avoid confusion with a
index-free and the index notations is summarized in the fol-
lowing table. boldface roman e frequently used for basis vectors {ea }. 

v v v x dx
vf v f v f, v v 1.1.2 Sample practice problems
[a, b] a b b a d (, , ) dx dx
You should be able to solve these problems if you know the
a b a b a b ; g(a, b) g a b a b
material of Chapter 1.
div u u u ; gv g v v
uv u v g1 g Null vectors. Prove that two non-parallel null vectors l, n
cannot be orthogonal. In other words, g(l, n) = 0 is equivalent
Remark on the roman d: Recently, certain science pub- to l = n when g(l, l) = g(n, n) = 0. Here g is a metric in four-
lishers started enforcing the ruleRof using a roman d (for dimensional space with the Lorentzian signature (+, , , ).
instance, writing d2 x( )/d 2 or d3 x) rather than an italic Extremum of a functional on curves. Consider a functional
d, as was the common practice during the previous two cen- Z x2
turies. I decided that in the present text, the roman symbol
F [] = ))d,
f (g(,
d will be used to denote the exterior differential in the cal- x1
culus of differential forms, while the italic d will appear in
other situations when the traditional usage involves an in- where f (x) is a given smooth, nonconstant function and
finitesimally small quantity. My intention is to use a roman is the tangent vector to the curve ( ). Derive the Euler-
d for rigorously defined objects, such as differential forms Lagrange equations describing the extremum of the func-
dx, and to contrast them with infinitesimals dx, which are tional F [].

2
1.2 Basic notions: Manifolds and vector fields

Symmetric geodesics. Consider a local coordinate system of the neighborhood U(p0 ) under the map p 7 x (p) Rn
{x, y, z} in a three-dimensional space. Suppose that the metric must be an open set in Rn . Under these conditions, the func-
g has the form tions x (p) constitute a local coordinate system defined on
an open subdomain U(p0 ) M. Of course, different subdo-
g = (x)dx2 + A(y, z)dy 2 + C(y, z)dz 2 , mains U(p0 ), U(p1 ), etc., may have different local coordinate
systems. Whenever there is an intersection of two domains of
where (x) > 0, A(y, z) > 0, C(y, z) > 0 are smooth func-
M where local coordinate systems are defined, we get a coor-
tions that depend on x, y, z as indicated. Show that lines y =
dinate change map Rn M Rn defined in a subset of Rn .
const, z = const (i.e. the coordinate lines of x) are geodesics.
A manifold is smooth (differentiable) if all these coordinate
Obtain an explicit formula for a Killing vector k that the met-
change maps are smooth (differentiable) in the ordinary sense
ric g admits. Is the vector field k geodesic?
(as functions Rn Rn ).
Extrinsic curvature. A hypersurface with normal vector
A particular manifold may be specified either by an explicit
field n is embedded in a curved space with metric g. The ex-
embedding into a larger space, or by giving a complete set of
trinsic curvature K(x, y) of the hypersurface is defined as a
charts and re-charting maps, specifying how the charts are to
bilinear form
be glued together.
K(x, y) = h(x n, y),
A typical example of a manifold is the surface of a sphere
where h is the induced metric on the hypersurface and x, y are in 3-dimensional space, usually called the two-dimensional
arbitrary vectors tangent to the hypersurface. Starting from sphere (or 2-sphere) and denoted S 2 . The 2-sphere S 2 is lo-
the definition and using g(n, n) = 1, show that K(x, y) is sym- cally like R2 , but globally is not (see Statement 1.2.2.1). It is
metric, K(x, y) = K(y, x). Compute the extrinsic curvature useful to visualize a manifold as a surface embedded in a
tensor explicitly for the surface x2 + y 2 z 2 = 1 defined larger space, and to think that we are flat observers liv-
in 3-dimensional Euclidean space with Cartesian coordinates ing entirely within the surface. So we cannot see the larger
{x, y, z} at the point {0, 0, 1}, using {x, y} as the local coordi- space and must describe the surface exclusively through its
nates on the surface.  intrinsic properties.

As an example of a manifold described in a different way,


1.2 Basic notions: Manifolds and consider the set of all 2 2 unitary complex matrices with unit
determinant. These matrices form a group denoted SU (2). (A
vector fields matrix A is unitary if A A = 1, where A is the Hermitian con-
jugate, i.e. the transposed and complex conjugate matrix.) It is
In nongravitational physics, one describes events in four-
perhaps not immediately clear how to introduce coordinates
dimensional Minkowski spacetime R4 with familiar coordi-
in this set of matrices. It can be shown (see Statement 7.1.1.2
nates {t, x, y, z}. Physical laws are expressed as partial differ-
on page 143) that all matrices A SU (2) can be described
ential equations for various fields in spacetime. General Rel-
explicitly as
ativity replaces Minkowski spacetime by a curved spacetime,
and partial derivatives with respect to {t, x, y, z} are replaced  
by more complicated differential operations. The mathemat- a0 ia3 ia1 a2
A= ,
ical concept of a manifold and accompanying notions are ia1 + a2 a0 + ia3
used to formalize the idea of a curved space and the ways
of writing differential equations for fields in such spaces. P3
where a0 , ..., a3 are real numbers satisfying j=0 a2j = 1.
In the following subsection 1.2.1, I will list concise defini-
Therefore, SU (2) is equivalent (as a manifold) to a three-
tions of manifolds, tangent spaces, vector fields, etc. If you
dimensional sphere S 3 .
are satisfied by or already familiar with these definitions, you
may skip the entire Sec. 1.2 and continue reading from Sec. 1.3. A smooth, one-to-one map from one manifold to another
Otherwise, you may find it helpful to read the following sec- is called a diffeomorphism. Two manifolds are called dif-
tions (from Sec. 1.2.2 onwards) where I explain those defini- feomorphic if there exists a diffeomorphism between them.
tions more visually. Diffeomorphic manifolds are topologically equivalent. For
example, a sphere can be diffeomorphically mapped onto the
surface of a cube, but not onto a torus. Another example we
1.2.1 Definitions have just seen is an explicit diffeomorphism (specified using
3
This section concisely summarizes the definitions and prop- coordinates) between SU (2) and S .
erties of manifolds and tangent vectors. If you prefer visual We will always consider smooth finite-dimensional mani-
explanations to formal definitions, you may ignore this sec- folds M and smooth functions f (p), p M, on these mani-
tion and start reading from Sec. 1.2.2. folds; so we omit the word smooth everywhere.
A manifold M is a set of points (which I informally call a In our notation, p M is a point on a manifold, scalar func-
space) that locally looks like a subset of Rn . In other words, tions (scalar fields) are functions f : M R, so that f (p) is
every point p M has a small neighborhood around it, which a number, and boldface letters v, w, ... are vectors and vector
is one-to-one mapped into a subset of Rn . This mapping is fields (defined below). We use Greek letters , , ..., , , ... for
called a chart and is equivalent to specifying a set of scalar tensor indices and also for numbers (scalars).
functions, x (p), defined in an open neighborhood U(p0 ) of
some point p0 and giving the point x Rn corresponding to A curve on a manifold is a set of points ( ) parametrized
points p M. These functions must uniquely identify each by a real number , i.e. a smooth map : R M, sometimes

point p, so that for any two points p, p from the neighbor- defined only on a subset of R.

hood U(p0 ) we have x (p) = x (p ) iff p = p . Also, the image A derivation D at a point p M is a map from scalar func-

3
1 Calculus in curved space

tions to numbers, satisfying the derivative-like properties

D(f + g) = Df + Dg, [linearity]


D(f g) = D(f )g + f D(g), [Leibnitz rule]
Df = 0 if f = const around p.
z
All the derivations at a point p form a vector space called the
tangent space Tp M at the point p. Vectors from the tangent
space are denoted by boldface letters, e.g. u Tp M, and their x
action on functions is denoted by u f . The number u f y
is interpreted as the derivative of the function f at the point
p in the direction u. Using a local coordinate system {x },
Figure 1.1: A visual example of a manifold is a curved surface
= 1, ..., n, one can prove that the tangent space is an n-
embedded in the flat 3-dimensional space.
dimensional vector space. An explicit basis in this space is

given by the n coordinate derivatives e /x . The com-
ponents of a vector v in this basis can be found as v = v x , 1.2.2 Manifolds and coordinates
and then the vector v is decomposed as
If you prefer formal definitions to visual explanations, or if
n n
X X the definitions in Sec. 1.2.1 are entirely familiar, you may skip
v= v e = v .
x Sec. 1.2.2 to 1.2.11.
=1 =1
One of the main assumptions underlying GR is that the ge-
The union of all tangent spaces Tp M for every p M is ometry of our universe is curved rather than flat. In a curved
called the tangent bundle of M and is denoted T M. The tan- geometry, initially parallel lines may move closer or further
gent bundle is itself a manifold with the same charts as M but apart, and the sum of the angles of some triangles may be
with twice the dimension of M. This is so because every chart larger or smaller than , depending on the triangle and its lo-
will map a patch of M into a patch of Rn and tangent vectors cation in space. Let us examine the notion of curved space,
map into vectors in Rn , thus the patch of the tangent bundle starting from an intuitive picture.
will map into a patch of Rn Rn . A simple visual image of a curved space is a 2-dimensional
A vector field v(p) on a manifold M is a tangent vector surface embedded in the 3-dimensional Euclidean (flat) space,
v Tp M chosen at every point p M, i.e. a map v : M as shown in Fig. 1.1. Since the surface is smooth, every suffi-
T M such that the image v(p) of every point p belongs to the ciently small patch of the surface looks like a small piece of a
tangent space Tp M at the same point p. flat 2-dimensional space R2 . However, different such patches
A covector is a linear map from vectors v into numbers. are glued together in a nontrivial way, so that the entire sur-
The space of covectors to tangent vectors at a point p is de- face has a shape we perceive as curved.
noted by Tp M and is called the cotangent space at point p. A According to GR, the geometry of the universe is similar to
covector (p) Tp M chosen at every point p M is called that of a curved surface. In a sufficiently small neighborhood
a 1-form. The pointwise action of a 1-form on a vector field of each point, the universe looks like the flat four-dimensional
v is denoted by v and is a scalar function on M. A tensor Minkowski spacetime R4 , studied in Special Relativity. How-
field on a manifold is defined similarly: it is a tensor based on ever, the global geometry of the universe (i.e. the geometry ob-
the tangent space at a point p. served at sufficiently large distances and time intervals) is dif-
The commutator of vector fields u and v is the vector field ferent from the Minkowski geometry.
[u, v] defined by its action on functions f by This motivates us to consider the idea of a general kind of
curved space, that is, a space which is only locally similar to
[u, v] f u (v f ) v (u f ) . R4 but globally very different from R4 . The mathematical ob-
ject that formalizes this intuitive idea is called a manifold. A
(It can be shown that [u, v] also satisfies the properties of a
standard, formal definition of a manifold is given in Sec. 1.2.1.
vector field; see Statement 1.2.10.1.) One says that the vec-
I will now motivate and explain that definition, using a two-
tor fields commute if their commutator is equal to zero. For
dimensional surface as a main example.
example, coordinate basis vector fields x and y commute
To formalize the idea that a two-dimensional surface M lo-
because x y f = y x f for smooth functions f (x, y). On the
cally looks like R2 , one says that for each point p in M there
other hand, vector fields xy and x do not commute because
is a small patch of M around p which is mapped bijectively
xy (x f ) x (xy f ) = y f 6= 0. (one-to-one) into some patch of R2 . A patch of R2 can be de-
scribed by coordinates {x1 , x2 } that vary in some finite ranges.
A vector field c is called a connecting vector for the vector Therefore, there is a one-to-one map assigning values x1 and
field v if [c, v] = 0. (Since [c, v] = [v, c], it follows that v is x2 to points of M in a small patch (see Fig. 1.2). This map is
also a connecting vector for c.) All the coordinate basis vectors called a local coordinate system or a chart. We can now for-
/x are connecting vectors for each other. Conversely, if mulate a first rough definition: A manifold is a set M such
one is given a set of n linearly independent vectors e1 , ..., en that there is a local coordinate system around every point of
all of which are connecting for each other within a domain, M. A manifold can be visualized as a surface that is smooth
then there exists a local coordinate system {x } in that domain near each point and does not have corners or spikes. An
such that e is equal to the -th coordinate basis vector /x . example of a non-manifold is a sphere with a line connected
The flow lines of e1 , ..., en are the coordinate grid of that to it. ()
coordinate system. A typical example of a smooth manifold is the unit 2-sphere

4
1.2 Basic notions: Manifolds and vector fields

the sphere, = /2. Nevertheless, this flaw is relatively in-


significant since only two points (the poles) are problematic.
x1 Such points where the chart map is not one-to-one are called
coordinate singularities of a local coordinate system. Despite
the presence of coordinate singularities, the spherical coordi-
z x2 nates constitute an almost global coordinate system, which
is very convenient for many calculations. The singular points
= /2 sometimes need to be handled specially.
x
y Practice problem: Consider the surface in R3 specified by
the equation x2 + y 2 z 2 = 1, where {x, y, z} are the stan-
dard coordinates. Prove that this surface is a two-dimensional
Figure 1.2: A local coordinate system {x1 , x2 } covers a small manifold by showing how to find local coordinates {x1 , x2 } in
patch of a manifold. the neighborhood of an arbitrary point {x, y, z} on the surface.
Is there a global coordinate system for this manifold? Answer:
No. 
specified by the equation
Although our considerations so far concerned two-
x2 + y 2 + z 2 = 1, dimensional surfaces embedded in a three-dimensional Eu-
clidean space, it is clear that completely analogous arguments
where x, y, z are the ordinary Cartesian coordinates in a three- apply to higher-dimensional hypersurfaces embedded into
dimensional Euclidean space. The 2-sphere is usually de- an even higher-dimensional Euclidean space. This picture
noted S 2 . To establish that S 2 is a manifold, it is sufficient leads to the general definition of an n-dimensional manifold:
to find a local coordinate system near each point. This is a set of points M such that each point p M belongs to a
straightforward; for instance, it is easy to see that the first chart that maps it into a subset of Rn .
two Cartesian coordinates, {x, y}, can be used as local coor-
dinates {x1 , x2 } in a neighborhood of the north pole {0, 0, 1},
even in the entire northern hemisphere (but not beyond the 1.2.3 Manifolds: intrinsic picture
equator). The coordinates {x, y} cannot be used near the point A very important idea is to imagine that there are flat ob-
{x, y, z} = {0, 1, 0} because, e.g., a part of the straight line servers living entirely within the surface M. These observers
segment {1 < x < 1, y = 0.99} in R2 that maps onto S 2 cannot see the larger space and describe the surface M ex-
becomes a small circle in S 2 . So the coordinates {x, y} do not clusively through the relationships between the points of M,
provide a one-to-one map from any subset of R2 into S 2 near without referring to any embedding into a larger space. Such
the point {0, 1, 0}. However, {x, z} can be used as a local co- a description is called intrinsic. The main fact is that the in-
ordinate system near {0, 1, 0}. trinsic description is sufficient for all purposes relevant to the
As I attempted to illustrate in Fig. 1.2, the relationship be- internal geometry of the manifold. One may certainly intro-
tween the local coordinates {x1 , x2 } and the Cartesian coor- duce an embedding of M into a larger space3 either as a
dinates {x, y, z} may be complicated. However, in some cases way to describe a particular manifold M more concisely or
one can simply select a subset of the Cartesian coordinates (for as a guide for the intuition, but the final results must be in-
instance, {x, y}) and obtain a local coordinate system covering dependent of any such embedding. In any case, it turns out
some part of M. The only requirement on the local coordi- that the intrinsic description of the geometry of M is more ele-
nates is that they should provide a one-to-one correspondence gant than a description in terms of an embedding into a larger
between a patch U M and a patch V R2 . For instance, space. Another argument in favor of the intrinsic description
any straight line segment situated within V must correspond is that according to General Relativity, we are observers living
to a non-self-intersecting line segment in M. in a four-dimensional curved manifold (the spacetime), and
It is important to note that there may be no global coordi- we do not directly see any higher-dimensional space in which
nate system covering the entire manifold M. This is the case, our spacetime is embedded as a hypersurface. Therefore,
for instance, if the manifold M is a closed surface, such as a we need a way to describe the properties of our spacetime
sphere (see Statement 1.2.2.1 below), or a donut-shaped sur- in terms of intrinsic measurements, i.e. measurements per-
face (a torus). If a global coordinate system exists, M must formed entirely within the spacetime.
be an infinitely extended (perhaps curved) plane-like surface An example of an intrinsic property is smoothness. A func-
that never closes on itself, that is, a surface which is topolog- tion is smooth (of class C ) if it is infinitely many times dif-
ically the same as R2 . ferentiable. In the embedding picture, one can visualize a
It is easy to see that the global geometry of the sphere S 2 smooth n-dimensional manifold M as an n-dimensional hy-
is different from that of R2 , even though small patches of S 2 persurface smoothly embedded into an N -dimensional Eu-
look like small patches of R2 . clidean space with coordinates {X a }, a = 1, ..., N . Using a
Statement 1.2.2.1: There exists no global coordinate system fixed smooth embedding of a manifold M into RN , one could
(chart) in S 2 that would map S 2 smoothly and one-to-one into easily define smoothness of functions and curves on M. These
R2 . (Proof on page 167.)  definitions may look like this. A smooth scalar function is
The often used spherical coordinates {, }, defined by a map f : M R which is a smooth function of the coor-
dinates {X a }. A smooth curve ( ) on a manifold M is a
x = r cos cos , y = r cos sin , z = r sin ,
3 Itis shown in differential geometry (Whitneys embedding theorem [37])
do not constitute a global coordinate system on S 2 because that any manifold can be embedded in a sufficiently high-dimensional
the mapping S 2 {, } is not one-to-one at the poles of Euclidean space.

5
1 Calculus in curved space

map : R M such that the curve coordinates X a ( ) in the


p2
embedding space are smooth functions of . However, these
definitions depend on the embedding of M into RN . This de-
pendence can be avoided and equivalent definitions can be p1
given intrinsically, using local charts on M that map patches
of M into patches of Rn . Here is how one can define a smooth z
scalar function. Within one chart, a function f : M R be-
comes a function Rn R, defined on a patch of Rn . Thus,
smoothness of f can be defined as smoothness of each restric-
x
tion of f to patches of Rn . In brief, a function f is smooth in y
M if it is smooth in every chart. Similarly, a curve : R M
is smooth if its restriction : R Rn to each chart is smooth.
If we define smoothness of functions through their restric- Figure 1.3: Tangent spaces at points p1 and p2 of a manifold,
tions to charts, we need to consider what happens within an visualized as vector subspaces of R3 .
intersection of two different charts. Could it happen that a
function f is smooth according to one chart but not smooth
persurface embedded into an n-dimensional Euclidean space
according to the other chart? Let U1 , U2 M be two differ-
Rn then each tangent plane is a vector space of dimension
ent chart domains and 1 : U1 Rn , 2 : U2 Rn the
n 1.
corresponding local coordinate maps. Since 1 and 2 are
For instance, consider a hypersurface defined by the equa-
one-to-one maps, we may define a map 1 1 2 : Rn Rn .
n tion f (x) = 0, where x R3 . The normal vector n at a point
(This map is defined in a small patch of R which is the image
x0 has the following components,
of the intersection of U1 and U2 under 2 .) We may call this
map a coordinate change map. If every coordinate change
f
map (for every pair of intersecting charts) is a smooth map nj (x0 ) = .
xj x=x0
Rn Rn in the usual sense, i.e. defined by smooth func-
tions in the standard coordinates {x1 , ..., xn }, then all charts Any vector t orthogonal to n is tangent to the hypersurface at
will agree about the smoothness or otherwise of a function the point x0 . Hence, the tangent plane at the point x0 is the
f : M R or a curve : R M. Therefore, one adds to set of points x in R3 specified by
the definition of a smooth manifold the requirement that ev-
ery coordinate change map 1 21 is smooth, for every pair n
X
of intersecting charts. In this way, smoothness of the mani- n (x x0 ) (xj xj0 )nj (x0 ) = 0.
fold itself is adequately described by the requirement that the j=1

coordinate change maps be smooth.


The tangent space is made of vectors with components xj
If there exists a smooth one-to-one map between two entire
manifolds, these manifolds are called diffeomorphic. For ex- xj0 satisfying this equation. This is a two-dimensional vector
ample, the surface z = x2 + y 2 is diffeomorphic to R2 because space that depends on the point x0 (i.e. it is a different space
there is a one-to-one map between points {x, y, z} on the sur- for each x0 ).
face and the plane {x, y}. Thus, just one chart (with {x, y} as It is important to realize that tangent spaces at different
the coordinates) is sufficient to map this manifold into R2 . On points are not naturally related to each other, even though
the other hand, a 2-sphere is not diffeomorphic to R2 (State- every tangent space is two-dimensional and looks like R2 .
ment 1.2.2.1). For instance there is no natural way to add a tangent vec-
tor at a point p1 to a tangent vector at another point p2 , or to
subtract these vectors. This should be especially clear from
1.2.4 Tangent spaces Fig. 1.3, which shows different tangent planes. There is no
obvious way to map vectors from one tangent plane into vec-
Vectors are used in ordinary physics to specify directed mag- tors from the other. We will see such a map later when we
nitudes, such as velocities and forces. Therefore, it is impor- define an additional structure on the manifold called a con-
tant to develop the concept of a directed magnitude in a nection. This structure literally provides a connection (i.e. a
curved space. one-to-one map) between tangent spaces that are infinitesi-
To aid the intuition, let us visualize a manifold M as a sur- mally close (and thus, in principle, between any two tangent
face embedded in a larger space R3 . An immediate example spaces). However, there are infinitely many possible connec-
of a directed magnitude comes from considering the ve- tions on a manifold. If we select a single preferred con-
locity of a moving point. Hence, let us imagine a point that nection, in effect we introduce an additional structure on the
moves along a curve ( ) within a manifold, where is a map manifold.
R M. It is clear the velocity of the moving point is a vec-
tor (in the larger space) that is always tangent to the surface M. Remark: If an embedding of M into a Euclidean space is
Due to this picture, directed magnitudes defined within a given, there is a way to construct a geometrically preferred
manifold are sometimes called tangent vectors. (Most of the connection. One imagines that the manifold M and the tan-
time, we will call them simply vectors.) gent planes are solid surfaces, and that one can roll the first
At every point p M, there is a tangent plane to the sur- tangent plane without sliding and without turning around the
face. This tangent plane is a vector subspace of R3 . Consid- surface M along a certain path in M until the location of the
ered as a two-dimensional vector space, it is denoted Tp M other plane is reached. This mechanically motivated pro-
and is called the tangent space to the manifold M at point p cedure yields a one-to-one map from one tangent plane into
(see Fig. 1.3). If the manifold M is an (n 1)-dimensional hy- another one. The mapping between tangent planes depends

6
1.2 Basic notions: Manifolds and vector fields

on the path along which we roll them, but is unambigu- consisting of all the possible directional derivatives D at p0 .
ous for infinitesimally close points. (It is only the mapping be- This will be an intrinsic definition of the tangent space because
tween infinitesimally close tangent spaces that is necessary to this definition does not make use of any embedding of M into
determine a connection.) The resulting connection is called a larger space. This definition also does not rely on a choice
the Levi-Civita connection; it is defined in Sec. 1.6.6 with- of a local coordinate system because only the curve ( ) a
out referring to the mechanical construction just described. Of set of points in the manifold that is used to construct the
course, this connection depends on the embedding of M into directional derivative.
a Euclidean space.  It might be difficult to visualize the space of all possible
So far we considered tangent vectors using the embedding directional derivatives. In order to help the intuition and
picture, i.e. by imagining that the M manifold is a hyper- to describe this space of derivatives more explicitly, let us
surface embedded into a larger space. However, it is much temporarily introduce a local coordinate system {x } around
more useful to describe tangent vectors intrinsically, i.e. with- the point p0 . If the curve ( ) is represented in the coordi-
out such an embedding. We can try to define a tangent vec- nates {x } by functions ( ), we can express the directional
tor as the directed velocity d/d of some curve ( ) go- derivative D f of a function f as follows,
ing through p. However, we cannot directly differentiate ( ) " #
X f d X d
with respect to without an embedding because the expres- D f (p0 ) = = f.
sion ( + ) ( ) is undefined: we cannot subtract one
x p0 d =0
d =0 x p0
point from another. To circumvent this problem without intro-
ducing coordinates, we focus not on points but on functions of It is clear from this formula that the derivative operation D
points. (Generally, shifting the attention from an abstract set is represented by the differential operator

to functions defined on the set is a powerful idea that has proven X d
useful in many areas of mathematics.) D

d =0 x p0
Let f : M R be a scalar function defined on the mani-
fold M. (A scalar function means a function with a numeric acting on functions f , where /x is the partial derivative
value, as opposed to a matrix-valued or a vector-valued func- with respect to one coordinate x . Moreover, we see that all
tion.) Suppose that a curve ( ) goes through the point p0 at the possible directional derivatives D are described by the
= 0, i.e. that ( = 0) = p0 , and consider the directional set of coefficients {d /d } multiplying a fixed set of differ-
derivative of the function f (p) along the curve at the point ential operators {/x }. The coefficients d /d depend on
p0 . This directional derivative is defined as the derivative of the curve , while the operators /x do not. Therefore, it
the function f (( )) with respect to , is natural to regard /x as the basis in the vector space of
directional derivatives and d /d as the components of D
d in that basis.
D f (p0 ) f (( )).
d =0 In this way, we have expressed directional derivatives D
as differential operators in a local coordinate system. An ar-
This derivative is interpreted as the instantaneous rate of P
bitrary directional derivative can be written as u /x ,
change of the function f at point p0 , as one moves along the
where u are some coefficients. Now it is clear that all direc-
curve passing through p0 .
tional derivatives form a vector space spanned by the basis
n
 1 Mn
To make the discussion specific, consider the manifold =
{/x }. This vector space is the same as the tangent space
R . Since points of M are arrays of n coordinates x , ..., x ,
Tp0 M at p0 defined above using the embedding picture. This
a curve ( ) is specified by n functions j ( ). Then the direc-
1 n is so because the tangent space Tp0 M in the embedding pic-
tional derivative D f of a function f (x , ..., x ) at a point p0 is
ture consists of all the possible vectors tangent to M at p0 ,
expressed by the formula
which is the same as the set of velocity vectors of all possi-
Xn k
d f

ble curves ( ) at p0 , i.e. the set of all possible values of the
D f (p0 ) = k
. components d /d . So we may identify the space of tangent
d x {xj }=p0

k=1 vectors to curves at p0 with the space of directional deriva-
For our present purposes, it is important to view the direc- tives at p0 . These are two equivalent descriptions of the tan-
tional derivative D as a map from functions to numbers. The gent space Tp0 M. A tangent vector can be understood both as
directional derivative D depends not only on the direction of a direction within the manifold and, at the same time, as a
the curve , but also on the speed at which the points ( ) differential operator acting on scalar functions.
are traversed as changes. For instance, the value of D f will The natural basis {/x } in the tangent space is called the
be multiplied by 2 if we keep the shape of the curve ( ) un- coordinate basis corresponding to a chosen coordinate sys-

changed but replace the parameter by 2 . Also, we note that tem {x }.
D f depends on the behavior of the curve in an infinitesi- I will denote tangent vectors by boldface letters such as v
mal neighborhood of p0 but does not depend on the behavior rather than by D , since the curve in D plays only an aux-
of at other points; there are many curves that yield the iliary role. The action of a tangent vector v on a function f
same derivative operation D at the point p0 . Therefore, the will be denoted by v f rather than by D f . In local coordi-

operation D carries the information only about the magni- nates {x }, one can express the value of a directional deriva-
tude and the direction of the intantaneous velocity along tive (tangent vector) v at point p0 applied to a function f as

the curve at p0 . This is precisely the information we expect to X
f

be represented by a tangent vector at the point p0 . Thus we vf = v
, (1.1)

x p0
are motivated to view the map D itself as the tangent vector.
In other words, we are going to define the space of the tangent where v are the components of the vector v in the basis
vectors at p0 , i.e. the tangent space Tp0 M, as the vector space {/x }. Hence, it is convenient to visualize the vector v as

7
1 Calculus in curved space

the derivative operator Thus, v = (sin ) x + 2 (cos ) y . By inspection, we can


express v directly through x and y as
X
v v . 1

x v = yx + 2xy .
2
The derivative of a function f at a point p0 along a curve ( ) Using this formula, we can define v as a vector field every-
can be written as where (not only on the curve ). However, there are infinitely
many different vector fields v that coincide with on the
p0 f,
D f (p0 ) |
curve ; the vector field v shown above is just one such field
that was, by a coincidence, easy to write down in the present
p0 means the tangent vector to the curve
where the notation |
case.
( ) at point p0 .
Let us also compute the components of the vector field v in
Remark on notation: The symbol is used in general to a different coordinate system. For example, consider a new
denote the application of an operation to its argument(s). For local coordinate system {a, b} defined by
instance, a tangent vector v is by definition a derivative oper-
ation, which can be applied to a function f and yields another a = x + y, b = exy .
function, v f . In this notation, X Y reads X is applied
The inverse functions are
to Y . Below, I also occasionally use the symbol to denote
the application of a 1-form to a vector and of multilinear forms 1 1
x= (a + ln b) , y= (a ln b) .
to sets of vectors. For instance, a bilinear form B applied to 2 2
two vectors x and y yields a number denoted by B (x, y).
The components of v are found as
Tangent vectors to curves ( ) are denoted where this
does not cause confusion. We will not use the notation D 
1

1
any more. Coordinate basis vectors such as /x or /y are v a = v a = yx + 2xy (x + y) = y + 2x,
2 2
denoted x and y for brevity; in that case, the subscript x in    
x is not an index but a local coordinate, and we will make it 1 1
v b = v b = yx + 2xy exy = y 2x exy .
clear in the text. Attention will be called to cases when the tra- 2 2
ditional index notation and the Einstein summation conven-
tion, such as v , are used. These cases will not be common Substituting x, y through a, b, we can express the vector v in
in the text. Usually, summation over indices will be written the new coordinate system as
out explicitly.  
3 5
 
5 3

If a vector v is given, the components v in any local coor- v= a + ln b a + a ln b bb .
4 4 4 4
dinate system {x } can be found by substituting the n coordi-
nate functions f = x1 , f = x2 , ..., f = xn into Eq. (1.1). Thus, 
the components v can be found as
Practice problem: (a) Determine the components of the vec-
v = v x . tor v = xx + yy in the coordinate system {a, b}, where
a = x + y, b = 3xy.
This is a convenient way to compute the components of a (b) Given the vector u = x + xy , find a coordinate sys-
vector in a coordinate system, if the vector is already known tem {X, Y } (where X and Y are functions of x, y) such that
through another coordinate system or expressed in another u = X .
way. Answers: (a) v = aa + 2bb . (b) One suitable coordinate
system is X = x, Y = x2 2y, and there are infinitely many
Example: Consider a two-dimensional manifold with local others. 
coordinates {x, y}. Suppose that a curve ( ) is specified as In most calculations involving directional derivatives, we
will not actually need to introduce local coordinates {x } and
( ) = {x0 ( ), y0 ( )} = {cos , 2 sin }
the coordinate basis {/x }. Instead, we will use the follow-
ing properties of vectors, which follow immediately from the
in these coordinates. Let us compute the tangent vector to the
fact that vectors are directional derivatives at a point:
curve, v( ) (
). By definition, the tangent vector v acts on
functions f (x, y) as v (f + g) = v f + v g, [linearity] (1.2)
d v (f g) = (v f ) g + f v g, [Leibnitz rule] (1.3)
vf f (x0 ( ), y0 ( )).
d v f = 0 if f = const around p0 . (1.4)
The tangent vectors x and y are a basis in the tangent space For example, one can prove that the chain rule follows from
at any point. The components of v in the basis {x , y } are the properties (1.2)-(1.4) .
found by using the coordinates x and y instead of f in the
Statement 1.2.4.1: If f is an analytic function f : R R; g is
above equation. We find
a smooth function M R; and v is a derivation, then
d df (g)
vx = v x = x0 ( ) = sin , v (f (g)) = v g.
d dg
d
vy = v y = y0 ( ) = 2 cos .
d (Proof on page 167). 

8
1.2 Basic notions: Manifolds and vector fields

p 1.2.6 *Tangent space as space of derivations


In this section we will see that the properties (1.2)-(1.4) are a
v minimal necessary set of properties describing all the possible
p
directional derivatives. Therefore, one may forget that D was
defined using some curve ( ), drop the subscript , call any
map D from functions to numbers satisfying the above prop-
Figure 1.4: A short curve segment between points p and p can
erties a derivation at p0 , and define the tangent space Tp0 M as
be approximated by a tangent vector v.
the set of all possible derivations. This somewhat abstract def-
inition of Tp M is more elegant from the purely mathematical
1.2.5 Tangent vectors as short curve segments point of view but entirely equivalent to the earlier definitions,
because every derivation D satisfying the properties (1.2)-(1.4)
An intuitive picture of an ordinary vector is that of a piece of is a directional derivative D along some curve . In other
a straight line with an arrow at one end. In a curved space, words, the only way to differentiate a function is to take a
there are in general no naturally defined straight lines, but derivative along a curve in some direction.
a sufficiently short line segment can be considered almost To prove these statements, let us perform explicit calcula-
straight. So one can visualize a tangent vector v Tp M tions in a local coordinate system {x }. If D is a derivation
heuristically as a short line segment going from a point p to a satisfying Eqs. (1.2)-(1.4) and f (p) is a smooth function, then
neighbor point p . Let us now explore this heuristic picture, we can apply the chain rule (see Statement 1.2.4.1) and com-
since it provides an important intuitive link between the or- pute
dinary geometric notions in flat space and the corresponding X f
notions in the curved space calculus. D f (p) =
D x (p),

x
To build a correspondence between tangent vectors and
line segments, consider a curve ( ) that starts at p and goes where Dx is the application of the derivation D to the scalar
through p . Let us assume that p corresponds to = 0 , that is, coordinate function x (p). Hence,
p = (0 ), and likewise p = (0 + ), where is heuristically " #
a very small number. (In other words, we will take the limit X
X
0 at the end). Now we would like to define a tangent D f (p) = (D x ) f (p) u ,

x
x
vector v that corresponds to the short segment of the curve
between the points p and p . where u are coefficients defined by u D x . It is always
A tangent vector is, by definition, a derivative along a curve possible to find a curve ( ) with the derivatives d /d = u
(the derivative is applied to scalar functions). So it is natural at the point p. Thus any derivation D is equivalent to a tangent
to consider the derivative of an arbitrary function f along the vector in the previous sense.
curve at point p,
f ((0 + )) f ((0 )) 1.2.7 Vector fields and flows
f lim .
0
A vector field on a manifold M is obtained by choosing a tan-
While finding the exact value of f requires taking the limit gent vector v|p at each point p M. Most of the time, we will
0, we will compute f sufficiently precisely if we use only need a vector field defined in a neighborhood of some
the finite value = , provided that is sufficiently small. point, rather than on the entire manifold M.
Then we obtain the approximate expression Another way to visualize a vector field is to imagine that the
f f (p ) f (p). manifold is filled by nonintersecting curves going through all
its points (or at least through every point in some neighbor-
The error of this approximation is of order 2 . hood); see Fig. 1.5. Such a collection of curves is called a con-
Thus it is natural to identify the tangent vector as a rep- gruence of curves or a flow of a vector field. If a congruence
resentation of the short curve segment between the points p of curves is given, derivatives along curves at various points
and p (see Fig. 1.4). Let us temporarily denote this vector by p will determine tangent vectors v|p at all points and thus de-
v .
The vector v does not depend (up to terms of order fine a vector field v. The curves in the congruence are also
) on the choice of the parameter or the curve . called the flow lines of the vector field v. In our notation, a
Given two sufficiently close points p and p , how can we ) = v|( ) at
curve is the flow line of a vector field v iff (
determine the tangent vector v that represents the almost every point p ( ) along the curve.
straight line between p and p ? We can choose some curve Flow lines of a nonzero, smooth vector field always exist
( ) passing through these points, such that (0 ) = p and and are unique, due to a well-known theorem of calculus. To
(1 ) = p . We can compute the vector at the point p, and see this, consider a local coordinate system {x } whereP a vec-
then the formula for v is tor field v is specified by its components v as v = v e
(the components v can be found as v = v x ). A curve
v = (1 0 ) (
0 ) Tp M.
( ) is specified by n coordinate functions ( ). The con-
The tangent vector v represents the arrow between the ) = v|( ) translates into a system of differential
dition (
points p and p in the sense of the formula equations,
d ( )
f (p ) f (p) v f, (1.5) = v |p=( ) .
d
which is approximately valid for any smooth function f up to We may choose the initial condition as (0) = x (0) , where
second-order corrections. x
(0) are the local coordinates of an initial point of the curve.

9
1 Calculus in curved space

find charts that map these patches into R2n . In other words,
the tangent bundle T M is a smooth 2n-dimensional manifold.
v(p)
p Remark: It is important to realize that the tangent bundle,
as a whole manifold, is not necessarily equivalent (i.e. diffeo-
z morphic) to the direct product M Rn , even though every
small patch of T M is diffeomorphic to a patch of M Rn .
The reason is the same as that for the manifold M not be-
ing diffeomorphic to Rn , even though small patches of M are
x diffeomorphic to patches of Rn . Namely, the patches may be
y
glued together in a nontrivial way such that the global ge-
ometry of M is different from that of Rn . A detailed consid-
Figure 1.5: A vector field v tangent to a congruence of curves, eration would lead us too far into differential topology, so I
which are the flow lines of v. For each point p, the will give only a qualitative example. Suppose a diffeomor-
n
vector v(p) belongs to the tangent space Tp M at p. phism : M R T M exists, such that every vector space
n
p R , where p M, is mapped one-to-one onto the vector
space Tp M via an invertible linear transformation. Then for
Since the vector field v is smooth, such a system always has every p M one may apply to p {1, 0, 0, ..., 0} M Rn
a unique solution ( ), which represents the required curve and obtain a nonzero vector e1 (p) Tp M at every point p.
( ) in the local coordinate system. (Of course, this system However, such a vector field e1 sometimes cannot exist; a
of equations may be complicated and so it is not always pos- typical example is the 2-sphere S 2 . It is known that there ex-
sible to obtain an explicit formula for the solution ( ). In ists no everywhere nonvanishing and smooth tangent vector
any case, an approximate solution can be found numerically.) field on a 2-sphere. This property is called the impossibil-
Therefore, the flow lines of a nonzero, smooth vector field al- ity of combing a sphere (see Statement 6.3.2.1). So it is im-
ways form a congruence that (at least locally) fills space. possible to choose even a single nonzero vector field e1 that
smoothly varies throughout the sphere. Thus, a diffeomor-
Example: Given a vector field v, consider the flow lines of v phism : S 2 R2 T S 2 does not exist. A tangent bundle
going through each point of some neighborhood. Let be the which is diffeomorphic to M Rn is called trivial. It follows
parameter along each of the flow lines ( ), such that = v that the 2-sphere S 2 has a nontrivial tangent bundle. 
everywhere. This defines as a scalar function on the mani-
fold. We can compute the scalar function v . Along every
flow line, we have 1.2.9 Tensor fields
A covector is a linear map from vectors v into numbers. I
d
v = = 1, assume that you know this standard construction from linear
d algebra, and I will only summarize the concepts we will need.
thus v = 1 everywhere.  In the index notation, covectors are denoted by letters with
If {x } are local coordinates around apoint p0 , we can con- a subscript index, . In our notation, a covector applied

sider a congruence of curves of varying x0 and constant x1 , x2 , to a vector v gives a number v (read applied to v);
... The tangent vector field to this congruence is /x0 . This sometimes we also write (v) for the same thing. (In the index

congruence consists of curved coordinate axes of the coor- notation, this is v .)
dinate x0 . Similarly, the other coordinate basis vectors /x1 , Covectors themselves naturally form a vector space called
2
/x , etc., can be seen as tangent vectors to a congruence of the dual space. The dual space to the tangent space Tp M is
curves made by other coordinate axes. denoted by Tp M and consists of covectors acting on tangent
vectors at point p. The space Tp M is also called the cotangent
space at point p.
1.2.8 *Tangent bundle A covector field, also called a 1-form, is a choice of a cov-
ector |p from the dual space (cotangent space) Tp M at each
The set of all the tangent spaces (one tangent space Tp M for point p. A 1-form can be seen as a linear map from vector
each point p M) is denoted T M and called the tangent bun- fields to functions on M, by acting pointwise on a vector field
dle of the manifold M. The tangent bundle is itself a manifold v and producing a scalar function f ,
with dimension 2n if M has dimension n. This manifold is
perhaps not easy to visualize as a whole, but we will not need : v 7 f ; |p v|p f (p).
to do anything complicated with the tangent bundle T M as
a manifold. Let us only verify that T M has the structure of a An example of a 1-form is the mapping of a vector v into
smooth manifold. the derivative of a fixed function f in the direction v, that is,
By assumption, M is a smooth manifold and, as such, can the map v 7 v f for a fixed f . Clearly, this is a linear map
be covered by a set of charts. Consider one such chart that and hence, by definition, it is a 1-form. This 1-form is called
n
maps a subset U of M into a subset V of R . Tangent vectors at the gradient of the function f and is denoted df . Thus, for
points p U are mapped into tangent vectors in V, which are a fixed function f we define the 1-form df by its action on a
n
simply vectors in R since V is a portion of the flat Euclidean vector field v as follows,
space Rn . Therefore, the part of T M consisting of tangent (df ) v v f.
planes to points p U is one-to-one mapped into V Rn .
Since the set V Rn is a subset of Rn Rn = R2n , we have Let us recall that tangent vectors can be understood as
established that T M can be covered by patches, and one can an approximate representation of short curve segments (see

10
1.2 Basic notions: Manifolds and vector fields

Sec. 1.2.5). If a vector v represents a short curve segment be- involves second-order derivatives of f . For this reason, the
tween points p and p then the 1-form df acting on v yields operation [u, v] f is equivalent to some vector field acting
approximately f (p ) f (p), according to Eq. (1.5). on f . This vector field is denoted by [u, v] and is called the
commutator of the fields u and v.
Example: Consider a two-dimensional manifold with a lo-
Statement 1.2.10.1: (a) It follows from the definition (1.6)
cal coordinate system {x, y}. The gradient of the coordinate
and Eqs. (1.2)-(1.3) that the commutator [u, v] satisfies the
function x is the 1-form dx. By definition, this 1-form acts on
defining properties (1.2)(1.4) of a derivation. Hence, [u, v]
vector fields u as (dx) u = u x. For instance, if u = x then
is itself a vector field. (b) The components c of the commu-
obviously u x = 1. In this notation, we obtain the equation
tator [u, v] in terms of the components u and v in a local
(dx) u = (dx) x = 1, coordinate system {x } with a coordinate basis {e },
X X X
which may look unusual or even confusing at first glance. u= u e , v = v e , [u, v] c e ,
However, it is only the notation that is unusual; the results
are always consistent.
Consider another vector field v = y 2 x and a function are given by the expression
f (x, y) = x2 y 3 . The field v acts on f and produces the func-
v u
tion ([u, v]) c = u v . (1.7)
x x
v f = y 2 f (x, y) = 2xy 5 .
x
(c) If e /x are the coordinate basis vector fields defined
The 1-form df can be computed using standard rules of cal- through a local coordinate system {x } then [e , e ] = 0.

culus as Proof of Statement 1.2.10.1: (a) The properties (1.2) and
df = 2xy 3 dx + 3x2 y 2 dy. (1.4) are obvious; the nontrivial one is the Leibnitz rule (1.3).
However, note that now df , dx, and dy are well-defined ob- To verify the Leibnitz rule, we compute [u, v] (f g), where f
jects (1-forms) rather than heuristic infinitesimals. and g are smooth functions:
The action of the 1-form df on the vector v is, by definition,
[u, v] (f g) = u (gv f + f v g) v (gu f + f u g)
(df ) v v f = 2xy 5 . = gu (v f ) + f u (v g) gv (u f ) f v (u g)
Let us see how this result can be obtained using the explicit + (u g) (v f ) + (u f ) (v g)
expressions for df and v. We have (v g) (u f ) (v f ) (u g)
 = ([u, v] f ) g + f [u, v] g.
(df ) v = 2xy 3 dx + 3x2 y 2 dy y 2 x
 
= 2xy 5 dx x + 3x2 y 4 dy x . (b) A straightforward calculation using the representation
of the basis fields e as differential operators yields
By definition, dx x = 1 and dx y = 0, therefore (df ) v =
2xy 5 as before.  X   X  

[u, v] = u v v u
Practice problem: (a) Given the vector v = x + xy , find all x x x x
, ,
1-forms = a(x, y)dx + b(x, y)dy such that v = 0. X X 
v


(b) Given the 1-form = xdy + ydx, find all vectors v = = u v + u
x x x x
a(x, y)x + b(x, y)y such that v = 0.  , ,
Similarly to vector fields, one can define tensor fields of ar- X  u 

X
bitrary rank. For example, if a linear transformation L|p is v u v
x x x x
defined in each tangent space Tp M, such that L|p v|p is an- , ,

other vector from Tp M, then we have a tensor field L on the X  v



u
X

= u v c .
manifold M. Also, one defines n-forms as totally antisym- x x x x
,
metric tensor fields of rank (0,n). (The calculus of n-forms is
explained in more detail in Sec. 1.4.) The operations of ten- This shows that [u, v] is indeed a first-order differential oper-
sor product L1 L2 , the exterior product 1 2 of n-forms, ator, and also gives an explicit formula for c .
and tensor contraction (for example, the contraction v of (c) For a smooth function f (x1 , ..., xn ), we have
a 1-form and a vector) are naturally defined on tensor fields.
These algebraic operations are performed on tensors in each
f= f
tangent space Tp M separately. x x x x
according to a well-known theorem of calculus. Therefore the
1.2.10 Commutator of vector fields basis vector fields e commute with each other. 
Since a vector field v can be viewed as a (first-order) differen- Remark: Statement 1.2.10.1 presents two ways of proving
tial operator acting on functions as v : f 7 v f , one may the fact that [a, b] is a first-order differential operation, in
consider the commutator of two such operators, parts (a) and (b). The first proof (a) is performed purely al-
[u, v] f u (v f ) v (u f ) . (1.6) gebraically, using the abstract definition of tangent vectors as
(unspecified) operations v (...) with certain properties. The
It turns out (see Statement 1.2.10.1) that the result is again a other proof (b) uses a specific P representation of tangent vec-
derivation, i.e. a first-order differential operator acting on func- tors as differential operators v /x in a local coordinate
tions, even though it may appear at first glance that [u, v] f system {x }.

11
1 Calculus in curved space

Both the proofs have certain advantages and disadvan-


tages. The proof (a) is very general because it uses only some
properties of v (...). So this proof applies not only to di-
rectional derivatives but to any operation with the proper-
ties (1.2)-(1.3), such as the Lie derivative and the covariant p p
derivative that we will use later. The proof (a) applies equally
well to derivatives and to finite difference operators, to opera- p2
tors in infinite-dimensional spaces, and so on. The proof (a) is
also more elegant because it is index-free, does not involve
unnecessary structures (such as a local coordinate system and
a basis), and does not depend on the explicit representation of
vectors in coordinates. This is the kind of proof a mathemati-
cian would be looking for. p1
On the other hand, the proof (b) is conceptually easier to
understand (especially for beginners) because it consists of
b
a specific computation in a familiar and elementary context,
namely, manipulations with partial derivatives of functions. p0 a
Also, the result of the proof (b) is a directly usable formula for
the components of the vector [u, v], while the proof (a) merely
shows that [u, v] is a well-defined vector. In the text that fol- Figure 1.6: The commutator [a, b] of two vector fields mea-
lows, I will usually not need explicit formulas for components sures the discrepancy between the points p and p
of vectors, and therefore index-free calculations will be prefer- obtained by following the flow lines of the vec-
able. However, in every case one can translate the final results tor fields in different order, starting at a point p0 ,
into components in a local coordinate system.  for small intervals of the parameter. The blue
It is important to understand the geometric interpretation and the black arrows are the vector fields a and b,
of the commutator. Imagine drawing the flow lines of two while the dotted lines are the flow lines of these
vector fields a and b (Fig. 1.6). Let us start from a point p0 , vector fields. The point p is obtained by following
follow the flow line of a for a small interval of the param- the line p 0 p 1 and then the line p1 p, while the

eter , and then follow the flow line of b for another inter- point p is obtained by following the line p0 p2

val ; we will arrive at a point p. If we again start from p0 and then p 2 p . The small green vector between

but first follow the lines of b and then the lines of a for the the points p and p is equal to [a, b] 2 in the limit
same parameter distances , we will arrive, in general, at a of small distances.

different point p . In the limit 0, the points p and p
will become infinitesimally close to each other and to p0 . As
shown in Statement 1.2.10.2, the line between p and p spec-
ifies a well-defined vector in the limit 0, namely the
vector [a, b] 2 .
Statement 1.2.10.2: Consider an arbitrary smooth function
f , vector fields a, b, and the points p0 , p, p in the notation of
Fig. 1.6. The commutator [a, b] describes the difference be-
tween f (p) and f (p ) in the following way,

f (p) f (p ) v
lim = ([a, b] f )|p0 . =3
0 2
(Proof on page 167.) 

Practice problem: Assuming a local coordinate system =2


{x, y, z}, determine the commutators [a, b], [a, c], and [b, c],
where a = xx + yy + zz , b = xy yx , c = z .
Answers: [a, b] = 0, [a, c] = z , [b, c] = 0.  =1

1.2.11 Connecting vectors c =0


If two vector fields a and b are such that [a, b] = 0, the vector
field a is called a connecting vector for the field b. (Since Figure 1.7: Connecting vectors c (thick arrows) point in the di-
[a, b] = [b, a], the field b is then also a connecting vector rection of equal parameter along a congruence of
for the field a. One also says that the vector fields a and b flow lines of v (thin lines). Dotted lines are the flow
commute with eath other.) The notion of a connecting vector lines of c, which are also the lines of equal .
turns out to be very useful in index-free calculations because
of the following geometric interpretation.
Consider a vector field v and a congruence of its flow lines
(see Fig. 1.7). Suppose that a subset of the flow lines is labeled
by an additional real-valued parameter s, so that ( ; s) is a

12
1.3 Lie derivative

smooth function R R M and the tangent vector v is ex- i.e. all c = a(x, y)x + b(x, y)y such that [c, u] = 0 every-
pressed as v = / . Using the function ( ; s), we may where.
also consider the curves (s) (0 ; s) for a fixed value of (b) In a local coordinate system {x, y, z}, find an explicit ba-
= 0 . The curves (s) go straight across the curves ( ), sis of connecting vector fields cj , j = 1, 2, 3, for the vector
intersecting the latter at points of equal . We can then de- v = xy + yz .
fine the tangent vector field c to the curves in the usual way, Answers: (a) c = f (x)u + g(x) (xx + yy ), where f and g
c = /s. The geometric interpretation of the vector c is are arbitrary functions of x. (b) One such basis is c1 = xy +
that it points sideways with respect to the curves ( ), con- yz , c2 = z , c3 = x2 x + xyy + y 2 xz z . 
necting a point p on a curve ( ; s) to a nearby point p on
a neighbor curve, ( ; s ), where s and s are infinitesimally
close while stays constant. In other words, the connecting 1.3 Lie derivative
vector c connects neighboring flow lines ( ) of the field v
at corresponding points, i.e. at points with the same value A vector field v acts as a directional derivative on scalar func-
of the parameter . Thus, the flow lines of c are also lines of tions. The Lie derivative Lv with respect to a vector field v
constant . Now it is easy to check that c is a connecting vector is an important differential operation that can be applied not
for v according to the above definition. only to scalars but also to all tensor fields. We now deduce the
Calculation 1.2.11.1: For the vector fields v and c defined formulas for Lv in an axiomatic way, i.e. by assuming some
in the preceding paragraph through the function ( ; s), we desirable properties of this operation. Later we will give a
have [c, v] = 0. (Details on page 167.)  geometric interpretation of Lv .
Since the parameter for different flow lines of a given vec- We would like Lv to act as the usual directional derivative
tor field v can be started with different values at different on scalar functions f ,
points, there are infinitely many connecting vector fields for
a given field v. The freedom of selecting a connecting field for Lv f = v f. (1.8)
v is the same as the freedom of choosing the initial values of
the parameter on each flow line of v. Additionally, Lv should have the following properties, fully
If a vector c is fixed at one point p0 , say c(p0 ) = c0 , one can analogous to Eqs. (1.2)-(1.3), but applicable to arbitrary ten-
always select c(p) in the neighborhood of p0 in such a way sors A, B:
that the field c commutes with v everywhere. To find such
c(p), one needs to solve the equation [c, v] = 0. It is easy to Lv (A + B) = Lv (A) + Lv (B), (1.9)
see from the coordinate representation Lv (A B) = Lv (A) B + A Lv (B), (1.10)
! Lv (A B) = Lv (A) B + A Lv (B), (1.11)
X X v X c
[c, v] = e c v where A B is the tensor product and A B here denotes

x
x

any pairing of the tensors A and B (e.g. v and also v f
that the equation [c, v] = 0 is a linear differential equation for are pairings in this sense). As we will see, these properties
uniquely define the action of Lv on any tensor.
the unknown vector field c that involves only the derivative
of c along the flow lines of v. Therefore, it is always possible
to solve that equation with any given initial condition c0 . 1.3.1 Commutator as Lie derivative
Remark: Since [v, v] = 0, every field v can be formally con- For instance, if f is a scalar and u, v are vectors, we expect to
sidered as a connecting field for itself, although there seems have
to be no useful geometric interpretation of that statement.  Lv (u f ) = [Lv (u)] f + u Lv (f ).
Vector fields x defined through a coordinate system {x }
always commute. In general, any vector field v can be con- Using Eq. (1.8), this is rewritten as
sidered as a coordinate derivative x in some local coordinate
system that includes x as one of the coordinates. To show this, v (u f ) = [Lv (u)] f + u (v f ).
it is sufficient to demonstrate that there exists a basis of con-
necting fields that includes v; then the coordinate system can Thus,
be constructed by using the flow lines of these vector fields.
[Lv (u)] f = v (u f ) u (v f ) = [v, u] f.
Statement 1.2.11.2: For any vector field v 6= 0 given on
an N -dimensional manifold M, there exist connecting fields In other words, the Lie derivative Lv of a vector field u coin-
c1 , c2 , ..., cN 1 such that {v, c1 , c2 , ...cN 1 } is a basis at any cides with the commutator of v and u,
point p in a neighborhood of an initial point p0 . (Proof on
page 167.)  Lv u = [v, u] = Lu v.
The geometric interpretation of a connecting basis can be
understood from Fig. 1.6. If every pair of basis vector fields In the index notation, we have, according to Eq. (1.7),
ea , eb are connecting, there is no discrepancy between points
obtained by following the coordinate lines in different order.
X u v
(Lv u) = v u . (1.12)
The absence of this discrepancy is necessary for the existence x x
of a coordinate system {x } such that ea = /xa .
It is important to realize that the Lie derivative Lv u de-
Practice problem: (a) In a local coordinate system {x, y}, de- pends on the derivatives of v (not only on the value of v and
termine all connecting vector fields c for the vector u = xy , the derivatives of u).

13
1 Calculus in curved space

Calculation 1.3.1.1: The Lie derivative Lv u involves deriva- Proof: (a) The coordinate basis vectors commute since
tives of v. In particular, if u, v are vector fields and is a scalar
function then
L ( ) f = f f =0
x x x x
Lv u = Lv u (u ) v.
for a smooth function f .
To derive this expression, we use the antisymmetry of the The Lie derivative of a 1-form is defined through the action
commutator and the Leibnitz rule for Lv : of the 1-form on an arbitrary vector field v,
 
Lv u = [v, u] = [u, v] = Lu (v) L dx v = L (dx v) dx L v .
= Lu () v Lu (v)
By definition of the 1-form dx we have
= (u ) v + Lv u.  
dx v v x , dx L v L v x ,

where x is understood as a scalar function representing the
1.3.2 Lie derivative of tensors -th coordinate of a point. Hence

Let us continue to investigate the properties of the operation L dx v = (v x ) [ , v] x
Lv . Consider a 1-form and a vector field u. According to = v ( x ) = v = 0.
Eq. (1.11), we have
(b) It is sufficient to show the proof on the given example
Lv ( u) = Lv () u + Lv (u). involving a tensor T of rank (2, 1). Using the Leibnitz rule,
On the other hand, u is a scalar function, so Eq. (1.8) gives we compute
X
Lv ( u) = v ( u). 0 = L1 T = (L1 T ) dx dx
,,
Hence, the Lie derivative of a 1-form with respect to v is the X  
1-form Lv that acts on an arbitrary vector u as = T dx dx ,

x1
,,
(Lv ) u = v ( u) [v, u] .
since L1 = 0 and L1 dx = 0 for every , . Since the
In the index notation,
set of basic tensors of the form dx dx is linearly in-
[Lv ()] = v + v . dependent, it follows that every derivative 1 T vanishes.

The action of the Lie derivative on an arbitrary tensor can
be derived similarly, using the properties (1.8)(1.11). For
example, suppose A is a bilinear form with vector values, 1.3.3 Geometric interpretation
i.e. A(u, v) is a vector. (The index notation for A would be To illustrate the geometric significance of the Lie derivative,
A .) Then the Lie derivative Lw A is defined as we first note that the Lie derivative of a connecting vector
[Lw A] (u, v) = [w, A(u, v)] A([w, u] , v) A(u, [w, v]). field is zero. A connecting vector, such as c in Fig. 1.7, is trans-
ported from the initial point by the flow of the field v. We can
In the index notation, imagine that the entire neighborhood of points around the ini-
tial point p is transported along the flow lines of v to another
(Lv A) = v A + A v + A v A v . nearby point p ; this neighborhood is deformed in a certain
The Lie derivative with respect to a coordinate basis vec- way by the transport. The character of this deformation is
tor of a given coordinate system has special properties with determined by the vector field v in the
neighborhood of the
respect to that coordinate system. initial point p and along the way to p . Tangent vectors at p
can be visualized as directions towards points near p. So all
Statement 1.3.2: Let {x } be local coordinates in a manifold, tangent vectors a, b, ... Tp M are also transported to partic-
and consider the standard coordinate bases {/x } { } ular tangent vectors a , b , ... Tp M (see Fig. 1.8). Thus the
and {dx } in the tangent and the cotangent spaces respec- flow of v determines a map between tangent spaces Tp M and
tively. Tp M, as long as the points p and p lie on a single flow line of
(a) The Lie derivatives with respect to a coordinate basis v.
vector of another basis vector or of a basis 1-form are zero, We have already seen (Sec. 1.2.11) that a vector field u ob-
tained by transporting an initial vector u|p along the flow of v
L ( ) = 0, L (dx ) = 0. satisfies Lv u = 0. (The transport from an initial point defines
(b) Let us represent an arbitrary tensor of rank (m, n) in the the vector field u along a single flow line of v, so we need to
coordinate basis by an array of components. For example, a use many flow lines and many initial values to define the vec-
tensor T of rank (1, 2) is represented by an array T as tor field u everywhere.) In general, for two vector fields u and
v we will have Lv u 6= 0. So we might say (heuristically) that
X the Lie derivative Lv u measures the extent to which a vector
T = T dx dx .
x field u fails to be a connecting field for v.
,,
To make these considerations somewhat more precise, let
If it is known that L/x1 T = 0, it means that the component us temporarily denote by : Tp M Tp M the map that
functions T are independent of the coordinate x1 . transports tangent vectors along the flow lines of v from point

14
1.4 Calculus of differential forms

a tensor A is also well-defined. Then the Lie derivative Lv A is


p b defined by a formula analogous to Eq. (1.13).
It follows from expressions derived above that the Lie
derivative Lv A|p of a tensor A at a point p contains not only
derivatives of A but also derivatives of v. So the tensor Lv A|p
depends not only on the value v|p of the vector field v at
the point p but also on the values of v in an entire neighbor-
hood of p. Thus the Lie derivative Lv A|p is not a true direc-
tional derivative of A in the direction of v at the point p. A
v true directional derivative along a curve ( ), i.e. a deriva-
tive that depends only on the value of the vector v(p)
at the point p, must be independent of the derivatives of v at
p. Such a true directional derivative cannot be defined with-
out introducing some additional structures on the manifold
b M. This is so because the tensors A(p) and A(p ) at neigh-
a p boring points belong to different tensor spaces, and thus there
is no definition for a quantity such as A(p ) A(p). Similarly,
there is no possibility to integrate vectors or tensors over a do-
Figure 1.8: A neighborhood of points around p is transported main. We might say, in figurative language, that the tangent
along the flow lines of a vector field v to a neigh- spaces are disconnected and a directional derivative cannot
borhood around p . Initial tangent vectors a and be computed without a connection between these spaces.
b from Tp M are visualized as directions between (The structure called the connection will be introduced in
nearby points. These vectors are transported into Sec. 1.6 below.) When we evaluate the Lie derivative Lv , the
vectors a and b from Tp M. information about how to connect the tangent spaces comes
from the flow of the vector field v. This allows us to relate
directions around p to directions around p , i.e. to connect
p to point p , where is the parameter distance between p and the tangent spaces T M and T M. However, this procedure
p p
p . (In other words, we introduce a curve parameterized by uses information about the vector field v in the entire neigh-
such that (0) = p and ( ) = p , while = v everywhere borhood of p, not just at one point p.
along .) In this notation, the vectors in Fig. 1.8 satisfy

a = (a) , b = (b) .
1.4 Calculus of differential forms
Now we would like to compute the Lie derivative Lv u, where
u is some vector field. Let us compute Lv u at point p; let us The calculus of differential forms is a basic tool of differen-
assume that is very small and so v is a vector pointing from tial geometry. Differential forms are not widely used in stan-
p to p . We might try to write the derivative of u along v at dard texts on GR because they do not provide a decisive com-
point p as putational advantage as long as one remains within the stan-
u|p u|p dard introductory material. A substantial computational ad-
lim , ??? vantage of differential forms is first seen when considering
0
more advanced material, such as the Frobenius theorem or
but this expression is meaningless since we cannot subtract the tetrad formulation of GR. However, the calculus of forms
vectors u at different points: these vectors belong to different is relatively simple, and in my view the learning effort in-
tangent spaces. However, we can transport u|p Tp M back volved is amply justified by the conceptual advantages and
to the tangent space Tp M using the inverse map 1 . So we wide-ranging applicability of differential forms in theoretical
write the derivative of u along v at point p as physics and mathematics. In the following subsections I ex-
  plain the motivation for introducing differential forms. Then
1
u|p u|p (Sec. 1.4.3 and below) I give an overview of the basic proper-
Lv u = lim . (1.13) ties of differential forms, mostly without proof. Since the cal-
0
culus of forms is standard material and since it will be used in
The geometric interpretation of this expression is clear: Lv u this book only as a computational tool, this brief introduction
is the extent to which the field u fails to be transported along will suffice.5
v.4
Similar considerations apply to the Lie derivative of a ten-
sor A with respect to a vector field v. Any tensor A can be 1.4.1 Volume as antisymmetric tensor
defined as a multilinear function of tangent vectors and cov-
The familiar notion of volume can be given a natural interpre-
ectors, while vectors and covectors are defined through the
tation in terms of antisymmetric multilinear functions of vec-
points in an infinitesimal neighborhood of p. Thus, any ten-
tors, i.e. antisymmetric tensors. This interpretation is a great
sor A is ultimately an object defined through combinations of
help in calculations because antisymmetric tensors have rich
points near p. When these points are transported by the flow
properties.
of a vector field v, the corresponding transport (A) of the
We begin with the two-dimensional Euclidean plane R2 . In
4 Although in this section I do not actually prove that Eq. (1.13) is equivalent two dimensions, the area plays the role of volume, so let us
to the previous definition of Lv u, a proof can be straightforwardly filled
in or found in differential geometry textbooks, albeit perhaps in a more 5 The calculus of differential forms is explained, for example, in chapter 4
formal or abstract language. of [33] or in chapter 7 of [1].

15
1 Calculus in curved space

consider the area of a parallelogram spanned by two vectors u + v


u, v. According to standard formula of elementary geometry,
the area of the parallelogram is u
A = |u| |v| sin ,
v
where is the angle between the vectors, which is a number
between 0 and determined by
uv
cos . Figure 1.9: The area of the parallelogram spanned by a pair
|u| |v| of vectors {u, v} is equal to the area spanned by
{u + v, v}.
Here |u| u u and u v is the standard Euclidean scalar
product. Thus the area of a parallelogram spanned by u and
v is a nonnegative number defined by Now it is clear that A(u, v) is the absolute value of a 2-form A
q defined by
2 2 2
A(u, v) |u| |v| (u v) .
u1 v1
This number appears to be a complicated nonlinear function
A(u, v) = u1 v2 u2 v1 = .
u2 v2
of the vectors u and v. It is more convenient to work with
the oriented area A, which is defined as A = +A if the pair
The sign of A is chosen such that the area of the unit square
{u, v} has positive orientation and A = A otherwise. (We spanned by the unit basis vectors {1, 0} and {0, 1} (in this or-
can fix the positive orientation to be that of the coordinate
der) is equal to 1. We note that the oriented area A is more
axes {x, y} in the plane, taken in this order.) The decisive
closely related to linear algebra than the unoriented area A.
advantage of the oriented area A over the conventional (un- A similar argument (with a suitable generalization of
signed) area A is that A(u, v) turns out to be an antisymmetric
Fig. 1.9 to three dimensions) shows that the oriented 3-volume
bilinear function of u and v. In mathematics, a number-valued V (a, b, c) of a parallelepiped spanned by three vectors a, b, c
bilinear function of two vectors is usually called a bilinear in a three-dimensional Euclidean space is a totally antisym-
form, while an antisymmetric bilinear form is called a 2-form. metric trilinear form (3-form). The same conclusion holds in
The oriented area A(u, v) is by definition antisymmetric in
any dimensions. The n-dimensional oriented volume of a par-
(u, v), since the pair {v, u} has opposite orientation to {u, v}. allelepiped spanned by vectors a , ..., a in Euclidean space is
1 n
v) is an bilinear form can be demonstrated
The fact that A(u, given by
quite straightforwardly. When one of the vectors u or v is 1
a1 a1n

multiplied by a positive number, the area of a parallelogram
(a1 , ..., an ) = ..
V ..
.
.. ,
is multiplied by the same number. If one of the vectors is . .
multiplied by a negative number, the orientation of the pair a1 ann
n

{u, v} is reversed and thus A(u, v) changes sign. We also have


i
v) = 0. So it is easy to see that A(u,
A(0, v) = A(u, v), where aj is the i-th component of the vector aj .
where is an arbitrary real number (positive, zero, or nega- Remark: determinants. The last formula suggests a close
v) = A(v,
tive). Since A(u, u), the same property holds for
connection between antisymmetric forms and the calculus of
the argument v. Further, one can show by an elementary geo- determinants. Indeed, such a close connection exists. For
metric construction (see Fig. 1.9) that the area does not change instance, one can show that the oriented volume of an n-
when a multiple of v is added to u, namely dimensional parallelepiped transformed by a linear map T
+ v, v) = A(u, v). will change by a numerical factor, which depends on T but
A(u does not depend on the parallelepiped. This numerical fac-
In two dimensions, two vectors {u, v} are a basis if A(u, v) 6= tor is equal to the determinant of T . In Sec. 1.4.5, we use this
0. It follows that A is linear in each argument: we can compute property as a convenient definition of the determinant of a lin-
1 +u2 , v) by decomposing u1 = 1 u+1 v, u2 = 2 u+2 v, ear transformation T .
A(u

and then it is easy to see that A somewhat more complicated concept is the area of a
parallelogram spanned by two vectors {u, v} in a higher-
1 + u2 , v) = A(u
A(u 1 , v) + A(u
2 , v). dimensional Euclidean space, for example, in R3 . The ordi-
nary (unoriented) area A(u, v) is a nonlinear function of u, v
The same property holds for the argument v. Thus, A(u, v) given by Eq. (1.14). In three (or more) dimensions, one can-
is an antisymmetric bilinear form that we may call the area not define an oriented area A(u, v) linear in u, v. Instead,
2-form. it turns out that one can define a vector-valued antisymmetric,
To make the above discussion less abstract, let us express all bilinear function A(u, v) whose absolute value is equal to the
quantities in Cartesian coordinates {x, y}. The vectors u and area A(u, v). This function is the familiar vector product of
v have components {u1 , u2 } and {v1 , v2 }; the unoriented area the vectors u and v,
is q
A(u, v) = (u21 + u22 ) (v12 + v22 ) (u1 v1 + u2 v2 )2 . A(u, v) = u v.
After some algebraic simplification, this becomes Indeed, it is well known that
q
A(u, v) = (u1 v2 u2 v1 )2 = |u1 v2 u2 v1 | . (1.14) |u v| = |u| |v| sin = A(u, v).

16
1.4 Calculus of differential forms

Thus, one could say that in three dimensions the area has an is interpreted as the 1-form evaluated on the tangent vec-
orientation vector. Thus, it is natural to regard the fundamen- . This expression represents a small amount of work
tor xd
tal area as vector-valued; the ordinary area is then computed done along a short curve segment. So the line integral (1.15)
as the absolute value of the area vector. is rewritten as
Z 2 Z
Remark: The same idea can be generalized to n-dimensional W = ( x)
d ,
parallelepipeds embedded in N -dimensional space (where 1
N > n). This time, it turns out that the orientation of the
n-dimensional volume is not a vector but an antisymmetric where denotes the path along which the particle is moving
tensor; so the oriented n-dimensional volume is naturally in space (regardless of the choice of the parameter ).
tensor-valued. The ordinary volume is a suitable absolute
Remark: Note that we do not write any ds after the inte-
value of that tensor. More explanations can be found in
gral sign in the last expression; all the ds are contained in
Sec. 1.4.4. One may even change the point of view and regard
the 1-form . We write the roman d in 1-forms to empha-
the oriented volume as a basic quantity which is a certain
size the fact that dx is a separate, rigorously defined object,
totally antisymmetric tensor and the ordinary volume as a
while dx stands for a heuristic infinitesimal change of x.
quantity defined as the absolute value of the oriented volume.


In general, one defines the integral of a 1-form over a
curve as follows. One splits the curve into a large num-
1.4.2 Motivation for differential forms ber N of short segments. Each segment is almost straight and
thus can be approximately represented by a tangent vector
In standard courses of multivariate calculus, one studies inte-
(see Sec. 1.2.5). Thus we represent the N segments by N tan-
grals such as
gent vectors v1 , ..., vN . Each segment contributes to the inte-
Z Z Z +
2 2 2
ex y z dxdydz = 3/2 . Rgral a small amount computed as (vj ) vj . The integral
is then defined as the limit of the sum of vj ,

Expressions such as dxdydz occurring under the sign of Z N


X
multiple integration are usually explained as a purely sym- = lim vj .
N
bolic notation. It is perhaps not obvious to a student of calcu- j=1
lus that dxdydz above can be interpreted, quite rigorously, R
as an antisymmetric product of dx, dy, and dz, rather The line integral can be reduced to an ordinary inte-
than an ordinary, symmetric product. In fact, the antisymmet- gral by choosing a parameterization ( ) for the curve and
ric interpretation of dxdydz is necessary if one wishes to ob- writing Z Z 2
tain a consistent calculus where objects such as dx or dxdy
= ( )
d. (1.16)
are well-defined and transform correctly under a change of 1
variables. Differential forms can be understood as the rigor-
But the result does not depend on the choice of the parame-
ous mathematical objects that make precise sense out of ex- R
terization. A line integral is defined purely through split-
pressions such as dxdydz. In this section we explore these
motivational ideas. ting of the curve into small segments, and so is a geomet-
Let us begin by considering an example familiar from me- rically defined quantity independent of the coordinate repre-
chanics. Suppose a particle moves in three-dimensional space sentation of .
along a curve x( ) {X( ), Y ( ), Z( )} and experiences a The calculus of 1-forms is compatible with the change of
space-dependent external force F {F1 , F2 , F3 }. The work variables under the integral. For instance, consider
R a one-
done by the force can be computed as the line integral, dimensional space and an ordinary integral f (x)dx. After
Z a change of variable x = X(), the integral becomes
W = (F1 dx + F2 dy + F3 dz) . (1.15) Z Z
dX
x( ) f (x)dx = f (X()) d.
d
This expression is computed as the ordinary integral
Z 2   This corresponds precisely to the chain rule for the differential
W = F1 X + F2 Y + F3 Z d. operator d,
1
f (x)dx = f (x)dX() = f (x)X ()d.
However, this form of the expression requires a parameteri-
zation x( ), while the work W is actually independent of the So it is consistent to interpret the expression f (x)dx within an
choice of the parameter along the curve: it depends only on ordinary integral as a 1-form f (x)dx.
the force F and on the layout of the given curve in space. So Let us now consider two-dimensional integrals (surface in-
it is more natural to work with the line integral (1.15). We can tegrals). An integral over a surface with local coordinates
interpret the integrand in Eq. (1.15) directly as a 1-form {x, y} is computed as a repeated integral,
F1 dx + F2 dy + F3 dz. ZZ Z x2 Z y2
F (x, y)dxdy = dx dy F (x, y). (1.17)

Note that the tangent vector xd is an approximate represen- x1 y1
tation of a short curve segment (see Sec. 1.2.5) between the
points x( ) and x( + d ). The expression Now we would like to interpret the expression F (x, y)dxdy as
  an analog and a generalization of the 1-form f (x)dx seen in
F1 X + F2 Y + F3 Z d = xd
the previous example.

17
1 Calculus in curved space

As in the previous example, we can divide the surface of in- Y (, ). Calculations are straightforward because the only
tegration into a large number N of small rectangles with sides new rule is the antisymmetry of the exterior product. We find
x and y. The sides of each rectangle are apprxoimately rep-
resented by a pair of tangent vectors {uj , vj }, j = 1, ..., N . A dx dy = dX(, ) dY (, )
   
rectangle number j located at the point {xj , yj } gives a small X X Y Y
contribution F (xj , yj )xy to the integral. It is clear that the = d + d d + d

number F (xj , yj )xy can be understood as the evaluation of 
X Y X Y

a bilinear form acting on tangent vectors {uj , vj }, i.e. = d d, (1.18)

F (xj , yj )xy (uj , vj ) . where we used the antisymmetry properties
The bilinear form is called a 2-form since it has two vector d d = d d = 0, d d + d d = 0,
arguments. Then the surface integral is defined as the limit
to get the last line. Now we may recognize Eq. (1.18) as
ZZ N
X the standard formula for the change of variables, involving
F dxdy = lim (uj , vj ) . the two-dimensional Jacobian of the transformation {x, y}
N
j=1 {, }. In this way, we can see that the unusual rule of the
exterior product is a natural consequence of the known prop-
The 2-form can be temporarily written as = F dx dy. Let erties of multiple integration.6
us now explore its properties under the change of variables in The geometric formulation of the surface integral is there-
the integral. fore the integral of a 2-form over a surface, written as
We first perform a simple change of variables, {x, y} Z
{x, }, where we replace y by y = Y (x, ) and Y is a fixed
,
function. To transform the integral, it is convenient to use the A
formula (1.17). Since the variable x is held fixed within the
integration over y, we find where A is a surface (or part of a surface) and is a 2-form.
The same formulation applies to integrals over surfaces em-
Z x2 Z y2 Z x2 Z 2 (x)
Y (x, ) bedded in a higher-dimensional space. Consider another ex-
dx dy F (x, y) = dx d F (x, Y ) . ample familiar from physics: the flux of a magnetic field B
x1 y1 x1 1 (x)
through a surface A is computed as a surface integral
Hence, if we wish to interpret the expression F dx dy consis- Z
tently as a bilinear form, we must adopt the rule = B dS,
A

Y (x, ) where dS is understood as a vector-valued infinitesimal sur-


dx dY (x, ) = dx d.
face element. The flux integral is usually written in compo-
nents, B {Bx , By , Bz }, as follows:
However, according to the calculus of 1-forms, we have Z
Y Y (Bx dydz + By dxdz + Bz dxdy) .
dY (x, ) = dx + d A
x
However, this notation needs to be supplemented by a com-
and (assuming that the product dx dy is bilinear) plicated prescription for the orientation of the surface ele-
  ments. Integrals of this kind become more transparent in the
Y Y notation of 2-forms. One introduces the 2-form
dx dY (x, ) = dx dx + d .
x
Bx dy dz + By dz dx + Bz dx dy
Therefore, this calculus is consistent with the properties of
two-dimensional integration only if we assume that and expresses the flux as
Z
dx dx = 0. = .
A

Now it is clear that the 2-form dx dy must be a rather special The (somewhat complicated) rules for changing variables in
kind of product of 1-forms dx and dy. This product is called such integrals, as well as the orientation prescriptions, are au-
the exterior product (also wedge product) and is denoted by tomatically reproduced by the calculus of differential forms.
the symbol ; so one writes dx dy rather than dx dy. The The rules of this calculus are simple: one needs to use the
property chain rule for d and the antisymmetry of .
dx dx = 0 Similarly, one can treat a three-dimensional integral,
ZZZ
together with bilinearity means that the exterior product is
antisymmetric. To see this, consider F (x, y, z)dxdydz,

6 However, note that we obtained the Jacobian itself rather than its absolute
d (x + y) d (x + y) = 0 dx dy + dy dx = 0.
value, which one has in formulas involving the area of domain. This is
consistent because integrals of 2-forms are defined on oriented surfaces
Let us now check the consistency of a more general change and are not merely integrals of the ordinary, unsigned area. Reversing
of variables, {x, y} {, } where x = X(, ) and y = the orientation of the surface will reverse the sign of the integral.

18
1.4 Calculus of differential forms

as an integral of the 3-form F (x, y, z)dx dy dz over a three- The components of in a local coordinate system {x }
dimensional manifold (or part of a manifold). The same rules are
will provide the correct formulas under any changes of vari- ( ) = ,
ables in the integral. Collectively, n-forms (n = 1, 2, 3, ...) are
where and are the components of the 1-forms and .
called differential forms since they represent expressions in-
volving the differentials d. Example: Consider a local coordinate system {x, y, z} and
After this motivation, we proceed to summarize the main the 1-forms = ydx 2dz, = xdy + ydz. The exterior
definitions and properties of differential forms. We begin by product of and is computed as follows,
studying totally antisymmetric tensors.
= (ydx 2dz) (xdy + ydz)
1.4.3 Antisymmetric tensors = xydx dy + y 2 dx dz + 2xdy dz

An n-form is a totally antisymmetric map from sets of n (we used dy dz = dz dy). Let us now compute how the 2-
vector fields to numbers, (v1 , ..., vn ) R, which is linear in form acts on a pair of vectors u = 5x yz , v = x +3zy .
each of its vector arguments vj . In other words, an n-form is We have dy u = 0 and dz v = 0, which simplifies the
a totally antisymmetric tensor of rank (0, n). The equivalent calculations:
notations (v1 , ...vn ) (v1 , ..., vn ) will be used for conve-
nience or clarity. (dx dy) (u, v) = (dx u) (dy v) = 15z,
When considering vector fields on manifolds, one deals (dx dz) (u, v) = (dz u) (dx v) = y,
with vectors v|p defined in each tangent space Tp M. So it (dy dz) (u, v) = (dz u) (dy v) = 3yz.
is natural to consider n-forms (v1 , ..., vn ; p) defined in each
tangent space Tp M, i.e. n-form fields. These n-form fields Finally, we compute
are called differential forms or simply n-forms, just as we
sometimes call vector fields simply vectors. ( ) (u, v) = xy 15z + y 2 y + 2x 3yz = 21xyz + y 3 .
In a local coordinate system {x }, tangent vectors are rep-
resented by components v and n-forms are represented by 
multi-indexed arrays of components, ... , such that A useful property is that the exterior product of linearly de-
pendent 1-forms vanishes.
(v1 , ..., vn ) = ... v1 v2 ...vn . Statement 1.4.3.1: If 1 , ..., k are some 1-forms, then 1 ...
k 6= 0 iff the set of the 1-forms {j } is linearly independent.
The array of components ... is totally antisymmetric in
all indices. We use the index-free notation in which n-forms Proof of Statement 1.4.3.1: If the set {j } is linearly depen-
are represented explicitly through basis 1-forms such as dx. dent then we can express e.g. 1 through other j , and the
To express the antisymmetry of an n-form, one writes dx exterior product vanishes due to antisymmetry (j j = 0).
dy, where denotes the special antisymmetric product to be If the set {j } is linearly independent then it is either already
introduced now. a basis or can be completed to a basis {1 , ..., k , ..., n } in the
The exterior product (also called the wedge product) of two n-dimensional space of 1-forms. So the exterior product is (by
forms, say an m-form 1 and an n-form 2 , is an (m + n)-form definition) an alternating linear combination of tensor prod-
1 2 , defined by a total antisymmetrization of products of ucts of basis 1-forms,
1 and 2 , X ||
1 ... k = (1) (1) ... (k) ,
1 X
(1 2 ) (v1 , ..., vm+n ) (1)||
m!n! where the sum is over all permutations of the set {1, ..., k}.
 
Since
1 v(1) , ..., v(m) 2 v(m+1) , ..., v(m+n) , the set of all basis tensors {j1 ... jk } is by defini-
(1.19) tion linearly independent, the linear combination cannot van-
ish. 
where the sum goes over all the permutations of the set
{1, 2, ..., m + n}, and || = 0, 1 is a function showing whether 1.4.4 *Oriented volume and n-vectors
the permutation is even or odd. Note that the factors m!n! in
Eq. (1.19) will cancel due to the total antisymmetry of 1 and In Sec. 1.4.1, we have introduced the notion of oriented
2 . For instance, if is a 1-form and is a 2-form then area and oriented volume. Now we develop a more gen-
eral picture in which certain antisymmetric tensors represent
( ) (x, y, z) = (x)(y, z) + (y)(z, x) n-dimensional volumes embedded in N -dimensional space,
+ (z)(x, y). where dimensions are arbitrary and N n.
By analogy with the exterior product of forms, one can de-
Similarly, if , , and are 1-forms then fine the exterior product of vectors. For instance, the exterior
product of two vectors is the antisymmetric tensor defined by
(x) (y)
( ) (x, y) = (x)(y) (y)(x) = det ,
x y x y y x; (x y) = x y x y .
(x) (y)

(x) (y) (z)
One can consider the vector space of linear combinations of
( ) (x, y, z) = det (x) (y) (z) . such exterior products; this is the space of antisymmetric ten-
(x) (y) (z)
sors of rank (2,0). Similarly, the exterior product of n vectors

19
1 Calculus in curved space

is an antisymmetric tensor of rank (n,0). Such totally antisym- is then computed as


metric tensors are sometimes called n-vectors or multivectors;
for instance, x y is called a bivector. X X X
Using this construction, one can reinterpret an n-form as a
Aij ei ej Aij ei ej = Aij Bij
linear map from n-vectors into numbers. For example, a 2- i,j i,j i,j

form acts on a bivector x y as follows,


(all the sums are taken over 1 i < j N ). Hence, the
ordinary (scalar) area of the parallelogram is
(x y) = (x y y x) = (x, y) (y, x) = 2(x, y).
s X
2
Notice the appearance of an extra factor 2. In the case of n- Area (a, b) = |Aij | .
forms, this extra factor will be n! since we will need to sum 1i<jN
over n! possible permutations of n vectors.

Remark: Some textbooks use extra factors of n! when defin-


1.4.5 Determinants
ing the exterior product of n 1-forms; for instance, one could A standard definition of the determinant of a square n n
define ( ) (x, y) = 21 ((x)(y) (y)(x)). Then extra matrix Aij is a formula involving the matrix elements:
factors of n! will appear in some equations but disappear from X ||
some other equations. These factors play no essential role, det (Aij ) (1) A1(1) ...An(n) ,
i.e. they are cosmetic (they merely make some equations bet-
ter looking). Of course, one must keep track of the extra fac-
tors when one uses formulas from different books.  where the sum is performed over all transpositions . A
transposition is a one-to-one map from the set {1, 2, ..., n}
An n-vector a1 ... an can serve as a representation of the
to the same set. The quantity || is defined as 0 when is an
n-dimensional volume of a parallelepiped spanned by a set of
even transposition (equivalent to an even number of pair in-
n given vectors {a1 , ..., an } in an N -dimensional space (where
terchanges) and 1 if is an odd transposition. This formula
N n). Statement 1.4.4.1 shows that the (oriented) volumes
is explicit but complicated. The geometric meaning and the
of two such parallelepipeds spanned by the sets {a1 , ..., an }
properties of the determinant are much more apparent if one
and {b1 , ..., bn } are equal iff the two corresponding n-vectors,
adopts a different (but equivalent) definition.
a1 ... an and b1 ... bn , are equal. This is proved by ge-
ometric arguments similar to those in Sec. 1.4.1, and by using We define the determinant det T of a linear transforma-
an n-dimensional generalization of Fig. 1.9. Note that an anal- tion T in an n-dimensional Euclidean space Rn as the vol-
ogon of Statement 1.4.3.1 holds also for vectors: an n-vector is ume of the image of the unit cube under the transformation
nonzero iff the set of its constituent vectors is linearly inde- T. In other words, we select an orthonormal basis {e1 , ..., en }
pendent. and transform it using T, obtaining the vectors {Te1 , ..., Ten }.

Statement 1.4.4.1: We consider an n-dimensional space Rn . By definition, det T is equal to the oriented volume of the n-
(a) Every n-vector a1 ... an is proportional to every other dimensional parallelepiped spanned by these n vectors.
n-vector. (b) The ratio of the oriented volumes of two n- A little work using the concept of multivectors shows that
dimensional parallelepipeds spanned by the sets {a1 , ..., an } the the initial set of vectors {e1 , ..., en } does not actually need
and {b1 , ..., bn } are related by the same proportionality factor to be an orthonormal basis; this is desirable, since we can then
as the two corresponding n-vectors, a1 ...an and b1 ...bn . define the determinant of a transformation T without the ne-
(Proof on page 168.)  cessity to have a scalar product in the vector space.
The ordinary (scalar) volume of the parallelepiped spanned Statement 1.4.5.1: Let {v1 , ..., vn } be n arbitrary vectors;
by {a1 , ..., an } can be calculated as a suitably defined abso- denote by Vol (v1 , ..., vn ) the oriented volume of the paral-
lute value or the norm of the n-vector a1 ... an , lelepiped spanned by these vectors. Consider also the paral-
q lelepiped spanned by the transformed vectors {Tv1 , ..., Tvn }.
2 Then the transformed volume is proportional to the original
Vol (a1 , ..., an ) = |a1 ... an | .
volume with the factor det T,
The norm |a1 ... an | is defined in the usual manner Vol(Tv1 , ..., Tvn ) = Vol (v1 , ..., vn ) det T.
through the scalar product on the space of n-vectors, which is
in turn defined through the scalar product in the vector space. This factor is independent of the choice of the vectors {vj }. A
For instance, the area of a parallelogram spanned by two vec- similar relationship holds for the corresponding multivectors,
tors {a, b} in an N -dimensional Euclidean space RN is calcu-
lated as follows. The oriented area of the parallelogram is Tv1 Tv2 ... Tvn = (v1 v2 ... vn ) det T.
represented by the bivector a b. Using the standard Carte-
sian basis {e1 , ..., eN } in RN , one can decompose this bivector
(Proof on page 168.) 
into basis bivectors {ei ej }, where 1 i < j N , with Due to Statement 1.4.5.1, we can reformulate the defini-
some coefficients Aij , tion of the determinant as follows: The determinant of T is
equal to the factor that multiplies the volume of an arbitrary
X
ab = Aij ei ej . n-dimensional parallelepiped after the transformation T.
1i<jN Using this definition, it is easy to derive the fundamental
property of determinants: the determinant of a product of two
The scalar product is then defined in the space of bivectors transformations is equal to the product of the two determi-
by postulating that the bivectors {ei ej } constitute an or- nants,
thonormal basis. The scalar product of two arbitrary bivectors det(AB)
= (det A)(det

B).

20
1.4 Calculus of differential forms

This property is a simple consequence of the fact that the vol- These properties are the following. For any forms 1 and 2
ume of every parallelepiped transformed by A is multiplied and scalar functions f we have the Leibnitz-like rule

by det A.
Finally, let us prove an important relationship involving d (f 1 ) = df 1 + f d1 ,
volume in Euclidean space. d(1 2 ) = (d1 ) 2 + (1)n1 1 d2 ,
Statement 1.4.5.2: The volume of a parallelepiped spanned
where 1 is an n1 -form. Heuristically, one gets n sign changes
by vectors {v1 , ..., vn } in a Euclidean space Rn is
when one pulls d through an n-form.
p The second fundamental property of the exterior differen-
Vol (v1 , ..., vn ) = det gij , gij vi vj ,
tial is d (d) = 0 for any n-form . This property can be
where vi vj is the scalar product of two vectors. understood heuristically as follows: d d is symmetric with
respect to the exchange of d with d; each d adds a vector
Proof of Statement 1.4.5.2: Let us consider an orthonor- argument into the form, but the result must be a totally anti-
mal basis {ej } in this space. The volume of a parallelepiped symmetric form. So we must have d d = 0.
spanned by {ej } is, by definition, equal to 1. Since {ej } is a ba- Due to the simplicity of these rules, calculations with differ-
sis, there exists a linear transformation T that brings {ej } into ential forms are straightforward.
the given set of vectors {vj }. By Statement 1.4.5.1, the volume
Example: Consider 1-forms 1 = x2 dy and 2 = xydx+2dy.
of the parallelepiped spanned by {vj } is equal to det T, so it
The exterior differential of 1 is
remains to compute that determinant. Let us introduce the
components of the transformation T in the basis {ej }, 
d1 = d x2 dy = 2xdx dy + x2 d (dy) = 2xdx dy,
X
vj Tej = Tjk ek . because d (dy) = 0. Let us compute d2 :
k
d2 = d(xy) dx + xyd (dx) + 2d (dy)
Since ek el = kl , we can express the matrix gij as
= ydx dx + xdy dx
! !
X X X = xdx dy,
gij = vi vj = Tik ek Tjl el = Tik Tjk .
k l k because dx dx = 0 and dy dx = dx dy due to antisym-
metry. Consider now the differential of d2 ,
It follows that the matrix gij is equal to the product of two
matrices Tij . Therefore, the determinant of gij is found as d (d2 ) = d (xdx dy) = dx dx dy = 0.
2
det gij = (det Tij ) (det Tji ) = (det Tij ) , Another example calculation is
where we used the fact that the determinant of the transposed d (xydx dz) = d (xy) dx dz = xdx dy dz.
matrix is the same as the determinant of the original matrix.
Hence, p 
Vol (v1 , ..., vn ) = det Tij = det gij , A rather cumbersome but explicit formula can be derived
which is the desired formula.  for the differential d of an arbitrary n-form . The (n + 1)-
form d acts on vectors v1 , ..., vn+1 as follows,

1.4.6 Differential forms n+1


X
(d) (v1 , ..., vn+1 ) (1)s1 vs (v1 , ..., v
s , ..., vn+1 )
Let us now consider differential forms, i.e. n-forms defined s=1
in the tangent space Tp M at every point p of a manifold M. X
+ (1)r+s1 ([vr , vs ], v1 , ..., v
r , ..., v
s , ..., vn+1 ),
An example of a differential form written in local coordinates
1r<sn+1
{x, y, z} is
(1.20)
= x2 dy dz 3y 3 dx dy
(in this example, is a 2-form). where the hat over a vector, v s , indicates the absence of the
For scalar functions (0-forms) f , the 1-form df is defined vector vs among the listed arguments of . As an example of
by using this formula, consider a 1-form , then d is the 2-form
(df ) v v f. defined by

So the notation dx itself can be understood as the differ- (d) (x, y) x ((y)) y ((x)) ([x, y]). (1.21)
ential operator d acting on the function x. It follows that
df (x) = f (x)dx, which is a familiar rule of calculus (the chain It is straightforward to check that this formula actually defines
rule). For this reason, the notation dx for 1-forms is conve- a bilinear map d that does not contain derivatives of x or y.
nient. Statement 1.4.6.1: Equation (1.21) defines a bilinear form d
The operator d can be generalized to a linear operator acting despite the apparent presence of derivatives of x and y.
on n-forms and producing (n + 1)-forms; this operator is then
called the exterior differential. So the exterior differential of Proof of Statement 1.4.6.1: Obviously the formula (1.21) de-
an n-form is an (n + 1)-form written as d. fines an antisymmetric function of (x, y). So it is sufficient to
The exterior differential d can be defined either explicitly show that
(see Eq. (1.20) below), or through the properties it satisfies. (d) (x, y) = (d) (x, y)

21
1 Calculus in curved space

when is a scalar function of a point. Once this is proved, it 1.4.7 *Canonical decomposition of 1-forms and
will follow that d does not contain derivatives of x or y. 2-forms
The remaining part of the proof is thus a straightforward
calculation: The material is covered in [2], volume 2, chapter 22, Ap-
pendix; see also [16], chapter 5.
(d) (x, y) = x (y) y ((x)) ([x, y])
= (x (y) y (x) ([x, y])) In this section I review some classical results obtained in
the theory of differential forms. The main statements are the
(y ) (x) + ((y ) x)
following.
= (d) (x, y). With a suitable choice of local coordinates {x1 , ..., xn } in a
Here we used Statement 1.3.1.1 to express [x, y].  suitable domain of an n-dimensional manifold:
There is a useful relationship between the Lie derivative A given 1-form can be expressed in one of two ways:
and the exterior differential. For convenience of notation, one either as
introduces the insertion operation v (also called the interior = x1 dx2 + ... + x2k1 dx2k ,
product) that inserts the vector v into n-forms as the first
argument, or as

(v ) (v1 , ..., vn1 ) (v, v1 , ..., vn1 ) . = x1 dx2 + ... + x2k1 dx2k + dx2k+1 .

The insertion operator produces (n 1)-forms out of n-forms.


(The Darboux theorem) A given closed 2-form can be
The Lie derivative can now be expressed through and d. To
expressed as
see how, let us first compute Lv for some 1-form and vec-
tor field v. The 1-form Lv acts on an arbitrary vector u as = dx1 dx2 + ... + dx2k1 dx2k .
(Lv ) u = Lv ( u) (Lv u) = v ((u)) ([v, u]).
(The Poincar lemma) Any closed n-form is locally ex-
Let us compare this with the formula (1.21) for d; by inspec- act: if d = 0, there exists an (n 1)-form such that
tion, we notice that = d in some domain (but perhaps not in the entire
manifold).
(Lv ) u = (d) (v, u) + u ((v)) .
* Canonical decomposition of closed 2-forms (Darboux the-
Using the insertion operator v , we can rewrite this as orem)
(Lv ) u = (v (d)) u + u (v ) . We consider an n-dimensional smooth manifold. In a local
coordinate system {x1 , ..., xn }, an arbitrary 2-form can be
Finally, note that v is a scalar function, so expressed as

u (v ) (d (v )) u. n
1 X
= ij (x1 , ..., xn )dxi dxj , (1.23)
Thus 2 i,j=1
(Lv ) u = (v (d) + d (v )) u.
where ij = ji is an x-dependent matrix. However, the
Omitting the arbitrary vector u, we obtain the Cartan homo- coordinate system may be chosen so that a particular 2-form
topy formula is written in a simpler way. For example, a closed 2-form
Lv = d (v ) + v (d) ,
written also more concisely as A = dx1 dx2 + x3 dx1 dx3

Lv = dv + v d. (1.22) can be rewritten as


 
This formula holds not only for 1-forms, but also for arbitrary 1 1
A = dx1 d x2 + x23 = dx1 dy2 , y2 x2 + x23 .
n-forms ; note that the exterior differential d brings n-forms 2 2
into (n + 1)-forms, while Lv does not change the number of
In this example, a particular choice of the new coordinates
arguments of n-forms. The Cartan homotopy formula for n-
{x1 , y2 , x3 , ..., xn } reduces the 2-form A to a simpler expres-
forms can be derived directly from the explicit definitions of
sion involving just two of the coordinates.
d and v and the Leibnitz property of Lv , for instance by
It is useful to know how many different coordinates are
showing that v d = Lv dv . See also e.g. the book [33],
needed to represent a given 2-form in the simplest possi-
4.5.
ble way. Of course, it is also useful to be able to determine the
The following properties also hold for arbitrary n-form ,
new coordinates if a closed 2-form is specified in a given
n -form , and vector fields v, x:
coordinate system.
n
x ( ) = (x ) + (1) x , The Darboux theorem (proved below) says that for any
(Lv d) = (dLv ) , closed 2-form a suitable local coordinate system {x1 , ..., xn }
can be found such that is locally (i.e. within a certain do-
Lv ( ) = (Lv ) + Lv , main) expressed as
(Lv x ) = [v,x] + x Lv .
= dx1 dx2 + dx3 dx4 + ... + dx2k1 dx2k , (1.24)
These properties can be verified by straightforward computa-
tions which we omit. where k n/2 is some number.

22
1.4 Calculus of differential forms

It is advantageous to find such canonical local coordinates The 2k-form k is nonzero since has rank 2k, therefore
because then calculations with are much easier. The re- k is proportional to the volume form dx1 ... dx2k with a
quired number 2k n of different coordinates in the decom- nonzero coefficient. It follows that
position (1.24) is called the rank of .
If a closed 2-form is given in a certain coordinate system, x (dx1 ... dx2k ) = 0.
can one determine the rank of without knowing the canoni- Pn
cal coordinates (but knowing that they exist)? In the example Writing x = j=1 aj /xj , it is straightforward to see that all
with the 2-form A above, it is easy to guess the correct canon- the coefficients aj must vanish. Therefore, x = 0. 
ical coordinates. Consider another example, Assuming that the Darboux theorem is true, it is easy to see
that a nondegenerate 2-form always has maximal rank. If
B = dy1 dy2 + y1 dy1 dy3 . does not have maximal rank, the canonical decomposition of
a 2-form is free of some of the coordinates, say of xn . It
In this case, it is not obvious how to find suitable canoni- follows that the vector v /xn satisfies v = 0. Hence,
cal coordinates {x1 , x2 , x3 , ...} such that B is expressed as in is degenerate.
Eq. (1.24). An algebraic version of the Darboux theorem is the follow-
To determine the rank of a given closed 2-form , the fol- ing.
lowing trick can be used. One considers exterior powers of ,
denoted Statement 1.4.7.1: An arbitrary antisymmetric bilinear form
(2-form) B can be expressed as
k = | {z ... } .
k times B = 1 2 + ... + 2k1 2k , (1.26)
Note that for a 2-form the product does not necessarily
vanish. A sufficiently high power of will, of course, vanish. where j are some suitable 1-forms. The set {j } of these 1-
So there will be some number k n/2 such that forms is linearly independent. The smallest required number
(2k) of these n-forms is the rank of B, which is equal to the
k 6= 0, (k+1) = 0. (1.25) rank of the matrix Bij in any basis.

From the Darboux theorem we know that the decomposi- Proof: The choice of a cnonical basis for an antisymmetric
tion (1.24) is possible in some coordinates {xj }. In these co- bilinear form B is a standard task of finite-dimensional lin-
ordinates, we can compute ear algebra. We consider an n-dimensional vector space. If
B = 0, the rank is zero and the statement is proved. If B 6= 0,
k
= k! dx1 ... dx2k 6= 0, there exists a vector e1 such that the 1-form e1 B is nonzero.
Thus there exists another vector e2 such that B(e1 , e2 ) = 1.
since by assumption all {xj } are independent coordinates in a Since B is antisymmetric, the vectors e1 and e2 cannot be
local coordinate system. However, (k+1) = 0. Thus the rank parallel to each other, thus they can be completed to a basis
of is 2k, where k is the largest integer such that k 6= 0 but {e1 , e2 , e3 , ..., en }. We can choose the other basis vectors e3 ,
(k+1) = 0. If = 0, we say that its rank is zero. ..., en such that they are orthogonal to e1 and e2 with respect
For example, we can immediately see that the 2-form B to B, i.e.
specified above has rank 2 because B B = 0. Thus, we
should expect to find suitable coordinates {x1 , x2 , x3 , ...} such B(e1 , ej ) = B(e2 , ej ) = 0, j = 3, ..., n.
that B = dx1 dx2 . Determining these coordinates in prac-
This choice is always possible because we may add to each ej ,
tice may be far from easy; a possible method of finding {xj }
j = 3, ..., n a suitable linear combination of e1 and e2 ,
is given in the proof of the Darboux theorem below.

Remark: The rank of a 2-form may be different at different j = ej B(e1 , ej )e2 + B(e2 , ej )e1 ,
e
points of the manifold. Presently, we assume that we will be
so that now B(e1 , e j ) = B(e2 , ej ) = 0 for every j = 3, ..., n.
dealing only with such 2-forms whose rank remains constant
For brevity, let us denote the resulting basis again by {ej }. The
locally, i.e. throughout some neighborhood of some point. 
result of this construction is that the two subspaces spanned
The maximal rank of a 2-form on an n-dimensional mani-
by {e1 , e2 } and by {e3 , ..., en } are orthogonal to each other
fold is n (for even n) or n 1 (for odd n). For a 2k-dimensional
with respect to the bilinear form B.
manifold, a 2-form can be nondegenerate if the matrix ij
After choosing the basis {e1 , ..., en } in this way, we compute
is invertible (has nonzero determinant). This condition can
the dual basis 1-forms {1 , ..., n } for the basis {e1 , ..., en } and
be expressed in the following way: for any nonzero vector x
represent the 2-form B as
there exists at least one vector y such that (x, y) 6= 0; in other
words, the 1-form x is a nonvanishing 1-form. Equivalently:
B = 1 2 + B (1) ,
if x = 0 for some vector x then x = 0.
(1)
Statement 1.4.7.1: For a 2k-dimensional manifold, a 2-form where the new 2-form B is equal to zero (1)
on e1 , e2 . Hence,
of maximal rank 2k is nondegenerate. it is sufficient to consider the 2-form B within the (n 2)-
dimensional subspace spanned by {e3 , ..., en }. We can now
Proof: Suppose a vector x is such that the 1-form x van- apply the same construction to B (1) in a smaller number of
ishes, x = 0. We would like to prove that x = 0. It follows dimensions; either B (1) = 0, or we can find 3 and 4 such
from x = 0 that that B (1) = 3 4 + B (2) , etc. The construction eventually
 stops because at every step the dimensionality of the space is
x k = k (x ) (k1) = 0. reduced by two. At that point, we will obtain a decomposition

23
1 Calculus in curved space

of the form (1.26), where all 1-forms j are (by construction) a coordinate system {x1 , ..., xn } can be chosen such that v =
linearly independent set. /xn . Then the property v = 0 means that the expression
In the basis {ej }, the 2-form B is represented by the n n for may contain dx1 , ..., dxn1 but no dxn . Further, we have
matrix d = 0 and thus (by the Cartan homotopy formula)
0 1
1 0 0 0 Lv = (dv + v d) = 0.

. ..



The property Lv = 0 means that the coefficients

0 0 1
, ij (x1 , ..., xn ) in the decomposition (1.23) do not actually de-

1 0
pend on the coordinate xn (see Statement 1.3.2). Thus, it is

0
sufficient to study the restriction of the 2-form to an (n 1)-
. .
dimensional submanifold of constant xn . The same construc-
0 .
tion can be applied to the reduced 2-form . If the rank of
0
is smaller than n 1, anther coordinate xn1 can be found
in which the lower right (n 2k) (n 2k) block is zero. such that does not indolve dxn1 and the coefficients ij
It is straightforward to verify that the rank of the 2-form do not depend on xn1 . Thus we will reduce to an (n 2)-
B is indeed 2k according to the definition (1.25) of the rank. dimensional manifold, etc. Each time, the dimension of the
The 2k-form B k is nonzero due to the linear independence manifold is reduced by one, until we obtain a 2-form of
of {j }, rank 2k on a 2k-dimensional manifold, i.e. a 2-form of max-
B k = k! 1 ... 2k , imal rank. A 2-form of maximal rank is nondegenerate (State-
ment 1.4.7.1). Thus it is sufficient to prove the Darboux theo-
while B (k+1) = 0. 
rem only for nondegenerate 2-forms.
The difference between the algebraic and the differential-
The second step is to prove that there exists a local coordi-
geometric versions of the Darboux theorem is the following.
nate system {x1 , ..., xn } such that a given closed, nondegen-
According to the algebraic theorem just proved, one may re-
erate 2-form has constant coefficients ij in the coordinates
duce any 2-form to a canonical decomposition at any one
{xj },
point p. This decomposition will involve a certain choice of
the 1-forms { j |p } at each p. However, this procedure will 2n
1 X (0) (0) (0)
generally generate different sets { j |p } in tangent spaces at = dxi dxj , ij = ij = const.
2 i,j=1 ij
different points p, so there will be no local coordinate sys-
tem {xj } in which the 1-forms j are equal to the coordi-
This involves a differential-geometric argument detailed be-
nate 1-forms dxj at every point p. Using the algebraic ver-
low. The third step is to reduce the nondegenerate, antisym-
sion of the Darboux theorem, one could obtain the decompo- (0)
sition (1.24) at any one point p but not at neighboring points. metric 2n 2n matrix ij to the canonical form
The differential-geometric Darboux theorem says that one can
0 1
actually choose a local coordinate system in which the 2-form 1 0 0
has a canonical decomposition (1.24) at every point within
..
.
some domain.
.
* Proof of the Darboux theorem 0 0 1
The Darboux theorem states that a closed 2-form of local 1 0
rank 2k in a n-dimensional manifold can be written as
The reduction is performed through a change of basis. Ac-
= dx1 dx2 + ... + dx2k1 dx2k cording to Statement 1.4.7.1, a basis {e1 , ..., e2n } can be chosen
(0)
such that the matrix ij assumes the canonical form shown
in a suitable local coordinate system. This theorem is proved above (there is no zero block since (0) has maximal rank).
ij
in three steps. It remains to perform the second step. We fix a point p at
The first step is to show that there exists a local coordinate which
system {x1 , ..., xn } such that depends only on the first 2k 2n
1 X (0)
coordinates {x1 , ..., x2k }, i.e. |p = dxi dxj ,
2 i,j=1 ij
2k
1 X and define a new 2-form 0 at all other points by the formula
= ij (x1 , ..., x2k )dxi dxj .
2 i,j=1
2n
1 X (0)
0 = dxi dxj .
Then it is sufficient to consider the 2-form on a 2k- 2 i,j=1 ij
dimensional submanifold where the remaining coordinates
{x2k+1 , ..., xn } have fixed values. By construction, = 0 at p, while 0 has constant coeffi-
The first step can be proved as follows. If a 2-form 6= 0 cients in the given coordinate system. Now we would like
has rank 2k < n then, by Statement 1.4.7.1, at every point there to find a change of coordinates in a neighborhood of p such
exists a decomposition of the form (1.26). It follows that in the that has constant coefficients in the new coordinates. This is
tangent space at every point p, there will exist a nonempty equivalent to finding a diffeomorphism f : M M such that
subspace of vectors v such that v = 0. Since the 2-form f (p) = p and f () = 0 . Instead of finding f directly, we will
is smooth, this subspace will also vary smoothly between find a one-parametric flow of diffeomorphisms ft , t [0, 1],
points. So at least one smooth vector field v 6= 0 can be chosen such that f0 = id (the identity map) and f1 = f is the map we
such that v = 0 at every point within some domain. A local need.

24
1.4 Calculus of differential forms

We would like to find a map ft that transforms into a 2- express this 1-form. This time, we do not limit ourselves to
form t which is between and 0 . The first trick is to considering only closed 1-forms; but let us briefly examine
write such an interpolation explicitly, i.e. we define the case of a closed 1-form. By the Poincar lemma, a closed
1-form is locally represented as = df using some function
t + t (0 ) . f . The function f will be nonconstant if 6= 0. Thus, f can
be used as one of the coordinates in a local coordinate system;
The 2-form 0 is closed and thus, by the Poincar lemma,
so can be expressed using just one coordinate. We conclude
there exists a 1-form such that
that a closed, nonzero 1-form has rank 1.
0 = d Below we will prove that any 1-form can be expressed
(using suitable local coordinates) in one of two canonical
(perhaps, the above will hold only in a smaller star-shaped ways: either
neighborhood of the initial point p). The second trick is to
= x1 dx2 + ... + x2k1 dx2k (1.27)
describe the flow ft through a tangent vector field vt . The
vector field vt will be obtained by the duality map from the (then we say that has rank 2k), or
1-form , using the 2-form t as the metric. Converting a 1-
form into a vector field is possible only if t is nondegenerate = x1 dx2 + ... + x2k1 dx2k + dx2k+1 (1.28)
at every point and for every t [0, 1]. We postpone the proof
of the nondegeneracy of t , and presently we assume that we (then has rank 2k + 1). We may call the coordinates used
have a vector field vt satisfying for such decompositions canonical coordinates of a given 1-
form.
vt t = , t [0, 1] . Knowing that this statement is true, how could we deter-
mine the rank of a given 1-form , for example,
We now define the flow ft as the flow generated by the (time-
dependent!) vector field vt . By construction, the flow ft trans- = dx + xydz,
forms into a t-dependent 2-form (t) which satisfies the dif-
without knowing the canonical coordinates? We use the fol-
ferential equation
lowing trick. Consider the 2-form d and compute its exterior
k
d powers, (d) , k = 1, 2, ... Write the following sequence of
(t) = Lvt (t). the differential forms,
dt
It is then straightforward to verify that the interpolation t , d, d, d d, d d, ... (1.29)
satisfies this equation:
The k-th element of this sequence (k = 1, 2, ...) is a k-form.
d Eventually some element of this sequence will be zero, and
t = 0 = d, then all the subsequent elements will also be zero. By sub-
dt
Lvt t = (dvt + d) t = dvt t = d. stituting the decompositions (1.27) and (1.28) into the se-
quence (1.29), one finds that the rank of is equal to the num-
Thus t = (t) and is indeed the result of the transformation ber of initial nonzero elements in the sequence. The rank of
of the initial 2-form by the flow ft . The diffeomorphism f1 is zero if = 0; the rank of is one if 6= 0 but d = 0;
corresponding to t = 1 is the one that transforms into 0 . the rank of is two if d 6= 0 but d = 0; etc. In this
It remains to demonstrate that the 2-form t has maximal way it is straightforward to compute the rank of a 1-form
rank for every t [0, 1] and at every point within some do- given in some local coordinates. For example, the rank of
main. We note that t interpolates between and 0 , while = dx + xydz is 3 (at points where xy 6= 0) because
the 2-forms and 0 coincide at the chosen point p. Since
t = at p, it is clear that t at p remains nondegenerate for d = xdx dy dz 6= 0, d d = 0.
all t; the present task is to show that t has maximal rank ev- Remark: The Carathodory theorem is a form of Frobenius
erywhere in some open domain around p. The 2-form t has theorem. It states that the set of null curves of a 1-form is
maximal rank 2n iff n t 6= 0. The 2n-form n
t is proportional surface-forming iff the form has rank 1 or 2, so that
to the volume form dx1 ... dx2n , namely d = 0. (Null curves of are curves such that = 0.)
n = ft (x) dx1 ... dx2n , This is proved by an explicit construction in local coordinates,
t
showing that a 1-form of rank 3 or higher has non-surface-
where the coefficient, temporarily denoted by ft , is a function forming null curves. (A non-surface-forming set of curves is
of the coordinates {xj }. The rank of t is maximal if ft (x) 6= 0. a set such that curves from the set can reach any point in a
We know that ft (p) 6= 0 for every t [0, 1]. Since the function neighborhood of an initial point.) 
ft is smooth, the domain where ft 6= 0 is a t-dependent open * Proof of the canonical decomposition theorem for 1-forms
set containing the point p. The intersection of all such do- The canonical decomposition theorem for 1-forms states
mains for every t [0, 1] is again a nonempty open set (pos- that for any 1-form , one can choose a local coordinate sys-
sibly smaller than the set where has maximal rank). Thus, tem {xj } such that is locally represented in one of the two
we have found an open domain containing the initial point p ways, (1.27) or (1.28), as long as has a locally constant rank.
where t is nondegenerate at every t. We prove this theorem by reducing it to the Darboux theo-
* Canonical decomposition of 1-forms rem. First, we determine the rank of a given 1-form by using
The concepts of rank and canonical coordinates can be de- the sequence (1.29). If has an odd rank 2k + 1, the 2-form d
fined also for 1-forms. Heuristically, the rank of a 1-form is has rank 2k (according to the definition of rank for closed 2-
the smallest number of independent coordinates required to forms) because (d) 6= 0 while (d)(k+1) = 0. According
k

25
1 Calculus in curved space

to the Darboux theorem, we can then choose local coordinages Applying the Poincar lemma to the closed 1-form
{xj } such that
x1 dx2 + ... + x2k1 dx2k ,
d = dx1 dx2 + ... + dx2k1 dx2k .
we show that there exists a function h such that
Applying the Poincar lemma to the closed 1-form
= dh + x1 dx2 + ... + x2k1 dx2k .
x1 dx2 + ... + x2k1 dx2k , Since d has rank 2k, we have
we show that there exists a function f such that 0 = (d)
(k+1)
= (k + 1)! dh dx1 ... dx2k ,
= df + x1 dx2 + ... + x2k1 dx2k . so the function h can depend only on the subset of coordinates
consisting of the first 2k coordinates {x1 , ..., x2k }. Hence the
Since the 1-form has rank 2k + 1, we obtain 1-form depends also only on these coordinates. Let us write
0 6= (d)k = df k! dx1 ... dx2k . 2k
X
= aj dxj ,
Hence, the function f can be used as a local coordinate that is j=1
independent of {x1 , ..., x2k }. Denote this function f by x2k+1 ,
we obtain the decomposition (1.28). where aj are coefficients that depend only on {x1 , ..., x2k }.
In the second case, the 1-form has an even rank 2k. Then Since both parts of Eq. (1.32) depend only on these coordi-
the 2-form d again has rank 2k, and so we could again arrive nates, the function f is a function only of {x1 , ..., x2k }.
at Eq. (1.28), but we would like to eliminate the last term in We can now return to the task of determining a suitable
(k1)
that equation. So we need to work a little harder. function f . First we express the (2k 1)-form (d)
The idea is to find a function f such that the 1-form f has needed for Eq. (1.32) as
rank 2k 1. If such f is found, then by the previously proved
(k1)
case it will follow that (d)
= (k 1)! (a1 dx1 + a2 dx2 ) dx3 ... dx2k + ...
f = x1 dx2 + ... + x2k3 dx2k2 + dx2k1 ,
+ (k 1)! (a2k1 dx2k1 + a2k dx2k ) dx1 ... dx2k2 .
so we will obtain The unknown 1-form d ln f can then be written as
1 2k
= (x1 dx2 + ... + x2k3 dx2k2 + dx2k1 ) X
f d ln f lj dxj ,
= y1 dy2 + ... + y2k3 dy2k2 + y2k1 dy2k , (1.30) j=1

where we simply relabeled the coordinates as where lj are unknown coefficients depending on {x1 , ..., x2k }.
Using this explicit representation, we compute
1 1
y1 x1 , y2 x2 , ..., y2k1 , y2k x2k1 . (1.31) (k1)
f f (d ln f ) (d) = (k 1)! (a1 l2 a2 l1 + ...
+a2k1 l2k a2k l2k1 )dx1 ...dx2k
By assumption, the 1-form has rank 2k and thus
k k
and hence is proportional to the 2k-dimensional volume form
(d) 6= 0, (d) = 0.
k
(d) = k! dx1 ...dx2k .
k
Since (d) 6= 0, it will follow from Eq. (1.30) that all
{y1 , ..., y2k } are independent local coordinates, and thus a de- It is convenient to introduce an auxiliary vector field
composition of the form (1.27) will be found.

It remains to determine f such that the 1-form f has rank v a2 + a1 ... a2k + a2k1
2k 1. It is sufficient to find some function f such that x 1 x 2 x2k1 x 2k

k k and write
0 = (d(f )) = (df + f d)
(k1)
= kf k1 df (d)
(k1) k
+ f k (d) . (d ln f ) (d) = (k 1)! (v d ln f ) dx1 ...dx2k .
(1.33)
Thus the function f must satisfy the differential equation Since both sides of Eq. (1.32) are proportional to the volume
form, the equation is simplified to
1
(d ln f ) (d)(k1) = (d)k . (1.32) v d ln f = v ln f = 1.
k
To solve this equation for f (or rather, to show that a so- Note that the field v can be also defined without coordinates
lution exists), we need to use some additional information by the requirement
about . We know that the 2-form d has rank 2k. There- h i
(k1)
fore, by the Darboux theorem we may choose local coordi- v (d) = 0.
nates {x1 , ..., xn } such that
In other words, v is a vector field that annihilates the nonzero
d = dx1 dx2 + ... + dx2k1 dx2k . (2k 1)-form (d)(k1) . (In general, a k-form vanishes

26
1.5 Metric

(j) (j)
on an (n k)-dimensional hypersurface, so a (2k 1)-form there corresponds a set of n tangent vectors {v1 , ..., vn }.
vanishes on a 1-dimensional subspace. The vector v is a basis The parallelepiped numbered j contributes a small amount to
vector in that subspace.) the integral; this amount is equal to some number (the value
We found that only restriction on the function f is that its of the function being integrated) times the n-dimensional vol-
logarithmic derivative in the direction of v must equal 1. ume of the parallelepiped. It is natural to regard the n-volume
Now it is clear that a solution f exists (at least locally). A of a parallelepiped spanned by {v1 , ..., vn } as an n-form eval-
particular solution f can be determined by integration along uated on the vectors {vi } (see Sec. 1.4.1). So the contribu-
the flow lines of v from arbitrary initial conditions. This com- tion of the j-th parallelepiped to the integral can be naturally
pletes the proof. described as an application of some n-form to the n vectors
(j) (j)
{v1 , ..., vn }. In the integral we are evaluating, this
R n-form is
1.4.8 The Poincar lemma the given n-form . Hence, to define the integral A we eval-
(j) (j)
uate the contribution (v1 , ..., vn ) of each parallelepiped
The Poincar lemma states that a closed n-form is locally j and then compute the sum of theRcontributions over all the
exact: there exists an (n 1)-form such that = d in a parallelepipeds. Thus, the integral A is defined as the limit
star-shaped neighborhood of some point. N of that sum,
A domain D is a star-shaped neighborhood of a point p if
there exists a diffeomorphism flow ft that contracts the do- Z N
X (j)
main D into the point p. In other words, ft is a continuous lim (v1 , ..., vn(j) ).
A N
set of diffeomorphisms, defined for t [0, 1] and such that j=1
f0 = id and f1 maps the entire neighborhood D into the sin-
This limit is quite similar to the limit involved in the ordinary
gle point p.
definition of the Riemann integral. So integrals of n-forms are
Suppose that D is a star-shaped neighborhood of p and a
equivalent to n-fold integrals in the usual sense. However, an
suitable flow ft is given. Let vt be the tangent vector field of
important caveat is that integrals of n-forms are always per-
the flow ft . Suppose that is an exact n-form defined in D.
formed over oriented manifolds. Since n-forms are antisym-
The flow ft transforms into a t-dependent n-form t that
metric, an integral will change sign when the orientation of
satisfies 0 = , 1 = 0, and
the manifold is reversed.
d The fundamental relationship between integration and the
t = Lvt t . exterior differential is
dt
Z Z
Since d = 0, we also have dt = 0 since d commutes with d = ,
diffeomorphisms. Using the Cartan homotopy formula, we A A
find
d where A is an n-dimensional manifold (or part of a manifold),
t = (dvt + vt d) = dvt . A is its (n 1)-dimensional boundary, and is an arbitrary
dt
(n 1)-form. This theorem is a generalization of the Stokes
Now we integrate this relationship over t,
and the Gauss laws, as well as the fundamental theorem of
Rb
Z 1 
d
 Z 1 calculus a f (x)dx = f (b) f (a). Since this is a standard
= 1 0 = t dt = d (vt ) dt. result, we refer the reader to textbooks (such as [1]) for more
0 dt 0
detailed explanations of this theorem.
Defining
Z 1
(vt ) dt, 1.5 Metric
0

we find the required relationship, = d. 1.5.1 Motivation: metric on surfaces


I will introduce the concept of a metric by starting from the
1.4.9 Integration of forms picture of a manifold as a surface embedded in a Euclidean
space Rn . The natural Pythagorean notion of distance in Rn ,
A differential n-forms can be integrated over a manifold of
namely the Euclidean metric, defines the distance between
dimension n: namely, 1-forms can be integrated over curves,
two points a, b Rn as
2-forms over surfaces, etc. The integral of a given n-form
over a given manifold A of dimension n is denoted by q
Z Distance (a, b) = (b1 a1 )2 + ... + (bn an )2 , (1.34)
. where an and bn are standard coordinates of the points a, b.
A
Let us consider a surface M embedded into Rn . We would
Note that the differential d is not written after the integral like to compute the distance between two points a, b M, as
sign because the n-form already contains the correct number measured along some path within the surface. However, if the
of ds. R surface has a curved shape then one does not have a simple
The integral A is defined by the following procedure. and general formula analogous to Eq. (1.34) for distances mea-
One starts by splitting the manifold A into a large number sured within the surface. Nevertheless, if a pair of points a, b
N of small n-dimensional paralellepipeds. If all the par- on the surface M are very close to each other, we may con-
allelepipeds are sufficiently small, one can approximate the sider a short, almost straight curve segment within M con-
sides of parallelepipeds by tangent vectors (see Sec. 1.2.5). necting these points. The length of this curve segment can
Thus, to each parallelepiped numbered j (where j = 1, ..., N ) be accurately estimated using the Euclidean formula (1.34).

27
1 Calculus in curved space

Then the length of any curve within the surface can be deter- Now we note that Eq. (1.35) looks like a quadratic form ap-
mined by splitting the curve into sufficiently small segments. plied to the two-dimensional vector
More precisely, one can integrate the infinitesimal lengths of  
infinitesimal segments along the curve. dx dy
, T( )M.
Let us explore this idea in more detail. To be specific, d d
let us consider a two-dimensional manifold M embedded
Recalling the correspondence between tangent vectors from
in R3 as a surface z = F (x, y) in standard coordinates
Tp M and short curve segments (see Sec. 1.2.5), we can verify
{x, y, z}. We are interested in computing the length of a
that the tangent vector v defined by
curve ( ) M. The curve is specified by three functions
 
{x( ), y( ), z( )}. Consider a very short curve segment be- dx dy
tween = 0 and = 0 + , where is very small. v ( ) = ( ) x + y T(0 ) M
d d
This segment is an almost straight line between the close-by
points {x1 , y1 , z1 } {x(0 ), y(0 ), z(0 )} and {x2 , y2 , z2 } is the vector representing the short curve segment between
{x(0 + ), y(0 + ), z(0 + )} in R3 . So we may compute (0 ) and (0 + ). Then L can be expressed as
the length L of the segment approximately as the Euclidean p p
distance between these points, L = g(, )
= g(v, v),
q
2 2 2 where g is the following bilinear form,
L (x2 x1 ) + (y2 y1 ) + (z2 z1 ) .
g Adx dx + B(dx dy + dy dx) + Cdy dy.
Further, we may approximate
Thus L (the length of a short curve segment) can be ex-
dx
x2 x1 = x(0 + ) x(0 ) (0 ), etc., pressed through the bilinear form g alone, without using any
d other information (such as the function F (x, y) or the embed-
and hence ding of M in the Euclidean space R3 ). The length of the curve
s 2  2  2 between 0 and 1 can be expressed as
dx dy dz
L (0 ) + (0 ) + (0 ) . Z 1 p
d d d L[] = d g(, ).

0
In the limit 0, this expression becomes precise since the
error is of order 2 . So the length of the curve ( ) between We conclude that the only information necessary to com-
= 0 and = 1 can be found as an integral over the in- pute lengths along curves within a surface M is a symmet-
finitesimal curve segments, ric bilinear form g defined in every tangent space Tp M. This
s bilinear form g is called the metric on the manifold M. In
Z 1
dx
2 
dy
2 
dz
2 our example, once we computed the metric g (which, in a lo-
L [] = d ( ) + ( ) + ( ) . cal coordinate system, is equivalent to determining the three
0 d d d
functions A, B, C), we may stop using the embedding of M
3
So far we have used all three coordinates x, y, z to describe in R because it is sufficient to use the local coordinate sys-
the curve. However, the pair {x, y} can be used as local coor- tem {x, y}. This simplifies the calculations since one needs to
dinates on M, and so we would like to be able to compute work with fewer coordinates, and also allows one to concen-
lengths of curves without using the coordinate z. In other trate on the intrinsic properties of the manifold M. Below we
words, we would like to use only an intrinsic description of will not use the embedding picture to the metric; instead, we
the manifold M. The metric represents the information one will work directly with the metric g, assuming that somehow
needs to be able to compute lengths in the intrinsic descrip- g is given. It will be unimportant whether or not the metric g
tion. Let us now see what information that must be in our comes from an embedding of M in a larger Euclidean space.
example. An embedding of M into Rn can be regarded as merely an
Given the equation z = F (x, y), we may express auxiliary construction that helps visualize the concept of met-
ric.
dz( ) F (x, y) dx( ) F (x, y) dy( )
= + ,
d x d y d
1.5.2 Definition
where it is implied that the derivatives of F are evaluated on
A metric on a manifold M is a nondegenerate, symmetric bi-
the curve ( ). Hence, the length L of an infinitesimal curve
linear form g(u, v) defined in the tangent space Tp M at each
segment can be written as
point p M. A symmetric bilinear form g is called nonde-
s 
2  2  2 negerate if for any vector u 6= 0 there exists a vector v such
dx dy F dx F dy
L = + + + that g(u, v) 6= 0; in other words, no vector can be orthogonal
d d x d y d to every other vector.
s  
dx
2
dx dy

dy
 2 The nondegeneracy condition prohibits uninteresting
= A + 2B +C , (1.35)
metrics, such as g = 0. Also, it is important to note that the
d d d d
nondegeneracy condition permits a vector x to be specified
where we introduced auxiliary functions A, B, C expressed uniquely through its scalar products g(v, x) with other vec-
directly through F as tors v. (If there were two vectors x 6= x such that g(v, x) =
g(v, x ) for all v, then the vector x x 6= 0 would be orthogo-
 2  2
F F F F nal to every vector, but this is excluded by the assumption of
A(x, y) 1 + , B , C 1+ . nondegeneracy of g.)
x x y y

28
1.5 Metric

It is known from standard linear algebra that symmetric Here are some further examples of metrics describing space-
bilinear forms have signature, which is a set of signs in- times important in physics.
dependent of the choice of a basis. In GR we shall usu- A Schwarzschild spacetime in the coordinates {t, r, , } is
ally consider four-dimensional metrics with the signature described by the metric
(+ ) as appropriate for a locally Lorentzian physical the-    1
ory, but many results will be the same for arbitrary dimen- 2M 2 2M 
g = 1 dt 1 dr2 r2 d2 + (sin2 )d2 .
sion and signatures of the metric. Manifolds with a metric r r
with signature (+ + ...+) are called Riemannian, and pseudo- (1.36)
Riemannian if the metric is not sign-definite. The familiar This spacetime is generated by a nonrotating black hole of
metric with a Lorentzian signature is the Minkowski metric, mass M centered at r = 0.
= diag(1, 1, 1, 1). A de Sitter spacetime in spatially flat coordinates {t, x, y, z}
has the metric
Self-test question: The Minkowski metric admits null
g = dt2 e2Ht (dx2 + dy 2 + dz 2 ). (1.37)
vectors n such that (n, n) = 0; for example, n = {1, 1, 0, 0} is
a null vector. Does this mean that the metric is degenerate? This spacetime is used to describe (approximately) the
Answer: No.  universe that is undergoing an accelerated expansion
In GR, the metric describes physically measured lengths (cosmological inflation).
and time intervals. We focus on the mathematical properties
of the metric for now. Remark: In GR, a physical spacetime M is a four-
dimensional manifold with a metric g; the metric should have
the Lorentzian signature (+ ). A simple-minded view
1.5.3 Examples of metrics of a curved spacetime M is that M = R4 with coordinates
As a first example, consider a flat Minkowski space M R4 {x } {t, x, y, z}, and a metric g is specified through the com-
with standard coordinates {t, x, y, z}. The Minkowski metric ponents g (t, x, y, z). However, this picture is insufficiently
g can be specified by the following index-free expression, general; for instance, spacetimes containing black holes do not
have this simple structure. In some cases, it turns out that the
g(u, v) = (u t) (v t) (u x) (v x) full manifold M is not covered by the coordinate system {x }
(u y) (v y) (u z) (v z) . in which a metric g was originally specified. 

In this expression, the coordinates {t, x, y, z} are interpreted


1.5.4 Orthonormal frames
as scalar functions on the manifold M, and the vector fields
u, v are interpreted as derivative operations applied to these This section is a very brief introduction to the notion of or-
scalar functions. We can then directly compute thonormal frames. This subject is more extensively developed
in Sec. 6.1.1.

g( , ) = 1, g( , ) = 0, g( + , + ) = 0, In each tangent space Tp M, one can choose an orthonormal
t t t x t x t x basis {ea }, where a is a label enumerating the basis vectors.
etc. This metric can be also written as We thus obtain a set of vector fields.7 Such a basis of vector
fields is called an orthonormal frame (in four dimensions, a
g = dt dt dx dx dy dy dz dz, tetrad or a vierbein). Orthonormality means that
where dt, dx, dy, dz are the 1-forms defined through the coor- g(ea , eb ) = ab ,
dinate functions t, x, y, z.
where ab is a diagonal matrix having diagonal elements
Remark on notation: In the physics literature, one usually equal to 1, depending on the signature of the metric g. For
finds the Minkowski metric written as follows, instance, in GR one uses metrics with Lorentzian signature
(+ ), so ab is the matrix
ds2 = dt2 dx2 dy 2 dz 2 .
1
Strictly speaking, this notation is inconsistent; for instance, dt2 1
ab = diag(1, 1, 1, 1) = .
stands for the bilinear form dt dt, and yet ds2 cannot be 1
interpreted as ds ds, where ds is a 1-form. Equations of this 1
kind should be understood as being no more than a traditional
(or jargon) notation for the metric, in which ds2 stands The dual basis consists of 1-forms { a } that can be ex-
for the bilinear form g. Also, physicists write dt dx for the pressed as follows,
symmetric tensor product 12 (dt dx + dx dt). We will fre- a u aa g(ea , u).
quently use the physicistss metric notation for brevity (but
we will denote the metric by g instead of the inconsistent The 1-forms {a } are a basis in the cotangent space Tp M. The
ds2 ). Thus, we will write dx2 and dt dx when the rigorous metric g, as a rank (0,2) tensor, can be recovered from the basis
but cumbersome notation dxdx, 21 (dt dx + dx dt) does { a } as
X X
not yield any computational advantages. In those cases, we g= ab a b ; g 1 = ab ea eb . (1.38)
write dx with an italic d since we are not actually invok- a,b a,b
ing the exterior differential d. 
More complicated metrics may involve coefficients at the A derivation of Eq. (1.38) can be found in Sec. 6.1.1.
coordinate 1-forms, or non-diagonal terms, e.g. 7 We assume that the vector fields {e } are smooth at least within some patch
a
of M, if it turns out to be impossible to choose a smooth orthonormal
g = f1 (t, x, y, z)dt2 + f2 (t, x, y, z)dt dx + ... frame throughout the entire manifold M.

29
1 Calculus in curved space

1.5.5 Correspondence of vectors and In curved space, the role of this symbol is played by a totally
covectors antisymmetric tensor field, called the Levi-Civita tensor.
Consider a four-dimensional manifold (to be specific and to
A metric g determines a one-to-one map between vectors and avoid unnecessary complications in the notation). In four di-
covectors (and thus between vector fields and 1-forms). I de- mensions, the Levi-Civita tensor has rank four and is defined
note this map by g, so gv is the 1-form corresponding to a as the 4-form
vector field v. By definition, the 1-form gv acts on vectors x = 0 1 2 3 ,
as follows,
g v) x g(v, x).
( where {a } is an orthonormal basis of 1-forms (see Sec. 1.5.4).
We now review the motivation for this definition and the
Then the dual basis 1-forms { a } are expressed as a = properties of the tensor .
aa gea . In a three-dimensional Euclidean space, the Levi-Civita
Since the correspondence between vectors and 1-forms is symbol ijk is closely related to volume. It is known that
one-to-one, there exists the inverse map (from 1-forms to vec- one can calculate the (oriented) volume of a parallelepiped
tors), denoted g1 . This map converts a 1-form into the vec- spanned by three vectors {a, b, c} by using the following ex-
tor v = g1 such that plicit formula,
g 1 , u) u
X
g(v, u) g( Vol (a, b, c) = a (b c) = ijk ai bj ck ,
i,j,k
for any vector u. Then the scalar product is also defined on
1-forms and is denoted by g 1 : where ai , bj , ck are the Euclidean components of the three vec-
tors (i.e. components in an orthogonal basis). We have seen in
g 1 (1 , 2 ) g(
g 1 1 , g1 2 ).
Sec. 1.4.4 that the oriented volume of such a parallelepiped is
Note that the dual basis {a } is orthonormal with respect to equal to the value of a 3-form evaluated on the vectors a, b, c.
the scalar product g 1 . Thus, the array of numbers ijk can be interpreted as the array
of components of the 3-form in the orthogonal basis. Then
Remark: In the index notation, the scalar product form g 1 we have
is specified by the matrix g , which is inverse to the matrix Vol (a, b, c) = (a, b, c) .
g representing the components of the metric g. 
Let us generalize this construction to a curved, four-
For a scalar function f , the 1-form df called the gradient of
dimensional manifold M. In a tangent space at a fixed point
f acts on vectors x as
p M, we look for a totally antisymmetric form such that
(df ) x x f, (v1 , v2 , v3 , v4 ) is equal to the oriented 4-volume of the paral-
lelepiped spanned by the tangent vectors v1 , v2 , v3 , v4 . If {ea }
and the corresponding vector field g1 (df ) may be called the is an orthonormal basis with a positive orientation then the
contravariant gradient of f . 4-volume of the parallelepiped spanned by {ea } is equal to 1.
Thus, we are looking for a 4-form such that
Example: For a vector field x and a scalar function f , we
have (e0 , e1 , e2 , e3 ) = 1.
x f = g(x, g1 df ) = g 1 (
g x, df ).
Since a eb = ba , it is clear that = 0 1 2 3 is one
The derivative of a scalar function in thedirection of the such 4-form.
vector g1 df , i.e. the scalar quantity g1 df , can be also We defined through a particular basis {a }, so we need
expressed as to investigate whether and to what extent depends on the
  choice of the basis. Of course, changes sign when the ori-
g1 df = g 1 (df, d) = g1 d f. entation of the basis is reversed. It turns out that does not
 actually depend on the choice of the basis, as long as the ori-
entation of the basis is fixed (see Statement 1.5.6.1 below). In
Practice problem: Suppose S(v) is a transformation-valued particular, the orthonormal frame {e } may be defined only
1-form such that for any vectors u, v, w we have locally, i.e. different frames {e } need to be used in different
parts of the manifold M. Nevertheless, the tensor is defined
S(v)w = S(w)v, uniquely and globally, i.e. throughout the entire manifold, as
g(S(v)w, u) = g(w, S(v)u). long as the manifold is globally orientable. A manifold is
globally orientable if a choice of orthonormal frame can be
Show that these conditions are satisfied only if S(v)w 0 for made in every chart such that the orientation of the orthonor-
all v, w. mal frames is the same in every overlapping region between
Hint: Define the auxiliary trilinear form S(v, w, u) two charts. A Mbius strip is an example of a manifold that
g(S(v)w, u) and investigate its symmetries.  is not globally orientable. It seems that in General Relativity
one never needs to consider nonorientable manifolds.
1.5.6 The Levi-Civita tensor Statement 1.5.6.1: The Levi-Civita tensor
Ordinary vector algebra in three-dimensional space makes = 0 1 2 3
use of the Levi-Civita symbol ijk , which is defined as the
totally antisymmetric array of numbers, is invariant under changes of orthonormal basis { a }, except
that it changes sign when the orientation of the basis is re-
123 = 1, ijk = ikj = jik , i, j, k = 1, 2, 3. versed.

30
1.6 Affine connection

Proof: Let us consider a linear transformation T that brings The main use of the Levi-Civita tensor in GR is to compute
the basis {ea } into another basis {
ea }, 4-dimensional volume in curved space.8 Since the tangent
vectors are an approximate representation of short curve seg-
a = Tea ,
e a = T1 a . ments (Sec. 1.2.5), the 4-form yields a good approximation to
the 4-dimensional volume for very small volumes. Therefore,
By definition of the determinant (see Sec. 1.4.5), the oriented one can compute the 4-dimensional volume of a spacetime do-
volume of the parallelepiped spanned by { ea } is equal to main by integrating the 4-form over that domain, and one

det T . Thus, if the new basis { ea } is orthonormal and has the can also integrate scalar functions over a domain. The integra-
same orientation as the old orthonormal basis, the oriented tion is defined purely geometrically, regardless of the choice of
volume of the parallelepiped spanned by { ea } is also equal to the coordinate system, once the metric g is fixed. For this rea-
1. Hence, we must have det T = 1. On the other hand, son (independence of coordinates), physicists know the Levi-
Civita tensor by the name covariant volume element.

0
1 2 3 = (det T 1 )0 1 2 3 = , Sometimes the tensor is simply called the volume 4-
form since the tensor |p evaluates the volume in each tan-
by the same logic as gent space Tp M. It is clear that the construction of the Levi-
Civita tensor can be straightforwardly generalized to an N -
1 e
e 2 e 3 e 4 = (det T)e1 e2 e3 e4 . dimensional manifold with a given metric and orientation. By
this generalization one obtains an N -form, which is frequently
Therefore the new 4-form defined through the new basis is
denoted by the symbol Vol as I have done above. Below I
equal to the old 4-form .  will also sometimes denote the Levi-Civita tensor by Vol, par-
The Levi-Civita tensor is independent of the choice of an or- ticularly when it is used for integration over a manifold. I will
thonormal basis { a }, but it does depend on the metric g. Since use the notation when I need to use its properties as a 4-
all 4-forms in four-dimensional space are proportional to each form, e.g. when I need to evaluate it on some tangent vectors.
other, the Levi-Civita tensors determined by different metrics
will be proportional to each other. So the dependence of on
the metric can be described by a single scalar function. This 1.6 Affine connection
description is usually given in terms of the matrix g that
represents the metric g in a local coordinate system {x }. By
1.6.1 Motivation
definition, the matrix elements g are equal to scalar prod-
ucts g( , ), where { } {/x } is a coordinate basis. To formulate the laws of physics, we need to write differential
Let us now determine the explicit formula for the Levi-Civita equations for some vector and tensor fields on the spacetime
tensor as a function of the metric g. manifold. These differential equations involve derivatives of
We used an orthonormal basis {a } of 1-forms to write the these vector or tensor fields in various directions. Thus we
4-form , but actually we can use any other basis: since all 4- need to have a directional derivative operation that acts on
forms are proportional to each other, we only need to include arbitrary tensors.
a correct scalar factor. So let us use the basis {dx }. Using this One could think at first that the Lie derivative Lv is to be
basis, the 4-form can be written as used as a directional derivative operation. However, Lv is
not a true directional derivative because it depends on the be-
= f dx0 dx1 dx2 dx3 f d4 x, havior of the vector field v in the neighborhood of a point p
4 and not only on the value v|p . A true directional derivative
where f is some function and d x is a shorthand notation for
of a tensor A in a direction v at a point p must depend only
the 4-form made of the basis 1-forms. It remains to determine
on the value v|p at the given point. Defining such a true di-
the unknown coefficient f . To do that, we can use a suit-
rectional derivative requires some information about how to
able modification of Statement 1.4.5.2, which shows that the
connect neighbor tangent spaces Tp M and Tp M, where p
4-volume of the parallelepiped spanned by the tangent vec-
p and p are near-by points in M. If such a connection be-
tors { } is equal to |det g |. The factor reflects the
tween the neighbor tangent spaces were defined, tensors at
fact that the volume is oriented and may be negative if the
different points could be subtracted. For instance, A|p A|p
coordinate basis { } has a negative orientation. The abso-
would be defined as a tensor in Tp M, and the derivative of a
lute value |det g | is needed to compensate for the signature
tensor A along a curve at a point p (0 ) could be defined
of the metric g, which might make the determinant negative.
as the limit
Thus
q A((0 + )) A((0 ))
f (0 , 1 , 2 , 3 ) = Vol(0 , 1 , 2 , 3 ) = |det g |, lim .
0

hence the final formula for in local coordinates (assuming a Such a true directional derivative operation can be defined.
positive orientation of the coordinate basis) is It is usually denoted (pronounced nabla or del) and
called a covariant derivative, an affine connection, or simply
q
a connection. Thus, u A|p is the derivative of a tensor A in
= |det g |dx0 dx1 dx2 dx3 .
the direction of the vector u at a point p. The quantity u A|p
is a tensor and does not depend on derivatives of u itself.
In this way, the functional dependence of on the metric g is
It turns out that a connection is not unique because
made explicit.
there are infinitely many ways to connect the neighbor
It is useful to note that d = 0. This is so because d is a
5-form, and there are no nonzero 5-forms in four-dimensional 8 Thereare other, more advanced applications of the Levi-Civita tensor, such
space. as the Hodge star operation, which I will consider below.

31
1 Calculus in curved space

tangent spaces, i.e. there are infinitely many possible maps tion now means only a contraction of tensors (e.g., appli-
Tp M Tp M. Such a connection map can be described by cation of a 1-form to a vector) and not the derivative operation
a transformation-valued 1-form v f.

C(u) : Tp M Tp M, Remark: Unlike the Lie derivative, the connection is not


expected to satisfy the property
where u is the tangent vector at point p in the direction of
point p . Thus, there are as many connections as maps C. In u (v f ) 6= (u v) f + v (u f ) .
a local coordinate system, the map C is represented by a quan-
If we assumed that this contraction property holds, we would
tity with three indices, C , but it is not a tensor in a single
be forced to have u Lu . 
tangent space at p: by definition, it is a map connecting differ-
ent tangent spaces. (The map C is related to the non-tensorial
Christoffel symbol; see Sec. 1.6.3 below.) 1.6.3 The coordinate derivative connection
In General Relativity, one formulates the laws of physics us-
The above properties of the operation do not yet uniquely
ing a particular choice of the connection (called the Levi-
specify that operation. There are infinitely many possible
Civita connection, see Sec. 1.6.6 below). This connection
affine connections. A simple way to define a connection is
has several physically motivated and technically convenient
to fix a local coordinate system {x } and set
properties with regard to the given metric g on the spacetime
manifold. So the Levi-Civita connection directly involves ... ...
the metric g and cannot be defined unless a metric is given. ; (v A) v A v A...,
x x
A visual motivation for the Levi-Civita connection, using
the idea of embedding a curved manifold in a flat space, is for any tensor A. In other words, the component A...;
given in Sec. A.4.3 (Appendix A). Presently, I approach the of the covariant derivative of A is calculated as the partial
definition of the Levi-Civita connection in a somewhat more derivative with respect to x of the component A... of the
abstract way: I first formulate the desirable properties of con- tensor A in the fixed coordinate system {x }. It is customary
nections in general and then impose suitable conditions to de- to use the semicolon in the expression A ; before the extra
duce the Levi-Civita connection. index introduced by the covariant derivative.
Since we are studying the general properties of tensors, we In the index-free notation, we denote the derivative with
will avoid working in a local coordinate system; instead, we respect to the coordinates in a fixed coordinate system by .
will perform calculations in a coordinate-free manner. Thus, the vector u v is defined through the components in
the fixed coordinate system {x } as
1.6.2 General properties of connections (u v) = u v , .
We would like to define a derivative operator v which maps In the coordinate system {x }, the derivative is just the
scalars to scalars, vectors to vectors, etc., and depends only on usual derivative, so it is easy to see that satisfies all the prop-
the direction of the vector v at a point p. The condition that erties of a connection. Thus, is a well-defined affine connec-
v A should not contain any derivatives of v can be written as tion which we call the coordinate derivative connection de-
termined by the coordinate system {x }. We stress that the
(v) A = v A,
connection is tied to a particular coordinate system {x }. In a

where is an arbitrary scalar function. If we had such an op- different coordinate system {y }, the components
of the vec-

eration v , we could define the gradient operator which tor u v do not have the same simple form u v /y . Thus,

maps scalars to 1-forms, vectors to (1,1)-tensors, and gener- for each coordinate system {x } we have a different coordinate
ally tensors of rank (m, n) to tensors of rank (m, n + 1). In the connection .
index notation, the symbol will have its own index; e.g. the Remark: The notation does not indicate explicitly the
gradient of a tensor A is a tensor of rank (0,3) denoted by coordinate system {x } through which is defined. So it is
A A; . important to identify that coordinate system every time one
Remark: Vectors are always boldface in our index-free con- uses . We will rarely make use of , but when we do, we will
vention, so there should be no confusion between expressions make it clear which coordinate system {x } is implied in the
such as a and v a. The former is the index representa- definition of . 
tion of the rank-two tensor a, while the latter is the covari- Let us also investigate how an arbitrary connection is re-
ant derivative of the vector field a in the given direction v. lated to the coordinate derivative . It is easy to verify that
The index representation of v a is v a; . The derivative of a acts on vectors v linearly (without involving derivatives
scalar function f in the direction of a vector v is equivalently of v),
written as v f = Lv f = v f = (df ) v, according to the u (v) u (v) = (u v u v) .
convenience of the moment. The index representation of v f Hence, for a fixed vector u the operator u u is a lin-
is v f, v f; .  ear transformation. Equivalently, we can say that is
Of course, v should also act on scalar functions as the a transformation-valued 1-form and write
usual directional derivative, i.e.
; (u u ) v = (u) v.
v f = v f.
The tensor is called the Christoffel tensor of the connection
In addition, the operator v should have the properties anal- with respect to the coordinate system {x } in which is the
ogous to Eqs. (1.2)-(1.4), (1.10)-(1.11), except that the contrac- coordinate derivative.

32
1.6 Affine connection

For a fixed coordinate derivative , various choices of will It is not necessarily true that an arbitrary connection satis-
give different connections . Thus, the Christoffel tensor fies this property. To describe the extent of the deviation from
parametrizes all the possible covariant derivatives . this property, one defines the torsion tensor,

Practice problem: Suppose that two different connections T (u, v) u v v u [u, v] , (1.41)
and are given (not necessarily coordinate derivative con-
nections in any coordinate system). Using the defining prop- which is a vector-valued 2-form (see Statement 1.6.5.1 below).
u v u v
erties of a connection, show that the difference The property (1.40) is then equivalent to the requirement that
does not depend on derivatives of v and can thus be described T (u, v) = 0 for all u, v; this is called the torsion-freeness of
as a linear transformation of v (for a fixed u).  the connection . Thus, if a connection is torsion-free, we
have the relation
1.6.4 Compatibility with the metric
Lu v [u, v] = u v v u. (1.42)
To formulate the laws of physics, we need to write differential
equations for tensors on the spacetime manifold, and thus we This relation is similar to Eq. (1.12), except that now covariant
need to use a directional derivative . One possible choice of derivatives are used. Note that the coordinate derivative
is the coordinate derivative in a fixed coordinate system (defined with respect to a coordinate system {x }) is always
{x }. But the connection is inconvenient not only be- a torsion-free connection because coordinate derivatives com-
cause it depends on a fixed coordinate system (while the laws mute.
of physics are expected to be independent of coordinates), but
Statement 1.6.5.1: The function T (u, v) as defined by
also because it is in general incompatible with the metric in the
Eq. (1.41) does not actually involve derivatives of the vectors
following sense.
u and v. Namely, for an arbitrary scalar function f , we have
Consider a vector v which is locally constant in the direc-
T (f u, v) = f T (u, v) and T (u, f v) = f T (u, v).
tion of some other vector u, i.e. u v = 0. Then it is natural
to expect that the length of the vector v is also constant in the Proof of Statement ??: For arbitrary scalars f, and vectors
direction of u, i.e. that u g(v, v) = 0. However, this property u, v we have
does not necessarily hold, since we will generally have
v (f u) = (v f ) u + f v u = (v f ) u + f v u,
u g(v, v) = (u g) (v, v) + g(u v, v) + g(v, u v)
f u v = f u v,
= (u g) (v, v) 6= 0.
[f u, v] = f u (v ) v ((f u) )
This inconvenience will be avoided only if u g = 0 for any = f [u, v] (v f ) (u ) ,
vector u. The property g = 0 is called compatibility with
the metric (or metricity of the connection). An equivalent and it follows that
way to write this property is
T (f u, v) = f T (u, v) ((v f ) u) + (v f ) (u )
u g(v, w) u g(v, w) = g(u v, w) + g(v, u w). (1.39)
= f T (u, v).
In other words, g behaves as a constant under a covariant
derivative u , and only v and w are differentiated when one The analogous property for v will follow from the antisym-
computes u g(v, w). metry of T . 
We have seen that all possible connections are parameter-
ized as = + , where is a rank 3 tensor (the Christoffel
1.6.5 Torsion and torsion-freeness
tensor) with respect to a fixed coordinate system {x } where
Usual partial derivatives of scalar functions with respect to co- is the coordinate derivative. We can now compute the torsion
ordinates x (in any local coordinate system) commute, tensor T through . Since is torsion-free, it follows immedi-
ately from Eq. (1.41) that
f = f.
T (u, v) = (u)v (v)u.
We may ask whether the same property holds for covariant
derivatives, To visualize this property, we may interpret as a vector-
f = f. (1.40) valued bilinear form (u, v) (u)v, and then the property
At the moment this property is written using the index no- T (u, v) = 0 becomes
tation. To reformulate the property (1.40) in a geometric
way (without introducing coordinates explicitly), we contract (u, v) = (v, u).

Eq. (1.40) with two arbitrary vectors u v and rewrite the re-
sult using covariant derivatives, Thus a torsion-free connection is characterized by a sym-
metric Christoffel tensor.
u v f = u (u f ) (u u ) ( f )
Practice problem: For a given affine connection , does
= u (v f ) (u v) f.
there always exist a coordinate system {x } where is equal
Thus, the property (1.40) is equivalent to to the coordinate derivative?
Hint: Consider the torsion of the connection . Answer:
(u v v u) f = u (v f ) v (u f ) = [u, v] f. No. 

33
1 Calculus in curved space

1.6.6 Levi-Civita connection so it vanishes. Thus, x y = 0. Similarly, one can show


that x z = 0, etc. In other words, the covariant derivative
A covariant derivative which is torsion-free and compat- of every basis vector in any direction equals zero! Now, we
ible with the metric is called the Levi-Civita connection or consider an arbitrary vector field
the metric connection. In General Relativity, this is the con-
nection used to formulate differential equations for physical v = Ax + By + Cz ,
quantities. The properties of metricity and torsion-freeness
are imposed for several reasons, including mathematical sim- where A, B, C are functions of x, y, z. We compute
plicity. Ultimately, these properties are physical assumptions,
that is, hypotheses verified by experiments. See Sec. A.4.5 in x v = (x A) x + (x B) y + (x C) z ,
Appendix A for more discussion and motivation.
We now derive an explicit formula for the Levi-Civita con- since the derivatives x of the vectors x , y , z vanish. In
other words, the Levi-Civita connection x simply differen-
nection, starting from the properties (1.39) and (1.40). It will
follow that such a connection is unique. Later we shall see tiates the components (A, B, C) of the vector v in the direction
how to describe other connections. x. This is the behavior familiar from standard vector analysis.
Thus, the Levi-Civita connection x in flat space coincides
with the familiar directional derivative /x.
First examples As a less trivial example, let us consider a two-dimensional
space with a local coordinate system (x, y) and the metric
Before presenting a general derivation, let us make the dis-
cussion less abstract by considering examples. First we ver- g = dx2 + f (x)dy 2 ,
ify that the Levi-Civita connection in the Euclidean space is
just the ordinary, familiar directional derivative. Consider the where f (x) 6= 0 is some (known) function that depends only
three-dimensional space R3 with coordinates {x, y, z} and the on x. Let us compute x y using the same approach as
metric above. We find
g = dx2 + dy 2 + dz 2 ,
1
and let us evaluate some covariant derivatives. We will be us- g(x y , x ) = g([x , y ] + y x , x ) = y g(x , x ) = 0;
2
ing only the assumed properties of the Levi-Civita connection 1 1
and see where this will lead us. g(x y , y ) = x g(y , y ) = f (x).
2 2
Suppose we need to compute the vector x x . The way
to compute it is to evaluate scalar products of this vector with In this way, the vector x y is completely determined
other vectors. For instance, using the compatibility of with through its scalar products with basis vectors x and y :
the metric, and the properties g(x , x ) = 1, g(x , y ) = 0 etc.,
1 f (x)
we find x y = y .
2 f (x)
1 1
g(x x , x ) = g(x x , x ) + g(x , x x ) Similarly, one may evaluate other covariant derivatives, such
2 2
1 1 as
= x g(x , x ) = x (1) = 0.
2 2
g(y y , y ) = 0,
Using the torsion-freeness and the fact that [x , y ] = 0, we 1
also find g(y y , x ) = g(y , y x ) = g(y , x y ) = f (x).
2
g(x x , y ) = x g(x , y ) g(x , x y ) Thus
1
= 0 g(x , [x , y ] + y x ) y y = f (x)x .
2
1
= y g(x , x ) = 0.
2
Derivation of the Levi-Civita connection
In this way, it is easy to see that x x has zero scalar products
with every basis vector; thus it is itself zero, x x = 0. In the examples above, we have computed covariant deriva-
Let us now consider x y . We have already seen that tives of some vectors by using only the defining proper-
g(x y , x ) = 0, and further we obtain ties (1.39) and (1.42) of the Levi-Civita connection, as well as
some tricks. We now derive a general formula for using the
1 same approach and the same tricks.
g(x y , y ) = x g(y , y ) = 0.
2 Consider arbitrary vector fields x, y and let us compute the
vector x y. This vector will be unambiguously determined if
The computation of the scalar product g(x y , z ) is a bit we compute its scalar product g( y, z) with an arbitrary vec-
x
longer. First, we find tor z. So we start with the scalar product expression g(x y, z)
g( , ) = g([ , ] + , ) = g( , ) and rewrite it using the assumed properties (1.39) and (1.42),
x y z x y y x z x y z
= g(y z , x ). g(x y, z) = x g(y, z) g(y, x z)
The trick is to notice that we now have a similar structure with = x g(y, z) g(y, [x, z]) g(y, z x)
the coordinates {x, y, z} cyclically transposed. Repeating the = x g(y, z) g(y, [x, z]) g(z x, y).
transposition twice more, we find
We have related g(x y, z) to the same structure with (x, y, z)
g(x y , z ) = g(x y , z ), cyclically transposed. So now we repeat the same procedure

34
1.6 Affine connection

twice more, replacing the term with z x by a term with y z, Calculation: Recall that the (2,0) tensor g 1 is the inverse
and finally by a term with x y. We obtain metric defined on 1-forms, or alternatively the map from vec-
tors to 1-forms to vectors. We now show that u (g 1 ) = 0 for
g(x y, z) = x g(y, z) g(y, [x, z]) z g(x, y)
the Levi-Civita connection .
+ g(x, [z, y]) + y g(x, z) g(z, [y, x]) g(x y, z). A simple (but less explicit) calculation is as follows. Con-
Thus the final formula for the Levi-Civita connection is sider the map g from vectors to 1-forms and the map g1 from
1 1-forms to vectors. We have
g(x y, z) = x g(y, z) + y g(x, z) z g(x, y)
2  gg1 = 1,
g(x, [y, z]) g(y, [x, z]) + g(z, [x, y]) . (1.43)
This is known as the Koszul formula. where 1 is the identity map from 1-forms to 1-forms. Then
Calculation 1.6.6.1: The familiar formula for the covariant gg1 ) = gu g1 = u 1 = 0,
u (
derivative in the index notation,
u = u + u , thus u g1 = 0.
Here is a more detailed calculation along the same lines. For
1 an arbitrary vector v and 1-form , we have
g ( g + g g ) , (1.44)
2
can be derived from Eq. (1.43). v = g(g 1 , v).
Details: The Christoffel tensor describes the difference
between the covariant derivative and the coordinate deriva- Now we apply u to both sides,
tive in a particular coordinate system. Let us select a coor-
dinate system and substitute x = , y = , z = (where u ( v) = u g( g 1 , v) = g(u ( g 1 ), v) + g(
g 1 , u v).
, , are fixed values of the indices) into Eq. (1.43). We note
Since satisfies Leibnitzs rule with respect to tensor prod-
that the commutators of these vectors vanish and so
ucts and contractions, we find
2g( , ) = g + g g .
P u ( v) = (u ) v + (u v)
Now we can decompose an arbitrary vector u as u and
compute = g(g 1 (u ), v) + g( g 1 , u v).

u = g g( u, ) = u + u . The result u g1 = 0 follows. 



Note that the Levi-Civita connection is defined through 1.6.7 Killing vectors
the metric g and thus implicitly depends on g. For instance,
the vector field u v involves not only first derivatives of v, The Lie derivative operation Lv can be applied to arbitrary
but also first derivatives of the metric. This is easy to see by tensors, and in particular to the metric tensor g. Let us com-
inspection of Eq. (1.43), where the metric g clearly stands un- pute the Lie derivative Lv g with respect to a given vector field
der differentiation in the first three terms. It is instructive to v. By definition, Lv g is a bilinear form acting on two vectors
verify this statement more directly, without using Eq. (1.43). a, b by first letting Lv act on the entire expression g(a, b) and
Let us change the metric tensor g as follows, then subtracting the Lie derivatives Lv of a and b:

g g = e2 g, (Lv g) (a, b) Lv (g(a, b)) g(Lv a, b) g(a, Lv b)


where is a scalar function. (This transformation of the met- = v g(a, b) g([v, a] , b) g(a, [v, b]).
ric is called a conformal transformation because it preserves
There exists a more convenient formula for the Lie deriva-
angles between vectors.) Now we can use Eq. (1.43) to de-
It is convenient to compute tive of the metric. This formula uses the Levi-Civita connec-
termine the new connection .
x y, z) using the new metric g, rather than the old metric. tion .
g(
We find Statement 1.6.7.1: The Lie derivative of the metric g with re-
spect to a vector k is a bilinear form that can be computed as
g( x y, z) = (x ) g(y, z) + (y ) g(x, z) (z ) g(x, y)
+ g(x y, z). (1.45) (Lk g) (a, b) = g(a k, b) + g(a, b k). (1.46)

This expression contains first derivatives of such as x . Proof: Let k be an arbitrary vector field. By definition of
Therefore, the Levi-Civita connection indeed involves first Lk g, we have for arbitrary vector fields a, b the expression
derivatives of the metric.
Lk g(a, b) = (Lk g) (a, b) + g(Lk a, b) + g(a, Lk b)
Remark: As we have seen in Sec. 1.6.3, all the possible
= (Lk g) (a, b) + g([k, a] , b) + g(a, [k, b]). (1.47)
connections (covariant derivatives) can be described by
choosing a coordinate system {x } (in which the coordinate On the other hand, g(a, b) is a scalar function, and L g(a, b) =
k
derivative is ) and a transformation-valued 1-form . Then g(a, b). Then the metricity condition g = 0 yields
k k
u v = u v + (u)v. Torsion-freeness implies the symmetry
(u)v = (v)u, but is not otherwise constrained. Of all the Lk g(a, b) = k g(a, b) = g(k a, b) + g(a, k b).
possible torsion-free connections, the Levi-Civita connection
is described by a particular unique choice of determined by Since [k, a] = k a a k, and similarly for [k, b], we obtain
the metricity requirement.  Eq. (1.46). 

35
1 Calculus in curved space

A vector field v such that Lv g = 0 is called a Killing vector Remark: In the index notation, the Killing equation (1.48) is
for the metric g. The Killing vector describes a special set of
directions in space, such that the metric remains constant k + k = 0 (1.49)
in these directions. It is only due to the presence of a certain
and involves the covariant components of the Killing vector k.
symmetry that a metric could have even a single Killing vector

field. (By contrast, the Levi-Civita connection gives v g = 0
For an arbitrary vector field x, we may consider the bilinear
for any vector field v, but this is due to a definition of rather
form B(x) defined by
than a special property of the metric g. Recall that the Lie
derivative is defined independently of the metric.) B(x) (a, b) g(a x, b).
Example 1.6.7.2: Consider
 an 3-dimensional space with co-
ordinates x1 , ..., x3 (x, y, z) and the metric We call B(x) the distortion tensor corresponding to the vector
field x. In the index notation, the distortion tensor is written
N
X as
g= A (x, z)dx dx ,  
B(x) = x .
,=1

where A (x, z) are some complicated functions of x and z The Killing equation (1.48) can be easily restated in terms of
(but not of y). Then it is intuitively clear that the metric g is the distorsion tensor
constant in the direction y. The mathematical formulation
B(k) (a, b) g(a k, b) (1.50)
of this property is that v = y is the Killing vector for g.
We can verify this property explicitly. The value of (Lv g) of the vector k. Namely, k is a Killing vector iff its distorsion
(a, b) depends only on the values of a and b at each point p, tensor B(k) is antisymmetric, i.e. B(k) (a, b) = B(k) (b, a). In
but not on derivatives of a and b. To simplify the calculations, other words, B(k) (a, b) is a 2-form if k is a Killing vector. The
let us choose the vector fields a and b such that following problem derives an explicit representation of this
[v, a] = 0, [v, b] = 0 2-form.

at a point p. In terms of the components a and b , these con- Practice problem: Using Eqs. (1.43) and (1.47), show that
ditions mean that
a g(k, b) b g(k, a) g (k, [a, b])
a b B(k) (a, b) g(a k, b) =
= 0, =0 2
y p y p
if k is a Killing vector. Then prove that the 2-form B(k) is
at a point p. It is always possible to adjust the derivatives of equivalently represented as
a and b at any single point p, without changing the values
of the vectors at that point. Then we find, at the same point p, 1
B(k) = d ( g k) ,
(Lv g) (a, b) = v g(a, b) g([v, a] , b) g(a, [v, b]) 2

where d is the exterior differential and gk is the 1-form de-
X
= y g(a, b) = y
A (x, z)a b = 0. fined by (gk) v g(k, v).
, Sketch of a solution: Three of the six terms in Eq. (1.43)
p
cancel because of Eq. (1.47). The rest follows by using the for-
The same calculation goes through at every point p. Thus y mula (1.21). 
is a Killing vector for g. 
Example 1.6.7.3: Consider the Schwarzschild metric 1.6.8 *Koszul formula and the Lie derivative
   1
2M 2M 
g = 1 dt2 1 dr2 r2 d2 + (sin2 )d2 . In Sec. 1.6.6 we have derived the Koszul formula (1.43) by
r r brute force, starting from the expression g(x y, z) and us-
This metric has the Killing vectors t and . One describes ing the defining properties of the Levi-Civita connection . A
the corresponding symmetries of the spacetime by saying shorter derivation and a more elegant-looking formula can be
that the geometry is stationary (independent of time) and az- found using the following trick.
imuthally symmetric (independent of the azimuthal angle ). We will have a formula for if we can compute the distor-
Note that these vectors commute, [t , ] = 0, meaning that tion tensor B (x) of an arbitrary vector field x. The trick is to
the symmetries are independent of each other. Of course, any consider the symmetric and the antisymmetric parts of B(x)
linear combination of t and (with constant coefficients) is separately. Let us first consider the symmetric part,
also a Killing vector for g. 
B(x) (a, b) + B(x) (b, a) = g(a x, b) + g(b x, a)
Practice problem: Produce an explicit example showing = g(x a Lx a, b) + g(a, x b Lx b)
that in general Lv g 6= 0 for a vector field v and a metric g. = Lx g(a, b) g(Lx a, b) g(a, Lx b)

Using Eq. (1.46), the Killing vector property Lk g = 0 can be = (Lx g) (a, b) .
rewritten in a more convenient way, as a differential equation Now consider the antisymmetric part,
for the vector k. This equation is called the Killing equation,
g( k, b) + g(a, k) = 0, (1.48) B(x) (a, b) B(x) (b, a) = g(a x, b) g(b x, a)
a b
= a g(x, b) g(x, a b) b g(x, a) + g(x, b a)
and is understood as an identity that should hold for all vec-
tors a, b. = a g(x, b) b g(x, a) g(x, [a, b]).

36
1.6 Affine connection

Comparing with Eq. (1.21) and denoting by gx the 1-form 3-volume is completed to a 4-volume by the unit vector u.)
However, for the present calculation we do not need to as-
gx) a g(x, a),
( sume that u is timelike.
Since
we find
Lu a = Lu b = Lu c = Lu u = 0,
B(x) (a, b) B(x) (b, a) = (d
g x) (a, b) . the quantity Lu V can be written as
Finally, we restore B(x) as a half-sum of its symmetric and Lu V = Lu (u, a, b, c) = (Lu ) (u, a, b, c).
antisymmetric parts. Thus, we can rewrite the Koszul formula
more concisely as follows, The tensor Lu is a totally antisymmetric tensor (4-form), and
so it must be proportional to itself. Hence we must have
1 1
B(x) = d ( g x) + Lx g. (1.51) Lu = f , where f is some scalar function. This function
2 2 depends on the field u and is (by definition) called the diver-
Now it becomes clear why the covariant derivative of a Killing gence of u, denoted divu.
vector k yields an antisymmetric 2-form B(k) = 12 dgk. Statement 1.6.9.1: The quantity divu can be expressed
through the Levi-Civita connection and an orthonormal
Remark: In the index notation, the formula (1.51) is a rein-
frame {ea }, such that g(ea , eb ) = ab , by the formulas
terpretation of the trivial identity
X
1 1 divu = aa g(ea , ea u)
k = ( k k ) + ( k + k ) . a
2 2 X
The first term represents the components of the exterior differ- = aa g(ea , [ea , u]).
a
ential of the 1-form k , while the second term resembles the
Killing equation (1.49).  In the index notation, the divergence can be expressed as

Practice problem: Derive a generalization of the for- divu = u u ; .


mula (1.51) in case of a connection that has a given nonzero
torsion tensor T (u, v) but is still compatible with the metric g. Proof of Statement 1.6.9.1: Denote by {a } the orthonormal
Hint: Use Eq. (1.41) and follow the derivation of Eq. (1.51). basis of 1-forms dual to {ea }. We can then compute
Answer: 
Lu = Lu 0 1 2 3
2B(x) (a, b) = [d (g x) + Lx g] (a, b)  
= Lu 0 1 2 3 + 0 Lu 1 2 3 + ...,
+ g(T (x, a), b) + g(T (x, b), a) g(T (a, b), x).
where we omitted similar terms involving 2 and 3 . We now

consider the first term above. The 1-form Lu 0 can be ex-
pressed as a linear combination of the basis 1-forms {a },
1.6.9 Divergence of a vector field X a 
Lu 0 = Lu 0 ea .
The divergence of a vector field u is a number characterizing a
the change in the volume of space carried by the flow of u.

To be definite, let us consider a four-dimensional manifold Due to antisymmetry of within Lu 0 1 2 3 , only
M. Let u be a given vector field on M, and consider the flow the term proportional to 0 survives, so
lines of u as curves parameterized by a parameter . We may   
select a region V M at an initial value = 0, and let the flow Lu 0 1 2 3 = Lu 0 e0 0 1 2 3 .
of u transform this region to a different region V( ). Such a - 
dependent region of space is called comoving with the flow Since 0 e0 = 1, we can simplify Lu 0 e0 as follows,
of u.  
0 0 0 0
It is more convenient to consider an infinitesimal volume, Lu e0 = Lu e0 Lu e0 = (e0 u u e0 )
or a volume element. A 4-volume element of the comoving = 00 g(e0 , e0 u u e0 ) = 00 g(e0 , e0 u),
volume can be thought of as a parallelepiped spanned by the
tangent vector u|p and by three vectors connecting the point where we used the property
p to nearby points on the congruence corresponding to the
same value of the curve parameter . Thus we need to choose g(e0 , [e0 , u]) = g(e0 , e0 u),
three connecting vectors a, b, c for the field u. Then we would
like to compute the rate of change of the volume element V which is due to
(u, a, b, c), where is the Levi-Civita symbol. The result will 1
be the derivative of V along the flow, that is, Lu V . g(e0 , e0 ) = const, g(e0 , u e0 ) = u g(e0 , e0 ) = 0.
2
If the field u is everywhere timelike and normalized by the
condition g(u, u) = 1, we may interpret as the proper time Thus 
measured by observers along the flow lines. Then the con- Lu 0 1 2 3 = 00 g(e0 , e0 u).
necting vectors will correspond to comoving spatial direc- Analogous simplifications are performed for other terms, and
tions measured at a fixed time in the reference frame of an so we obtain the first required formula. Note that
observer. So the derivative Lu V will be interpreted as the lo-
cal rate of change of the comoving 3-volume of space. (The g(ea , u ea ) = 0

37
1 Calculus in curved space

due to the normalization of ea . Thus to be consequences of the Jacobi identity and the assump-
tions about the Levi-Civita connection. Also, the geomet-
g(ea , ea u) = g(ea , [ea , u]). ric relations between tensors are shown explicitly, rather
than encoded in the positions of indices. Thus, calcula-
The formula for divu in the index notation can be now de- tions in index-free notation help develop geometric intu-
rived by expressing the orthonormal basis {ea } through the ition rather than merely an agility with manipulating in-
coordinate basis { }. Suppose that ea are the components of dices. In the index notation, the same calculations appear
the orthonormal frame, so that to consist of a certain lucky manipulations of indices (the
so-called juggling of indices). It remains unclear how
ea = ea .
to guess the correct sequence of index manipulations that
would be needed for a particular calculation.
Then in the index notation we have
Index-free notation is in many cases more concise than
g(ea , ea u) = g ea ea u ; . the index notation, since one does not need to write in-
dices next to each letter.
Thus X
divu = aa g ea ea u ; . In the index-free approach, every object is defined
a through geometric operations, so it is impossible to in-
On the other hand, Eq. (1.38) gives troduce a non-tensorial quantity. For instance, the non-
X tensorial Christoffel symbol in its usual form (1.44) is
aa ea ea = g . simply undefined; instead one might define a Christoffel
a tensor , which is a transformation-valued 1-form repre-
senting the difference between two given affine connec-
Hence, divu = g g u ; = u ; .  and ,
tions
Since Lu = (divu), our final result is
u v u v.
(u)v
Lu V = (divu) (u, a, b, c) = (divu) V.
Within the index notation, one frequently uses non-tensor
Thus the quantity divu is the relative rate of change of the vol- quantities and calculates their components in a conve-
ume carried by the flow of u. nient coordinate system, and then one needs to check that
The divergence of a vector field can be also expressed in the results are tensors. Operations with non-tensor quan-
terms of the Hodge star operation. tities obscure the logic of a derivation and may lead to
Statement 1.6.9.2: If ab is the canonical form of the metric, hard-to-spot errors.
the divergence of a vector field u is
Index-free notation appears more abstract, whereas ex-
divu = (det ab ) d gu. pressions in the index notation can be easily interpreted
as just arrays of numbers.
Remark: This expression for the divergence will be rarely
Index-free notation is unwieldy if complicated contrac-
useful in our calculations.
tions or symmetrizations are applied to tensors of high
[ ]
Proof of Statement 1.6.9.2: Since divu is a scalar, we have rank. For instance, an expression such as A; B is
divu = divu, so it is sufficient to show that difficult to manipulate in the index-free notation.

divu = d gu. On the whole, it appears that the index-free notation is more
suitable for generic or abstract calculations, such as defi-
Since gu is a 1-form, we have nitions of new quantities or derivations of general properties
of tensors. In calculations involving complicated tensor con-

gu = u . tractions and symmetrizations, the index notation is better.
For specific calculations, e.g. when components of a certain
Using the property d = 0 and the Cartan homotopy for- tensor are known in particular coordinates, the index notation
mula (1.22), we compute
is unavoidable.
Perhaps, the index notation should be studied first, be-
d gu = du = (Lu u d) = Lu (divu) .
cause a certain familiarity with the index notation seems to
 be helpful for learning more abstract material. However, I
feel that learning to think in the index-free notation is also
helpful, if only for the fact that the entire mathematical litera-
1.7 Calculations in index-free notation ture has been using exclusively index-free notation for several
decades.
The index-free notation has its advantages and disadvantages,
in comparison with the index notation. 1.7.1 Abstract index notation
Index-free notation emphasizes geometric operations, It is somewhat cumbersome to use the index-free notation
such as the commutator [a, b] or scalar product g(a, b), when dealing with tensors of higher rank. For example, sup-
that would otherwise remain hidden under a mass of in- pose A is a transformation-valued one-form such that A(v)
dices. For instance, the Bianchi identities are clearly seen is a linear transformation in the tangent space, i.e. A(v)w is a

38
1.7 Calculations in index-free notation

vector. Then A is a transformation-valued bilinear form that equal to an operator 1 acting on the component A112 of the
acts as tensor A. Rather, the notation 1 A112 stands for the compo-
(A)(u, v)w [u A] (v)w. nent (A)1112 of the tensor A. If (for example) A = A

Note that we have some difficulty in making it clear that then the obvious property A; = A; is not automati-
(A)(u, v) is a derivative in the direction of u rather than v, cally satisfied (as it would be if A; were a component-wise
and that only A is to be differentiated but not v or w. We could derivative of A ). This property must be derived as a con-
write sequence of the properties of . The symbol is not a col-
lection of fixed tensors 0 , 1 , 2 , 3 ; rather, represents
(A)(u, v)w u [A(v)w] A(u v)w A(v)u w,
different operations when applied to different tensors. There-

but manipulating such expressions is a rather cumbersome af- fore, an abstract index expression such as A; must be inter-
fair. In such cases, it is more convenient to write everything in preted and used with care.
the index notation, e.g.
1.7.2 Converting expressions into index-free
(A)(u, v)w u v w A u v w A; . notation
This notation clearly shows that the vector u supplies the di- In this section we develop the methods for converting indexed
rection for the derivative, and the vectors v and w are con- expressions into an index-free form.
tracted with the tensor A, but only A is differentiated. How- The correspondence between the index notation and the
ever, when using the index notation we do not actually need index-free notation is established using the basis vectors e
to talk about components of the vectors in a basis; in fact, we /x , by assuming fixed values of all the indices. For ex-
have not really chosen any explicit basis so far. In most cases, ample, consider a covector (1-form) A. It is represented in
the indices are actually used only as labels indicating that vec- the index notation by the symbol A , while A means the
tors and tensors are contracted with each other in a particular (, )-component of the rank (0,2) tensor A. For fixed and
order. This is the idea of the abstract index notation intro- , the component A is the number defined by
duced by R. Penrose as a re-interpretation of the familiar com-
A; A (A) = (e A) e .
ponent notation.9
The abstract index notation can be summarized as follows: An equation written in the index notation, such as
An index, such as in v and u , does not take values
A; + g 2A; = 0, (1.52)
0, 1, 2, ..., but is merely a label indicating that v is a vector
while u is a 1-form. Repeated indices, e.g. v u , do not is converted to the index-free notation by first rewriting it as
mean summation but instead indicate that a certain vector is
A + g 2 A ,
being contracted with a certain 1-form. No particular basis
or coordinate system is chosen or implied, and one is care- where we introduced the symbol explicitly, and then by fix-
ful to perform only well-defined geometric operations on all ing , and inserting basis vectors {e }, thus obtaining
quantities. (Geometric operations are the tensor product, the 
contraction of tensors, the Lie derivative, and the covariant (e A) e + g (e , e ) 2 e A e = 0.
derivative. The ordinary coordinate derivative /x is dis- However, we expect that such tensor relations can be ex-
allowed.) The results of such calculations may be interpreted pressed in a geometric way, i.e. as statements about arbitrary
either as coordinate-free expressions or as component expres- vectors, without using a particular basis {e }. Indeed, this
sions valid in any coordinate system. Strictly speaking, can be done with a little work.
the abstract index notation disallows non-tensorial quantities The basic rule of converting indexed expressions to the
such as the Christoffel symbol , operations with individ- index-free notation is to contract each free index with an ar-
ual components of tensors such as R44 , or an implicit choice bitrary vector. The resulting expression will then depend on a
of a coordinate system (e.g. calculations in locally inertial co- certain number of arbitrary vectors but will be independent of
ordinates). the choice of basis. Consider Eq. (1.52) as an example. Since
The approach taken in this book is coordinate-free and the Eq. (1.52) has two free indices ( and ), let us contract it with
notation is almost everywhere index-free. However, we do arbitrary vectors a and b having index representations a and
use the abstract index notation when it is significantly more b :
convenient. For instance, it is straightforward to express com- a b A; + a b g 2a b A; = 0.
plicated tensor contractions such as A; B in the index no- Then we rewrite covariant derivatives explicitly through the
tation, but cumbersome without indices. symbol :
Remark: I would like to emphasize that the index nota- a b A + a b g 2a b A = 0.

tion such as A A; does not imply that the covari-
Finally, this equation is rewritten in the index-free notation as
ant derivative is applied to each component of A separately.
Rather, this notation refers to the components of the higher- (a A) b + g (a, b) 2 (b A) a = 0.
rank tensor (A). For instance, the component 1 A112 is not
Note that (a A) is a 1-form acting on vectors. For the purpose
9 Penrose wrote in [25]: What I shall present here is an entirely frame- of facilitating further calculations with the above expression,
independent algebra which allows one to calculate with indexed quantities exactly it might be useful to rewrite the 1-form (a A) b as
as before (but now with a clear conscience!)... The clear conscience referred
to is the absence of a cordinate system. However, this conscience is not (a A) b = a (A b) A (a b)
really clear if one still uses non-tensor quantities such as or per-
forms calculations in specially chosen coordinate systems. = a (A b) A ([a, b] + b a)

39
1 Calculus in curved space

or in some other form. In any case, we now have an equation 1.7.3 Index-free computations of trace
involving arbitrary vectors a, b and the known 1-form A, and
we are not using any basis vectors. Traces of tensors can be computed in the index-free notation,
A somewhat more complicated example is an indexed ex- although in many cases calculations are easier in the index
pression involving repeated covariant derivatives, such as notation. Nevertheless, it is possible to develop a sufficiently
powerful formalism, so that the index notation is in princi-
v ; v ; = v v = 0. (1.53) ple never needed. In this section I explain the index-free ap-
proach to calculating the trace. Since the readers will certainly
Contracting Eq. (1.53) with arbitrary vectors a and b , we be more familiar with the index notation for traces, I will pro-
find vide extensive comparisons between indexed and index-free
a b v a b v = 0. (1.54) approaches.
I begin by reviewing the index-free definition of the trace of
We would like to rewrite this expression in the index-free no-
a linear transformation. A transformation T is expressed in a
tation. But now we encounter a difficulty: Nested covari-
tensor form as
ant derivatives, such as a b v, contain also derivatives of
b, which are not present in the expression (1.54). Therefore T = a1 1 + a2 2 + ..., (1.55)
we first rewrite
   where aj are some vectors and j are some covectors; by def-
a b v = a b v a b v inition, any (1,1)-tensor can be decomposed in this way. Then
= a b v (a b) v, the trace of T is

and similarly for the other term in Eq. (1.54), and then finally Tr T = 1 a1 + 2 a2 + ... (1.56)
Eq. (1.53) becomes
However, this definition is not convenient in more compli-
a b v (a b) v b a v (b a) v cated cases. For example, a tensor expressed in the index no-

= [a , b ] v [a,b] v = 0. tation as T may be interpreted as a transformation-valued
1-form,

In the last line, the first commutator is simply a shorthand x T (a)x, (T (a)x) T a x .
notation, Suppose we would like to compute the trace of T , expressed in
[a , b ] v a b v b a v, the index notation as T
. According to the definition (1.56),

while the second commutator [a, b] La b is the commutator we first need to decompose T as a linear combination of
of vector fields. tensor products, as in Eq. (1.55). This computation is cumber-
some because many tensor products will be needed in a de-
Practice problem: Rewrite the following indexed expres- composition of T . Also, such explicit decompositions usu-

sions in an index-free manner: ally require a choice of basis. For example, the identity opera-
tor can be decomposed as
u u h; = C v g f, ;
n
u; u; = v w v w ; X
1 = ej j ,
v u; u v; = w w; ; j=1

where u , v , w are indexed representations of vector fields 


where {ej } is a basis and j is the corresponding dual ba-
u, v, w; f and h are scalar functions; and C is a bilinear form sis. (The above relation is called a decomposition of iden-
C(x, y) C x y . tity.) Calculations with such decompositions rapidly become
Answers: u u h u u h = C(v, g1 df ), d g v)
g u = ( unwieldy for higher-rank tensors, and so we will proceed by
g w), and g(x, [v, u]) = g(w, x w), where x is arbitrary.
(  another route.
Note that if we used coordinate basis vectors and To be specific, let us show how one can compute the trace
instead of arbitrary vector fields a and b, we would have
T in a basis-free manner. We first lower the index to ob-
[ , ] = 0 and the term with the commutator would van-
tain T T g , which is in the index-free notation a tri-
ish. Thus we would have obtained a simpler formula
linear form T (a, b, c),
v v = 0,
T (a, b, c) g(T (a)b, c).
which looks quite similar to the indexed expression (1.53). Then we need to specify two of three arguments of T which
The simplification is due to the fact that the commutator term are to be contracted. For instance, the covector T

[a,b] v drops out. But the same simplification can be achieved
T g looks like the contraction of T (a, b, c) with respect to
without choosing the coordinate basis vectors. It suffices to as-
the vectors a and c. To make this relationship explicit, we use
sume that the vectors a and b commute. Since our expression
the following notation,
is actually independent of the derivatives of a and b, we can
always choose commuting vector fields a and b in a neighbor- (b) = Tr(a,c) T (a, b, c). (1.57)
hood of any one point. This choice will be possible in every
case when we convert an indexed expression into an index- This notation is somewhat verbose but explicit and unam-
free form, since any number of simultaneously commuting biguous.The vectors a, c do not enter the final object (b) and
vector fields can be chosen at a point (see, for example, State- are merely auxiliary; they may be called mute vectors.
ment 1.2.11.2). We will frequently use this method of simpli- Thus we propose the following index-free definition of the
fication. trace of a multilinear tensor-valued function T (a, b, c, ...) with

40
1.7 Calculations in index-free notation

respect to a, b: The tensor T (a, b, c, ...) is first written as a lin- In this way, we are able to rewrite the initial indexed expres-
ear function of a b, i.e. T (a b, c, ...), and then the trace is sion in an index-free manner. 
found by substituting the inverse metric g 1 , which is a (2,0)- To determine the trace such as that in Eq. (1.57) in an ac-
tensor, instead of the argument a b: tual calculation, we need to have a concrete expression for the
tensor T in terms of other known vectors or tensors, and a
Tr(a,b) T (a, b, c, ...) = Tr(a,b) T (a b, c, ...) T (g 1 , c, ...). procedure to implement the trace operation via an index-free
calculation. I now describe such a procedure and then give
The index notation for this trace would be g T.... In this
examples.
way, the trace operation can be understood as a simple substi-
Suppose we need to compute the trace of a tensor
tution a b = g 1 . An explicit notation for this substitution,
T (x, y, z, ...) with respect to x, y. One possibility to perform
Tr(a,b) T (a, b, c, ...) Trab=g1 T (a, b, c, ...) (1.58) this calculation in the index-free notation is to use an explicit
decomposition of the inverse metric g 1 through tensor prod-
could be useful when one needs to specify explicitly that the ucts of orthonormal frame {e0 , e1 , e2 , e3 },
metric g is being used. For instance, one could then intro-
duce several metrics g, h, ..., and compute traces with respect g 1 = e0 e0 e1 e1 e2 e2 e3 e3 , (1.59)
to these metrics. However, we will be working most of the which corresponds to the index notation
time with one metric g, and so the verbose and explicit nota-
tion (1.58) will not be necessary. g = e 0 e0 e1 e1 e2 e2 e3 e3 .
Using the trace notation, we may now rewrite an arbitrary Since T (x, y, z, ...) is a bilinear form in x and y, we can com-
indexed expression in an index-free form. When an indexed pute the trace Tr
(x,y) T (x, y, z, ...) by substituting the decom-
expression contains a sum over a pair of dummy indices, position (1.59) of the tensor g 1 instead of the arguments x, y.
we can first lower both indices (inserting an extra g ) and Therefore
then replace the g by a pair of mute vectors. 3
X
Example: Let us rewrite the following equation, Tr(x,y) T (x, y, z, ...) = T (e 0 , e 0 , z, ...) T (ej , ej , z, ...).
j=1
u u u w; = C u u , (1.60)
Of course, the trace does not depend on the choice of the basis.
in the index-free notation. Using the techniques from For instance, we may also use a decomposition of the form
Sec.1.7.2, we rewrite the left-hand side as n
X
1
g(u, u u w u u w). g = uj vj , (1.61)
j=1
Turning now to the right-hand side, we note that it is only where uj and vj are some suitable vectors that are not neces-
necessary to express the trace C , since all other contrac- sarily linearly independent or orthogonal (and n 4). If we
tions are ordinary. The tensor C can be interpreted as a use the decomposition (1.61), we will have
transformation-valued function of two vectors,
n
X
Tr(x,y) T (x, y, z, ...) = T (uj , vj , z, ...).
z C(x, y)z, [C(x, y)z] C z x y .
j=1
(In an actual calculation, such as that in the practice problem
However, it is somewhat inconvenient to have to choose par-
on the following page below, we may be given an explicit,
ticular vectors for a decomposition (1.59) or (1.61), especially
index-free formula for the transformation C(x, y)z.) The trace
since the result does not depend on the choice of these vectors.
C is first converted into a contraction with g ,
In many cases, a trace Tr(x,y) T (x, y, ...) can be computed with-
out selecting an explicit decomposition of the metric, espe-
C = g C .
cially when the tensor T is given by an index-free expression
The tensor C is a function of four vectors, expressed involving fixed vectors, the metric g, and covariant deriva-
through the original tensor C as tives. The following statement lists the basic building blocks
for index-free trace calculations.
C a z x y = g a C z x y
Statement 1.7.3.1: The trace operation has the following
g(a, C(x, y)z) C(a, z, x, y).
properties:
Thus we rewrite Linearity and distributivity:
Tr (A + B) = Tr A + Tr B,
C u u = g C u u .
Tr(a,b) A(...) B(a, b, ...) = A(...) Tr(a,b) B(a, b, ...),
Finally, we introduce two mute vectors a, b whose tensor Tr(a,b) c A(a, b, x, ...) = c Tr(a,b) A(a, b, x, ...).
product a b will replace g in the contraction g C .
The result of the replacement is Here A, B are arbitrary tensors, g is the metric, P1 , P2 are
transformations, i.e. (1,1)-tensors, and a, b, ..., x, ... are vec-
g C u u a b C u u = C(a, u, b, u) tors.
= g(a, C(b, u)u). Symmetry:

The trace of this expression with respect to the mute vectors Tr(a,b) A(a, b, x, ...) = Tr(a,b) A(b, a, x, ...),
a, b is written as Tr(a,b) g(P1 P2 a, b) = Tr(a,b) g(P2 P1 a, b),

Tr(a,b) g(a, C(b, u)u) = C u u . Tr(a,b) g(P1 a, b) = Tr(a,b) g(P1T a, b),

41
1 Calculus in curved space

where we denote by P T the adjoint transformation to P : The Answer: 3g(u, u). 


transformation P T acts on x giving a vector P T x that satisfiesThe tensor under the trace may contain derivatives of mute
g(P T x, y) g(x, P y) for arbitrary y. (The adjoint transfor-
vectors, as long as these derivatives can be moved out of
mation is represented in an orthogonal basis by a transposed the expression or when the derivatives ultimately cancel out.
matrix, hence the notation P T .) In other words, the expression under the trace should be
Relations involving the metric: a linear function of the mute vectors, not really involving
their derivatives. Otherwise, the trace expression is undefined
Tr(a,b) g(a x, b) = divx; (meaningless). For instance, the expressions Tr(a,b) a b and
Tr(a,b) g(a, b) = 4, Tr(a,b) g(a, [x, b]) are meaningless because the result of a cal-
culation according to Eq. (1.61) depends on the derivatives
where 4 is the dimension of spacetime. The substitution prop- of the basis vector fields {ej }. Such meaningless expressions
erty: are never encountered in practical calculations (or through
rewriting a well-defined indexed expression). On the other
Tr(a,x) g(a, b)x = b, hand, the following expressions containing derivatives are
well-defined,
or more generally,
Tr(a,b) x g(a, b y) = x Tr(a,b) g(a, b y) = x (divy);
Tr(a,x) g(a, b)A(x, y, z, ...) = A(b, y, z, ...) Tr(a,b) (a b x a b x) x g x.
The last line is the DAlembert operator acting on a vector
for an arbitrary tensor-valued function A(...) which is linear field x. The trace is well-defined because the expression un-
in x. der the trace is actually independent of the derivatives of a
Relations involving the Levi-Citiva tensor 0 1 2 3 : and b.
Computations with expressions of this kind can be simpli-
Tr(a1 ,b1 )(a2 ,b2 ) (a1 , a2 , x, y)(b1 , b2 , x , y ) fied using the following trick. It is always possible to choose
= 2g(x, x )g(y, y ) + 2g(x, y )g(y, x ); basis vector fields so that ej ek = 0 for all j, k at one point
in spacetime. Hence, when computing a well-defined trace
Tr(a1 ,b1 )(a2 ,b2 )(a3 ,b3 ) (a1 , a2 , a3 , x)(b1 , b2 , b3 , x ) = 6g(x, x ).
expression containing derivatives, one could simply assume
that all the mute vectors are constant vectors, in the sense
Hints: These properties are straightforward to derive us-
1 that their (covariant) derivatives are all equal to zero. This as-
ing a tensor decomposition of the inverse metric g in an
orthonormal basis. As a last resort, one could rewrite all ex- sumption is also consistent with the property g = 0 and a
decomposition (1.59) of the metric through a basis that enters
pressions in the index notation.
the trace via Eq. (1.60). For instance, the DAlembert operator
Partial proof: Let us derive the last two properties. P To can then be written more intuitively as follows,
show that Tr(a,x) g(a, b)x = b, we decompose b = j b j ej
in an orthonormal basis {ej } and use Eq. (1.59): x = Tr(a,b) a b x.
X X
Tr(a,x) g(a, b)x = g(ej , b)ej = bj ej = b. 1.7.4 Summary of calculation rules
j j
The formalism of coordinate-free, index-free calculations is
For an arbitrary tensor-valued multilinear function A(...), we sufficiently powerful to handle all situations involving vec-
have g(a, b)A(x, y, ...) = A(g(a, b)x, y, ...). Since we already tor fields and 1-forms, the metric, and covariant deriva-
derived the desired property for g(a, b)x, and since A(...) is tives. (However, the index-free notation becomes cumber-
linear in its arguments, it follows that the the same trace prop- some when one needs to manipulate tensors of higher rank.)
erty holds also for A(...).  In comparison with the index notation, this formalism is more
transparent, emphasizes the geometric content of a calcula-
Example: Let us compute the trace of the transformation tion, and frequently helps to guess the necessary sequence
P x x2ng(n, x), where n is a given unit vector, g(n, n) = 1. of manipulations that lead to a desired result. I summarize
First we define the bilinear form P (x, y) g(P x, y); then the rules of index-free, coordinate-free calculations for conve-
we compute its trace with respect to x, y: nience.
Vector fields are denoted by boldface letters a, v, x etc.,
while scalar functions are denoted by ordinary letters a, b, f ,
Tr P Tr(x,y) P (x, y) = Tr(x,y) [g(x, y) 2g(n, x)g(n, y)]
etc. It is sometimes convenient to use Greek letters , , for
= 4 2g(n, n) = 2. scalar functions. In blackboard calculations, vectors may be
underlined or even simply left unmarked if it does not create
 confusion.
The basic operations of tensor calculus are tensor product,
Practice problem: For a fixed vector u in four-dimensional
tensor contraction, and the derivative. Explicit tensor prod-
space, compute the trace
ucts are written as u v and seem to be seldom needed. Ten-
sor contraction is written as an application of a function to a
Tr(a,b) g(a, C(b, u)u), vector, for example v if is a 1-form, g(a, b) if g is a metric,
or v if is an n-form.
where the transformation-valued 2-form C is defined by The metric g creates a correspondence between vectors and
1-forms; this correspondence and its inverse are denoted by g
C(x, y)z g(y, z)x g(x, z)y. and g1 . For instance, gv is a 1-form if v is a vector.

42
1.8 Curvature

There are three basic derivative operations: the Lie deriva- Statement 1.7.4.1: Let g and g = e2 g be two metrics related
tive Lv A of a tensor A (the application of a vector field to a by a conformal transformation. For a vector field u, we can
scalar field is a particular case, v = Lv ); the exterior then compute two different divergences divu and divu with
differential d of an n-form (where 0-forms are understood respect to the metrics g and g. These divergences are related
as scalar functions); and the Levi-Civita covariant derivative by the formula
v A of a tensor A in the direction given by a vector v. The
quantities Lv A and v A are tensors of the same rank as A. = divu + N u ,
divu
The quantity d is an (n + 1)-form if is an n-form.
In calculations, it is usually more convenient to use v with where N is the dimension of the manifold.
vectors but d, v , and Lv with n-forms.
Proof of Statement 1.7.4.1: The simplest proof is through
The operations v and v are linear in v and local (depend
the definition of divu as the coefficient in the equation
only on the value of v at a point but not on its derivatives):
v (...) = v (...) , v (...) = v (...) ; Lu = (divu) ,
however, Lv is not local in v, where is the Levi-Civita tensor corresponding to the metric
Lv (...) 6= Lv (...) . g. The Levi-Civita tensor is defined as the N -form with the
property
The precise expression for Lv A depends on the rank of the (e1 , ..., eN ) = 1,
tensor A. The Leibnitz rule with respect to tensor products
holds for the derivative operations, with the exception of d if {ej } is an orthonormal
2
basis in the metric g. When the met-
which needs an extra sign: ric g is multiplied by e , each vector in the orthonormal basis
is multiplied by e . It follows that the Levi-Civita tensor is
a ( ) = (a ) + a , multiplied by eN ,
La ( ) = (La ) + La , = eN .

d ( ) = (d) + (1)|| d, Then we compute

where || in the expression (1)
||
means the rank n of an n- = Lu eN + eN Lu = (N u )
Lu + (divu)

form . Also  

= divu .

||
a ( ) = (a ) + (1) a .

Let us perform another, slightly longer computation of divu
Some further rules for calculations are the following (a, b, c

using the formula involving the trace. We denote by the
are vectors, is a scalar, is an n-form, is a 1-form, g is the
metric tensor). Levi-Civita connection for the metric g. We will use the for-
mula
[a, b] a (b ) b (a ) ,
= Tr
divu (a,b) g(
a u, b) = e2 Tr(a,b) g(
a u, b).
La b = [a, b] = a b b a = Lb a,
La = a (d) a a d = a , Using Eq. (1.45), we compute
a b = b a ,
a u, b) = divu
e2 Tr(a,b) g(
a g = 0,
(d) (a, b) = a ( b) b ( a) [a, b], +Tr(a,b) [(a ) g(u, b) + (u ) g(a, b) (b ) g(u, a)]
(a ) (b, c, ...) (a, b, c, ...) ; = divu + (u ) Tr(a,b) g(a, b)
La = da + a d, = divu + N u .
dd 0. 
The symbol refers to properties that hold directly by defini-
tion.
1.8 Curvature
Examples: A consequence of a g = 0 and the Leibnitz rule
is the identity A motivation for introducing the concept of curvature is to
a g1 = g1 a describe in detail the deviation of the geometry of a manifold
near each point from the flat geometry. A direct description of
for a 1-form .
this deviation can be given using the concept of parallel trans-
The equivalence of a and La on scalars leads to useful sim-
port (see also Sec. 1.9.1). In Appendix A a formula is derived
plifications in some calculations with the metric. For instance,
for the parallel transport of a vector along an infinitesimally
a g(b, c) = a g(b, c) = La g(b, c); small closed curve, and thus the connection to the curvature
a g(b, c) = g(a b, c) + g(b, a c), tensor is made explicit (see Sec. A.5.3). In this approach, the
curvature tensor is the object describing the transformation
La g(b, c) = (La g) (b, c) + g([a, b], c) + g(b, [a, c]).
of vectors under parallel transport along infinitesimally small
Depending on the particular calculation, one of these equiva- closed curves. However, calculations in this approach require
lent forms may be used to proceed.  using a local coordinate system because otherwise the parallel
The basic rules for index-free calculations with traces are transport operation cannot be described explicitly. Presently, I
listed in Statement 1.7.3.1. Here is another example of such introduce the curvature tensor by a different formula that can
calculation. be directly used in index-free and coordinate-free calculations.

43
1 Calculus in curved space

1.8.1 Curvature of a connection follows immediately from the definition (1.62). Additionally,
the curvature tensor corresponding to the Levi-Civita connec-
The curvature of a connection is a transformation-valued tion has the following symmetry properties:
2-form R(u, v) defined by10
R(a, b, c, d) = R(a, b, d, c), (1.64)
R(u, v)w = [u , v ] w [u,v] w. (1.62)
R(a, b, c, d) + R(b, c, a, d) + R(c, a, b, d) = 0. (1.65)
At first glance, it may appear that the right-hand side of the R(a, b, c, d) = R(c, d, a, b). (1.66)
expression (1.62) contains derivatives of the vectors u, v, w.
However, this is not so. The property (1.65) is called the first Bianchi identity. These
properties hold only for the Levi-Civita connection; we always
Statement 1.8.1.1: The vector field R(u, v)w defined by
work with that connection here.
Eq. (1.62) does not depend on the derivatives of the vector
fields w, u, or v. Thus, R(u, v) is indeed a transformation- Remarks: Usually, one simply says Riemann tensor of the
valued 2-form. manifold instead of the more precise but cumbersome phrase
Riemann tensor of the Levi-Civita connection corresponding
Proof of Statement 1.8.1.1: Let us verify that no deriva-
to the given metric on the manifold. The Riemann tensor is
tives of w remain in R(u, v)w; to that end, we show that
a function of the metric that involves second derivatives of
R(u, v)w = R(u, v)w, where is an arbitrary scalar func- the metric (see Calculation 1.8.4.1 below). It is easier to re-
tion. This is obtained by a direct computation:
member the symmetries (1.63)-(1.66) of the Riemann tensor
R(a, b, c, d) if one thinks of the expression g(a, c)g(b, d)
u v w = u (v w + (v ) w)
g(a, d)g(b, c). This expression (as will be shown below) is ac-
= u v w + (u ) v w tually equal the Riemann tensor of a unit sphere in Euclidean
+ (u (v ))w + (v )u w; space, if g is the induced metric on the sphere.12 
[u,v] w = [u,v] w + ([u, v] )w; We will now show in a series of statements that: the prop-
erty (1.64) is a consequence of torsion-freeness and the com-
now it is easy to see that patibility of with the metric; the property (1.65) is a con-
sequence of torsion-freeness and the Jacobi identity for com-
u v w v u w = R(u, v)w + [u,v] w. mutators of vectors; and Eq. (1.66) is a purely algebraic conse-
quence of Eqs. (1.64)-(1.65).
Similarly, we show that R(u, v)w is a linear function also Statement 1.8.2.1: The Riemann tensor has the property
of v (i.e. does not involve derivatives of v) by computing R(a, b, x, x) = 0, where a, b, x are arbitrary vectors. The iden-
R(u, v)w. We use Calculation 1.3.1.1 to express [u, v], and tity (1.64) then follows due to linearity of R(, , , ).
obtain Idea of proof: Transform the torsion-freeness condition
u v w = u (v w) = u v w + (u ) v w; 0 = [a , b ] g(x, x) [a,b] g(x, x)
[u,v] w = (u)v+[u,v] w = (u ) v w + [u,v] w;
R(u, v)w = u v w u v w [u,v] w = R(u, v)w. using the metricity condition (1.39).
Proof of Statement 1.8.2.1: We have
The analogous property for u follows from the antisymmetry
of R(u, v)w in (u, v).  a b g(x, x) = 2g(a b x, x) + 2g(a x, b x),
The index notation for the curvature tensor, R , can be
defined by hence

(R(x, y)z) x y z R . [a , b ]g(x, x) = 2g([a , b ]x, x);
We can also define a tensor R(a, b, c, d) of rank (0,4) by low- also,
ering the index, i.e. [a,b] g(x, x) = 2g([a,b]x, x);
finally,
R(a, b, c, d) g(R(a, b)c, d) R a b c d .
R(a, b, x, x) = g(R(a, b)x, x)
Either of these equivalent tensors R, when evaluated using
the Levi-Civita connection, is called the Riemann tensor of = g([a , b ]x, x) g([a,b] x, x)
the manifold.11 1 
= [a , b ] [a,b] g(x, x) = 0.
2
1.8.2 Bianchi identities The identity R(a, b, c, d) + R(a, b, d, c) = 0 follows by setting
x = c + d in R(a, b, x, x) = 0. 
In this section I review and derive the standard properties of The calculation in the preceding proof can be made shorter
the Riemann tensor. by assuming that the vectors a, b commute, [a, b] = 0, and
The antisymmetry property, that the derivatives of x in directions aand b vanish. This can
be assumed without loss of generality because R(a, b, x, x)
R(a, b, c, d) = R(b, a, c, d), (1.63)
12 Due 1 2 2
to the symmetry properties, the Riemann tensor has 12 n (n 1) in-
10 Equation (1.62) is known as the Ricci identity; it is convenient to regard it dependent scalar components for an n-dimensional manifold. This fact,
as the definition of R(u, v). which does not seem to have direct applications in General Relativity, nev-
11 We are using the ( + ) sign convention, according to the classification ertheless enjoys a mention (usually without a clear derivation) in many
in Misner-Thorne-Wheeler [21]. GR textbooks. Here I follow the venerable tradition.

44
1.8 Curvature

does not depend on derivatives of a, b, x, but only on values It is somewhat cumbersome to write this identity using the
of a and b and x at a point, and thus these derivatives (such index-free notation such as R(a, b, c, d) because only R is to
as a b or b x) can be set to zero. (We can always find any be differentiated and not the vectors a, b, c, d, which is not
finite number of vector fields a, b, c, ... in the neighborhood of easy to indicate without using indices. The property (1.67)
a point p such that the values a(p), b(p), c(p), ... at the point p has important consequences for GR.13
are prescribed and all first derivatives, such as a b or b c, Let us now derive the Bianchi identity for the
vanish at p; but second derivatives, such as a b c, will in transformation-valued tensor R(u, v) defined by Eq. (1.62).
general not vanish at p.) We need to convert the index expression (1.67) into an index-
Here is the abbreviated calculation. Assuming that a x = free form. So we introduce arbitrary vector fields a, b, c, v
b x = [a, b] = 0, we have and consider the expression

0 = [a, b] g(x, x) = [a , b ] g(x, x) c a b v R = c R(a, b)v R(c a, b)v
= 2g([a , b ] x, x) = 2R(a, b, x, x). R(a, c b)v R(a, b)c v.

Below we will often use the assumption of vanishing first (The subtractions are needed to make sure that only R is dif-
derivatives to simplify calculations. ferentiated.)
Statement 1.8.2.2: The property (1.65) follows by rewriting Statement 1.8.2.4: The identity (1.67) holds, due to the van-
the Jacobi identity ishing of the sum of the twelve terms obtained via cyclic per-
mutations of a, b, c in the above formula.
[a, [b, c]] + [b, [c, a]] + [c, [a, b]] = 0
Proof of Statement 1.8.2.4: We start the derivation by writ-
using covariant derivatives and the torsion-free condi- ing the Jacobi identity,
tion (1.42).
[[A, B] , C] + [[B, C] , A] + [[C, A] , B] = 0.
Proof of Statement 1.8.2.2: We start with
This is a purely algebraic relation that holds for arbitrary op-
[a, [b, c]] = a [b, c] [b,c] a erations A, B, C in any context (as long as the composition of
operations is associative). We now apply the Jacobi identity
= a b c a c b [b,c] a,
to the operations a , b , c , and also use the fact that u is
and similarly for the other terms. Then we write the Jacobi linear in u, to derive the following identities,
identity and rearrange the resulting expression:
[[a , b ] , c ] v + [[b , c ] , a ] v + [[c , a ] , b ] v = 0,
[a, [b, c]] + [b, [c, a]] + [c, [a, b]] = a b c a c b + b c a [[a,b],c] v + [[b,c],a] v + [[c,a],b] v = 0.
b a c + c a b c b a [b,c] a [c,a] b [a,b] c The Bianchi identity will follow if we subtract the first of these
= R(a, b)c + R(b, c)a + R(c, a)b = 0. relations from the second and express various terms through
R() using Eq. (1.62). For instance,

Statement 1.8.2.3: Any tensor of rank (0,4) satisfying the [[a , b ] , c ] v + [[a,b],c] v = R(a, b)c v + c R(a, b)v
transposition properties (1.64) and (1.65) will also satisfy [a,b] c v + c [a,b] v + [[a,b],c] v
Eq. (1.66). 
= R(a, b)v R(a, b) v R([a, b], c)v.
c c

Proof of Statement 1.8.2.3: For brevity, we will write abcd


Adding up the cyclic permutations of the last expression
instead of R(a, b, c, d) within this calculation. First, we use
in (a, b, c) and using Eq. (1.42) and the obvious property
the transposition properties to move the indices c, d to the left:
R(a, b)v = R(b, a)v, we find all the terms required for the
0 = abcd + bcad + cabd = Bianchi identity. 
= abcd + bcad + (cadb)
= abcd + bcad + (adcb + dcab),
1.8.3 Ricci tensor and scalar
The Ricci tensor R is the bilinear form Ric(a, b) defined as
which means that the trace of R(a, x, b, y) with respect to x and y,
abcd cdab = (bcad adbc). Ric(a, b) = Tr(x,y) R(a, x, b, y).
This last property can be now used repeatedly on different The Ricci tensor is a symmetric bilinear form due to the prop-
indices: erty (1.66). In the index notation, this contraction of the Rie-
mann tensor is written as
(bcad adbc) = +(cabd bdac),
+(cabd bdac) = (abcd cdab). R = R g .

Therefore abcd cdab = (abcd cdab) = 0.  The Ricci scalar is the trace of the Ricci tensor,
The Riemann tensor also satisfies the second Bianchi iden-
R Tr(a,b) Ric(a, b) g R .
tity, written in index notation as
13 Ageneralization of the second Bianchi identity also exists for connections
R + R + R = 0. (1.67) with nonvanishing torsion; see [33], 5.5.

45
1 Calculus in curved space

It is sometimes more convenient to use the index notation for Practice problem: In the Einstein-Cartan theory (which is a

calculations with the Riemann and Ricci tensors. The follow- modification of GR), the torsion tensor T is related to the
ing examples illustrate the typical calculations. We will use
spin density tensor S of matter by the equation
both the index and the index-free notations.

The Einstein equation relates the Ricci tensor R to the T + T

T

= 8GS .
energy-momentum tensor of matter, T , as follows,
Consider a more general relationship,
1 
R g R = 8GT , (1.68) A + a A A = S

,
2

where G is Newtons constant. According to this equation, where a is a given constant and S is a given tensor (both

A and S are antisymmetric in the lower indices). Obtain
curvature of the spacetime is related to the energy density, ve-
locity, and pressure of matter. an explicit expression for A in terms of S
.
Hint: Compute a suitable trace of the left-hand side of the
Statement 1.8.3.1: It follows from the Einstein equa-
given equation.
tion (1.68) and the second Bianchi identity that T is always
Answer: If a 6= 31 , the solution is
covariantly conserved, T = 0.
a 
Proof of Statement 1.8.3.1: Contracting Eq. (1.67) with A = S

S

S

.
1 1 + 3a
g g , we can easily show that (R 2 g R) = 0. 

Example: We consider a four-dimensional spacetime with a
known metric g. Consider the following relationship for a
1.8.4 Calculations with the curvature tensor
symmetric tensor A , similar to Eq. (1.68):
We will now perform some further calculations involving the
A + ag A = S , curvature tensor. The index-free notation will be used.
Statement 1.8.4.1: A Killing vector k has the following prop-
where a is a given constant, S is a given symmetric tensor, erty (written in index notation),
and the trace A
is defined by A A g . We would like
to obtain an explicit expression for A in terms of S . k = R k .
In the index-free notation, A is a symmetric bilinear form
satisfying The index-free version of the given identity is
A + agTr(a,b) A(a, b) = S, g(a b k, c) g(a b k, c) = R(b, c, a, k). (1.69)
or more explicitly Outline of proof: We use the first Bianchi identity and the
definition of a Killing vector. To simplify the calculations, we
A(x, y) + ag(x, y)Tr(a,b) A(a, b) = S(x, y). assume that first derivatives of the auxiliary vectors a, b, c
vanish.
The unknown tensor A would be readily found from this
equation if we knew its trace. So let us compute the trace of Proof of Statement 1.8.4.1: To obtain the index-free expres-
both parts with respect to (x, y); since the trace of g is 4, we sion shown above, we contract the given indexed formula
find with arbitrary vectors a , b , c , and for convenience raise the
index on k ; we find
Tr(x,y) A(x, y) + 4aTr(a,b) A(a, b) = Tr(x,y) S(x, y).
a b c k = c g [a (b k ) (a b )( k )]
It follows that = g(c, a b k) g(c, a b k);

a b c R k = R(b, c, a, k).
1
Tr(x,y) A(x, y) = Tr(x,y) S(x, y),
1 + 4a We note that both sides of Eq. (1.69) depend only on the val-
ues of the auxiliary vectors a, b, c but not on their derivatives.
and therefore So we may simplify calculations by choosing these auxiliary
a
A=S g. vectors such that all the first derivatives (e.g. a b) vanish at
1 + 4a a given point. Then we rewrite Eq. (1.69), which we need to
In the index notation, the calculation looks as follows: prove for vector fields a, b, c with vanishing derivatives, as

g(a b k, c) = R(b, c, a, k). (1.70)


S g = (A + ag A
) g

= A
+ ag g A
= (1 + 4a) A
, By assumption, the vector k satisfies the Killing equation
(for arbitrary x, y),
therefore
a g(x k, y) = g(y k, x).
A = S g S .
1 + 4a
The plan is to convert b k into c k using the Killing equa-
Note that there may be no solutions, or at any rate no unique tion, to obtain a term g(a c k, b), and to express it through
expression for A in terms of S, when a = 14 .  R(a, c, k, b), which essentially differs from what we need on

46
1.9 Geodesic curves, geodesic vector fields

the right side of Eq. (1.70) by a cyclic permutation of a, b, c. is called the Hessian of the function . The Hessian is a ten-
To implement this plan, we need to move a temporarily to sor representing all the (covariant) second derivatives of de-
the outside of g(...), so that we can use the Killing equation: fined with respect to the old metric g. In the index notation,
the Hessian of is the tensor ; .
g(a b k, c) = a g(b k, c) g(b k, a c); The new Ricci tensor is
a g(b k, c) = a g(c k, b) (a, c) = Ric (a, c) + g (a, c) 
Ric
= g(a c k, b) g(c k, a b).
+ (N 2) [H (a, c) + g(l, l)g(a, c) g(a, l)g(c, l)] ,
Since the first derivatives of a, b, c vanish by assumption, we
where
have simply
 Tr(a,b) H (a, b)
g(a b k, c) = g(a c k, b). denotes the covariant DAlembert operator (the
DAlembertian) defined with respect to the old metric
We replace a c k with the Riemann tensor, g. Note that
g(l, l) g 1 (d, d).
g(a c k, b) = R(a, c, k, b) + g(c a k, b) + g([a,c] k, b)
The new Ricci scalar is
= R(a, c, k, b) + g(c a k, b),
= e2 Tr(a,c) Ric
R (a, c)
since [a, c] = 0. Thus we have derived the property  
= e2 R + 2 (N 1)  + (N 1) (N 2)g 1 (d, d) .
g(a b k, c) = R(a, c, k, b) g(c a k, b).
In the index notation,
Now we note that the structure of the term on the right side  = g ; , g 1 (d, d) = g , , .
is exactly the same as that of the left side, except for a cyclic
permutation of the vectors a, b, c. Since the first Bianchi iden- (Details on page 169.) 
tity involves the permutation of the first three arguments of
R(...), it is convenient to replace R(a, c, k, b) = R(a, c, b, k).
Thus we can apply this relation twice more, until the cyclic 1.9 Geodesic curves, geodesic vector
permutation is complete, and we find
fields
g(a b k, c) = R(a, c, b, k) R(c, b, a, k)
We have seen before that there is no a priori relation between
+ R(b, a, c, k) g(a b k, c). tangent vectors at different points of the manifold M. In other
words, there is no intrinsic (i.e. naturally defined) way to
Using the first Bianchi identity, we find carry a vector or a tensor from one point to another. However,
we can use a connection to define the notion of a direction-
g(a b k, c) = R(c, b, a, k), preserving, or parallel, transport.
which coincides with Eq. (1.70). 
As another application of the index-free method of calcu- 1.9.1 Parallel transport of vectors
lation, we will derive the formula for the change in the Rie- A vector field v is called parallelly transported along a curve
mann tensor under a conformal transformation, g g e2 g, ( ) if the covariant derivative of v along the curve vanishes,
where (x) is a fixed scalar function. The nonzero function e2
is called the conformal factor. Since the resulting expression v = 0. (1.71)
contains second derivatives of the conformal factor, it will fol-
low that the Riemann tensor depends on second derivatives The interpretation is that we carry a vector v along the curve
of the metric. while keeping the magnitude and direction of v constant.
Heuristically, the only way to control the constancy of v is
Calculation 1.8.4.1: Consider a conformal transformation of through the covariant derivative in the direction , hence the
the metric, g g e2 g. It can be derived from Eq. (1.45) that requirement (1.71).
the Riemann and Ricci tensors of the new metric g are related Let us compare this condition with a similar one involving
to the old tensors in the following way. The new Riemann the Lie derivative,
tensor is L v = 0. (1.72)
b, c, d) = R(a, b, c, d)
e2 R(a, A vector v satisfying the condition (1.72) is a connecting vec-
tor for the congruence of curves to which ( ) belongs; it
+ H (a, c)g(b, d) H (a, d)g(b, c)
is also called Lie-propagated along the curve. The condi-
H (b, c)g(a, d) + H (b, d)g(a, c) tion (1.71) depends only on the direction of along the curve,

g(a, c) g(b, c) g(l, c)
while the Lie derivative in Eq. (1.72) requires us to have a vec-
+ det g(a, d) g(b, d) g(l, d) , tor field defined also around the curve ( ). In other words,
g(a, l) g(b, l) g(l, l) we need a congruence of curves and not just one curve. Differ-
ent choices of the congruence around will lead to different
where l g1 d and the symmetric bilinear form connecting vector fields v. The geometric meaning is that v
connects points on neighboring curves corresponding to the
H (a, b) = H (b, a) g(a l, b) b a d same value of the parameter .

47
1 Calculus in curved space

To summarize: Transporting a vector from one point to an- Statement 1.9.2.1: If ( ) is a geodesic and the parameter is
other along a curve requires to have a relationship between changed by = (), the new curve () is again a geodesic
tangent spaces at different points. One needs additional infor- only if the function () is of the form () = a + b with
mation to produce such a relationship. In the case of Eq. (1.71), constant a, b and a 6= 0. Thus the affine parameter is defined
this information comes from the chosen connection ; in the up to an affine transformation.
case of Eq. (1.72), this information comes from the neighbor-
ing curves from a congruence containing the given curve . Proof of Statement 1.9.2.1: We have
Clearly, the construction of a parallel transport closely reflects
= ();
= () ( ()) ,
(1.74)
the intuitive notion of a vector carried along a curve without
change. which vanishes only if () = const along the curve. 
Statement 1.9.1.1: The scalar product g(u, v) is constant The square of the tangent vector, g(, ),
remains constant
along a curve if both u and v are parallelly transported along a geodesic due to Statement 1.9.1.1. This means that a
along the curve.  timelike geodesic remains timelike at all times, a null geodesic
remains null, and a spacelike geodesic remains spacelike. It
Proof: A straightforward computation gives follows from Eq. (1.74) that an affine parameter along a non-
null geodesic can always be chosen so that g(, = 1: It
)
g(u, v) = g( u, v) + g(u, v) = 0. p
is sufficient to divide by the constant number g(, )
to
achieve this parameterization. But a null geodesic does not

have a naturally fixed affine parameter ; the freedom to re-
place a + remains for null geodesics.
1.9.2 Geodesics Physical interpretation of geodesics: According to GR,
A natural question to ask about a curve ( ) is whether the massive point particles in freefall move along worldlines ( )
curve parallelly transports its own tangent vector ( ). If which are timelike geodesics in spacetime. In that case, it is
the tangent vector ( ) is indeed parallelly transported along standard to choose the affine parameter as the proper time
the curve, heuristically one could say that the curve ( ) is measured along the particles worldline, so that g(, )
= 1.
straight in the sense that it is a line whose direction remains On the other hand, electromagnetic waves (in the geometri-
unchanged along the line. Within a curved (nonflat) manifold, cal optics approximation) can be pictured as rays propagat-
such lines ( ) are the closest analog of straight lines. ing along null geodesic curves.14 So I will call null geodesic
By definition, a curve ( ) is called a geodesic if its tangent curves lightrays. In a calculation involving lightrays whose
vector is parallelly transported along the curve. frequency is being measured, it is possible to choose the affine
It is worth emphasizing that the parallel transport in GR is parameter for a null geodesic ( ) such that the frequency
defined through the Levi-Civita connection determined by the measured at a point by an observer with 4-velocity u is nu-
given metric g. Properties of parallel transport, geodesics, and merically given by
curvature are very different if a non-Levi-Civita connection is = g(u, ). (1.75)
used. In this book we always denote by the Levi-Civita In that case, h (where h is Plancks constant) is the locally
connection and we use no other connections, so the choice of measured energy of a single photon propagating along the
connection will not be discussed any more. null geodesic ( ). The 4-vector h can be interpreted as the
If we temporarily assume that the tangent vector is a part 4-momentum of that photon.
of a vector field v, the condition for being geodesic can be The statements about the lightrays can be justified by con-
written as a differential equation on v, called the geodesic sidering a local coordinate system {x } of an observer who
equation, has the 4-velocity u and measures the frequency of an elec-
v v = 0, (1.73) tromagnetic wave near a spacetime point p. In a very small
neighborhood of p, the wave looks like a plane wave and the
which however must hold only at points p = ( ) along the metric g can be brought into the Minkowski form, g , by
curve. Note that the derivative is taken only in the direction a choice of local coordinates. Then the electromagnetic po-
of v itself, so it is sufficient to have the vector field v defined tential A near the point p can be expressed by the familiar
only on the curve (where v = is the tangent vector to the formula
curve). The derivative v v does not depend on the values A = A0 exp [i(k, x)] ,
of v outside of the curve. Therefore, the assumption that is
a part of a vector field v is inessential, and Eq. (1.73) makes where k is the wave vector, A0 is the amplitude of the wave,
sense also for the tangent vector v = defined only along the and x is a vector representing the local coordinates. (The wave
curve. vector k is a null geodesic vector field.) In a reference frame of
A vector field v that satisfies the geodesic equation (1.73) the observer, the frequency of the wave is the coefficient at t
at every point (not merely along a single flow line) is called a in the expression exp [2it] in the phase of the wave; in other
geodesic vector field. words,
1
It is perhaps not immediately evident that the choice of the = k0 ,
2
parameter is important for a geodesic curve, according to
this definition. Namely, if ( ) is a geodesic and we change where k0 is the time component of the 4-vector k. The 4-
the parameter to = (), then the new curve () ( ()) velocity u has components {1, 0, 0, 0} in the observers refer-
will not necessarily be a geodesic. If the parameter is chosen ence frame. Therefore, the frequency can be computed as
so that ( ) is a geodesic then is called an affine parameter 14 This follows from the equations of Maxwell electrodynamics in curved
and the vector is called the affine tangent vector. space. In this book I will not derive this property.

48
1.9 Geodesic curves, geodesic vector fields

1
= 2 (k, u) in the observers reference frame. To obtain a Let v be the tangent vector along the curve. A pertur-
formula for valid in an arbitrary coordinate system, we need bation of the curve can be described as the flow of a vector
to use the metric g instead of ; hence field t which, we imagine, is directed transversely to and
leaves the endpoints (1,2 ) in place. Let us denote by a pa-
1 rameter along the lines of t, starting from the curve . Thus,
= g(k, u).
2 the flow of t infinitesimally perturbs into a curve , when
Since the affine parameter may be multiplied by a constant we let the lines of t carry the points of for an infinitesimal
factor, we can choose the affine parameter along the null parameter distance . Let v be the perturbed vector field
1
geodesic ( ) such that = 2 k. Then Eq. (1.75) holds.  obtained by transporting the initial curve ( ) by various dis-
tances along the flow of t. By construction, v is a connecting
Remark: Since the geodesic equation = 0 is a first- vector for t, and so t v = v t. The proper length of a per-
order equation with smooth coefficients,15 it has a unique so- turbed curve can now be seen as a function of ,
lution that depends only on the direction of the vector at an Z 2
initial point p. Thus we have a map from vectors v Tp M at p
L[
] = g( )d.
v, v
a point p to geodesic lines starting at p. This map is called the 1
exponentiation map exp : v ( ). The resulting geodesics
are sometimes denoted ( ) = exp(v ), but we will not use Hence, the curve will extremize the proper length if
this notation since it does not give a computational advan- dL/d = 0 at = 0. The operation d/d corresponds to ap-
tage.  plying (under the integral sign) a derivative Lt = t along
A useful property of null geodesics is formulated in the fol- the flow of t,
lowing statement. Z Z
2 2
Statement 1.9.2.2: A null geodesic ( ) remains a geodesic d p )
g(v t, v
L[
] = t g( )d =
v, v p d.
when the metric is rescaled by a conformal transformation, d 1 1 g(v, v)
g = e2 g, under a suitable change of the affine parameter
f ( ), such that the new affine tangent vector is e2 .
In other Here we have used the property t v = v t. The last expres-
words, the shape of a null geodesic is conformally invariant. and is evaluated
sion above does not contain derivatives of v
= v, hence
on the initial curve where = 0 and v
Idea of proof: Use Eq. (1.45).
Z 2
d g(v t, v)
Proof of Statement 1.9.2.2: Suppose that the new affine tan- L[
]
= p d.(1.77)
d =0 1 g(v, v)
gent vector is n = n, where is an unknown function and
n = . It follows from Eq. (1.45) and n n = 0 that, for arbi-
trary x, The curve extremizes the proper length if the above integral
vanishes for all t. Since an arbitrary change of the parameter
n (n) , x) = 22 (n ) g(n, x) + g(n (n), x)
g( = f ( ) does not change the proper length L[], we may
choose to normalize the vector v so that g(v, v) = const
= g(n, x)2 (2n + n ln ) . along the curve , where the constant may be either 1 or
zero, depending on the causal character of the curve. At the
This will vanish if = e2 . The new affine parameter is moment, we only consider segments of the curve that have the
found by integrating along the curve .  same causal character, and also assume that g(v, v) 6= 0 (the
case of null curves will be considered below). Similarly, the
1.9.3 Geodesics extremize proper length perturbed curve can be normalized in the same way, so that
the flow of the field t does not perturb the normalization and
In Riemannian geometry, when the metric is positive-definite, p
t g(v, v) = 0. We can then drop the constant factor g(v, v).
sufficiently short geodesic curves have an important property: Since
they are curves of shortest length among all curves connecting
two given points. In GR, the metric is not positive-definite, so g(v t, v) = v g(t, v) g(t, v v),
geodesic curves extremize the proper length. In this section we
an integration over removes the total derivative term
derive the geodesic equation from the corresponding varia-
tional principle.
d
The proper length of a curve segment ( ) is v g(t, v) g(t, v),
d
Z 2 p
L[] = )d.
g(, (1.76) because by assumption the vector field t vanishes at the end-
1 points. The remaining term yields the condition
The proper length is extremized (but not necessarily maxi- Z 2
mized or minimized) if the functional L[] does not change in g(t, v v)d = 0.
first order in the perturbation when the curve is infinites- 1
imally perturbed (while the endpoints (1 ) , (2 ) remain
fixed). We now apply the standard considerations of the vari- Since this condition should be satisfied for any vector field
ational calculus to this condition. t, we find v v = 0, which is precisely the geodesic equa-
15 This property may not hold at a point where the manifold is not smooth,
tion (1.73). Note that we have already fixed the normaliza-
tion of the curve, g(v, v) = const, so the resulting geodesic
i.e. at a physical singularity. In this book I do not attempt to consider non-
smooth manifolds and do not examine the precise conditions for smooth- equation is not invariant under arbitrary changes of parame-
ness. ter f ( ).

49
1 Calculus in curved space

Remark: For a brief time in the above calculation, we 1.9.4 *Motion under external forces
needed to use expressions such as Lt v and t v, which re-
quire that v be a vector field defined also away from the curve In classical physics, equations of motion for particles and
( ). This minor inconvenience is easily remedied by suppos- fields are derived from an action principle: The correct tra-
ing that the curve ( ) is one of the flow lines of some vector jectories must extremize a certain functional called the action
field v, so that v is the tangent vector to along the curve , functional. As an example, we now derive the equations of
but v can have arbitrary values away from the curve as long motion for a charged massive particle in the presence of an
as v is smooth. Since the final result (the geodesic equation) external electromagnetic field in curved spacetime.16
contains only the derivative v v along the curve , the values The motion of a massive particle corresponds to a timelike
of v away from are irrelevant.  worldline ( ) in spacetime. The electromagnetic field is de-
scribed by a 1-form A, and the gravitational field by the metric
Practice problem: (a) By considering the variation of g. We will assume that these fields are fixed, and the question
the functional (1.76) but without assuming the condition is to determine the trajectory ( ) of the particle.
g(v, v) = const, derive a more general form of the geodesic The action functional for the particle is
equation, ! Z 2 Z 2
1 p
v p v = 0. S[; A, g] = m
g(, )d + q (A )
d,
g(v, v) 1 1

(However, do assume that g(v, v) 6= 0.) where m is the rest mass and q is the electric charge of the par-
(b) Suppose that a vector field v satisfies the equation ticle. The first term in the action is m times the proper time
v v = v for some scalar function . Show that v can be along the trajectory, while the second term is simply the inte-
made geodesic by a rescaling v v.  gral of the 1-form A along R the worldline, which may be writ-
Remark: In the derivation above, we assumed that g(v, v) 6= ten more concisely as
A (see Eq. (1.16) and the preceding
0; let us now consider the case of null curves, i.e. curves ( ) explanations).
such that g(, ) = 0 along the curve. Such null geodesics The equations of motion for the particle are derived by ex-
cannot be obtained by extremizing the functional (1.76), be- tremizing the functional S with respect to the trajectory ( ).
cause a variation (even an infinitesimal one) of a null curve Although the actual computation is fairly short, I will go
( ) will make some portionsp of the curve spacelike and other through it slowly and show every step.
portions timelike, and then g(, ) will not remain well- As in Sec. 1.9.3, we introduce a vector field t which is trans-
defined. To avoid this difficulty, one can simply postulate verse to the curve and vanishes at = 1,2 . The vector field
Eq. (1.73) for null geodesics. Alternatively, a different vari- t perturbs the curve by shifting it to the side. Then we con-
ational principle can be used to derive the geodesic equa- struct the vector field v by shifting the curve along the flow
tion (1.73). Consider the variational problem with the func- lines of t by different parameter distances . The variation of
tional Z 2 the functional S is found by applying the derivative Lt under
L[( ); N ( )] g(, )N
( )d, the integral,
1 Z 2
d  p 
depending on an additional unknown function N ( ). Since S[; A, g] = Lt m g( ) + qA v
v, v d. (1.78)
the functional does not contain derivatives of N , this variable d 1
plays the role of a Lagrange multiplier. The variation with
respect to N ( ) yields the constraint g(, = 0, while the The terms under the integral are simplified using Lt v
) = 0
variation with respect to ( ) yields the equation and the Leibnitz property of the Lie derivative. The first term
yields
1
N v v = (v N ) v + g(v, v) g 1 dN.
2 p (Lt g) ( ) 1 g(v t, v
v, v )
Lt g( v, v) = p = p
Using the constraint g(v, v) = 0 and choosing a function N ( ) 2 g( v, v) g( )
v, v
that is constant along the flow of v, one recovers the geodesic 2
v )
g(t, v
= g(t, v p ) + v p
equation (1.73). When the Lagrange multiplier N ( ) is chosen g(v, v) g(
v, v)
arbitrarily, one still obtains the geodesic curve but in a non-
affine parameterization. 1 2
(= is via Eq. (1.46), = is to separate a total derivative). The in-
The variational principle can be generalized to the case of
tegral of the total derivative v (...) vanishes since t vanishes
timelike or spacelike geodesics in the following way. One
at the endpoints 1,2 . Thus the first integral term in the right-
writes the functional
hand side of Eq. (1.78) is simplified to
Z 2  
K
L[( ); N ( )] ( )
)N
g(, d, Z 2

v
1 N ( ) m g(t, v p )d.
1 g( )
v, v
where K is a constant. Variation with respect to N ( ) yields
the constraint g(, )
= K, while variation with respect to The second term in the right-hand side of Eq. (1.78) becomes
( ) yields the non-affine geodesic equation as before. The
value of K can be chosen freely to control the normalization Lt (A v ) = (Lt A) v ,
of the curve; the value K = 0 is perfectly admissible. In this
way, geodesics of every kind are described through a single 16 I called the electromagnetic field external to emphasize that this field is
variational principle.  assumed to be given and fixed, rather than determined dynamically.

50
1.10 Example: hypersurface of constant curvature

and then the line integral can be transformed as to measure the deviation between neighbor geodesics using a
Z 2 Z Z connecting vector field c, since it would connect points cor-
1 2 responding to the same value of . In the reference frame of
(Lt A) v
d = Lt A = (dt A + t dA)
1 an observer whose time coordinate is , the neighbor geodesic
Z Z 2
3 4 thus has the coordinate c, where is very small. Then the
= t dA = (dA) (t, v
) d observer moving along a geodesic worldline ( ), such that
1
Z 2 = v|p=( ) , measures the 4-acceleration of the neighbor ob-
= {t v (dA)} d server as
1 a = c.
1 2
(= is by definition (1.16) of integrals of 1-forms over curves, = Let us compute this quantity using the properties
3 R
is by the Cartan homotopy formula (1.22), = is due to df = [v, c] = v c c v = 0, v v = 0, v(p).
0 for a function f that vanishes at the endpoints of the curve,
and = is by definition of ). Putting the two terms together We find, evaluating all the quantities at the point p,
4
t
and evaluating the derivatives on the initial curve , we find a = v v c = v c v = R(v, c)v + c v v
Z 2
d v = R(v, c)v.
S[; A, g] = mg(t, v p )d
d =0 1 g(v, v)
Z 2 This is called the equation of geodesic deviation. In compo-
q {t v (dA)} d. nents, this equation can be written as
1
a = R v c v .
The integrand above is linear in the vector field t, and thus
can be expressed as an auxiliary 1-form V applied to t, The equation of geodesic deviation shows that the Rie-
Z 2 Z 2 mann tensor R has a direct physical manifestation: Inertial
d geodesics will accelerate with respect to each other (exhibiting
S[; A, g] = (t V ) d = (V t) d,
d =0 1 1
a gravitational tidal effect) iff the Riemann tensor is nonzero.
v Furthermore, the entire Riemann tensor R can be recov-
V m gv p qv dA. ered if the vector-valued quantity R(v, c)v is known for arbi-
g(v, v)
trary v and c; this is derived in the following statement. Thus,
The action S[; A, g] is extremized if the derivative dS/d = 0. all the components of the Riemann tensor at a point can be (in
This can happen for an arbitrary field t only if the 1-form V principle) measured using only (a finite number of) geodesic
vanishes for all . Thus we obtain the equation deviation experiments within an infinitesimal neighborhood
of that point.
v
g v p
V = m qv dA = 0. Statement 1.9.5.1: The Riemann tensor at a point can be de-
g(v, v)
termined from a finite number of measurements of geodesic
This is the equation of motion for a charged particle in an ex- deviation using only trajectories of massive particles in a sin-
ternal electromagnetic field. gle reference frame. In other words, R(a, b, c, d) can be ex-
We may rewrite this equation in a more familiar form by pressed through its special values of the form R(v, x, v, y) for
choosing the parameter such that g(v, v) = 1 and noting some future-directed, timelike vectors v and some spacelike
that the 2-form dA is the electromagnetic field tensor (2-form), vectors x, y orthogonal to v. (Proof on page 170.) 
usually denoted F :

g 1 v F .
mv v = q 1.10 Example: hypersurface of
This can be recognized as a covariant form of Newtons law. constant curvature
The left-hand side is the proper acceleration times mass, and
Let x be the usual rectangular coordinates in a flat space Rm
the right-hand side is the Lorentz force. In the index notation,
with metric g, and let us consider the hypersurface x x
this is written as
g(x, x) = A2 , where A is a given constant. This hypersurface
mv v ; = qg F v . will be a sphere if g is a Euclidean metric, or a hyperboloid
if g is pseudo-Euclidean, but for now we assume that g has
Note that the electromagnetic force appears in the right-hand a Euclidean signature and A2 > 0. We shall show that this
side, while the gravitational force is not explicitly introduced hypersurface is an (m 1)-dimensional space of constant cur-
(but is implicitly present because of the covariant derivative vature; this will elucidate the concept of constant curvature.
in the left-hand side). In GR, the effects of gravitation are de- This result holds regardless of the signature of the metric and
scribed not by forces but by the modification of the equations the sign of A2 , as long as A2 6= 0.
of motion due to the presence of covariant derivatives.
1.10.1 Tangent bundle and induced metric
1.9.5 Deviation of geodesics
The vector field n (x) A1 x is, by construction, everywhere
Consider a geodesic congruence with an affine tangent vector normal to the hypersurface and normalized to g(n, n) = 1.
field v and an affine parameter corresponding to the proper (The normalization g(n, n) = 1 holds only on the hypersur-
time, so that g(v, v) = 1. Let us investigate how neighbor- face, but this is sufficient for our calculations.) Vectors t tan-
ing geodesics deviate from each other with . It is natural gent to the hypersurface satisfy g(n, t) = 0; let us call such

51
1 Calculus in curved space

vectors simply tangent. The equivalent condition for tangent only for vector fields tangent to the hypersurface. We note that
vectors t is there is a natural flat connection in Rm , which is acts by the
partial derivatives with respect to the Cartesian coordinates,
t (g(x, x) A2 ) = t g(x, x) = 0
(u v) u v .
for x on the hypersurface.
We shall now see how to express the connection on the
Statement: The commutator of tangent vector fields is again hypersurface through the existing connection in the larger
tangent. space.
Hint: Tangent vectors to a hypersurface f (p) = 0 satisfy The first motivation for deriving the induced connection is
x f = 0. the following heuristic consideration. If t and u are tangent
Derivation: This follows directly from the definition of the vector fields, we can compute u t, but the result is not neces-
commutator: The vector field [x, y] is defined intrinsically sarily a tangent vector field. To make it tangent, we can project
through curves lying within the hypersurface and derivatives u t onto the tangent bundle using the projection operator P ,
of functions along these curves. So [x, y] is a derivative along i.e. we define
some curve within the hypersurface, and thus must be a tan- u t P u t. (1.81)
gent vector field. Here is also an explicit calculation. Suppose The result is the induced covariant derivative u t that has
that the hypersurface is specified by an equation f (p) = 0; values in the tangent bundle of the hypersurface.
then a vector field x is tangent if x f = 0. By definition of the However, it is not immediately obvious that this pro-
commutator we have (for tangent x, y) jected connection is indeed the Levi-Civita connection
that would be defined intrinsically on the hypersurface, with-
[x, y] f = x (y f ) y (x f ) = 0.
out using the already existing connection in a larger space.
 Therefore we give a second, more rigorous motivation: The
Let us construct a self-adjoint projector onto the tangent Levi-Civita connection on the hypersurface is defined by
bundle of the hypersurface, that is, a self-adjoint linear trans- Eq. (1.43) if we substitute h, the induced metric, instead of
formation P such that P 2 = P and P u is everywhere tan- g. Since h = g and the Levi-Civita connection for g is , we
gent for any field u. It is easy to see that the (transformation- obtain for arbitrary tangent vectors x, y, z
valued) tensor field P must be defined as h(x y, z) = g(x y, z). (1.82)
P u = u n g(u, n) (1.79) However, Eq. (1.82) defines the vector x y only through its
scalar product with an arbitrary tangent vector z. The vector
and then we have g(P u, n) = 0. x y is not necessarily tangent, but Eq. (1.82) shows that x y
The condition of self-adjointness, and x y must have equal scalar products with any tangent
vector z (and, of course, x y must be tangent). Therefore,
g(P x, y) = g(x, P y), (1.80) y must be the projection of y onto the tangent bundle, as
x x
shown in Eq. (1.81).
can be motivated by the consideration that the projection of a
Let us note that g(n, t) 0 for any tangent vector t, and
vector x onto the tangent space must preserve the scalar prod-
thus
uct of x with vectors that are already tangent,
0 = u g(n, t) = g(u n, t) + g(n, u t)
g(x, P y) = g(P x, P y) for all x, y. on the hypersurface. Then we can rewrite the induced con-
nection (1.81) using Eq. (1.79) as
This latter requirement expresses the usual geometric pic-
ture of projecting straight down to the surface: a vector x u t = u t ng (n, u t) = u t + ng(t, u n) u t + (u)t,
is modified by subtracting a multiple of the normal vector, (1.83)
which preserves scalar product. By exchanging x with y, we where the tensor is the following transformation-valued 1-
find form,

g(x, P y) = g(P x, P y) = g(P y, P x) = g(y, P x), (u)t ng(t, u n) = u t ; n n .


The tensor is called the connection 1-form or the Christoffel
which is the condition (1.80). Conversely, Eq. (1.80) with the
symbol. Substituting n = A1 x , we get
property P 2 = P gives g(x, P y) = g(P x, P y).
The induced metric on the hypersurface is simply the same 1 1 1
n = , u n = u, (u)t = ng(u, t). (1.84)
metric as in the larger space Rm but restricted to tangent vec- A A A
tors. For convenience of notation, we call the induced metric Statement: The induced connection is in the Levi-Civita
h and define h(u, v) = g(u, v) for tangent vectors u, v. Thus connection on the hypersurface, i.e. for arbitrary tangent vec-
the hypersurface has the structure of a (pseudo-) Riemannian
tors x, y, z we have
manifold. Let us now compute the curvature tensor of this
manifold. x h(y, z) h(x y, z) h(y, x z) = 0,
x y y x [x, y] = 0.
1.10.2 Induced connection Derivation:
Since the hypersurface has a metric (the induced metric h), we
x h(y, z) x g(y, z);
may define the corresponding Levi-Civita connection (also
called the induced connection). This connection is defined h(x y, z) = g(x y, z);

52
1.10 Example: hypersurface of constant curvature

then we use x g(y, z) g(x y, z) g(y, x z) = 0 (the flat and we write out the terms using Eq. (1.83),
connection is compatible with the flat metric g). Torsion-
freeness follows by projecting onto the tangent bundle: x y z = x (y z + (y)z)
= x y z + (x)y z + x (y)z + (x)(y)z;
x y y x = P (x y y x) = P [x, y] = [x, y] , [x,y] z = [x,y] z + ([x, y])z;
[x , y ] z [x,y] z = [x , y ] z [x,y] z + [(x), (y)] z
the last equality holds since [x, y] is tangent. 
+ (x y y x [x, y])z + (x )(y)z (y )(x)z
Self-test question: In the above calculation, the connection = [(x), (y)] z + (x )(y)z (y )(x)z. (1.85)
1-form is obviously a tensor since it is defined in an invari-
ant, geometric way by Eq. (1.84). However, plays the role In the last line we used the fact that is a flat and torsion-free
of the Christoffel symbol because Eq. (1.83) is written in the connection, in order to cancel the underlined terms. (Note the
component notation as v = v + v . It is explained similarity of the last formula to the standard expression for
in most GR textbooks that the Christoffel symbol is not a the Riemann tensor in the component notation, R.... = . ...
tensor. Is there a contradiction? . ... + ... ... ... ... ; see Eq. (A.11) in Appendix A.)
Now we substitute Eq. (1.84) into Eq. (1.85) and find (for
Hint: In the index-free approach, every quantity is a tensor.
tangent vectors x, y, z, t)
Answer: Our tensor is defined using the coordinate-
based connection which depends on a particular, fixed co- 1 1
(x)(y)z = ng(x, (y)z) = ng(x, n(...)) = 0,
ordinate system in Rm , which is in the present case the natural A A
Cartesian coordinate system {x }. If we pass to a different co- 1 1
(x )(y)z = (x n)g(y, z) = 2 xg(y, z),
ordinate system {y }, say to the spherical coordinate system, A A
the components of the tensor will have to be transformed ac- 1
R(x, y)z = (xg(y, z) yg(x, z)) ,
cording to the usual tensor law. The formula for will not be A2
the same in the coordinate system {y }. One could say that 1
R(x, y, z, t) = (g(x, t)g(y, z) g(x, z)g(y, t))
the connection /x itself is no longer /y in a different A2
coordinate system {y } and its components also need to be 1
= (h(x, t)h(y, z) h(x, z)h(y, t)) , (1.86)
transformed. The standard GR textbooks, on the other hand, A2
define the components by a fixed, non-covariant formula
since g(u, v) = h(u, v) for tangent vectors u, v. In the index
in an arbitrary coordinate system. This procedure does not de-
notation, the above Riemann tensor is
fine a tensor. So then it is no surprise that the components
do not transform as components of a tensor.  1
R = (h h h h ) .
A2
Practice problem: A transformation-valued tensor field S is
defined in the tangent bundle of the hypersurface by its action A manifold with a Riemann tensor of form (1.86) is called
on tangent vector fields t in the following way, a space of constant curvature. Note also that the second
Bianchi identity forces A = const for a Riemann tensor of the
form (1.86).
1
St = h(v, t)v t, Our result is that the hypersurface x x = A2 has constant
17 curvature. The actual magnitude of the curvature is A2 if the
metric g is Euclidean and A2 if it is pseudo-Euclidean with
where h is the induced metric, v is a given tangent vector field
signature (+ ...), because in the latter case the vector n is
which is normalized, h(v, v) = 1. Compute S, which should
timelike and the metric h is negative-definite. (The cases
be a transformation-valued 1-form.
x x = A2 or g(n, n) = 1 can be treated in a very similar
Solution: For arbitrary tangent vector fields u and t, we have way and the result is essentially the same, with appropriate
sign changes.)
(u S)t = u (St) Su t
  Remark: The following two arguments help to visualize the
1 1
= u h(v, t)v t h(v, u t)v + u t concept of constant curvature.
17 17 The first argument appeals to the intuitive understanding
= h(u v, t)v + h(v, t)u v that a sphere is a perfectly symmetrically curved surface that
  has a constant curvature at all points. The above calculation of
1
= g(u v, t)v + g(v, t) u v ng(u, v) . the Riemann tensor obviously holds for a sphere |x|2 = R2 in
A
the usual Euclidean space. It makes intuitive sense to say that
 a sphere of radius R has a constant curvature equal to 1/R.
Therefore, any space with a Riemann tensor (1.86) is a space
of constant curvature, and the value of the curvature is equal
to 1/A.
1.10.3 Riemann tensor within the hypersurface The second argument is somewhat more abstract. The ten-
sor R = R can be viewed as a symmetric bilinear
Finally, let us compute the Riemann tensor of the hypersur- form in the space of 2-forms, R( (1) , (2) ) = R (1) (2) ,

face. By definition, we have or as a linear transformation in the space of 2-forms,
R , where = is an arbitrary 2-form. The
R(x, y)z = [x , y ] z [x,y] z, space of 2-forms is six-dimensional, and the bilinear form

53
1 Calculus in curved space

R( (1) , (2) ) can be represented as a symmetric 6 6 matrix.


Due to this symmetry, the corresponding transformation is di-
agonalizable. Consider the six eigenvalues 1 (p), ..., 6 (p) of
that transformation as functions of the point p on the mani-
fold. These six scalar functions characterize (in some way) the
magnitude of the curvature of the manifold at various points.
Without a detailed analysis of the significance of the eigenval-
ues j and the corresponding eigenvectors, we may heuristi-
cally expect that, for a manifold of constant curvature, all the
six eigenvalues j should be equal to each other and remain
constant throughout the manifold. Thus the transformation
R must be proportional to the identity map in the space
of 2-forms,
 
R = K, K = const.

(The coefficient K must be constant also by the second Bianchi


identity, once we assume that R has the form given
above.) Then the comparison with the sphere shows that the
numeric value of the curvature is K. 

Practice problem: Consider a surface embedded in Rm ,


specified by an equation f (p) = 0, where f is a given scalar
function and p Rm . Obtain the normal vector field n in
terms of f and the flat metric g in Rm . Compute the induced
metric h and the Riemann tensor R(x, y, z, t) in terms of n and
g.
Solution: The vector n normal to the hypersurface is
1 1 p
n= g (df ), N g 1 (df, df ).
N
In other words, g(n, x) = N 1 x f for any vector x. In com-
ponents:

g f
q
n = , N g ( f ) ( f ).
N
Note that g(n, n) = 1 and so g(x n, n) = 0 for any vector x.
The projection onto the tangent bundle is

P x = x ng(n, x).

The induced connection is equal to projected onto the tan-


gent bundle,

x y = P x y = x y ng(n, x y) = x y + ng(y, x n),

which can be also written as

x y = x y + (x)y, (x)y ng(x n, y).

The connection is the Levi-Civita connection on the hyper-


surface. The curvature of is

R(x, y)z = [(x), (y)] z + (x )(y)z (y )(x)z


= g(y n, z)x n g(x n, z)y n;
R(x, y, z, t) = g(y n, z)g(x n, t) g(x n, z)g(y n, t).

In the index notation, we can write



R = ( n )( n ) ( n )( n ) g g
= ( n ) ( n ) ( n ) ( n ) ,

since the metric g in the Euclidean space Rm is flat. It is


clear that, in general, the hypersurface f (p) = 0 is not a space
of constant curvature. 

54
2 Geometry of null surfaces
Throughout this chapter, we consider a four-dimensional in the subspace but is zero on vectors orthogonal to the sub-
spacetime with a metric g having the signature (+ ), and space. Thus, h is defined in the entire space but can only
the Levi-Civita connection . Most results can be straightfor- see the subspace. The advantage of introducing the partial
wardly generalized to higher dimensions, but it is important metric is that one can work with vectors from the full space,
that the metric has indefinite signature of the kind (+ ...) substituting vectors from the subspace only at the end of cal-
and, as a consequence, that there exist null directions. culations.
Given a self-adjoint projector P onto the subspace, the par-
tial metric h satisfying the above requirements can be ex-
2.1 Null vectors pressed as

Recall that a vector v is called timelike if g(v, v) > 0, null if h(a, b) = g(P a, P b) = g(P a, b).
g(v, v) = 0, and spacelike if g(v, v) < 0. A geodesic curve
2
= 0 for all . (Lightrays (The last equality is due to P = P and self-adjointness.) For
)
( ) is called a lightray if g(,

are worldlines of light or other radiation carried by massless a standard projector (2.1) to the space n where g(n, n) = 1,
particles.) the partial metric is
Null vectors have somewhat peculiar properties which we
h(a, b) = g(a, b) g(n, a)g(n, b).
now review.
Statement 2.1.0.1: a) If n is null and v is orthogonal to n, Now we turn to the case when the vector n is null. Then the
i.e. g(v, n) = 0, then v is either parallel to n, or spacelike. In formula (2.1) fails since g(n, n) = 0, so g(P u, n) 6= 0. Clearly,
other words: the orthogonal complement to a null vector n the transformation P x needs to subtract from x some vector
is spanned by n and two spacelike vectors. (The orthogonal whose scalar product with n is nonzero. So let us try P x =
complement n of a vector n is the subspace consisting of x lg(n, x), where l is a vector such as g(l, n) = 1. This gives
vectors v such that g(n, v) = 0.) g(P x, n) = 0 for any x; however, this operator P is not self-
b) If n is null and v is some vector such that g(v, n) 6= 0 then adjoint, P T x = x ng(l, x) 6= P x, and we need to symmetrize
there exists a linear combination n + v that is timelike, and it. So finally we define the map
another that is spacelike.
Hint: These statements concern vectors in a single tangent P x = x lg(n, x) ng(l, x), (2.2)
space Tp M, so it suffices to consider the Minkowski space
with a standard coordinate basis {t, x, y, z}. where again g(l, n) = 1. It is easy to check that P is now
self-adjoint and that g(P x, n) = 0 for any x. The requirement
P 2 = P yields the condition
2.1.1 Orthogonal complement spaces
0 = g(l, P x) = g(l, x lg(n, x) ng(l, x)) = g(l, l)g(n, x),
The preceding statement 2.1.0.1 shows that the orthogonal
complement n to a null vector n is a three-dimensional sub- which can be satisfied for all x only if l itself is null, g(l, l) = 0.
space spanned by n and two other spacelike vectors. For cal- Thus the self-adjoint projector (2.2) is, by necessity, also a pro-
culations, it is convenient to have a projection operator onto jector onto the orthogonal complement of another null vector
this subspace. We shall now investigate the properties of such
l. The image of P is therefore the set (l, n) of vectors orthog-
projections.
onal to both n and l, which is not the entire n but a two-
We begin with an easier case: the projection onto n , where
dimensional subspace within n . Accordingly, the trace of P is
n is not null. Recall that a self-adjoint projector is an operator
equal to 2:
P such that g(P x, n) = 0 for all x (any vector x is projected
onto n ), P 2 = P (the projection remains projected), and Tr P = 4 g(n, l) g(n, l) = 4 1 1 = 2.
the bilinear form corresponding to P is symmetric, g(P x, y) =
g(x, P y) for all vectors x, y (self-adjointness; see Sec. 1.10.1 Statement: For null vectors n, there exists no self-adjoint
for a motivation). For a unit vector n, there is a unique self- projector onto the full subspace n .
adjoint projector onto n , given by the standard formula Derivation: If P were such a projector, we would have P n =
n. Consider a vector v such that g(n, v) 6= 0. By assumption
P x = x ng(n, x). (2.1) the vector P v must be orthogonal to n, and then the condition
of self-adjointness yields a contradiction:
(See also Eqs. (1.79) and (1.80) in Sec. 1.10.1.) It is easy to com-
pute the trace of P (see Sec. 1.7.3 for details), 0 6= g(n, v) = g(P n, v) = g(n, P v) = 0.

g(P x, y) = g(x, y) g(n, x)g(n, y); In other words, we cannot project v while keeping the scalar
Tr P = Tr(x,y) g(P x, y) = 4 g(n, n) = 4 1 = 3. product of v and n constant.
The orthogonal complement subspace n has a naturally
For calculations in a subspace, it is convenient to define a induced metric, which is just the restriction of the full metric
partial metric h which coincides with the metric g for vectors g onto n . However, the induced metric is degenerate since n

55
2 Geometry of null surfaces

is orthogonal to every vector in the subspace. It is awkward The change of the area along the flow lines of n is described
to work with a degenerate metric; for instance, tensors speci- by the directional derivative n A(s1 , s2 ).
fied only through their scalar products with vectors from n It is easier to compute n A(s1 , s2 ) if we note that A(s1 , s2 ) is
become ambiguous (defined up to multiples of n). To avoid equal to the 4-volume spanned by s1 , s2 and two unit vectors
this inconvenience, we may further restrict the subspace n x, y orthogonal to both s1 and s2 . For instance, we may choose
x to be spacelike and y to be timelike, g(y, y) = 1 = g(x, x).
to a two-dimensional subspace (l, n) which is the image of
a projector P . The induced metric is nondegenerate on the The 2-volume A(s1 , s2 ) is then interpreted as the area mea-

subspace (l, n) . However, this subspace is not uniquely se- sured in the reference frame of an observer moving along the
lected since it depends on the choice of a second null vector, l. timelike direction y. (Of course, the area A(s1 , s2 ) is observer-
Despite these difficulties, we will obtain unambiguous results independent since it is defined regardless of the choice of the
that are independent of the freedom in the choice of l. auxiliary vectors x, y.) The 4-volume spanned by s1 , s2 , x, y
Let us now compare the projectors for the spaces v , n , is expressed through the Levi-Civita symbol as
and s , where the vectors v, n, s are respectively timelike, A(s1 , s2 ) = (s1 , s2 , x, y). (2.5)
null, and spacelike:
Since is totally antisymmetric, the expression (a, b, c, d) can
Pv x = x g(v, x)v, g(v, v) = 1, Tr Pv = 3; be interpreted as a linear function of the totally antisymmet-
Pn x = x lg(n, x) ng(l, x), g(n, l) = 1, Tr Pn = 2; ric (4,0)-tensor a b c d, where x y is the exterior (an-
tisymmetric) product of vectors. Thus, the vectors x and y in
Ps x = x + g(s, x)s, g(s, s) = 1, Tr Ps = 3.
Eq. (2.5) can be replaced by their linear combinations x and
We find that the timelike and the spacelike cases are quite sim- y as long as x y = x y . It turns out to be convenient for
ilar but differ from the null case. The partial metrics are the calculation of n A to choose the vectors x , y as x = l
and y = n, where l is the null vector orthogonal to both s1
hv (a, b) = g(a, b) g(v, a)g(v, b); (2.3) and s and related to n by g(l, n) = 1.
2
hn (a, b) = g(a, b) g(n, a)g(l, b) g(l, a)g(n, b); (2.4)
Statement: Indeed the choice x = l, y = n is possible,
hs (a, b) = g(a, b) + g(s, a)g(s, b).
i.e. x y = l n.
We can also write the partial inverse metrics, which are (2,0)- Derivation: Write the vectors x = l, y = n as linear combi-
tensors obtained from the above partial metrics by using the nations of x, y with unknown coefficients:
map g, according to l = a x + b y, n = a x + b y.
1 1 2 2
1
h (1 , 2 ) h(
g1 , g2 ), Since g(x, y) = 0, the conditions g(l, l) = g(n, n) = 0 give
where 1 and 2 are arbitrary 1-forms. We find a21 = b21 , a22 = b22 ,
h1
v = g
1
v v, so we could choose a1 = b1 , a2 = b2 (the signs must differ, or
else l will come out parallel to n). Then we have 1 = g(l, n) =
h1
n =g 1
n l l n,
2a1 a2 and
h1
s =g 1
+ s s.
l n = 2a1 a2 x y = x y.
Note that the partial metrics are not invertible, so we do not
Thus the area is expressed as
talk of inverse partial metrics.
It is important to investigate the signature the partial met- A = (s1 , s2 , l, n).
rics on the subspaces where they are nondegenerate (the par- Since s1,2 are connecting vectors, the derivative of the area A
tial signature). The easiest way to deduce the signature is to along the lines of n is most conveniently computed by using
choose a suitable basis containing the vectors v, n, or s. For the Lie derivative,
the metric hv we choose an orthogonal basis {v, s1 , s2 , s3 },
where s1 , s2 , s3 are all spacelike. Then we find hv (sj , sk ) = dA
= Ln A = Ln (s1 , s2 , l, n)
jk , so the partial signature of hv is ( ). For the met- d
ric hn , we choose a basis containing the vectors {l, n, s1 , s2 }, = (divn)(s1 , s2 , l, n) + (s1 , s2 , Ln l, n).
where s1 , s2 are spacelike and orthogonal to each other and to We need to retain the last term above because l is not neces-
l, n. Then we find hn (..., l) = hn (..., n) = 0, while hn (sj , sk ) = sarily a connecting vector for n. Let us compute the scalar
jk , and therefore the partial signature of hs is (). Finally, product of Ln l [n, l] with n,
for the metric hs we choose an orthogonal basis {s, v1 , s2 , s3 },
where v1 is a timelike vector and s2 , s3 are spacelike. Thus the g(n, [n, l]) = g(n, n l) g(n, l n)
partial signature of hs is (+ ). 1
= n g(n, l) g(n n, l) l g(n, n) = 0.
2
2.1.2 Divergence of a null vector field Therefore [l, n] is a linear combination of n, s1 , and s2 . Hence,
(s1 , s2 , Ln l, n) = 0 regardless of the choice of l. (This is to be
We have seen that there are two spacelike connecting vectors expected since A(s1 , s2 ) and n A(s1 , s2 ) are defined indepen-
s1 , s2 orthogonal to n within the subspace n . Thus we can dently of the choice of l.) Finally, we find
visualize how the 2-volume (i.e. area), A(s1 , s2 ), of the paral-
dA
lelogram spanned by these two vectors propagates along the = (divn)A. (2.6)
null direction n. From elementary geometric considerations, d
this area can be expressed as The divergence divn is thus interpreted as the relative rate of
change in the area spanned by the vectors s1 , s2 as this area is
A(s1 , s2 ) = |g(s1 , s1 )g(s2 , s2 ) g(s1 , s2 )g(s2 , s1 )|1/2 . being transported by n.

56
2.2 Null surfaces

2.2 Null surfaces Example: Let us produce some explicit examples of non-
integrable vector fields.
Null surfaces (3-dimensional hypersurfaces orthogonal to a In flat space with polar coordinates, an example of a non-
null vector field) play a major role in relativity. Before con- integrable field would be /. In Cartesian coordinates, this
sidering null surfaces, we shall spend some time studying 3- would be v = y/x x/y which is non-integrable because
dimensional hypersurfaces is general. gv = ydx xdy and d( g v) = 2dy dx 6= 0. The field v is not
a potential field since its flow lines are closed circles.
A form is closed if d = 0. Since dd = 0, every exact
2.2.1 Three-dimensional hypersurfaces form is closed, and locally (but not necessarily globally!) every
A convenient way to define a 3-dimensional hypersurface is closed form is exact. So a field v is (locally) integrable if it is
through an equation f (p) = 0, where f (p) is an auxiliary func- dual to a 1-form gv which is closed,
tion. In this way the hypersurface can be visualized as the lo-
cus of constant f . A vector t is tangent to the hypersurface of g v) = 0.
d(
constant f if the function f remains constant in the direction
Let us now motivate this well-known condition by more intu-
of t, i.e. if t f = 0. Since x f is a linear function of x, there
itive considerations.
exists a vector n such that
We are trying to find a function f (p) such that v is the gra-
1 dient of f . Such a function could be found by integrating the
x f (df ) x = g(n, x), n = g (df ) , (2.7)
work done by the force field v from some initial point p0
1
where g is the map from 1-forms into vectors. The vector to the point p. The work corresponding to an infinitesimal
n is called the contravariant gradient of f and is the normal displacement l in the direction of a vector l is
vector to the hypersurface because it is orthogonal to every
g(v, l) = (g v) l,
tangent vector t,
g(n, t) = t f = 0. where gv is the 1-form corresponding to the vector v. [Recall
Note that the vector field n is well-defined not only on the hy- that the 1-form gv acts on vectors a as ( g v) a g(v, a).] So
persurface but in the entire space; away from the hypersurface our candidate function f would be defined by integrating this
f (p) = 0, the vector n is the normal vector to hypersurfaces 1-form along a path leading from p 0 to p:
f (p) = f0 for other values of f0 . Z p
Below we shall almost always work with 3-dimensional hy- f (p) = gv.
p0
persurfaces in 4-dimensional spacetimes, and for brevity we
shall call them simply surfaces. If the function f is well-defined then we will have df = gv as
required. However, the function f is well-defined only if the
2.2.2 Integrable vector fields value f (p) is independent of the path of integration, which is
Hthe case only if the integral along any closed loop L vanishes,
Not every vector field v admits a family of hypersurfaces for L gv = 0. By the integration theorem (a generalization of
which v is the normal vector at each point. This property is Stokess law), I Z
called hypersurface orthogonality. A vector v is hypersur-
gv = d( g v),
face orthogonal if there exists a scalar function f such that v L A
is orthogonal to every vector t tangent to the surfaces of con- where A is a surface enclosed by the loop L. Thus the two-
stant f : form d gv must be everywhere equal to zero (this follows by
considering infinitesimal loops around each point).
g(t, v) = 0 for any t such that t f = 0. (2.8)
Let us now obtain a more convenient form of this condi-
A general condition for hypersurface orthogonality is given tion. The 2-form dgv acts on arbitrary vectors a, b according
by the Frobenius theorem (see Sec. 2.2.3). We shall now to Eq. (1.21),
motivate and derive a weaker condition called integrability, gv) (a, b) = a gv(b) b gv(a) gv([a, b])
(d
which is sufficient for many purposes.
In the language of classical mechanics, the force field v = a g(v, b) b g(v, a) g(v, [a, b])
has a potential f if v is the contravariant gradient of f . In a g(v, b) b g(v, a) g(v, [a, b])
the language of forms, we say that v is dual to the 1-form df = g(a v, b) g(b v, a).
and write v = g1 (df ). In that case, we have t f g(t, v)
for any vector t, and it is clear that v is the normal vector for The condition we are looking for is therefore
surfaces of constant f . A vector field v is integrable if there
exists a scalar function f such that v is dual to df . g(a v, b) g(b v, a) = 0. (2.9)
The analogous property for 1-forms is called exactness. A
Written in the index notation using the covariant components
1-form is exact if = df for some scalar function f . Then
of v, this condition becomes
we may say that f is the integral of , and that a vector field
is integrable when it is dual to an exact 1-form. v v = 0.
Qualitatively, the flow lines of an integrable vector field v
are good coordinate lines. An immediate example of an in- This condition is reminiscent of the curl (rotation), v, in
tegrable vector field is a coordinate basis vector field ej three-dimensional vector analysis, where a vector field is a
/xj , where j = 0, 1, 2, 3. The vector ej is orthogonal to the gradient if its curl vanishes. By analogy, the 2-form d g v is
surface of constant xj and is dual to the exact 1-form dxj . called the rotation of the vector field v.

57
2 Geometry of null surfaces

Another way to express the integrability condition is to de- condition is written in the language of differential forms as
fine the v-dependent bilinear form
d = 0, where gv,
B(v) (a, b) g(a v, b), (2.10)
and gv is the 1-form dual to the vector v. Let us motivate
and to require that B(v) be a symmetric bilinear form, the formulation of the Frobenius theorem before we begin to
B(v) (a, b) = B(v) (b, a). Note that in general, the rotation 2- prove it.
form d gv is twice the antisymmetric part of B(v) . A hypersurface orthogonal vector field v is everywhere par-
allel to the contravariant gradient g1 (df ) of some function f
Remark: We have already seen the bilinear form B(v) in describing a family of hypersurfaces f (p) = const. The con-
Eq. (1.50) when we introduced Killing vectors. For a Killing dition of being parallel is most concisely expressed in the lan-
vector k, the form B(k) is antisymmetric and thus equal to guage of differential forms: Two 1-forms 1 and 2 are par-
1
2 dgk; for an integrable vector field v, the form B(v) is sym- allel if 1 2 = 0. So we are prompted to reformulate the
metric. It follows that B(v) = 0 for an integrable Killing vector condition (2.8) in the following way. Denote by g1 v the
field v; this means a v = 0 for all a, i.e. the vector v is con- 1-form dual to v; then the vector v is hypersurface orthogonal
stant everywhere. The existence of such a vector v is a very if there exists a function f such that df = 0.
special property of the spacetime, indicating a high degree of However, it is inconvenient to involve an unknown func-
symmetry. The flow lines of v are geometrically preferred co- tion f in the condition df = 0. We can try to eliminate the
ordinate lines. For instance, any vectors transported by the unknown 1-form df from this condition by computing
flow of v are parallelly transported. If a timelike Killing vec-
tor is hypersurface orthogonal, which is a weaker property d( df ) = d df = 0,
than integrability, the spacetime is called static because in that
case there is a natural time coordinate t such that v is par- so d is also parallel to df just as is. Heuristically we
allel to /t and thus orthogonal to surfaces of constant t. A may say that both and d are parallel to an unknown 1-
spacetime is called stationary if it admits a timelike (but not form df , so the exterior product of and d must vanish,
necessarily hypersurface-orthogonal) Killing vector. d = 0. Thus we have eliminated the unknown 1-form
df and obtained a condition, d = 0, that conveniently
Statement: An integrable vector field v which is normal- involves only the given 1-form (equivalently, the given vec-
ized, g(v, v) = const, is always geodesic, v v = 0. tor field v). Now we shall begin the proof of the Frobenius
Derivation: (See also the proof of the statement in theorem.
Sec. 2.2.8 where forms are not used.) We use Eq. (2.9) to apply We say that an n-form is parallel to a 1-form 6= 0 if
the 2-form d d( g v) to v and an arbitrary vector a: = 0. The following lemma states a useful property of
parallel forms.
1
(v d) a (d) (v, a) = g(v v, a) a g(v, v) Lemma 1: Suppose is an n-form, n 1, and 6= 0 is a
2
= g(v v, a). 1-form. If is parallel to , there exists an (n 1)-form such
that = . (For n = 1 this will be a scalar function and
For an integrable field v, we have d = 0 and so g(v v, a) = we will have = .)
0 for all a; thus v v = 0. Note that it does not follow from
v v = 0 that d = 0, but only that v d = 0. Thus a normal- Proof: The idea is to think that is parallel to and to
obtain the components of that are not parallel to . To
ized geodesic field is not necessarily integrable.
obtain the components of a form, we can use a basis of vectors.
Statement: The acceleration of a vector field v is defined as Since 6= 0, we can choose a basis {v0 , v1 , v2 , ...} such that
the vector field v v; if the field v is timelike and g(v, v) = 1 v0 = 1 and vj = 0 for j 1. Then for any basis vectors
then v v is interpreted as the acceleration of an observer vsj 6= v0 we have
moving along a worldline of v with respect to the tangent
geodesic line. We show that the acceleration of an integrable ( ) (v0 , vs1 , vs2 , ...) = (v0 ) = (vs1 , vs2 , ...)
vector field is again integrable. (In this case, one recovers a = (vs1 , vs2 , ...) = 0.
limited version of the Newtonian picture, where the acceler-
ation is determined as the gradient of a gravitational poten- Thus is zero on any sets of basis vectors that does not con-
tial, see Sec. 3.1.1.) tain v0 , but (v0 , vs1 , vs2 , ...) may be nonzero. So we define
Derivation: By assumption, v is integrable, so B(v) is sym- the (n 1)-form
metric and for an arbitrary vector x we have
v0 ; (x, y, ...) (v0 , x, y, ...).
g(v v, x) = B(v) (v, x) = B(v) (x, v) It is easy to show that = on any set of basis vectors vj
1 and thus = on any vectors. Note that the form is
= g(x v, v) = x g(v, v).
2 defined up to a multiple of , because a choice of v0 instead of
v0 will yield (v0 v0 ) = 0 and thus
So the 1-form dual to the field v v is a gradient of the scalar
function 21 g(v, v); this is the definition of integrability. = + v0 v0 = + v0 v0 ( ) = v0 v0 .

2.2.3 Frobenius theorem An immediate corollary is: If two forms 1 , 2 are both par-
allel to the same 1-form then 1 2 = 0.
The Frobenius theorem gives a sufficient and necessary con- By Lemma 1, it follows from df d = 0 that = df for
dition for a vector field v to be hypersurface orthogonal. The some 1-form . Thus the forms and d are both parallel to

58
2.2 Null surfaces

some unknown df and then d = 0. This is one direction family of surfaces whose normal vectors are everywhere par-
of the Frobenius theorem. allel to v. This family of surfaces can be described by a func-
It remains to prove the converse statement, namely that if tion , and the vector field normal to the surfaces is g1 d.
d = 0 then there exists a surface which is everywhere Since this field is parallel to v, there exists a scalar function
orthogonal to v. 6= 0 such that v = g 1 d or equivalently = d and
We can try to construct such a surface by choosing some d = d d. Then Eq. (2.11) gives
vector fields t orthogonal to v and following the flow lines
1
of all such t. It is sufficient to choose a basis of vector fields 0 = v d = d(v)d d(v)d (v )d (v)d.
t1 , t2 , t3 orthogonal to v. However, the flow lines of these
vector fields will not necessarily form a single surface. For Since (v) 6= 0 and 6= 0, we can express
instance, we may follow a flow line of a vector t1 and then
a line of t2 , and we shall not necessarily arrive at the same (v )
d = d,
point as when we are following first t2 and then t1 . For in- (v)
finitesimal distances, the discrepancy is in the direction of the
commutator [t1 , t2 ]. This direction must be orthogonal to v and it follows that d = d d = 0.
if all the flow lines are to lie within a single surface. There- It is interesting to see where the argument fails for null fields
fore, surfaces orthogonal to v will exist if for any two vector v: We cannot divide by (v) because (v) = g(v, v) = 0. In-
fields t1 , t2 orthogonal to v, the commutator [t1 , t2 ] is again deed, we can still find and such that = d, but the
orthogonal to v. We shall now see that this is indeed the case transversality v d = 0 yields only v = 0. This condi-
if d = 0. This explanation needs to be made more clear tion can be satisfied by a function whose gradients are or-
and illustrated by a figure. thogonal to the null vector v, in other words, g 1 (d, ) = 0.
Note that a tangent vector t is orthogonal to v if (t) = 0, However, the gradient d does not need to be parallel to ,
and that d(x, y) involves the application of to the com- and thus we might have d = d d 6= 0. An explicit ex-
mutator [x, y], according to Eq. (1.21). This motivates us to ample can be easily found in Minkowski spacetime: Choose
formulate and prove the following statement. = tx and = y, such that d is orthogonal to d. Then the
vector field dual to d is v = y(t + x ), and we have
Lemma 2: If vector fields x and y are everywhere orthogonal v = 0. So v is geodesic, null, and hypersurface orthogonal
to v, and if d = 0, where is the 1-form dual to v, then (to surfaces of constant x t), but non-integrable since it has
the vector field [x, y] is also orthogonal to v. nonvanishing rotation d = dy (dt dx) 6= 0.

Proof: The conditions on x and y can be written as (x) = Statement: A hypersurface-orthogonal field (even normal-
(y) = 0. By Lemma 1, d = for some 1-form . Then ized and even timelike) does not necessarily have an inte-
we have grable acceleration v v.
Hint: It is not necessarily true that dv d 6= 0 even if v =
(d) (x, y) = ( ) (x, y) = (x)(y) (y)(x) = 0. 0.
Derivation: Explicit counterexamples are the 1-forms =
On the other hand, from Eq. (1.21) we find (x + y)dx, = x(dt dx), = cosh x dt + sinh x dx
in the Minkowski spacetime. These 1-forms are dual to
(d) (x, y) = x (y) y (x) ([x, y]) = ([x, y]). hypersurface-orthogonal fields since d = 0, but they do
not have integrable acceleration.
Thus ([x, y]) = 0.
It follows from Lemma 2 that vector fields orthogonal to Statement: A geodesic vector field v is hypersurface-
v will have surface-forming flow lines if d = 0. This orthogonal everywhere if there exists a single 3-surface to
concludes the proof of the Frobenius theorem. which v is orthogonal. The same statement holds also for
Here is another useful property of hypersurface-orthogonal Killing fields k instead of geodesic vector fields.
geodesic fields. Derivation: Recall that v g(v, v) = 0, thus g(v, v) is con-
stant along the lines of v, and so the affine tangent vector v
Statement 1: A hypersurface-orthogonal, normalized, can be rescaled, v v = v, where 6= 0 is a scalar func-
spacelike or timelike, geodesic field v is always integrable. tion such that v = 0 and g( v, v ) = const everywhere. The
vector field v is normalized, geodesic, and orthogonal to the
Proof: For a normalized and geodesic field v, we have 3-surface . Since v is everywhere parallel to v, it is sufficient
v v = 0 and g(v, v) = const, so Eq. (2.9) gives to show that v is hypersurface orthogonal everywhere. Let
be the 1-form dual to v
= gv ; then d = 0 on , and we
(v d)x (d)(v, x) = g(v v, x)g(x v, v) = 0. (2.11) need to prove that d remains zero away from . We shall
now show that Lv ( d) = 0. This would mean that the Lie-
We say that the 2-form d is transverse to v. The proof can
propagated 3-form d remains zero along the flow lines of
now proceed in one of two ways. If we consider the 2-form
when applied to a Lie-propagated basis of connecting vec-
v
v ( d) which vanishes since d = 0, we find
; that property is equivalent to having d = 0
tors to v
0 = ( d) = ( ) d ( d) = (v)d. everywhere. We compute
v v v

(Lv ) x = Lv ( x) Lv x =
Since the field v is by assumption not null, (v) g(v, v) 6= 0,
and it follows that d = 0, i.e. the field v is integrable. An- v, x) g(
= v g( v, [ v, x v
v, x]) = g( )
other, more constructive way to prove the statement is the 1
= x g( ) = 0,
v, v
following. Since v is hypersurface-orthogonal, there exists a 2

59
2 Geometry of null surfaces

since g( ) = const. Recall that L commutes with d on any Self-test question: Do we need separate considerations for
v, v
n-forms, hence surfaces whose tangent space is spanned by (a) a null vector,
Lv d = dLv = 0, a timelike vector, and a spacelike vector, (b) two null vectors
and then we have and a spacelike vector, (c) three null vectors?
Answer: No. Each of the described surfaces is in fact time-
Lv ( d) = (Lv ) d + Lv d = 0. like, i.e. orthogonal to a timelike vector. Their tangent spaces
Alternatively, here is a geometric argument. We can con- are also spanned by one timelike and two spacelike vectors.
struct a family of 3-surfaces ( ) which are surfaces of con-
stant affine parameter measured along flow lines of v , where 2.2.5 Examples of null surfaces
we set = 0 on . We can choose a basis of connecting vectors
] = 0 and cj are tangent to and
cj , j = 1, 2, 3, such that [cj , v The simplest example of a null surface is the null cone of ligh-
thus orthogonal to v on . These connecting vectors cj will trays emitted from a point. Let us describe this surface explic-
remain orthogonal to v for all (see Lemma 1 in Sec. 2.2.6, itly in a flat Minkowski spacetime. The lightcone emitted at
where we need to replace n by v ). Therefore, the 3-surfaces the origin is specified by the equation
( ) are everywhere orthogonal to v and thus also to v. Now p
consider the case of a Killing field k and define gk. It is tr t x2 + y 2 + z 2 = 0. (2.12)
sufficient to prove that Lk ( d) = 0. Since k is a Killing
vector, we have Lk g = 0, so The vector field n normal to this surface is

Lk = Lk gk = gLk k = 0. x y z
n= + + + .
t r x r y r z
We again recall that L commutes with d, thus
It is easy to check that g(n, n) = 0. The tangent space to the
Lk ( d) = Lk d = dLk = 0.
lightcone at a point (t, x, y, z) is spanned by the null vector n
In summary, we have the following relationships between and the two spacelike vectors / and /,
integrability, hypersurface orthogonality, and metric proper-  
ties of a vector field. The condition for hypersurface orthog- 1
p y x ,
onality, d = 0, is weaker than the condition for inte- x2 + y 2 x y
grability, d = 0. An integrable field v = g1 (df ) is always z



x2 + y 2
hypersurface orthogonal (to the hypersurfaces of constant f ). = x +y .
r x y r z
An integrable field v which is normalized, g(v, v) = const,
is always geodesic. A hypersurface orthogonal, normalized, Another example of a null surface is the horizon of a
geodesic field is integrable unless it is null (a geodesic field nonrotating, spherically symmetric black hole (Schwarzschild
can always be rescaled to make it normalized). The accelera- spacetime). The metric is given by Eq. (1.36). It is well known
tion v v of an integrable field v is again integrable. that the black hole horizon is the surface r 2m = 0. How-
ever, it is not easy to analyze vectors at r = 2m since the
2.2.4 Null surfaces Schwarzschild coordinates are singular there. To show that
the horizon surface is null, we note that its tangent space is
A null surface is a 3-surface whose normal vector n is every- spanned by the two spacelike vectors /, /, and also by
where null and nonzero, g(n, n) = 0, n 6= 0. We shall now the vector which is null at r = 2m (but not at other val-
t
study the geometry of null surfaces since they have certain ues of r). Since /t is null, it is also the normal vector to the
special and useful properties. surface and hence the surface is null.
One unusual fact is that the normal vector is at the same
time tangent to the null surface since g(n, n) = 0. The tangent
space to a null surface is thus spanned by n and two other 2.2.6 Lightcones are null surfaces
basis vectors. The latter two vectors must be orthogonal to
We can consider the surface formed by all the null geodesics
n, and thus both must be spacelike. In fact, all the tangent
emitted from a given initial spacetime point p0 in all direc-
vectors that are not parallel to n are spacelike (see the State-
tions. This surface is called the lightcone emitted at p0 .
ment 2.1.0.1). Thus, a null surface is spanned by the null di-
rection n and two spacelike directions orthogonal to n. Statement: The lightcone emitted at a point p0 is always a
Since the tangent space contains its normal vector, the in- null surface, in an arbitrary curved spacetime.
duced metric in the tangent space is degenerate since the vector We begin the proof by a preliminary calculation.
n is orthogonal to every vector in the tangent space. This leads
to an ambiguity of the induced Levi-Civita connection on the Lemma 1: If n is normalized, g(n, n) = const, and geodesic,
null surface: The formula (1.43), which is the only constraint n n = 0, and if c is a connecting vector for n, then g(n, c) is
on the Levi-Civita connection, does not define the vector x y constant along the flow lines of n.
uniquely through its scalar product with an arbitrary vector
z. Thus there are many possible Levi-Civita connections on a Proof: Since c n = n c and n n = 0, we have
null surface. 1
Another peculiarity is that there is no unique projector onto n g(n, c) = g(n, n c) = g(n, c n) = c g(n, n) = 0.
2
the tangent space (see Sec. 2.1.1). A satisfactory projector (2.2)
can be found only after choosing another null vector field l Lemma 2: The tangent space to a lightcone at any point p is
normalized by g(n, l) = 1, and then the image of the projec- spanned by one null vector and two spacelike vectors that are
tion is a two-dimensional subspace within the tangent space. both orthogonal to the null vector.

60
2.2 Null surfaces

Proof: We first note that in the immediate vicinity of the ini- examples are the functions
tial point p0 the lightcone can be viewed, in suitable local coor- p
dinates, as a portion of the Minkowski lightcone (2.12). Thus (t, x, y, z) = t r t x2 + y 2 + z 2 ,
the tangent space in the vicinity of p0 is indeed spanned by or (t, x, y, z) = t x,
one null vector and two spacelike vectors orthogonal to it. We
now need to extend this property away from the initial point in the Minkowski spacetime (verify this!). A null function (p)
p0 . defines a null hypersurface (p) = 0 for each value of 0 .
By assumption, the lightcone surface is built entirely from A null function can be physically interpreted as a retarded
the congruence of lightrays emitted at p0 . Translating this as- time carried by lightrays emitted by some sources: If a ligh-
sumption into the mathematical language, we say that every tray reaches an event p, it was emitted at the time (p) at its
point on the surface lies on some lightray from the congru- source. (Here we do not assume that all the rays are emitted
ence, and thus the surface can be parametrically described by by one and the same source.) To remember this more easily,
a function recall that time stands still for an observer moving at light-
( ; s1 , s2 ) : R R R M speed, so a lightray preserves its emission time and shows
it at a point p as the value of (p).
such that ( ; s1 , s2 ) is a lightray for fixed s1 and s2 . (Note A peculiar property of null functions (p) is that the nor-
that there is a two-dimensional sphere of initial directions for mal vector n = g1 (d) is tangent to the surface of constant
emitting the lightrays, hence two parameters s1,2 .) In partic- . Thus a null surface (p) = naturally comes with a con-
0
ular, n = / is the tangent vector field which is null, and gruence of null lines, which are the flow lines of n within the
the vectors u1 /s1 , u2 /s2 are everywhere linearly surface and the null generators of the surface.
independent, connecting vectors for n (see the Statement 13).
Thus we have the properties
2.2.8 Null functions generate null geodesics
g(n, n) = 0, n n = 0,
The usefulness of null functions comes from the fact that any
[n, u1 ] = [n, u2 ] = 0. null surface consists not just of some null lines, but actually of
null geodesics (see the following statement). These geodesics
By construction, the connecting vectors u1,2 are always tan- are the null generators of the surface. Since any null surface
gent to the surface, and so is n. Thus the vectors (n, u1 , u2 ) is a surface (p) = for some null function , it will follow
0
are always a basis in the tangent space to the surface. The that null functions provide a complete description of any null
vector n is everywhere null by assumption, so it remains to surface consisting of lightrays.
show that u1 and u2 remain spacelike throughout the surface,
at any point p. Statement: If (p) is a null function with the normal vector
The Minkowski description of the lightcone holds for an n, the flow lines of n are null geodesics. More generally, if
infinitesimal vicinity of p0 , and thus we have g(n, u1 ) = a null curve ( ) is contained within a surface (p) = const
g(n, u2 ) = 0 near p0 . Then it follows from Lemma 1 that then ( ) is a null geodesic.
g(n, u1 ) = g(n, u2 ) = 0 everywhere along the flow lines of
n. Vectors orthogonal to a null vector and not parallel to it Proof: A stronger statement was proved in Sec. 2.2.2: If n is
must be spacelike (see Statement 2.1.0.1). Since each point p integrable and g(n, n) = const, where the constant is not nec-
on the surface can be reached by some flow line of n starting essarily zero, then n n = 0. We shall give a direct derivation
from p0 , the vectors u1,2 remain spacelike and orthogonal to that does not use forms. To show that n n = 0, we shall prove
n everywhere on the surface. that g(x, n n) = 0 for an arbitrary vector x. We compute

Corollary: The null tangent vector n at a point p is also the g(x, n n) = n g(x, n) g(n x, n)
normal vector to the surface. = n g(x, n) g(x n + [n, x], n).

Proof: By Lemma 2, the tangent space at p is spanned by n (We have used torsion-freeness and compatibility of with
and two spacelike connecting vectors u1 , u2 that remain or- the metric.) Since by assumption g(n, n) = const and n =
thogonal to n. Thus, n is orthogonal to every vector in the g1 d, we have g(n, x n) = 0 and g(x, n) = x [see also
entire 3-dimensional tangent space, which means that n is the Eq. (2.7)]. Thus we find
normal vector to the surface.
We have proved that a lightcone is a null surface. A light- g(x, n n) = n g(x, n) g([n, x], n)
cone is by construction a surface built from lines (the individ- = n (x ) ([n, x]) = x (n )
ual lightrays). These lines are called the null generators of the = x g(n, n) = 0.
surface. Now we shall use the formalism of null functions
to prove the converse statement: Any null surface is always It follows that the flow lines of n are geodesics for integrable
built from null generators (even though not every null surface null n. A tangent space to a null surface contains only one
is a lightcone emitted from one point). null direction, and that direction is parallel to n. So any null
curve contained within the surface must be a flow line of n
2.2.7 Null functions and hence a geodesic.

A null surface can be defined through an equation (p) = 0,


2.2.9 Every lightray comes from null functions
and then the function (p) is called a null function. More
generally, a null function (p) on the spacetime is any function The following statement demonstrates that any null geodesic
such that d is a null covector, g 1 (d, d) = 0. Immediate can be obtained from a suitable null function.

61
2 Geometry of null surfaces

Statement: For any null geodesic curve ( ) there exists a seen in Sec. 1.6.9 that the divergence divu of a vector field u
null function (p) such that = g1 (d) on the curve, so that measures the relative rate of change of infinitesimal volume
( ) is a null generator of (p) = 0. transported by the flow of u. Consider an small region of
space around a point p, and imagine that the entire region
Proof: There are of course many possible null functions sat- moves with the flow (we call this a comoving region). This
isfying this condition. We shall construct one such null func- region can be visualized as consisting of a number of small,
tion (p) by choosing an arbitrary timelike curve intersect- noninteracting particles, each moving along its flow line. The
ing at a point p0 . This curve will be the worldline of a light region will be generally distorted by the flow, so that not only
source. Suppose that the source emits lightrays in all direc- its volume but also its shape will change. The distortion of the
tions from each point of the curve , and that ( ) is one of shape is measured by tensors called the shear and the rotation,
these lightrays emitted from p0 . If is a suitable parameter which we shall now introduce.
on the curve (), such that the tangent vector is normalized, The shape of the comoving region is naturally characterized
)
g(, = 1, then is the observers proper time measured
by connecting vectors c. For convenience, we can choose c to
along the curve (). We may suppose that the point p0 cor- be orthogonal to u at the initial point p. It then follows from
responds to = 0. Then we define the desired null function Lemma 1 in Sec. 2.2.6 that c remains orthogonal to u along the
(p) as the retarded time carried by the lightrays from the flow lines of u.
source. More precisely, we define (p) = if the point p is There would be no distortion of the region if every con-
intersected by a lightray emitted from the point (). For in- necting vector c were parallelly transported by the flow, i.e. if
stance, we then have (p0 ) = 0. The function (p) is well- c = 0 for every connecting c. Of course, in general we ex-
u
defined in some neighborhood of the curve ; more precisely, pect that c 6= 0 at least for some c. Thus, the distortion of
u
in a neighborhood where the lightcones are well-behaved, the volume around the point p during an infinitesimal time
i.e. do not form caustics and fill the entire space around the will be described by the derivative c along the flow. Since
u
curve , leaving no gaps. By construction, the function (p)
is constant along ( ) and thus the curve ( ) lies within the u c = c u,
surface (p) = 0, and the null vector ( ) is everywhere tan-
gent to the surface (p) = 0. It remains to show that (p) is a the quantity c is a linear function of c that can be described
u
null function; this is so because (p) = const is by construc- by a u-dependent transformation B(u), which is a (1,1) rank
tion a lightcone surface, and we have already proved that any tensor defined by
lightcone surface is null.
u c = c u B(u)c.
2.2.10 Conformal invariance
In components,
Under a conformal transformation of the metric, g g =
B = u u; .
e2 g, null vectors obviously remain null. Therefore null func-
tions remain null under a conformal transformation (i.e. they The transformation B(u) can be also written as
are conformally invariant).
Also, shapes of null geodesics are conformally invariant: a B(u) = v Lv ,
null geodesic line ( ) remains a geodesic after a conformal
transformation of the metric. This statement is now very easy where it is implied that both sides act on a vector field. We call
to prove: a null geodesic is always a flow line of a gradient of B(u) the distortion tensor since it quantifies the deviation of
some null function (p), the gradient operation d is indepen- the shape of a small comoving region from being parallelly
dent of the metric, and so the property of being a null func- transported.
tion is conformally invariant. It follows that the normal vector The divergence of u is equal to the trace of B(u):
n g1 (d) will change to n
= e2 n, and thus the flow lines
are the same as those of n. The affine parameter must of
of n divu u = Tr B(u).
course be changed, but the shape of the lines will remain the
same. Even if the divergence is zero, but B 6= 0, the shape of the
region will be changed by the flow. To analyze the informa-
tion provided by B(u), it is convenient to consider the bilinear
2.3 Raychaudhuri equation form
We now consider a geodesic congruence with a given tan- B(u) (x, y) g(B(u)x, y) = g(u x, y).
gent vector field. The distortion of the congruence along its
own flow lines will be determined by the geometry of the We have seen in Sec. 2.2.2 that B(u) is a symmetric bilinear form
given spacetime. The distortion satisfies a purely kinematical iff u is an integrable vector field.
relation called the Raychaudhuri equation. We shall derive Practice problem: Compute the divergence of the null vec-
separate versions of this equation for timelike and for null tor field corresponding to a lightcone t = r in Minkowski
geodesics. We begin by introducing some geometric quanti- spacetime.
ties characterizing the distortion of geodesic congruences.
Solution: The vector field is

2.3.1 Distortion tensor 1


u = t + (xx + yy + zz ) ;
r
In this section we consider a normalized, geodesic vector field
u which may be either timelike, spacelike, or null. We have using the flat connection, we find divu = 2r1 .

62
2.3 Raychaudhuri equation

2.3.2 Rotation finally,



We may decompose B(u) into symmetric and asymmetric u B(u) (x, y)
parts, = R(u, x, u, y) + g([u,x] u, y) g(u x u, y)
(u) (x, y) + r(u) (x, y),
B(u) (x, y) = B (2.13) = R(u, x, u, y) g(x u u, y)
= R(u, x, u, y) g(B(u)B(u)x, y).
(u) (x, y) 1 B(u) (x, y) + B(u) (y, x) ,

B
2 The trace of the last expression with respect to x, y yields
1 
r(u) (x, y) B(u) (x, y) B(u) (y, x) .
2 u (divu) = Ric(u, u) Tr (B(u)B(u)) , (2.14)
The 2-form r(u) is called the rotation (or the twist , or the vor- where the last term is the trace of the square of the linear trans-
ticity) of the field u. It is easy to see from Eq. (2.9) that formation B(u). This is the initial form of the Raychaudhuri
1 1 equation which we shall later analyze in particular cases.
r(u) = gu d,
d At this point we can use the decomposition (2.13) which
2 2
separates the symmetric and the antisymmetric parts of the
where is the 1-form dual to the vector field u. The rotation
distortion tensor. Note that Tr (AB) is a nondegenerate scalar
tensor r(u) is thus the analog of the familiar three-dimensional
product in the linear space of square matrices. The subspaces
curl (rotation) v of a vector field. A nonzero rotation
of symmetric and antisymmetric matrices are orthogonal with
means that the vector field is not a potential field; in our termi-
respect to this scalar product. (Verify this!) Therefore with the
nology, a nonzero r(u) means that the field u is not integrable. + r the trace Tr(B 2 ) is simplified to
decomposition B = B
To visualize the shape distortion represented by a nonzero
 
r(u) , consider a flat (Minkowski) spacetime in which u is time- Tr (B(u)B(u)) = Tr B(u) B(u)

+ Tr r(u) r(u) ,
like and r(u) is a constant tensor, represented by a constant an-
tisymmetric matrix r in an orthogonal basis. Denote by r(u) where r(u) is the transformation associated with the 2-form
the transformation corresponding to the 2-form r(u) , so that r(u) . If the vector field u is integrable, its rotation vanishes and
for all vectors x, y we have we are left only with the symmetric part B of the distortion
g(
r(u) x, y) = r(u) (x, y). tensor.
A further simplification is possible if we separate the trace-
For simplicity, suppose that the symmetric part of the distor- free part from B. (This symmetric, trace-free part is called
tion tensor vanishes, B (u) = 0. Then B(u) = r(u) and the
the shear of the field u.) However, this operation depends
connecting vector c satisfies the equation sensitively on whether the field u is null. Therefore we shall
d consider the relevant cases separately.
c = B(u)c = r(u) c,
d
with the solution 2.3.4 Shear for timelike congruences
 While the distortion tensor B(u) is not necessarily symmetric,
c( ) = exp r(u) c0 .
it is always transverse to the direction of v:
Since r(u) is transverse to a timelike direction, it is represented
(in an orthogonal basis) by an antisymmetric 3 3 matrix act- B(u) (u, x) = g(u u, x) = 0,
ing in spacelike directions. The exponential of such a matrix 1
B(u) (x, u) = g(u, x u) = x g(u, u) = 0,
is a rotation matrix, thus c( ) is related to the initial value c0 2
by a rotation. We find that all the vectors c are rotated by the for all x. This fact allows us to reduce B to a bilinear form
(u)
same fixed angle, which is proportional to . It follows that in a lower-dimensional space.
the entire shape of the initial region is rigidly rotated around First we decompose the tangent space Tp M into the sub-
an axis. This motivates the name rotation for the tensor r(u) . space u and its complement, the one-dimensional space
spanned by u. This decomposition is conveniently expressed
2.3.3 Introducing Raychaudhuri equation using a projector P onto u and a complementary projector Q
onto u, such that
The Raychaudhuri equation describes the change of the diver-
gence of a geodesic vector field u along its own flow lines. We P + Q = 1, x = P x + Qx for all x, P 2 = P, Q2 = Q.
straightforwardly find
 Note that Tr P = 3 and Tr Q = 1. The above decomposition of
u (divu) = Tr(x,y) u B(u) (x, y); identity is equivalent to the decomposition of the metric g into
 the partial metric hu , see Eq. (2.3), and a suitable complement,
u B(u) (x, y)

= u B(u) (x, y) B(u) (u x, y) B(u) (x, u y) g(x, y) = hu (x, y) + g(u, x)g(u, y).
= u g(x u, y) g(u x u, y) g(x u, u y). The contravariant metric is correspondingly decomposed as
Using the metricity condition and the definition of the Rie- g 1 = h1
u + u u.
mann tensor, we rewrite the first and the third terms as
A bilinear form B(x, y) can be decomposed with respect to
u g(x u, y) g(x u, u y) = g(u x u, y) the direction u using the projectors P and Q as follows:
= R(u, x, u, y) + g(x u u,y) + g([u,x] u, y)
B(x, y) = B(P x, P y)+ B(P x, Qy)+ B(Qx, P y)+ B(Qx, Qy).
= R(u, x, u, y) + g([u,x] u,y); (2.15)

63
2 Geometry of null surfaces

So far we have not obtained any new information about B. orthogonal to the subspace of antisymmetric forms. Therefore
However, if the form B is transverse to the direction of u then we can decompose the trace term as
B(, Qy) = 0, and the above formula simplifies to
1 2  
Tr (B(u)B(u)) = (Tr B(u)) + Tr
(u)
(u) + Tr r(u) r(u) .
B(x, y) = B(P x, P y). 3
The Raychaudhuri equation for timelike geodesics is then
Since the structure of the projectors crucially depends on written as
whether the vector field u is null, we must consider the rel-
1
evant cases separately. In this section we treat the easier case u (divu) = Ric(u, u) (divu)2
when u is timelike. (The spacelike case is very similar but less 3 
important in practice.) Assuming that g(u, u) = 1, the projec- Tr
(u) (u) Tr r

(u) r(u) .
tors are Let us now analyze the individual terms in the above equa-
P x = x ug(x, u), Qx = ug(x, u). tion. Since a trace of a square of a symmetric matrix is non-
The distortion tensor is effectively three-dimensional since it negative, we have Tr ( (u) ) 0. We can demonstrate this
(u)

is nonzero only on the subspace u , where it coincides with fact formally by computing

its projection, denoted by B(u) ,
Tr(x,y) h(
x, y) Tr(x,y) (
x, y) = Tr(x,y) (y, x)
Tr(x,y) h(
y,
x).
B(u) (x, y) B(u) (P x, P y) = B(u) (x, y).
We now need to use a metric decomposition (see Sec. 1.7.3 for
The shear of the vector field u is, by definition, the sym- details of trace calculations). If {s1 , s2 , s3 } is an orthonormal
metric traceless part of the projected distortion tensor. It is basis of (spacelike) vectors, so that h(si , sj ) = ij , then the
convenient to express the shear using the partial metric for inverse metric h1 is decomposed as
the subspace u ,
h1 = s1 s1 s2 s2 s3 s3 ,
hu (a, b) g(P a, P b) = g(a, P b) = g(a, b) g(u, a)g(u, b).
and we get
Note that the partial signature of this metric is ( ) and 3
its trace is equal to 3. If a symmetric bilinear form T (x, y) is X
Tr(x,y) h( x) =
y, h( sj ) 0
sj ,
nonzero only on the subspace u , the traceless part of T is j=1
expressed as
1 since h is negative-definite.
T (Tr T ) h.
3
Statement: We have Tr ( r(u) r(u) ) 0.

Since Tr B(u) = Tr B(u) = divu, we obtain the decomposition Derivation: Same calculation as above, except for
1 r(u) x, y) = Tr(x,y) r(u) (y, r(u) x).
Tr(x,y) r(u) (
B(u) = (divu) h + (u) + r(u) , (2.16)
3
(u) (u) 1 (divu)h; Tr
B (u) = 0. 2.3.5 Shear for null congruences
3
We have seen that for null vector fields n, building a self-
The name shear for the tensor (u) describes a linear defor- adjoint projector (2.2) into n requires a choice of an addi-
mation of the comoving region that changes its shape but does tional null vector field l. It is convenient to normalize l by
not change its volume. g(l, n) = 1. The definition (2.2) of the projector can be in-
The Raychaudhuri equation (2.14) contains the trace of the terpreted as a decomposition of the identity operator into the
square of B(u) , and we can now simplify this term by using projector P and a complementary projector Q,
the decomposition (2.16). Note that every term in that decom-
position is transverse to u and thus is nonzero only on the 1 = P + Q, Qx lg(n, x) + ng(l, x).
three-dimensional subspace on which P projects. It is impor-
tant that this subspace has a metric h with a Euclidean signa- Note that Tr P = 2 and Tr Q = 2. The above formula is equiv-
ture; the overall minus sign of h is unimportant for the present alent to a decomposition of the metric g into the partial metric
consideration. The partial metric h establishes a correspon- hn , given by Eq. (2.4), and a suitable complement,
dence between bilinear forms and transformations, and we g(x, y) = hn (x, y) + g(l, x)g(n, y) + g(n, x)g(l, y).
shall denote transformations by a hat. For example, the bi-
linear form (u) corresponds to the transformation (u) such Note that a tensor decomposition of the inverse metric g 1 is
that1
g 1 = l n + n l s1 s1 s2 s2 , (2.17)
(u) (x, y) = h(
(u) x, y).
where s1 and s2 are orthonormal, spacelike vectors, orthogo-
This relation uniquely determines the vector (u) x within the
nal to both n and l, and spanning the image of P .
subspace u .
Let us now decompose the distortion tensor B(n) using
In the space of bilinear forms with the scalar product
the projectors P and Q, analogously to Eq. (2.15). Since
Tr (AB), the subspace of symmetric traceless forms is orthogo-
B(n) (x, n) = 0 by transversality, we can write
nal to the subspace of pure trace (i.e. bilinear forms propor-
tional to the partial metric h). Both these subspaces are also B(n) (x, y) = B(n) (P x, P y) + B(n) (P x, l)g(n, y)
1 Why dont we also denote the transformation B(u) by a hat? + B(n) (l, P y)g(n, x) + B(n) (l, l)g(n, x)g(n, y).

64
2.4 Applications of Raychaudhuri equation


The first term, B(n) (P x, P y) B(n) (x, y), is effectively a bi- 2.4 Applications of Raychaudhuri
linear form on a two-dimensional space spanned by s1 , s2 .

Note that the trace of B(n) is equal to the trace of B(n) . This
equation
can be seen formally by using the general property 2.4.1 Energy conditions
Tr(x,y) g(x, n)A(y, ...) = A(n, ...);
The Einstein equation (1.68) specifies the matter energy-
in other words, a trace with g(n, x) substitutes n into the rest momentum tensor T that would produce any given space-
of the expression. However, substituting n into any term time. However, not every tensor T corresponds to physi-
yields zero since P n = 0 and g(n, n) = 0. For instance, cally reasonable situations. There are certain conditions that
Tr(x,y) g(x, n)g(n, y) = g(n, n) = 0, are convenient to impose on T ; these are summarily called
energy conditions.
Tr(x,y) B(n) (P x, l)g(n, y) = B(n) (P n, l) = 0.
It is reasonable to suppose that the energy density mea-
Therefore sured locally by an observer is everywhere nonnegative.2 This

Tr B(n) = divn. gives the weak energy condition
When we compute the trace needed for the Raychaudhuri T u u T (u, u) 0 for all timelike u.
equation,
Since null vectors are limits of timelike vectors, a consequence
Tr (B(n)B(n)) Tr(a,b)(x,y)B(n) (a, x) B(n) (y, b) , of this condition is the null energy condition
it turns out that all the terms vanish except the term
T (n, n) 0 for all null n.
Tr B(n) B(n) involving only the projected or transverse

distortion B(n) , In some calculations it is necessary to impose conditions di-
rectly on the Ricci tensor. One such condition is the strong

Tr (B(n)B(n)) = Tr B(n) B(n) . (2.18) energy condition,
(Verifying this is left as an easy exercise.) Therefore all the Ric(u, u) 0 for all timelike u.
other terms in the decomposition of B(n) are unphysical,
i.e. they do not enter the physically significant equations. By the Einstein equation, this is equivalent to
g(u, u)
Remark: For null congruences n, the reduction of the distor- T (u, u) Tr T 0 for all timelike u.
tion tensor B(n) to its transverse B(n)
can be also motivated as 2
follows. The distortion tensor describes the change of shape in Note that (contrary to what the words suggest) the strong en-
the comoving 2-volume spanned by two spacelike connecting ergy condition does not imply the weak one.
vectors s1,2 . It is clear that these connecting vectors should be Finally, the dominant energy condition is
orthogonal to n. Also, connecting vectors could be changed j T u is future-directed and j j 0,
by adding a multiple of n, e.g. s1 s1 + n, where is a
scalar function; this will not modify the orthogonality of s1,2 for all timelike u . This condition means that the energy-
and n. However, connecting vectors that differ only by a mul- momentum flows along future-directed timelike or null lines.
tiple of n correspond to a change of the parameter in nearby By definition, a perfect fluid with the velocity field v, den-
curves of the congruence. Therefore, the connecting vectors s sity , and pressure p is described by the energy-momentum
and s + n carry the same information about the geometry of tensor
the congruence. Thus we should only consider equivalence T = ( + p)v v pg 1 ;
classes of connecting vectors up to a multiple of n. While
there is no canonical choice of representatives for these equiv- T = ( + p) v v pg .
alence classes, a relatively straightforward way is to choose In many situations, the matter EMT can be represented by the
s1,2 within a 2-plane orthogonal to n and an auxiliary null perfect fluid form with a suitably chosen velocity vector v.
vector l. This is the construction used in the present section. It is assumed that v is timelike and g(v, v) = 1. Note that the
We can now separate the symmetric traceless part (n) of trace of the perfect fluid EMT is
the projected distortion, called the shear of n, from the pure
trace part proportional to hn , and from the antisymmetric Tr T = 3p.

part, which is the projected rotation r(n) , Energy conditions can be expressed as inequalities for p and
1 as follows: The weak energy condition is

B(n) = (divn)hn + (n) + r(n) ,
2 0, + p 0,
1   1

(n) (x, y) B(n) (x, y) + B(n) (y, x) (divn)hn , the null energy condition is just
2 2
1  

r(n) (x, y) r(n) (P x, P y) =
B(n)
(x, y) B(n) (y, x) . + p 0,
2
1 1 the strong energy condition is
Note the factor 2 instead of 3 , which is due to the projector P
being two-dimensional. The Raychaudhuri equation for null + 3p 0, + p 0,
geodesics is then written as
and the dominant energy condition is
1
n (divn) = Ric(n, n) (divn)2 0, |p| .
2  
 2 This
Tr (n) Tr r(n)
(n) r(n) . is true for every known classical field but is sometimes violated by
quantum fields.

65
2 Geometry of null surfaces

Statement: The above formulas represent the energy condi- This inequality can be integrated with an initial value
tions for the perfect fluid EMT. divu(0 ) = 0 , which yields
Derivation: Since the energy conditions involve an arbitrary
timelike vector u, it is convenient to decompose that vector 1
divu 1 1 .
with respect to the fluid velocity v as 0 + 3

u = av + w, w v , g(u, u) = a2 g(w, w) = 1, If the initial value 0 is positive, the divergence goes to zero (or
becomes negative), and if it is initially negative, it diverges to
where a is a suitable constant. The value of the vector w negative infinity within a finite time. A negative infinite di-
will be irrelevant for the calculations since T (v, v) = p + , vergence means that the geodesics cross at a focal point . The
T (v, w) = 0, and T (w, w) = p, while the value of a is con- set of all focal points in a given congruence is called a caustic.
strained only by |a| 1. For instance, the weak energy condi- We conclude that the appearance of a focal point is inevitable
tion gives within a finite proper time once the initial value of the diver-
2 2
T (u, u) = a T (v, v) + T (w, w) = a ( + p) p 0, gence is negative, for hypersurface orthogonal geodesic con-
gruences. This statement is known as the focusing theorem.
which holds for all a 1 only if 0 (at a = 1) and + p Here is an example of a hypersurface orthogonal congru-
0 (the limit of large a). Other energy conditions are treated ence to which the focusing theorem applies.
similarly.
Statement 1: A congruence of timelike geodesics emitted
Statement: A massless, minimally coupled scalar field from one point p is hypersurface orthogonal because it is or-
with the EMT thogonal to hypersurfaces of constant proper time .
g 1 (d, d) Proof: Let u be the affine tangent vector field for a con-
T = d d g,
2 gruence of geodesics emitted from p. We may normalize
1 g(u, u) = 1. Since there are three linearly independent con-
T , , g , , ,
2 necting vectors for u at any point, and since these connecting
satisfies the null and the strong energy conditions. vectors form a basis of the tangent space to a hypersurface of
Derivation: For any null vector n, we have constant , it is sufficient to show that any connecting vector
field c is everywhere orthogonal to u. It will then follow that
2 g(n, n) 2 u is orthogonal to the hypersurface of constant . Now, if c is
T (n, n) = (n ) (...) = (n ) 0.
2 a connecting vector for u, it is easy to show that u g(c, u) = 0
For any timelike vector v, we have (see Lemma 1 in Sec. 2.2.6). Thus g(c, u) is constant along
the lines of the congruence, and so it remains to show that
Tr T = g 1 (d, d), g(c, u) = 0 at any one point along each line. Since obviously
1 2 all connecting vectors c must vanish at the point p, we have
T (v, v) g(v, v)Tr T = (v ) 0. g(c, u) = 0 at p for every line of u. (Alternatively, we may
2
consider an infinitesimal neighborhood of p, where the geom-
2.4.2 Focusing of timelike geodesics etry of lines is approximately the same as in the Minkowski
space, where geodesics are straight lines and the statement
The divergence divu of a timelike vector field u can be inter- g(c, u) = 0 is obviously true.) Thus we have proved the state-
preted as the relative rate of change of a comoving 3-volume ment.
orthogonal to u. This is so because the 3-volume spanned by
vectors a, b, c is equal to (a, b, c, x), where x is any normal-
ized vector orthogonal to a, b, c. If a, b, c are connecting vec- 2.4.3 Repulsive gravity
tors for u which are orthogonal to u initially, they will remain Suppose that the energy-momentum tensor contains a cosmo-
orthogonal to u, so we can use u as the vector x. Thus logical constant term
dV 1
= Lu (a, b, c, u) = (divu) (a, b, c, u) = (divu) V. T = T(m) + g ,
d 8
Consider a congruence of timelike geodesics such that the
where T(m) is the matter EMT and > 0 is the cosmologi-
tangent vector field u is hypersurface orthogonal. Then, as we
cal constant. (The cosmological constant is also called dark
have seen, the rotation of u vanishes, and the distortion tensor
energy if it is implied that = (p) is a slowly-varying dy-
is symmetric. The Raychaudhuri equation is simplified to
namical field and its local value near a point p is perceived as a
d 1  cosmological constant.) The effect of introducing a cosmo-
(divu) = R(u, u) (divu)2 Tr (u)
(u) . logical constant is described by the Raychaudhuri equation,
d 3
which receives an extra term
Imposing the strong energy condition, which holds for
 
classical matter, we find that the right-hand side of the d 1 1 1
above equation is nonpositive, indicating that gravity makes (divu) = T(m) (u, u) Tr T(m) (divu)2
d 8 2 3
hypersurface-orthogonal geodesics diverge less. This means
Tr(
(u)
(u) ) + .
that gravity is an attractive force. In fact, we can obtain a
stronger result, The extra term will decrease the rate of focusing, which indi-
d 1 cates that a cosmological constant corresponds to a repulsive
(divu) (divu)2 . gravitational force.
d 3

66
2.5 Null tetrad formalism

2.4.4 Focusing of null geodesics a connecting vector for n, we are motivated to drop the re-
quirement that s1,2 be connecting vectors but instead demand
We now consider a congruence of null geodesics with a orthonormality, g(sj , sk ) = jk . Then we define the complex-
hypersurface-orthogonal tangent vector field n, and we valued null vectors
would like to obtain an analogous focusing theorem. Com-
pared with the timelike case, we have two differences: Firstly, 1 1
m (s1 + is2 ) , (s1 is2 ) .
m (2.19)
the 3-volume spanned by vectors orthogonal to n vanishes 2 2
and so the interpretation of divn is altered; this issue was
treated in Sec. 2.1.2 where we found that divn describes the The fact that these vectors are complex-valued is merely a
rate of change of comoving area. Secondly, the rotation tensor mathematical formality that makes equations more compact;
we expect that every physically measurable quantity is real-
r(n) generally does not vanish for hypersurface-orthogonal
null geodesics. However, Raychaudhuri equation involves valued. The null basis {l, n, m, m} is called the Newman-
the trace term Tr (r(n) r(n) ), and we shall now show that this Penrose null tetrad.
= 1,
The null tetrad is nearly orthogonal since g(m, m)
term vanishes for a hypersurface-orthogonal field n, even
though the rotation tensor r(n) itself does not vanish. g(l, n) = 1, and all the other scalar products vanish. Hence,
the decomposition (2.17) of the contravariant metric g 1 is
We have seen in Sec. 2.3.5 that the trace terms involve only
rewritten through the null tetrad as
the projected distortion tensor [Eq. (2.18)]. Therefore the trace
term will remain the same after we project the rotation tensor g 1 = l n + n l m m
m
m.
onto the subspace spanned by s1,2 ,
   The partial metric h(n) is similarly decomposed as

Tr r(n) r(n) = Tr r(n) r(n) .
h1
(n) = m m
m
m,
We now use the fact that the rotation 2-form r(n) is equal to d,
where g1 n is the 1-form dual to n. By the assumption of and the projector P is
hypersurface orthogonality and the Frobenius theorem, there P x = mg(x, m) mg(x,
m).
exist scalar functions , such that = e d and then
Using the null tetrad decomposition above, the divergence
d = e d d = d . of a hypersurface orthogonal null geodesic field n can be writ-
ten as
The projected rotation tensor acts on vectors x, y as

divn = Tr(x,y) B(n) (x, y)
r(n) (x, y) (d) (P x, P y)
= B(n) (l, n) + B(n) (n, l) B(n) (m, m)
B(n) (m,
m)
= (P y)(d) P x (P x)(d) P y.
= 2B(n) (m, m) 2g(m n, m),

Since (P x) = 0 for all x, we have r(n) = 0. because the tensor B(n) is transverse to n and its projection
Since the projected rotation vanishes, we obtain the Ray- onto the subspace spanned by (m, m) is rotation-free.
chaudhuri equation for null geodesics in the form Note that for a geodesic field n that is not hypersurface or-
thogonal, the quantity 2B(n) (m, m) is complex-valued, and
d 1 
only its real part is actually equal to the divergence of n. How-
(divn) = R(n, n) (divn)2 Tr
(n)
(n) .
d 2 ever, the complex-valued divergence
Imposing the null energy condition, we find R(n, n) 0.
2B(n) (m, m)
Hence, the right-hand side of the above equation is nonpos-
itive and is still useful for the analysis of such cases.
d 1
(divn) (divn)2 . The other ingredient in the Raychaudhuri equation is the
d 2 projected shear tensor (n)
, which can be expressed as
This is the focusing theorem for the null case. The conclusion
is that gravity makes null geodesics focus within a finite inter- 1
(n) (x, y) = B(n) (P x, P y) (divn) h(n) (x, y)
val of the affine parameter . 2
x)g(m,
= g(m, y) + g(m, x)g(m, y) (2.20)

2.5 Null tetrad formalism through the auxiliary (complex-valued) scalar quantity ,

The null tetrad formalism is due to Newman and Penrose and B(n) (m, m),
consists of writing equations in a basis where all four vectors
also called the scalar shear. Another expression for the scalar
are null and as orthogonal as possible with respect to each
shear is
other. This leads to simplifications when writing tensor equa-
tions in components. We shall now introduce the null tetrad g(m, m n) = g(m, [m, n]) + g(m, n m) = g(m, [m, n]).
formalism as an elegant way of expressing the projected dis-
tortion tensor for the case of null geodesics. It follows that m cannot be a connecting vector for n unless
The idea is to change the previously used basis {l, n, s1 , s2 }, the shear vanishes.
where the vectors l, n are null and s1,2 are spacelike con- The completion of a given vector n to a null tetrad is of
necting vectors for n, into a null basis. Since l must satisfy course not unique. Let us now investigate the freedom in
g(l, n) = 1 and so l is not (and generally cannot be chosen as) the choice of the vectors l, m and the resulting uncertainties

67
2 Geometry of null surfaces

in the distortion quantities , . The tetrad vectors are de- By virtue of the following obvious properties,
fined by the requirements g(l, l) = g(m, m) = g(l, m) = 0,
g(n, l) = g(m, m) = 1. The vector m is related by Eq. (2.19) g( n, n) = 0, g( n, m) , g( n, m) 1
m m m ,

to an orthonormal basis s1,2 in the space (l, n) . Note that 2
{l, n, s1 , s2 } is a basis; the set of transformations that preserve g(n m, n) = n g(m, n) g(m, n n) = 0, g(n m, m) = 0,
the orthonormality of the basis is the Lorentz group; so the
admissible transformations of the tetrad vectors are precisely we have
the subgroup of the Lorentz group that leaves the vector n 1
unchanged. Let us describe these transformations explicitly. m n = m + m + n,
2
Suppose that the vectors m and l change to n m = m + n,
mm = m + n + l, where we introduced unknown scalar functions

l l = l + m +
m + n,
g(m n, l), g(n m, m), g(n m, l).
where , , , are complex and , are real. Then it
is straightforward to prove that the orthogonality relations These constants depend on the completion of n to a null
n) = 0, g(m,
g(m, m) = 1, g(l, n) = 1, g(l, m)
= 0, and tetrad and should not enter the final expression. Now we can

g(l, l) = 0 require that = 0, = 1, = 1, = , and straightforwardly evaluate:
= . Therefore, the general form of an admissible transfor-
mation is g(m n, n m) = ,
 
1
mm i
= e m + An, (2.21) [n, m] = m + m + ( )n,
2
l l = l + ei Am
+ ei Am
+ AAn, (2.22) g([n,m] n, m) = B(n) ([n, m] , m)
 
where and A are arbitrary scalar functions; note that is 1
= B(n) (m, m) + B(n) (m, m)
real-valued while A is complex-valued. We can easily see that 2
 
the distortion quantities change under this transformation as 1 1
= .
2 2
= e2i , = . (2.23)
Finally, Eq. (2.25) is rewritten as
It is useful to obtain closed-form equations for the diver-
gence and the scalar shear of a geodesic congruence. The 1 + 2.
n = R(n, m, n, m) ( + )
Raychaudhuri equation (for a hypersurface orthogonal, null, 2
geodesic field n) acquires a more elegant form,
This equation still contains the function g(n m, m) that
d 1 2 depends on the completion of n to a null tetrad and is thus ar-
n = + Ric(n, n) 2 . bitrary. We would like to eliminate this gauge dependence.
d 2
Note that the function always has pure imaginary values
Since is invariant under the transformation (2.23), the Ray- and changes under the replacement (2.21) as
chaudhuri equation does not depend on the completion of n
to a null tetrad. = + in .

Statement: It follows from Eq. (2.20) that Tr (u) (u) = 2.
Therefore, for a given null tetrad, there exists a suitable func-
Derivation: Calculation (omitted).
has = 0.
tion such that new null tetrad {l, n, ei m, ei m}
Under some additional assumptions about the null tetrad,
(Obviously, the function can be found by integrating i
a similar equation holds for the scalar shear ,
along the flow lines of n.) Once we have = 0, the required
d + equation (2.24) follows.
n = + R(n, m, n, m). (2.24) Alternatively, we may redefine the scalar shear for a given
d 2
null tetrad as
We shall now derive this equation and detail the necessary g e2i ,
assumptions.
We need to compute the quantity where the function is such that

n n B(n) (m, m) n g(m n, m)


in = g(n m, m).
= g(n m n, m) + g(m n, n m). (There is, of course, a considerable freedom in choosing the
The appearance of nested covariant derivatives suggests that function .) The redefined gauge-covariant shear g still
we introduce the Riemann tensor, depends on the choice of the tetrad and of the function ,
but is gauge-covariant in the sense that a tetrad transfor-
g(n m n, m) = R(n, m, n, m) + g([n,m] n, m). mation (2.21), (2.23) yields

(We used n n = 0.) Thus g = e2i n g .


n

n = R(n, m, n, m) + g([n,m] n, m) + g(m n, n m). The quantity g


g remains invariant as before, and Eq. (2.24)
(2.25) holds for g with any choice of tetrad.

68
2.5 Null tetrad formalism

Calculation: Verify the above equation, where = e2i and



g e2i
.
Derivation: Let be the required function for the original
tetrad and for the modified tetrad with m
= ei m. Then we
compute

n g = (2in )g + e2i n ,
= g(n ei m, ei m)
in = in in ,
= ,


g + e2i n
g = n (e2i
n ) = (2in )

g + e2i+2i n
g n ( ) + (2in )
= 2i
= e2i n g .

69
3 Asymptotically flat spacetimes
If a spacetime contains a finite island of matter surrounded arbitrary x, we have
by an infinite sea of vacuum, we intuitively expect that the
spacetime is almost flat sufficiently far from the matter. For 1
g(k k, x) = g(k, x k) = x g(k, k).
example, the metric (1.36) for the Schwarzschild spacetime 2
is approximately equal to the Minkowski metric in spherical Thus, the scalar product of k k with an arbitrary vector van-
coordinates in the limit r . Such spacetimes contain- ishes iff the derivative of g(k, k) in an arbitrary direction van-
ing well-isolated systems are called asymptotically flat. In ishes. (c) Assuming that g(k, k) 6= 0, we compute
this chapter we shall formulate and explore the property of
asymptotic flatness in more detail. We begin by consider- 1 k 1
u u = k k + p k p
ing an easier case of stationary spacetimes. g(k, k) g(k, k) g(k, k)
1
= k k.
g(k, k)
3.1 Stationary spacetimes
Thus u u = 0 iff k k = 0. 
Suppose a metric g is given in coordinates {x }, and the coef- In general, a Killing vector is not normalized, so station-
ficients g are independent of one of the coordinates, say x1 . ary worldlines generally correspond to accelerated observers
Then x1 is a Killing vector for the metric g; in other words, the rather than to freely falling observers. For example, the sta-
flow of the vector field x1 leaves the metric invariant. Thus, tionary worldlines in the Schwarzschild spacetime describe
Killing vectors provide a formalization of the concept of co- observers remaining at a constant distance from the center of
ordinate symmetry of the metric. Using Killing vectors, the mass. These observers move with a constant acceleration and
heuristic concept of a time-independent metric is made pre- thus feel a constant pull of gravity. Accordingly, the Killing
cise in the following way. vector t is not normalized in the Schwarzschild spacetime,
A spacetime is called stationary if it possesses a timelike i.e. g(k, k) 6= const.
Killing vector field. The existence of such a vector field leads
to many useful properties that we shall now explore. A 3.1.1 Newtonian limit
stronger property is being static rather than stationary. A
spacetime is static if it is stationary and the timelike Killing In the framework of General Relativity, the Newtonian limit
vector is hypersurface-orthogonal. For now, it suffices to con- is the assumption that gravitation is weak and that there ex-
sider stationary spacetimes. ists a globally inertial reference framethe absolute space-
In a stationary spacetime, the flow lines of the Killing vec- time. In other words, a Newtonian setting considers a set
tor k can be seen as preferred worldlines along which the ge- of absolute observers who see each other as motionless or
ometry does not change. It is useful to consider imaginary moving with constant velocities at all times. (Of course, this
observers moving along these worldlines. These are called assumption is only approximately correct in the actual curved
stationary observers and have 4-velocities spacetime; also the clocks at different points run at different
rates and cannot be exactly synchronized everywhere at all
1 times.) For example, someone sitting on the surface of the
u= p k. (3.1)
g(k, k) Earth could be such an absolute observer. Since absolute
observers do not necessarily move along geodesic lines, they
Stationary observers find that the geometry of the spacetime will perceive the motion of freely falling bodies as an acceler-
around them is the same at all times. However, these station- ated motion occurring in the absolute reference frame. The
ary observers are not necessarily geodesic observers, i.e. they Newtonian theory explains this apparently accelerated mo-
are not necessarily freely falling. For example, an observer sit- tion as being due to gravitational forces. We now formal-
ting at the surface of a nonrotating planet observers that the ize these assumptions and derive the Newtonian equations of
geometry is time-independent. This is a stationary observer gravitation.
who is, obviously, not freely falling. A congruence of the world-lines of the absolute ob-
The following statement lists the most important properties servers is described by a timelike vector field v, normalized
of Killing vectors with regard to being geodesic. as g(v, v) = 1. The observers measure the absolute time
Statement 3.1.0.1: (a) For a Killing vector k the property along the flow lines of v. The central assumption of the New-
k g(k, k) = 0 holds. (b) The function g(k, k) is everywhere tonian limit is that these observers are fixed with respect to
constant iff the vector k is geodesic. (c) In the domain where each other, so any neighbor observers appear to move without
g(k, k) 6= 0, the normalized vector field u defined by Eq. (3.1) relative acceleration. In other words, any connecting vector
is geodesic iff k is geodesic. field c, such that [c, v] = 0, appears (approximately!) unaccel-
Idea of proof: Compute the scalar product of k k with an erated, v v c 0.
arbitrary vector and use the Killing equation (1.48). Let us find out how the stationary observers perceive the
motion of freely falling (geodesic) particles. If a geodesic has
Proof of Statement 3.1.0.1: (a) The property k g(k, k) = 0 instantaneously the same tangent vector v as one of the fixed
follows directly from the Killing equation (1.48). (b) For an observers, the observers acceleration relative to the geodesic

71
3 Asymptotically flat spacetimes

is v v, therefore the observer will measure the acceleration of To obtain a concrete expression for Ric(v, v), we need to
the freely falling particle as a v v. Our goal is to derive relate the curvature to the matter content of the spacetime. In
a formula describing the acceleration a. the Newtonian approximation, the matter sources are approx-
The plan of the derivation is the following. First we use imately motionless particles, so the energy-momentum tensor
the properties of connecting vectors to deduce that the vector of matter is T = v v , where (x) is the mass density. The
field a is integrable. It will follow that there exists a scalar Einstein equation,
function such that a = g1 d. This scalar function is called
the gravitational potential. Then we shall derive an equation 1
R g R = 8T ,
for , using the Einstein equation with matter sources consist- 2
ing of fixed matter (not moving with respect to the fixed can be rewritten as
observers) with mass density (x). This equation will coin-  
cide with the (three-dimensional) Poisson equation = 4, 1
R = 8 T g T .
which is equivalent to Newtons law of gravity. In this way we 2
reproduce the Newtonian theory of gravitation.
Since v v = 1, we find T = and finally
The acceleration field a is integrable if the bilinear form
B(a) (x, y) defined by Eq. (2.10) is symmetric. The form of the  = R v v = 4.
expression
B(a) (x, y) g(x v v, y) This is in fact the familiar Poisson equation relating the New-
tonian gravitational potential to the density of matter,
suggests that we should consider the combination written in a Lorentz-covariant way. In an almost-flat space-
time, we can choose the coordinate t along the flow lines of v,
g(v x v x v v [v,x] v, y) = R(v, x, v, y)
and then we have approximately
and try to simplify it using the given information about the 
 = t2 = 4, x2 + y2 + z2 .
connecting vectors. We substitute x = c and obtain
The time derivatives are negligible since the distribution of
0 = g(v v c, y) = g(v c v, y) = R(v, c, v, y)+g(c v v, y) matter is almost stationary (in the SI units, this is made more
explicit by the factor c12 t2 ). Hence, we recover the three-
and therefore
dimensional Poisson equation
B(a) (c, y) g(c v v, y) = R(v, c, v, y). (3.2) = 4.
Since R(v, c, v, y) = R(v, y, v, c), we find Note that the three-dimensional acceleration ai = i ,
where i = x, y, z, because of the negative sign of the spatial
B(a) (c, y) = B(a) (y, c).
part of the metric.
In fact, the above relation holds for arbitrary vector fields x, y
at any point, B(a) (x, y) = B(a) (y, x), because it involves only Example: Schwarzschild spacetime
the value of the connecting vector c at one point, and not the
fact that c is a connecting vector. Hence, the bilinear form B(a) The metric is given by Eq. (1.36) and the stationary observers
is symmetric, the vector field a is integrable, and there exists follow lines of fixed r, , for r > 2m. We expect the Newto-
a scalar function such that a = g1 d. nian approximation to hold for large r 2m. The vector v is
To derive an equation for , we use Eq. (3.2) and the defini- thus proportional to t , and normalization g(v, v) = 1 yields
tion of divergence,  1/2
2m
v= 1 t .
diva = Tr(x,y) g(x a, y). r

It follows that The vectors , are connecting vectors for v, but r is not,
m
diva = div (v v) = Tr(x,y) R(v, x, v, y) = Ric(v, v). [v, r ] = v.
r(r 2m)
On the other hand, However, the approximation [v, r ] 0 is good for large r,
namely for r m.
g 1 d
diva = div A calculation shows that the acceleration field v v is exactly
= Tr(x,y) g(x g1 d, y) integrable (not only in the Newtonian approximation). Let
= Tr(x,y) (x d) y = , us compute the scalar products of v v with the basis vectors
v, r , , :
where the DAlembertian  is defined as the trace of the
1
bilinear form ... ... , g(v v, v) = v g(v, v) = 0,
2
 Tr(x,y) (... ... ) (x, y) Tr(x,y) (x y x y ). g(v v, ) = g(v , v) = g( v, v) = 0,
g(v v, ) = 0,
In components,  = . Hence, we obtain the follow-
g(v v, r ) = g(v r , v) = g(v, [v, r ]) g(v, r v)
ing equation for ,
m
= .
 = Ric(v, v). r(r 2m)

72
3.1 Stationary spacetimes

Therefore Solution: We can compute v c by taking scalar products;


g(v v, r ) m for instance,
v v = r = 2 r .
g(r , r ) r
1
The potential is easier to find from the 1-form gv v, which is g(v c, c) = v g(c, c) = He2Ht ,
2
expressed as
while scalar products of v c with other basis vectors van-
m ish. Thus we obtain Eq. (3.4) and as a trivial consequence,
gv v = dr d(r),
r(r 2m) v v c = H 2 c.
 
1 2m
(r) ln 1 . (3.3)
2 r
3.1.2 Redshift
Thus (r) is the potential for the gravitational force mea-
In a stationary spacetime, the magnitude of the Killing vec-
sured by the stationary observers in the Schwarzschild space- p
time. For large r 2m the potential becomes m/r, which is tor defines a scalar function, z g(k, k). The function z
the familiar Newtonian potential due to a point mass m, while is called the redshift factor because, as we shall now show,
for r 2m the gravitational force approaches infinity. (Note the local values of z characterize the variation of the photon
that the coordinate r does not coincide with the physical dis- frequency measured by different stationary observers.
tance.) A null geodesic with a null tangent vector n represents a
path of a photon. The vector n may be rescaled in such a
Self-test question: Is the motion of test particles in the way that g(n, u) represents the energy E of the photon in the
Schwarzschild spacetime actually the same as the Newtonian rest frame of a timelike observer with 4-velocity u. From the
motion in the potential (r) or in some other potential? An- Planck relation, E = h, where h is Plancks constant, we find
swer: No. (The gravitational force depends on velocity as that the measured frequency of the photon is = h1 g(n, u).
well.) Not sure how to explain! Need to make this clear. In a stationary spacetime, stationary observers have world-
lines with 4-velocity
Remark: The fact that the Schwarzschild spacetime admits a
potential (r) is a consequence of spherical symmetry and 1
u= p k = z 1 k.
time-independence, i.e. the metric depends only on the radius g(k, k)
r. (More formally: The presence of an integrable Killing vec-
tor t shows that the spacetime is static; additionally, the an- Therefore, a stationary observer at a given location will mea-
gular Killing vectors and indicate the spherical sym- sure the frequency of a given lightray with a (null) worldline
metry.) Thus the acceleration of an observer at constant r is a ( ) as
vector directed along r whose magnitude is a function only h = g(z 1 k, ).

of r. Therefore this acceleration must be equal to a gradient
of some function (r). In general, all stationary spacetimes By Statement 3.1.2.1 below, g(k, ) remains constant along
admit such a potential although it may not be spherically lightrays. Thus, for a given single photon moving along a null
symmetric; see Sec. 3.1.4. geodesic, the frequency measured by a stationary observer is
proportional to the local value of z 1 . This justifies the name
redshift factor for the function z.
Example: de Sitter spacetime
Statement 3.1.2.1: If k is a Killing vector, so that Eq. (1.48)
The metric is given by Eq. (1.37) and the stationary observers is constant along .
holds, and ( ) is a geodesic, then g(k, )
follow lines of fixed x, y, z. The vector v is simply equal to t
and the basis of connecting vectors is x , y , z . Since the field Proof of Statement 3.1.2.1: Let v be a geodesic vector field
v is integrable and normalized, it is geodesic and v v = 0. then v v = 0 and we compute
containing ;
Hence, stationary observers are at the same time freely falling
v g(k, v) = g(v k, v) = g(v, v k),
and do not feel any gravity at all. However, the Newtonian
approximation breaks down at sufficiently large distances.
thus v g(k, v) = 0. 
This can be seen by computing the relative acceleration of two
It follows from Statement 3.1.0.1 that k z = 0; in other
nearby observers (see the following calculation),
words, the redshift factor is constant for a given stationary
2
v v x = H x . observer. It also follows that the function z is everywhere con-
stant (i.e. there is no redshift) iff the stationary observers are
At small distances L H 1 , we are allowed to use the ap- geodesic.
proximation that stationary observers are not accelerated rel-
ative to each other.
3.1.3 Conformal Killing vectors
Calculation: For a connecting vector c which is one of
In cosmology, one usually considers spacetimes that are not
x , y , z , show that
stationary, so no timelike Killing vectors are present. Never-
v c = Hc (3.4) theless, sometimes there exists a conformal Killing vector
p
and thus derive the above equation. The relation (3.4) is called and then the redshift function z g(k, k) is still useful.
the Hubble law. (In a uniformly expanding spacetime, the By definition, a conformal Killing vector is a vector field k
velocity of a comoving observer is proportional to the distance such that
to the observer.) Lk g = 2g,

73
3 Asymptotically flat spacetimes

where is some scalar function. It follows directly from particles with respect to the conformally stationary observers
Eq. (1.46) that a conformal Killing vector k satisfies the equa- is characterized by the scalar product g(v, u) according to the
tion standard special relativistic formula for the gamma factor,
g(a k, b) + g(a, b k) = 2g(a, b),
1
where a, b are arbitrary vectors. This is a modified version of q = g(v, u).
2
Eq. (1.48). 1 (~
v )
When a spacetime admits a conformal Killing vector, the
redshift function is again useful, as demonstrated by the fol- We would like to derive an equation for ( ) along a particles
lowing statement. wordline parameterized by the proper time . This is possible
in the following special case.
Statement 3.1.3.1: Assume that a spacetime admits a time-
like conformal Killing vector k, and call observers moving Calculation 3.1.3.3: Consider stationary observers whose
along k conformally stationary. Then the frequency of a worldlines are parallel to the conformal Killing vector k =
Ht
photon measured by conformally stationary observers p is in- e t for de Sitter spacetime with the metric (1.37). A point
versely proportional to the redshift function z g(k, k). particle of mass m is moving along a timelike geodesic; the
motion is slow relative to the conformally stationary ob-
Proof of Statement 3.1.3.1: For an arbitrary geodesic field v servers. Show that the relative 3-velocity ~v of the particle
we have decreases exponentially at late times. In the Newtonian ap-
v g(k, v) = g(v k, v) = g(v, v). proximation, the slowdown can be ascribed to a gravitational
friction force F~ which is proportional to the velocity accord-
If the field v is null (g(v, v) = 0) then
ing to the law
v g(k, v) = 0, F~ = mH~v.
(Details on page 171.) 
so g(k, v) remains constant along the geodesic lines, p
just as in
In a more general situation, a closed equation for ( ) can
the case of ordinary Killing vectors. Defining z = g(k, k)
be derived if we assume additionally that k is an integrable
and u z 1 k, we again find
vector field.
1 1 Statement 3.1.3.4: Suppose k is an integrable, timelike con-
h g(u, ) = z g(k, ) z .
formal Killing vector field, Lk g = 2g; the vector field u
The photons frequency is inversely proportional to the red- is defined by u = z 1 k, where z is the redshift function,
p
shift factor.  z g(k, k); and v is a timelike geodesic normalized by
Note however that k g(k, k) = 2g(k, k), so the redshift g(v, v) = 1. Then the scalar product g(u, v) satisfies the
factor z is not constant along the worldlines of stationary ob- equation

servers. v = 1 2 z 1 . (3.5)
Example 3.1.3.2: For a de Sitter spacetime with the met-
(Proof on page 171.) 
ric (1.37), it is reasonable to guess that the vector k = f (t)t is
For a particle moving with a small relative velocity ~v with
a conformal Killing vector if the scalar function f (t) is chosen
respect to conformally stationary observers, Eq. (3.5) can be
correctly. Let us determine f (t), the conformal factor , and
the redshift z. interpreted in the Newtonian limit. We have 1+ 12 (~v )2 , so
It suffices to compute g(a t , b) for a, b either t or x . We Eq. (3.5) can be rewritten as the equation for the local kinetic
find that x t = Hx is the only nonzero derivative (simi- energy,
larly for y and z ). Hence d 1 2
Ekin = ~v F~ , F~ mz 1 ~v , Ekin (~v ) .
d 2
g(t f t , t ) = fg(t , t ),
According to the Newtonian description from the viewpoint
and f = . Further, of a conformally stationary observer, the kinetic energy of the
particle is decreased due to the force of gravitational friction
g(x f t , x ) = f H g(x , x ) g(x , x ),
F~ that acts opposite and proportional to the velocity ~v of the
p particle.
thus = f H = f and f (t) = e , = He , z = g(k, k) =
Ht Ht

eHt . The frequency of photons decays with time t as eHt . 


3.1.4 Gravitational potential
Practice problem: For a flat Friedmann-Robertson-Walker
metric, In Sec. 3.1.1 we have seen that the Schwarzschild space-
g = dt2 a2 (t)(dx2 + dy 2 + dz 2 ), time possesses a gravitational potential such that the 3-
acceleration of freely falling bodies in stationary reference
where a(t) is a nonzero function, show that k = a(t)t is a
frames is described by the Newtonian formula, ~a = . ~
conformal Killing vector with the factor = a(t).

Unlike the case of null geodesics, the behavior of timelike This fact is a manifestation of a more general property: Every
geodesics in the presence of a conformal Killing vector is not stationary spacetime admits a gravitational potential in this
as straightforward to describe without additional assump- sense.
tions. Statement:1 If k is a Killing vector, show that k k is inte-
Suppose that a metric g has a conformal Killing vector k, grable. If k is timelike (a stationary spacetime), show that the
and consider particles moving along a timelike geodesic field
v normalized as g(v, v) = 1. The relative velocity ~v of the 1 After problem 4a in chapter 6 of [36].

74
3.1 Stationary spacetimes

acceleration field u u is integrable for a p


stationary observer 3.1.5 Energy
with the 4-velocity u = z 1 k, where z g(k, k) is the red-
In a stationary spacetime, the Killing vector k generates time
shift factor. Derive the gravitational potential for the accel-
translations along the stationary worldlines. The local geom-
eration field u u.
etry of the spacetime is invariant under these time transla-
Derivation: For an arbitrary vector x, tions: for instance, the scalar product of connecting vectors
remains constant. The existence of such time translations, in
1 1
g(k k, x) = g(k, x k) = x g(k, k) = x z 2 , turn, leads to a conserved quantity, interpreted as the total en-
2 2 ergy of the spacetime.
We know that the distortion B(k) of a Killing vector k is a
thus k k = g1 df with f 12 z 2 . Since u = z 1 k and
2-form,
k z = 0, we find
  B(k) (x, y) g(x k, y) = g(y k, x).
1 k 1 k 1
u u = k = 2 k k + k In the index notation, we have
z z z z z
1 1 1  B(k) = k .
= 2 k k = g1 2 d z 2
z 2 z
= g 1 d ln z. The divergence of this 2-form is a 1-form e,
e B(k) = k k ; e = k.
Thus the gravitational potential is
Written in the index-free notation, the expression for e is sig-
1 1 nificantly more cumbersome,
= ln z = ln g(k, k), u u = g d.
2  
e z Tr(x,y) x B(k) (y, z)
(Note the minus sign in the above formula: The 4-acceleration = Tr(x,y) [x g(y k, z) g(x y , z) g(y k, x z)] .
of a freely falling body in a stationary frame is u u, and the
~ has the opposite sign relative to the So we shall use the index notation in2 the following calcula-
3-acceleration, ~a = ,
tions. The 1-form e is divergence-free,
4-acceleration. ***Alarm: what about the minus sign we get
1
while computing g d from the minus sign in the spatial e = B(k) = 0,
components of g?) This general expression agrees with the re- 1
sult (3.3) derived above for the Schwarzschild spacetime with therefore it defines a conserved vector current e g e. The
the Killing vector k t , flux of e through a (spacelike) 3-volume V with a (timelike)
normal vector v is constant, up to boundary terms. Assuming
1 1

2m
 that the 3-surface V extends to infinity and that the current
= ln g(t , t ) = ln 1 . e vanishes sufficiently quickly at infinity so that all the bound-
2 2 r
ary terms vanish, we find that the quantity
Z Z Z
According to our (so far) heuristic picture of an asymp- 3 3
totically flat spacetime as something generated by a well-

E (e v) d V e v d V = (k ) v d3 V
V V V
isolated clump of gravitating matter, we expect that at large
distances the acceleration of stationary observers goes to zero, is independent of the choice of the 3-volume V . The quantity
E is the conserved charge that corresponds to the conserved
thus the gravitational potential becomes constant. Since
= ln z, this is equivalent to assuming that the redshift z current e. Alternatively, E can be written as an integral over a
becomes constant at infinite distances. Then it is convenient closed spacelike 2-surface which is the boundary of V ,
to multiply the Killing vector k by a constant to achieve z 1 I
and 0 at infinity. After this rescaling, the redshift func- =
E ( k ) v n d2 , (3.6)

tion z describes the redshift of lightrays with respect to ob-
servers at infinite distances. where the vectors v, n are normal to the 2-surface . The 2-
surface lies outside of any matter sources and encircles the
Remark: A non-asymptotically flat spacetime may involve entire matter distribution.
unbounded redshifts at infinity. An example is the de Sitter We shall now express E through the matter distribution us-
spacetime with the metric (1.37), which possesses a conformal ing the Einstein equation. It is again more convenient to work
Killing vector e2Ht t . The redshift function is z = e2Ht and in the index notation. We have seen in Statement 1.8.4.1 that
grows unboundedly along any lightray.
Note that the potential is stationary: k = R k .
Therefore
k = g(k, u u) = 0.
e = k = g R k = R k ,
On physical grounds, we expect that the 3-acceleration of a e x = Ric(x, k),
freely falling mass points towards the center of gravity, at Z Z
3
least when the test mass is outside of the clump of gravitat-
E= Ric(v, k)d V = R v k d3 V.
V V
ing matter. Thus, the gravitational potential decreases away
2 The fact that e is divergence-free can be also understood as a consequence
from infinity, and outside of the matter sources we have
of the identity dd B(k) = 0, where is the Hodge star operation. Thus
the integral of the 3-form d B(k) over a 3-volume can be expressed
< 0, z < 1. through an integral of B(k) over a closed 2-surface.

75
3 Asymptotically flat spacetimes

We shall now show that E is proportional to the total energy angular momentum. In a curved spacetime, there are no pre-
E of the gravitating system in the Newtonian limit, where ferred axes, so instead the rotational symmetry is formulated
the gravitating matter is an isolated, stationary distribution as the existence of a spacelike Killing vector with closed orbits
of dust with density . The energy of such a system is (these orbits play the role of circles of constant r, around the
Z z axis). In a stationary spacetime there exists also a timelike
E= d3 V, Killing vector k. If these two Killing vectors commute, the
V spacetime is called azimuthally symmetric. In such a space-
time, the total angular momentum is well-defined and con-
and the energy-momentum tensor is
served. In the Newtonian limit, this quantity will correspond
T = v v , to the total angular momentum with respect to the z axis.

where v k are the stationary worldlines of matter particles.


By the Einstein equation (1.68), we have
3.2 Conformal infinity
1 In the previous section, we considered a stationary spacetime
R = 8(T T g ), where the total energy of an isolated system can be defined
2
through an integral over an infinitely remote 2-surface [see
therefore Eq. (3.6)]. It appears that such a definition only requires that a
spacetime be asymptotically stationary, that is, almost sta-
R v k = 4v v v k 4 tionary sufficiently far from the matter sources. In fact, it
would be convenient if we could evaluate the 2-surface inte-
and
gral at infinity where the spacetime is exactly stationary
Z Z
(in a heuristic sense). The notion of conformal infinity allows
E= R v k d3 V 4 d3 V = 4E. one to make these ideas precise. We shall approach this notion
V V
by considering the elementary example of a flat Minkowski
Thus the total energy of the system is, in the Newtonian ap- spacetime.
proximation,
1
E E. 3.2.1 Conformal infinity for Minkowski
4
spacetime
We are therefore motivated to define the total energy con-
tained in a stationary spacetime by We shall be able to evaluate quantities at infinity if we ex-
Z tend the spacetime manifold M by adding points at infin-
1 1 3 ity to it. A convenient way to produce such an extended
E= E= (k ) v d V.
4 4 V spacetime manifold is by using a conformal transformation
of the metric, g g = e2 g, such that the conformal factor
Calculation: Compute the total energy of the Schwarzschild 2
e vanishes at infinity. Then the infinitely remote points can
spacetime with the metric (1.36). The Killing vector is k t .
be located at finite distances according to the new metric g. Of
Solution: It is convenient to use Eq. (3.6) and integrate over
course, the new metric g is unphysical since it does not describe
a 2-sphere of a large radius R. Choosing the normal vectors
actual distances or time intervals in the spacetime. However,
v = z 1 k and n = zr , we have
the metric g is related to the physical metric g by a conformal
I I transformation, which preserves important causal properties
1 1
E= g(v k, n)d2 = g(k k, r )d2 . of the spacetime. Here we describe such a conformal transfor-
4 4
mation for the Minkowski metric; later, we will consider more
Since (see Sec. 3.1.4) general spacetimes.
r Consider the lightcone coordinates {u, v, , } in the
2m Minkowski spacetime, which are defined through the stan-
z = z(r) = 1 ,
r dard spherical coordinates {t, r, , } via
1 m
g(k k, r ) = r z 2 = 2 , u t + r, v t r.
2 r
The radial lightrays are then lines of constant u or v. The
we find I
1 m 2 metric has the form
E= 2
d = m. 2
4 r (u v)
g = du dv dS 2 , dS 2 d2 + sin2 d2 .
Thus the total energy of the Schwarzschild spacetime is equal 4
to the mass m of the source of gravity. A conformal transformation of the metric,
Remark: The total momentum and the total angular mo- g g = e2(u,v) g, (3.7)
mentum of an isolated system are so far undefined. (There
is no preferred reference frame with respect to which one where (u, v) is an arbitrary function of u and v (but not
would measure the total momentum or angular momentum.) of , ), leaves the radial lightrays invariant (see State-
A satisfactory definition of these quantities requires more in- ment 3.2.1.1). The transformed (unphysical) metric can be
formation than the existence of a timelike Killing vector. For written as
instance, symmetry with respect to rotations around an axis (u v)
2
leads to conservation of the corresponding component of the g = d v e2
u d dS 2 d v r2 dS 2 ,
u d
4

76
3.2 Conformal infinity

where u
, v are new lightcone coordinates, t

u = f1 (u), v = f1 (v). 1

For simplicity, let us choose f1 (s) = f2 (s) = f (s). (Choosing


more complicated functions for u and v is certainly admissible
but will make our work harder.) Then we have
(u v)2
e2 = f (u)f (v); r2 = f (u)f (v) .
4 x

It is obvious that the lines of constant u are at the same time 1 0 1
lines of constant u. Thus, the radial lines of constant u are
null generators of a 3-surface u = const, which are always
geodesics (see Sec. 2.2.8). In this case, the function (u, v) has a
particularly simple form; the following statement shows that
radial lightrays remain geodesics in the metric g = e2(u,v) g
even with an arbitrary function (u, v).
1
Statement 3.2.1.1: (a) In a two-dimensional spacetime, any
null curve is a geodesic curve. (b) The radial null geodesics
(lightrays) are invariant under a conformal transformation of
the form (3.7), where g is a flat metric. (Proof on page 171.)  Figure 3.1: The time-radial section of the unphysical manifold
The next important step is the following: the function f can M
 for the Minkowski spacetime. The coordinates
be chosen such that the new lightcone coordinates u , v have a t, x
have finite ranges. The boundary of the dia-
finite range. There are many possible choices of f ; for example, gram represents points at infinity.
2
f (s) = arctan s; f (s) = tanh s; f (s) = erf s;
We may restrict the choice of the function f (s) further. Let
etc. With any of the above choices, f (s) tends to 1 as us consider the limit of the metric g along a fixed lightray, say
s . So at large values of u, v the unphysical metric u = u0 , v . The coefficient r2 of the metric is then
g is flat on the sections of constant angles , and describes (u0 v)2
the points u , v = 1 as being at finite (unphysical) distances, r2 = lim f (u0 )f (v) .
v 4
although physically these points correspond to infinitely re-
mote events u, v= . For convenience, one can introduce This limit is equal to zero when n > 2 but is finite and nonzero
the coordinates t, r, , by defining when n = 2. For f (s) f0 s2 , we find
1
t + r,
u v t r. r2 (I + ) = f0 f (u0 ) > 0.
4

The t, r (time-radial) section of the spacetime (with the an- We expect that the structure of the null infinity should reflect
gular coordinates omitted) is sketched in Fig. 3.1. the fact that null rays can be emitted in all directions spanning
As we described above, the unphysical manifold M is con- a 2-sphere. Thus, the null infinity at a fixed value of u
= f (u0 )
structed from the original spacetime manifold M by changing should have a metric structure of a 2-sphere with a nonzero
the metric from g to g and adding the infinitely far away do- radius. This is the case only if f (s) s2 for s .
mains u = 1, v = 1. Thus M is a manifold with boundary. Hence, we shall only consider these choices of f (s). Then the

The boundary of M consists of null lines (future null infinity future null infinity, denoted I + (scri-plus), has the topo-
and past null infinity), the spacelike infinity domain i0 , and logical structure R S 2 (a 2-sphere for each value u = ),
timelike future infinity/past infinity points i . Note that the while a spacelike infinity is represented by a single point i0 .
conformal factor e2 vanishes on the boundary; this is what The asymptotic form of the unphysical metric near a point
brings infinity to finite distances in the unphysical manifold. (u = u0 , v = ) on I + is
To see that the choice of the function f (s) is not entirely ar-
bitrary, consider the spacelike infinity domain i0 correspond- f0 f (u0 ) 2
g d v
u d dS ,
ing to r (u , v ). This domain is formally a 4
2-sphere of radius while the conformal factor near I + is
(u v)2 1
r2 (i0 ) = lim e r2 = lim f (u)f (v) . (1 u
) (1 v) .
r v 4 f0
u

Therefore, r1 at large distances. From the above expres-


Since f (s) 1 as s , the derivative f (s) must go to
sion for , it is also easy to see that the null infinity I + is a
zero at large s. Suppose that f (s) tends to zero as sn for
s ; then the limit r2 (i0 ) depends on the value of n. If null surface: The normal vector n g1 d is null on I + (but
n 2 then the limit is zero, meaning that the domain of a not away from it!) since
spacelike infinity is a sphere of zero radius, i.e. a point. In g(n, n) = g1 (d, d) (1 u
) (1 v) = 0 on I + .
the opposite case, n < 2, the limit is infinite, which means that
the metric g diverges (is not regular) at the point i0 . Thus we Note that there still remains a considerable freedom in
are motivated to consider only the regular case n 2. choosing the conformal transformations. However, within the

77
3 Asymptotically flat spacetimes

present scheme every lightray will always be represented in a Remark: There is no analog of conformal diagrams for gen-
diagram by a straight line drawn at 45 angles, and the null eral 3+1-dimensional spacetimes. The reason is that a general
future boundary I + will be a null surface. spacetime is not conformally flat and cannot be mapped into
a region of Minkowski spacetime by a conformal transforma-
Calculation: Show that the Riemann tensor has only one in- tion. For instance, even if we demand that the spatial sections
dependent component in a 2-dimensional spacetime. Then of the spacetime be flat and assume a metric of the form
use the following expression for the curvature scalar in the
3
unphysical metric, X
g = dt dt a2 (t, x1 , x2 , x3 ) dxi dxi ,
= R + 6 ,
2 3 i=1
R (3.8)
the spacetime will be conformally flat only if a(t, x1 , x2 , x3 )
to show that any 2-dimensional metric can be transformed has the special form4
into a flat metric by a choice of a conformal factor 2 . (The 1
above equation is derived e.g. in [36] and [9]. See also State- a(t, x1 , x2 , x3 ) = P3 2,
ment 1.8.4.1 on page 47.) (t) + (t) i=1 (xi i (t))
Solution: The Riemann tensor R(a, b, c, d) is a symmetric where i , , are arbitrary functions of time. Nevertheless, in
bilinear function of a b and c d; for a 2-dimensional vec- many cases a 3+1-dimensional spacetime can be adequately
tor space with a basis {e1 , e2 }, any exterior product x y is represented by a suitable 1+1-dimensional slice, at least for
proportional to the basis product e1 e2 , so the space of ex- the purpose of qualitative illustration. For instance, a spheri-
terior products x y is one-dimensional. Thus R(a, b, c, d) is cally symmetric spacetime is visualized using the time-radial
effectively a symmetric bilinear function in one-dimensional half-plane {t, r}, r 0, where each point stands for a 2-sphere
space. Such a function is fully specified by one coefficient, of radius r. A conformal diagram is then drawn for the re-
say R(e1 , e2 , e1 , e2 ), or equivalently by the curvature scalar duced 1+1-dimensional spacetime.
R = 2R(e1 , e2 , e1 , e2 ). It is clear from the given relation that The reduced conformal diagram is meaningful only if
the curvature scalar can be made to vanish by a choice of the the null geodesics in the 1+1-dimensional section are also
function . Then the unphysical spacetime will have an iden- geodesics in the physical 3+1-dimensional spacetime. In this
tically vanishing Riemann tensor, i.e. it will be flat. case, the 1+1-dimensional section can be visualized as the set
of events accessible to an observer who sends and receives
signals only along a fixed spatial direction. Then a conformal
3.2.2 Conformal diagrams diagram provides information about the causal structure of
In the previous section we explored the construction of a con- spacetime along this line of sight. 
formal infinity for Minkowski spacetime. The result was an After recapitulating the standard construction of conformal
unphysical 4-dimensional manifold M containing all of M diagrams, I develop an easier method.
and additionally some infinitely remote points. The met-
ric on M was unphysical since it did not represent actual Standard procedure
distances. However, this unphysical manifold allows one to
A conformal diagram is defined for a 1+1-dimensional space-
evaluate quantities at infinity and hence is useful for the
time with a given line element gab dxa dxb . The coordinates
analysis of asymptotic behavior of fields in the real spacetime.
{xa } (where a = 0, 1) must cover the entire spacetime, and
Another important application of the unphysical manifold
M is as a tool to help visualize the causal structure of the several sets of overlapping coordinate patches may be used
if necessary. The standard construction of the conformal di-
physical spacetime M. Such a visualization is possible if one agram may be formulated as follows (see e.g. [?], chapter 3).
considers a 1+1-dimensional slice of the physical spacetime. One first finds a change of coordinates x x such that the
This slice is a 1+1-dimensional spacetime M(1+1) having a new variables x a have a finite range of variation; the com-
certain reduced metric. This metric can be transformed into ponents of the metric change according to g (x)dxa dxb =
ab
a flat Minkowski metric by a conformal transformation. Then, g ( xa d
xb . One then chooses a conformal transformation
ab x)d
a coordinate transformation can be performed such that the of the metric (in the new coordinates),
manifold M(1+1) is mapped into into a 1+1-dimensional un-
physical manifold M having a finite extent in all directions. gab ab ( x) = 2 ( x) gab , (x) 6= 0, (3.9)

The resulting manifold M can be simply drawn as a figure such that the new metric ab is flat, i.e. has zero curva-
on a plane. (Implicitly, the plane carries the unphysical flat ture. A suitable function ( x) always exists because all two-
Minkowski metric; in particular, straight lines drawn at 45 dimensional metrics are conformally flat. The new metric ab
angles are null geodesics.) An example of such a drawing is describes an unphysical, auxiliary flat spacetime. Since the
Fig. 3.1. Figures of this kind are called conformal diagrams metric ab is flat, a further change of coordinates x x (the
(also Carter-Penrose diagrams). We now explore conformal new variables x again having a finite extent) can be found to
diagrams in more detail.3 transform ab explicitly into the Minkowski metric ab ,
A conformal diagram provides a visualization of the causal
ab (
x) dxa dxb = ab dx a dx
b , ab diag (1, 1) . (3.10)
structure of a spacetime because the null geodesics remain
straight lines after the conformal transformation, and because For brevity we incorporate all the required coordinate changes
the entire spacetime is drawn as a finite figure in a plane. For into one, x x (x), and summarize the transformation of the
instance, it is easy to see the entire domain that can receive metric as
signals from a given spacetime point. 2 (x) g dxa dxb = d xa d
xb . (3.11)
ab ab
3 This and the following sections are adapted from the paper [38]. 4 The author is grateful to Matthew Parry for figuring this out.

78
3.2 Conformal infinity

Thus the new coordinates x a map the initial spacetime onto 0


x
a finite domain within a 1+1-dimensional Minkowski plane. 1
This finite domain is a conformal diagram of the initial (phys-
ical) 1+1-dimensional spacetime. The diagram is drawn on a
sheet of paper which implicitly carries the fiducial Minkowski
metric ab , the vertical axis usually representing the timelike A
1
x
0
coordinate x . 1 0 1
The coordinate and conformal transformations severely dis- B
tort the geometry of the spacetime since they bring infinite
spacetime points to finite distances in the diagram plane. So
one cannot expect in general that straight lines in the diagram
1
correspond to geodesics in the physical spacetime. However,

it is well known that straight lines drawn at 45 angles in a
conformal diagram represent null geodesics in the physical
spacetime. This follows from the fact that any null trajectory Figure 3.2: A conformal diagram of the 1+1-dimensional
xa ( ) in 1+1 dimensions, i.e. any solution of Minkowski spacetime. Dashed lines show ligh-
a trays emitted from points A, B. The curved line is
dx
gab x a x b = 0, x a , (3.12) the trajectory of an inertial observer moving with
d a constant velocity, x1 = 0.3x0 .
is necessarily a geodesic (this is not true in higher dimen-
sions), and Eq. (3.12) is invariant under conformal transforma-
tions of the metric. By drawing lightrays emitted from various
Method of lightrays
points in the diagram, one can illustrate the causal structure
of the spacetime.
The standard construction of conformal diagrams involves an
A textbook example is the conformal diagram for the flat
explicit transformation of the metric to new coordinates that
Minkowski spacetime with the metric gab ab . Calculations
have a finite extent, and usually a further transformation to
are conveniently done in the lightcone coordinates
bring the metric to a manifestly conformally flat form. Find-
u x0 x1 , v x0 + x1 , ab dxa dxb = du dv, (3.13) ing these transformations requires a certain ingenuity. If the
spacetime manifold is covered by several coordinate patches,
and a suitable coordinate transformation is
a different transformation must be used in each patch. How-
du d
v
u
= tanh u, v = tanh v, du dv = . (3.14) ever, a conformal diagram typically consists of just a few lines
(1 u2 ) (1 v2 ) and one would expect that the required computations should
The new coordinates u, v extend from 1 to 1. Multiplying   not be so cumbersome.
the
metric by the conformal factor 2 ( u, v) 1 u 2 1 v2 , I now describe a method of drawing conformal diagrams
we obtain the fiducial spacetime, that avoids the need for performing explicit transformations
2 du dv = d u d
v = ab dxa d
xb , (3.15) of the metric. The method is based on a qualitative analysis of
intersections of lightrays. This approach is particularly suit-
0 x
u x 1 , v x0 + x1 . (3.16) able for the analysis of stochastic spacetimes encountered in

The new coordinates x a have a finite extent, namely models of eternal inflation. Such spacetimes have no symme-
0 1
x x < 1, and the resulting diagram has a diamond shape tries and their metric is not known in closed form, so one can-
shown in Fig. 3.2. To appreciate the distortion of the space- not apply the standard construction of conformal diagrams.
time geometry, we can draw the worldline of an inertial ob- Another motivation for the new method is the apparent re-
server moving with a constant velocity. Note that the angle dundancy involved in the standard method. It is clear that
at which this trajectory enters the endpoints depends on the the transformations used in the standard construction are not
chosen conformal transformation and thus cannot serve as an unique. For instance, one may replace the lightcone coordi-
indication of the observers velocity. nates in Eq. (3.14) by
There is no analog of conformal diagrams for general 3+1-
dimensional spacetimes. Nevertheless, in many cases a 3+1-
f (
u u) , v g (v) , (3.17)
dimensional spacetime can be adequately represented by a
suitable 1+1-dimensional slice, at least for the purpose of
qualitative illustration. For instance, a spherically symmetric where f, g are arbitrary monotonic, bounded, and continuous
spacetime is visualized as the (t, r) half-plane (r 0) where functions. The shape of the diagram will vary with each pos-
each point stands for a 2-sphere of radius r. A conformal di- sible choice of the transformations, but all resulting diagrams
agram is then drawn for the reduced 1+1-dimensional space- are equivalent in the sense that they contain the same infor-
time. mation about the causal structure of the spacetime. One thus
The reduced conformal diagram is meaningful only if expects to be able to extract this information without involv-
the null geodesics in the 1+1-dimensional section are also ing specific explicit transformations of the coordinates and the
geodesics in the physical 3+1-dimensional spacetime. In this metric.
case, the 1+1-dimensional section can be visualized as the set The crucial observation is that this information is unam-
of events accessible to an observer who sends and receives biguously represented by the geometry and topology of ligh-
signals only along a fixed spatial direction. Then a conformal trays and their intersections. I shall now develop this idea into
diagram provides information about the causal structure of a self-contained approach to drawing conformal diagrams
spacetime along this line of sight. that does not involve explicit transformations.

79
3 Asymptotically flat spacetimes

New definition of conformal diagrams correspondence


0
x x0
A conformal diagram is a figure in the fiducial Minkowski
plane satisfying certain conditions, and I first formulate a def-
inition of a conformal diagram in terms of such conditions. A
constructive procedure for drawing conformal diagrams will
be presented subsequently.
A finite open domain of the plane is a conformal diagram 1
x x1
of a given 1+1-dimensional spacetime S if there exists a one-
to-one correspondence between all maximally extended ligh-
trays in S and all straight line segments drawn at 45 an-
gles within the domain of the diagram. This correspondence
Figure 3.3: The local correspondence between straight lines
must be intersection-preserving, i.e. any two lightrays inter-
drawn at 45 angles in the conformal diagram (left)
sect in the physical spacetime exactly as many times as the
and lightrays in the physical spacetime (right).
corresponding lines intersect in the diagram. It is assumed
that all null geodesics in the spacetime S are either infinitely
extendible or end at singularities or at explicitly introduced to the new definition, any two different conformal diagrams
spacetime boundaries. Similarly, the straight line segments of the same spacetime are equivalent in the sense that all the
drawn at 45 angles in a conformal diagram must be limited lightrays in those two diagrams will be in an intersection-
only by the boundary of the diagram. Note that there are only preserving one-to-one correspondence. In the old language,
two spatial directions in a 1+1-dimensional spacetime S, and there must exist a conformal transformation bringing one dia-
that two lightrays emitted in the same direction cannot inter- gram into the other. Equivalent diagrams carry identical in-
sect. 0 formation about the causal structure of the physical space-
For example, the diamond x x1 < 1 is a conformal dia- time. In the next sections I shall give examples illustrating
gram for the flat spacetime due to the intersection-preserving the arbitrary and the necessary choices involved in drawing
one-to-one correspondence of null lines x 0 x1 = const in the conformal diagrams.
0 1
diagram and lightrays x x = const in the physical space- A conformal diagram delivers information mainly through
time. the shape of its boundary line. The boundary of a confor-
For spacetimes having a nontrivial topology, appropriate mal diagram generally contains points representing a space-
topological features need to be introduced also into the fidu- like, timelike, or null infinity, and points belonging to ex-
cial Minkowski plane. At this point I do not consider such plicit boundaries of the spacetime manifold (e.g. singulari-
cases. ties) where lightrays end in the physical spacetime. The latter
I shall now demonstrate the equivalence of the proposed boundaries will be called physical boundaries to distinguish
definition to the standard procedure for drawing conformal them from putative infinite boundaries whose points do not
diagrams. It suffices to find a conformal transformation of the correspond to any events in the physical spacetime. (The def-
form (3.11) in some neighborhood of an arbitrary (nonsingu- inition of conformal diagrams contains the requirement that
lar) spacetime point. Given a diagram with an intersection- the diagram domain be topologically open, and so the bound-
preserving correspondence of lightrays, we can introduce lo- ary points are not supposed to belong to the diagram.) The
cal lightcone coordinates u, v in the diagram such that the infinite boundary is of course the most interesting feature of a
null geodesics are locally the lines u = const or v = const. diagram.
By assumption, each null geodesic uniquely corresponds to a Lastly, I would like to emphasize that conformal diagrams
lightray in the physical spacetime. Since the correspondence can be drawn not only for geodesically complete spacetimes
is intersection-preserving, the local configuration of the null but also for artificially incomplete spacetimes, i.e. for se-
geodesics in the physical spacetime can be visualized as in lected subdomains of larger manifolds. In fact such artifi-
Fig. 3.3. Hence the local lightcone coordinates u, v become cially incomplete spacetimes are often needed in cosmolog-
well-defined local coordinates in the physical spacetime, and ical applications. Examples are a description of a collapsing
again the null geodesics are the lines u = const or v = const. star using a subdomain of the Kruskal spacetime and a de-
On the other hand, these null geodesics must be solutions of scription of an inflationary universe using a subdomain of the
Eq. (3.12), therefore de Sitter spacetime.
guu u 2 + 2guv u v + gvv v 2 = 0 if u = 0 or v = 0. (3.18)
Minkowski spacetime
It follows that guu = gvv = 0. Thus the metric in the local light-
cone coordinates is of the form gab dxa dxb = 2guv (u, v)du dv The new definition merely lists the conditions to be satisfied
which is explicitly conformally flat. This demonstrates the ex- by a conformal diagram. Based on these conditions, I now de-
istence of a local conformal transformation bringing the phys- velop a practical procedure for drawing the diagrams, using
ical metric gab into the fiducial Minkowski metric du dv = the Minkowski spacetime as the first example.
xa d
ab d xb in the diagram. We begin by choosing a Cauchy surface in the physical
Before presenting examples, I comment on the proposed spacetime. In 1+1 dimensions, a Cauchy surface is a line L
definition of conformal diagrams. The definition may appear such that intersections of lightrays emitted from L entirely
to be too broad, allowing many geometric shapes to represent cover the part of the spacetime to the future of L. In the
the same spacetime. However, the old procedure also does Minkowski spacetime, we may choose the line x0 = 0 as the
not specify a particular conformal transformation of the met- Cauchy surface L. The image of the Cauchy surface L in the
ric and in effect admits precisely as much freedom. According conformal diagram must be a finite curve L from which ligh-

80
3.2 Conformal infinity

C
E

boundary
???

B

Cauchy line L
A
Cauchy line L

Figure 3.5: A Cauchy line must have a slope between 45


and 45 (left). A timelike boundary must be a
curve with a slope between 45 and 135 (right).

D
S such that every timelike curve (without endpoints) within
Figure 3.4: Construction of the conformal diagram for the the region intersects S exactly once. (It follows that a Cauchy
Minkowski spacetime. The point E cannot belong surface must be spacelike.)
to the diagram because there are no lightrays inter- It is clear that in any 1+1-dimensional spacetime a suffi-
secting at E. ciently small neighborhood of a Cauchy surface has the same
causal properties as the line x0 = 0 in the Minkowski plane:
namely, two nearby lightrays emitted toward each other will
trays can be emitted in both spatial directions. Therefore the cross, while rays emitted from a point in opposite directions
slope of the curve L must not exceed 45 but otherwise L will diverge. Therefore the line L representing a Cauchy
may be drawn arbitrarily, e.g. as the curve AB in Fig. 3.4. The surface in a conformal diagram must have a slope between
endpoints A, B represent a spatial infinity in the two direc- 45 and 45 (see Fig. 3.5, left). We shall call such lines
tions. horizontally-directed. Other than this, there are no restric-
In the Minkowski spacetime, two lightrays emitted towards tions on the shape of the line L and it may be drawn as an ar-
each other from any two points on L will eventually intersect. bitrary horizontally-directed curve. (In some spacetimes, one
Therefore the domain of the conformal diagram must contain needs to use several disconnected Cauchy surfaces, but I shall
at least the triangular region ABC. On the other hand, any not consider such cases here.)
point outside ABC, such as the point E in Fig. 3.4, cannot be- Another frequently occurring feature in conformal dia-
long to the diagram domain because the point E cannot be grams is a timelike boundary. For example, a spherically sym-
reached by any left-directed lightray emitted from L, and we metric 3+1-dimensional spacetime is usually reduced to the
know that all points in the Minkowski spacetime are intersec- 1+1-dimensional (r, t) plane, where 0 < r < +. From the
tion points of some lightrays. (More formally, the existence 1+1-dimensional point of view, the line r = 0 is an artificially
of the point E within the diagram would violate the condi- introduced timelike boundary that can absorb and emit ligh-
tion that the correspondence between lightrays is intersection- trays. The local geometry of lightrays near r = 0 is shown
preserving.) Therefore the future-directed part of the confor- in Fig. 3.5 (right). It is clear from the figure that in a confor-
mal diagram is bounded by the lines AC and BC. mal diagram the timelike boundary must be represented by a
A completely analogous consideration involving past- line with a slope between 45 and 135 . Let us call such lines
directed lightrays leads to the conclusion that the past- vertically-directed.
directed part of the diagram is the region ABD. Thus a pos- As an example of using timelike boundaries, let us consider
sible diagram for the Minkowski spacetime is the interior of the subdomain (t0 < t < , x1 < x < x2 ) of a de Sitter space-
the rectangle ACBD. This diagram differs from the square- time with flat spatial sections, described by the metric
shaped diagram in Fig. 3.2 by a (finite) conformal transforma-
gab dxa dxb = dt2 e2Ht dx2 . (3.19)
tion of the form (3.17).
We can also ascertain that the points C and D are the future This subdomain can be visualized as the future of a selected
and the past timelike infinity points. For instance, the point C initial comoving region. The Cauchy line t = t0 is con-
is the intersection of the lines AC and BC; these lines are in- nected to the two timelike boundary lines, x = x1,2 . In the
terpreted as putative lightrays emitted from infinitely remote coordinate system (3.19), the null geodesics are solutions of
points of L. At sufficiently late times, any inertial observer dx/dt = eHt and it is easy to see that a lightray emitted
in the Minkowski spacetime will catch lightrays emitted from at x = 0, t = 0 only reaches the values |x| < H 1 (the limit
arbitrarily far points. The same holds for observers moving value is the de Sitter horizon). Null geodesics emitted from
non-inertially as long as their velocity does not approach that the Cauchy surface and from the boundary lines are sketched
of light. Hence all trajectories of such observers must finish at in Fig. 3.6 where it is assumed that the comoving domain
C. x1 < x < x2 contains several de Sitter horizons. A lightray
emitted in the positive direction, such as the ray A, intersects
Cauchy surfaces and artificial boundaries left-directed lightrays emitted from the point B or from nearer
points but does not intersect lightrays emitted further away,
Drawing Cauchy surfaces is a convenient starting point in the such as the ray C. We call the ray B the rightmost ray inter-
construction of conformal diagrams. secting A. It is clear that the intersection of the correspond-
A Cauchy surface for a domain of spacetime is a 3-surface ing lines A and B in the conformal diagram must occur at the

81
3 Asymptotically flat spacetimes

X X rX X X rX rX
(a) (b)
A

A C
C Figure 3.7: The local direction of the infinite boundary (thick
D B
line) is found by determining the rightmost rays
D rX , rX for nearby rays X, X . The direction is at
t0 45 angle (a) when rX = rX and horizontal (b)
B when rX 6= rX .
x
x1 x2
contain a suitable set of Cauchy surfaces and timelike bound-
aries, if the classical field theory is to have predictive power.
Figure 3.6: Construction of a conformal diagram for a part of Qualitative knowledge of the geometry and topology of these
de Sitter spacetime delimited by thick lines (left). boundaries is required for building a conformal diagram us-
The upper boundary of the diagram (right) is a ing the method of lightrays.
horizontal line. After drawing the curves for the physical boundaries, it re-
mains to determine the shape of the infinite (timelike and null)
boundaries of the diagram. To this end, one can first con-
boundary of the diagram, otherwise there would exist further
sider right-directed lightrays emitted from various points on
lines intersecting A to the right of B. Considering a ray D to
the Cauchy line and from the boundaries, including the limit
the right of A, we find that the rightmost ray for D is C. The
points at infinity. For each right-directed lightray X there ex-
intersection of C and D is thus another point on the bound-
ists a certain subset SX of (left-directed) lightrays that inter-
ary of the conformal diagram. It follows that the boundary
sect X. Since the subset SX has a finite extent, there exists
line must have a slope between 45 and 45 ; for simplicity,
a rightmost ray rX SX . (In the de Sitter example above,
we draw a straight horizontal line (Fig. 3.6, right). This line
the rightmost ray rA is the ray B and the rightmost ray rD is
represents the (timelike and null) infinite future.
C.) By definition of the conformal diagram, the lightrays are
A timelike boundary can be interpreted as the trajectory of
straight lines limited only by the boundary of the conformal
an observer who absorbs or emits lightrays and thus partic-
diagram, therefore the intersection point of X and rX must
ipates in the exploration of the causal structure of the space-
belong to the boundary. In this way we have established the
time. The lightrays emitted by the boundary, together with
location of one point of the unknown boundary line, namely
those emitted from the Cauchy surface, form the totality of all
the endpoint of the ray X.
lightrays that must be bijectively mapped into straight lines in
To determine the local direction of the boundary line at
the conformal diagram. The role of timelike boundaries and
that point, we use the following argument. For each right-
Cauchy surfaces is to provide a physically motivated bound-
directed lightray X, we can consider a right-directed ray X
ary for the part of the spacetime we are interested in.
infinitesimally close and to the right of X (if no ray X can be
An artificial timelike boundary may also be introduced into
found to the right of X, it means that X itself belongs to the
the spacetime with the purpose of simplifying the construc-
boundary of the diagram). Then there are two possibilities
tion of the diagram. Below we shall show that conformal di-
(see Fig. 3.7a,b): either the rightmost ray rX is also the right-
agrams can be simply pasted together along a common time-
most ray rX for X , or the ray rX is located to the right of rX .
like boundary.
In the first case, the boundary line has a 45 slope and locally
coincides with rX , while in the second case the boundary line
3.2.3 How to draw conformal diagrams is horizontally-directed. Thus we can draw a right-directed
fragment of the infinite boundary that limits the ray X. We
We can now outline a general procedure for building a confor- then continue by moving further to the right and consider the
mal diagram for a given 1+1-dimensional spacetime using the endpoint of the ray X , etc.
method of lightrays. The procedure can be applied not only to The same procedure is then repeated for left-directed ligh-
geodesically complete spacetimes but also to spacetimes with trays, until one finishes drawing all unknown lines in the
explicitly specified boundaries. future-directed part of the diagram. In this way the infinite
One starts by considering the future-directed part of the boundary of the conformal diagram is constructed as the lo-
spacetime and by choosing a suitable Cauchy surface and, cus of last intersections of lightrays. Finally one applies the
possibly, some timelike boundaries. These lines are repre- same considerations to past-directed lightrays and so com-
sented in the diagram by arbitrarily drawn horizontal and pletes the diagram.
vertical curves of finite extent. The endpoints of these curves It follows that the infinite boundary of a conformal diagram
correspond either to the intersection points of Cauchy lines can always be drawn as a sequence of either straight line seg-
and timelike boundaries, or to imaginary points at spacelike ments directed at 45 angles, or horizontally- and vertically-
and timelike infinity. directed curves. In our convention, Cauchy surfaces and arti-
Note that Cauchy surfaces and timelike boundaries are the ficial timelike boundaries are drawn as curved lines and infi-
lines on which boundary conditions for e.g. a wave equation nite boundaries as straight lines (when possible).
must be specified to obtain a unique solution within a domain. Let us briefly consider the pasting of conformal diagrams
One expects that any physically relevant spacetime should in general. When two spacetime domains are separated by

82
3.2 Conformal infinity

a timelike worldline, the corresponding conformal diagrams r=0


can be pasted together along the boundary line. To verify this C F
almost obvious statement more formally, we begin by draw-
ing the timelike boundary as a vertically-directed line in the E
diagram. The shape of the conformal diagram to the right F

of the boundary is determined solely by the intersections of F
B
lightrays within the right half of the spacetime. Hence, the E
y line
conformal diagram for the right half of the spacetime can be Cauch
pasted to the right of the timelike boundary line. The same A
holds for the left half of the spacetime. This justifies pasting
of diagrams along a common timelike boundary.

r=0
Further examples
Collapsing star The method of lightrays does not require ex-
D
plicit formulae for the spacetime metric if the qualitative be-
havior of lightrays is known. As another example of using the
lightray method, let us consider an asymptotically flat space- Figure 3.8: Construction of the conformal diagram for the
time with a star collapsing to a black hole (BH). spacetime of a collapsing star. The thick dotted line
To reduce the spacetime to 1+1 dimensions, we assume CF is a Schwarzschild singularity.
spherical symmetry and consider only the (r, t) plane, where
0 < r < + and the line r = 0, the center of the star, is
an artificial boundary. As before, we start with the future- resented by the entire line CF .
directed part of the diagram. We must first choose a Cauchy The past-directed part of the conformal diagram is easy to
surface; a suitable Cauchy surface is the line t = t0 where t0 construct. Since any two past-directed lightrays intersect, the
is a time chosen before the collapse of the star. We represent past-directed part is similar to that for the Minkowski space-
the Cauchy surface by the curve AB in the diagram (Fig. 3.8). time with a boundary at r = 0, namely the triangular domain
The point A corresponds to (r = 0, t = t0 ), while B is a spa- ABD. Thus the diagram in Fig. 3.8 is complete.
tial infinity (r = , t = t0 ). The artificial boundary r = 0 is
represented by the vertical line AC. Future part of de Sitter spacetime In Fig. 3.6 we have drawn
To investigate the shape of the future part of the diagram, a conformal diagram for the subdomain of a de Sitter space-
we need to analyze the intersections of lightrays emitted from time delimited by two timelike boundaries. We shall now con-
faraway points of the Cauchy surface. We know from qualita- struct the diagram for the future part of a de Sitter spacetime
tive considerations of the black hole formation that lightrays with spatially unlimited sections. (Note that the past half of
can escape from the star interior only until the appearance of the de Sitter spacetime is not covered by the flat coordinates
the BH horizon. Shortly thereafter the star center becomes a because of incompleteness of past-directed geodesics. In this
singularity that cannot emit any lightrays. Hence, among all paper we shall not use the complete de Sitter spacetime but
the rays emitted from the star center at various times, there only the future of an arbitrarily chosen, unbounded, spacelike
exists a last ray not captured by the BH, while rays emit- Cauchy hypersurface.)
ted later are captured. We arbitrarily choose a point E on the The Cauchy surface t = t0 , < x < is drawn as a
boundary line r = 0 to represent the emission of this last finite horizontal curve in the conformal diagram (the curved
ray. Any lightray emitted from r = 0 before E will prop- line AB in Fig. 3.9). The points A, B in the diagram represent a
agate away from the BH and so will intersect all left-directed spacelike infinity in the two directions and do not correspond
lightrays emitted from arbitrarily remote points of the Cauchy to any points in the physical spacetime. It remains to establish
line AB. Since all the intersection points are outside of the BH the shape of the infinite future boundary which must be a line
horizon, it follows that the conformal diagram contains the connecting the points A, B to the future of the Cauchy sur-
polygon AEF B which represents the spacetime outside the face. We already know from the construction of the diagram
BH. The line F B is the infinite null boundary of the diagram, in Fig. 3.6 that this future boundary is locally horizontal. Since
while the line EF is the BH horizon. the behavior of lightrays emitted from all points of the Cauchy
It is clear that the point C is the last point from which ligh- surface is the same, the future boundary may be represented
trays can be emitted from the star center and thus C is the by a horizontal straight line connecting the points A, B. (To
beginning of the BH singularity. It remains to determine the keep the convention of having straight infinite boundaries, we
shape of the diagram between the points C and F . We know have drawn the Cauchy surface t = t0 as a curve extending
that a lightray emitted from the center after E will be recap- downward from the straight line AB.)
tured by the BH singularity and thus will not intersect with Note that the same diagram also represents an inhomoge-
lightrays entering the BH horizon sufficiently late. For in- neous spacetime consisting of de Sitter-like regions with dif-
stance, the lightray emitted at E will intersect with the ligh- ferent values of the Hubble constant H. This is so because a
tray F but not with a later ray F , as shown in Fig. 3.8. There- difference in the local values of H does not change the quali-
fore the diagram boundary line connecting C and F is locally tative behavior of lightrays at infinity: the rays will intersect
horizontally-directed. This line consists of final intersection only if emitted from sufficiently near points. The infinite fu-
points of lightrays emitted from the star center and those en- ture boundary remains a horizontally-directed line as long as
tering the BH horizon from outside. These final intersection H 6= 0 everywhere.
points are located at the BH singularity which is therefore rep- The construction of conformal diagrams for the following

83
3 Asymptotically flat spacetimes

A B

t = t0

Figure 3.9: A conformal diagram for the future part of the


de Sitter spacetime with flat spatial sections. The

x = x1

x = x2
curved line represents the spatially infinite Cauchy
surface t = t0 . The local behavior of lightrays is the A B
same as in Fig. 3.6.

spacetimes is left as an exercise for the reader.

A subdomain of the Minkowski spacetime between two


causally separated observers moving with a constant
proper acceleration in opposite directions (Fig. 3.10, left).
A flat closed universe: the subdomain ( < t < ,
Figure 3.10: Left: The domain of Minkowski spacetime be-
x1 < x < x2 ) of a Minkowski spacetime with the lines
tween two causally separated observers A, B
x = x1 and x = x2 identified (Fig. 3.10, right).
moving with a constant proper acceleration in op-
An asymptotically flat spacetime with two stars collaps- posite directions. The dashed lines show that the
ing into two black holes; the line of sight crosses the two two observers cannot see each other. Right: A
star centers (Fig. 3.11). closed Minkowski universe. The thick lines rep-
resent the identified boundaries x = x1,2 . A ligh-
The future part of a de Sitter spacetime with a star col- tray (dashed line) crosses the line x = x1,2 in-
lapsing into a black hole (Fig. 3.12). finitely many times.
A maximally extended Schwarzschild-de Sitter space-
time (Fig. 3.13). It is interesting to note that this diagram
is usually drawn with all Schwarzschild and de Sitter re-
gions having the same size, which makes the figure un-
bounded (Fig. 3.13, top) despite the intention to represent
the spacetime by a finite figure in the fiducial Minkowski A T B
plane. To adhere to the definition of a conformal dia-
gram as a bounded figure, one can use a suitable confor-
mal transformation reducing the diagram to a finite size
(e.g. Fig. 3.13, bottom).

3.3 Asymptotic flatness


Additional literature:
[23] A concise derivation of the peeling property. Figure 3.11: A spacetime with two collapsing stars. The point
[9] Good conceptual explanations of the asymptotic de- T is a future timelike infinity reached by an ob-
scription of spacetimes in GR. server (thick curve) remaining between the two
[7] A recent review. black holes. The lines AT and T B represent BH
singularities and the dashed lines are the BH hori-
The concept of an asymptotically flat spacetime was in- zons.
vented to formalize the picture of a spacetime which is sim-
ilar to Minkowski near infinity. Informally, a spacetime is
asymptotically flat if it can be extended via a conformal trans-
formation to include the infinity domain, and then if the struc-
ture of the infinity is the same as that of the Minkowski space-
time. A definition can be given more formally as follows: A B
A spacetime manifold M is asymptotically flat in the null
future direction if:

1. There exists an extension of M to an (unphysical) mani-


fold M with an (unphysical) regular metric g and bound- Figure 3.12: The future part of a de Sitter spacetime with a col-
ary I + , such that g = 2 g, where is a smooth function lapsing star. The line AB represents the BH sin-

such that = 0 on the boundary, but 6= 0 everywhere gularity; the dashed line is the BH horizon.

(we denote by the covariant derivative with respect to
the unphysical metric).

84
3.4 Conformal radiation fields

A calculation shows that the unphysical Ricci tensor R is


related to the physical one, R , by (see [36] Appendix D)

R = R 21

 
g 1 32 g(n, n) .
(3.21)

We now multiply both sides of Eq. (3.21) by . By assumption,


R vanishes in a neighborhood of I + , and R is regular

everywhere on M since it is a nonsingular manifold. Using
1 g(n, n) = O(), we find

2 + g
= 0 on I + .

Figure 3.13: Diagrams for a maximally extended
Schwarzschild-de Sitter spacetime. The con- = 0 or equivalently n
=
Contracting with g gives
ventional, unbounded diagram (top) and an
0, which means that n is a (null) geodesic, shear-free, and
equivalent diagram having a finite extent (bot-
divergence-free vector field on I + . (The entire distortion ten-
tom) are related by a conformal transformation.
sor B(n) vanishes on I + .) Note that the condition g(n, n) = 0
The thick dotted lines represent Schwarzschild
could be also obtained from Eq. (3.21) if we multiply both
singularities. Both diagrams contain infinitely
sides by 2 .
many Schwarzschild and de Sitter regions.
Since I + has the topology RS 2 , we can select coordinates
, on the S 2 sections such that the partial metric is that of a
2. The boundary I + is a null surface (in the unphysical sphere, 
metric), i.e. is a null function on the boundary. dS 2 = d d + sin2 d d V.

3. The boundary surface I + has topology R S 2 . Since the vector field n is divergence-free, the cross-section
area of the S 2 sections remains constant along the flow lines of
4. The vector is future-pointing at I + (as appropriate n, thus the partial metric will have equal values of V on every
for the future boundary). S 2 section. So the remaining conformal freedom ( g e2 g
with = const along the flow lines of n) can be used to set
5. The energy-momentum tensor of matter (equivalently,
V = 1. The function is a well-defined coordinate near I +
the Ricci tensor) vanishes in a neighborhood of I + . (This
since n g1 d 6= 0. Finally, we can select a fourth coordi-
condition may be relaxed to vanishing only as 4 or
nate u parametrizing the R part of I + = R S 2 , such that
faster.)
n u = 1. The coordinate system (u, , , ) is well-defined in
From these conditions, we shall now derive the existence of a neighborhood of I + , where the metric has the form
Minkowski-like coordinates at large distances. Let n be the
1
null vector dual to d. First, we note that g(n, n) = 0 on I + g = (du d + d du) d d sin2 d d.
2
implies that the auxiliary function f 1 g(n, n) is smooth
on M . Secondly, there is a freedom to rescale the unphysi- A coordinate system in the physical manifold M away from
cal metric by a nowhere vanishing factor e2 . Under such a I + can be obtained by the replacement r = 1 . The function
rescaling, the function f changes as r will then serve as a radial coordinate near infinity since
  r at infinity.
f e f + 2 n on I + . (3.20) The coordinate system (u, r, , ) is extended away from
I + into the interior of M as follows. The coordinate u is
Therefore, a suitable choice of will make f vanish on I + . chosen as a null function, so that l g1 du is a null vector
There will remain the freedom to perform conformal transfor- such that g(n, l) = 1. Then, r is defined as an affine param-
mations g e2 g with n = 0, i.e. with the factor constant eter along the flow lines of l, such that l r = 1. The an-
along the flow lines of n. gular coordinates , are kept constant along the flow lines
of l, so l = l = 0. The resulting coordinate system
Calculation: Derive Eq. (3.20). asymptotically corresponds to spherical Minkowski coordi-
Solution: A replacement g e2 g requires e be- nates {t, r, , } if we define u t + r.5 This local coordinate
cause the physical metric 2 g must remain fixed. Since system is called the Bondi coordinate system.
g(n, n) = g1 (d, d) and the differential d is metric- Thus, we conclude that the assumptions of asymptotic flat-
independent, we find that the function f transforms under ness are a precise way of formulating the notion of a spacetime
the above replacements as that is almost Minkowski at large distances.
f = 1 g1 (d, d)
e 1 e2 g1 (d(e ), d(e )) 3.4 Conformal radiation fields
2 1
=e f + 2e g (de , d) + O()
Very far from an isolated system, the spacetime is almost flat
=e
f + 2e 2
n e + O(). but there may still remain radiation (gravitational, electro-
magnetic, or other). We are interested in studying the struc-
Thus we obtain the desired relation on I + .
ture of the radiation field at large distances. The construction
Need to check all conformal transformation equations
against Appendix D of Wald! 5 This statement is justified in more detail in [36], p. 280.

85
3 Asymptotically flat spacetimes

of conformal infinity is especially helpful for this task, because It is easy to see that this action is conformally invariant (in
radiation fields are conformally invariant. Let us review some 3+1-dimensional spacetimes), the Maxwell field F having
examples of conformally invariant fields. conformal weight 0.

3.4.1 Scalar field in 1+1 dimensions 3.4.4 Gravitational radiation field


The action for a massless scalar field in a 1+1-dimensional The Einstein equation relates the Ricci tensor to the distribu-
spacetime is tion of matter in an algebraic way (if there is no matter at a
Z
point p, the Ricci tensor is zero at p). The Ricci tensor, how-
1
S[g, ] = d2 x g g 1 (d, d), ever, is only the trace of the full Riemann tensor. Therefore,
2 the curvature of spacetime contains degrees of freedom not

where g det g is the determinant of the covariant algebraically related to the distribution of matter. These de-
metric, and so gd2 x is the invariant 2-volume element. grees of freedom are described by the trace-free part of the
This action is invariant under the conformal transformation Riemann tensor called the Weyl tensor,
of the metric, g g = 2 g.
1 [ ]
C = R + [ S ] ,
Calculation: Show that the action is invariant under the 2
above transformation. where
Hint: In a 1+1-dimensional spacetime, g = 2 g. Sub- 1
stitute g = 2 g into the action and use the independence of S R Rg
6
d from the metric. is an auxiliary tensor carrying equivalent information to the
Ricci tensor. Note that R = C in vacuum (where
3.4.2 Scalar field in 3+1 dimensions R = 0). It can be shown that the second Bianchi identity
implies the constraints
The action for a massless, conformally coupled scalar field is
Z   C + [ S] = 0,
1 4 1 1 2
S[g, ] = d x g g (d, d) R , S S = 0.
2 6
where R is the scalar curvature. Again one can verify that the In the absence of matter, S = 0 and thus we obtain the equa-
action is invariant under a conformal transformation g g = tion C = 0 which determines the dynamics of pure
2 g (up to boundary terms), provided that the field is also gravitational waves.
transformed as The Weyl tensor has the simple and useful conformal trans-
= 1 . formation property
The power of the conformal factor needed for the correct
C = 2 C for g = 2 g.
transformation law is called the conformal weight of the field.
Thus a scalar field has conformal weight 0 in 1+1 dimen- Another way to express this property is to say that the tensor
sions and weight 1 in 3+1 dimensions. Conformal invari- C is conformally invariant.
ance means that a solution of the equation of motion for in
the metric g also gives a solution for the field in the metric
g.
3.4.5 Asymptotic behavior of radiation
Radiation fields are massless and propagate along null
Calculation: Using Eq. (3.8), show that the action is invari-
geodesics (lightrays). Let us consider a lightray ( ) that ap-
ant under the above transformations (up to boundary terms).
proaches null infinity I + at a point p. We shall now character-
Hint: An integration by parts yields an equivalent form of
ize a radiation field along the lightray , in the asymptotically
the action,
  far region near infinity.
Z
1 1 Denote by l the affine tangent vector to the lightray ( ). In
S[g, ] = d4 x g  R2 ,  . the unphysical metric ( g = 2 g), the affine parameter should
2 6
be replaced by 2 , so that the corresponding affine tangent
Substitute g = 2 g, = into this action, and replace R vector is l = 2 l (see Statement 1.9.2.2). Since the tangent
according to Eq. (3.8). Note that vector l is regular in the unphysical manifold, it should have
() =  + 
+ 2( )( ). a well-defined value l0 on I + . Then the physical-space tan-
gent vector l must decay at infinity as l l0 2 . We shall now
complete l to a null tetrad {l, m, n} (see Sec. 2.5), obtain sim-
3.4.3 Electromagnetic field ilar asymptotic properties for the tetrad at infinity, and study
The electromagnetic field is described by the Maxwell tensor the asymptotic behavior of radiation fields using the tetrad
which is a 2-form F , components.
The vector l0 can be completed to a null tetrad using the
F = dA, F = A A ,
vector n g1 d and a complex-valued vector m is cho-
where the 1-form A is the electromagnetic potential. The ac- sen in the orthogonal complement { n, l0 } . The (unphysical)
tion for the Maxwell field is null tetrad {l0 , m,
n } can be parallelly transported from I +
1
Z
along using the unphysical metric, which produces null vec-
S[g, F ] = gd4 x g g F F . tor fields {l, m,
n } defined along and satisfying g( n, l) = 1
16

86
3.4 Conformal radiation fields

m)
and g(m, = 1 at every point. However, we need to find a Statement: Consider a Lorentz transformation of the tetrad
physical tetrad, {l, m, n}, normalized and parallelly trans- that leaves the vector l constant,
ported along using the physical metric. We have already
seen that l = 2l. The vector n of the physical tetrad can be m ei m + Al,
chosen as n = n n, l) = g(n, l) = 1
since we must have g( + ei Am
n n + ei Am
+ AAl,
everywhere along . Finally, the complex-valued vector m is
selected from the subspace {n, l} to be proportional to m. where is real and A is complex. Under this transformation,
The normalization g(m, m) = g(m,
= 1 shows that the
m) the tetrad components j of the Maxwell tensor change as
unphysical-space vector m must be chosen as m = 1 m.
Thus, the physical null tetrad {l, m, n} and the unphysical 0 ei 0 + 2A1 + A2 ei 2 ,
one, {l, m,
n }, must be related by 1 1 ,

{2l, m,
n } = {l, m, n} 2 ei 2 .

everywhere along the lightray . It follows that the leading falloff behavior of 0 r1 is
The convenience of the null tetrads is in the concise way independent of the choice of the null tetrad. The quantities
they describe components of tensors. For example, the an- taken separately, do not carry tetrad-
F (n, l) and F (m, m),
tisymmetric Maxwell tensor F (a, b) for the electromagnetic independent information (i.e. one of these may be set to zero
field is represented by three tetrad components, usually cho- by a choice of tetrad).
sen as follows,
0 F (n, m),
1 1
1 F (n, l) F (m, m),

2 2
2 F (m,
l).
(The properties of the tetrad components j are studied in
the Statement below.) The unphysical components j, j =
0, 1, 2, are given by the same equations with the unphysical
tetrad. The relation between the physical and the unphysical
components is therefore
0,
0 = 1,
1 = 2 2.
2 = 3
Since the Maxwell field is conformally invariant, the unphys-
ical components will also satisfy the Maxwell equations and
will have regular solutions on I + . It is now easy to see that
the component 0 will be the slowest-decaying one,
1
0 ,
r
while other components will decay faster. Note that F and
j represent the field strength, which should decay as r1 for
a radiation field. Hence, it is clear that the the intensity of
outgoing radiation (i.e. energy carried away to infinity) must be
independent of the two other components 1,2 . In fact, the
radiated power is proportional to |0 |2 . Since the unphysical
component 0 is constant at infinity, the differential intensity
of radiation in a given (null) direction can be obtained directly
by evaluating | 0 |2 at a point of I + that corresponds to the
required asymptotic direction.
Other radiation fields exhibit analogous properties. For ex-
ample, a scalar field has the radiation component n ,
while the gravitational radiation is described by the compo-
nent
0 C n m n m = R(n, m, n, m);
note that this component of the Weyl tensor coincides with
that of the Riemann tensor. The component 0 decays as
= r1 at infinity, and the intensity of gravitational radiation
2
is proportional to |0 | . All the other components of the Weyl
tensor decay faster at infinity.
The splitting of components according to their falloff behav-
ior along null geodesics is called the peeling property. This
property holds in general (even-dimensional) asymptotically
flat spacetimes and for all massless fields (of arbitrary spin).

87
4 Global techniques
Under global techniques we understand considerations time and to restrict the calculations to regular (nonsingular)
that involve entire manifolds or large domains, rather than points.1 Hence, one finds a curious situation: gravitational
local considerations (e.g. calculations involving properties of singularities are not places and are not present anywhere
tensors and their derivatives at a single point). Global tech- in the spacetime manifold. Since all the singular points are
niques can be used to answer questions such as whether there removed, it is difficult to formulate a general definition of a
is a singularity anywhere, or whether a certain interesting singularity in terms of internal properties of the remaining
subdomain of the spacetime is finite or infinite. In this chap- nonsingular manifold. Nevertheless, the presence of sin-
ter we shall introduce some of these techniques and consider gularities is usually manifested in several ways. For instance,
some applications. the curvature may become infinite along a certain geodesic
worldline, indicating that a freely falling observer following
that worldline will experience unbounded tidal forces. Or,
4.1 Singularity theorems certain geodesics may have only a finite range of the affine pa-
rameter, because these geodesics hit a singularity and can-
Additional literature: [8], [26]. There are not many textbooks not be extended any further.
that explain singularity theorems. An accepted way to define a singular spacetime is through
[12]: Contains all the derivations but requires you to invest the non-extensibility of geodesics. Intuitively, a geodesic
a significant time in reading large portions of the book. should be able to extend arbitrarily far (for an infinite range
[21]: Contains some introductory material on singularity of the affine parameter) unless there is a singularity in the way.
theorems; Chapter 34 contains a sufficiently detailed proof of A spacetime is called geodesically complete if any geodesic
Hawkings area theorem. is extendible to arbitrary values of the affine parameter. A
[36]: Devotes two chapters to mathematical developments geodesically incomplete spacetime is interpreted as singu-
and explains the main ideas behind singularity theorems. lar in some way. Usually, only timelike and null geodesics
However, refers to [12] for proofs of some crucial statements. are used to determine geodesic completeness. (Note that a
spacetime can be intuitively singular without containing
4.1.1 Singularities and geodesic any incomplete timelike and null geodesics because the sin-
incompleteness gularity may be infinitely far away, approachable after finite
proper time only by a sufficiently highly accelerated observer,
It is well known that some spacetimes occurring in gen- or by a spacelike geodesic.)
eral relativity contain points where the force of gravity be- The presently known singularity theorems are mathemati-
comes infinitely large (more precisely, the geometric de- cal statements about conditions for a spacetime to be geodesi-
scription of the spacetime breaks down). Such spacetimes are cally incomplete. Thus, these theorems only indicate the pres-
called singular. Two major examples of singular spacetimes ence of singularities but neither predict their location nor
are the Schwarzschild spacetime and a generic Friedmann- characterize their physical properties.
Robertson-Walker (FRW) spacetime. In the Schwarzschild
spacetime, it is relatively easy to understand that the black Statement:2 Let k be a timelike Killing vector and ( ) a
hole center r = 0 represents a singularity because the cur- (non-geodesic) timelike curve with tangent vector v, such that
vature becomes infinite as r 0, indicating the presence of g(v, v) = 1. Show that
unboundedly large tidal forces. Also, every timelike or null |v g(k, v)| |g(k, v)| |a| ,
geodesic entering the Schwarzschild radius cannot escape hit-
ting the center within a finite time. No lightrays or timelike where the vector a( ) p v v is the proper acceleration of
geodesics can be continued past that point because the geom- the curve and |a| g(a, a). Deduce that g(v, k) can
etry is singular and the geodesic equation becomes undefined change
R only by a finite amount along the curve as long as
(i.e. it does not predict how a body will move past the center). 0 |a( )| d < , which means that the worldline ( ) can
Since the tidal forces will tear apart any material objects ap- be realized by a rocket with a finite amount of fuel. Show that
proaching the center, it is reasonable to conclude that there is the curve cannot reach any spacetime regions where g(k, k)
no sensible way to describe the evolution of an observer past becomes unbounded.
the center of a black hole. Thus the center is a place where Solution: By the Killing property, g(v k, v) = 0, thus
the geometry of the spacetime cannot be described by general v g(k, v) = g(k, a). It remains to show that |g(k, a)| <
relativity. In an FRW spacetime with the scale factor a(t) 0 |g(k, v)a| for timelike k, v and spacelike a 6= 0 such that
as t 0, the curvature is unbounded near t = 0. Similarly to g(v, a) = 0. We may decompose k = a + v + s, where
the Schwarzschild case, geodesics cannot be extended to the g(k, a)
past beyond t = 0, so one cannot describe what happened to , g(k, v),
g(a, a)
an observer before t = 0. In this sense, t = 0 is the beginning
1 Itis conjectured that a quantum theory of gravity (to be developed) will
of time for an FRW spacetime. It is reasonable to say that the
replace singularities by some well-defined and finite new physics. At
3-surface t = 0 in an FRW spacetime is singular. this point, it is safe to say that the classical theory of general relativity
Since the spacetime manifold is not smooth at singular breaks down at singularities.
points, it is natural to excise such points from the space- 2 After [36], Problem 9.1.

89
4 Global techniques

and s is a spacelike vector orthogonal to both a and v. Then described by the inflationary spacetime and so may be inter-
the inequality g(k, k) > 0 yields preted as the presence of an initial singularity (the beginning
of time).
g(k, k) = 2 2 |a|2 |s|2 > 0, We first describe the inflationary character of the space-
time in a coordinate-free manner. A salient property of the
hence spacetime (4.1) is the existence of a congruence of timelike
1
|| = |g(k, v)| > || |a| = |g(k, a)| , geodesics with tangent vectors v t and connecting vec-
|a| tors c1,2,3 = x , y , z that satisfy the Hubble law (3.4). These
which is equivalent to the desired inequality. Rewriting it as geodesics have everywhere positive divergence (see the fol-
lowing calculation).
|v ln g(k, v)| < |a| ,
Calculation: For the metric (4.1), show that the vectors
and integrating both sides, we have c1,2,3 x , y , z are connecting for v t and satisfy the
Z Z Hubble law (3.4) with H = a/a. Then show that the diver-
gence div v = 3H and hence is everywhere positive during
d v ln g(k, v) d |v ln g(k, v)|

0
inflation.
Z0 Solution: The Hubble law, v cj = Hcj , follows from
|a| d < .
1 1
g(v cj , cj ) = v g(cj , cj ) = t a2 = a2 H.
2 2
Therefore the total change of ln g(k, v) is bounded. Since for
any timelike vectors x, y we have the inequality Then we compute the divergence using the decomposition
2 3
g(x, x)g(y, y) [g(x, y)] , X
g 1 = v v a2 cj cj
j=1
it follows that g(k, k) |g(k, v)| and thus the total change of
g(k, k) along the curve ( ) is bounded for all . 
that follows from the fact that v, a1 c1 , a1 c2 , a1 c3 is an
orthonormal basis. Hence,
4.1.2 Past-incompleteness of inflation div v = Tr(x,y) g(x, y v)
To get a taste of singularity theorems, we start with a simple 3
X
statement about the past timelike geodesic incompleteness of = g(v, u v) a2 g(cj , cj v)
an inflationary spacetime.3 j=1
Cosmological inflation is a period of accelerated expansion 3
X
of spacetime. For instance, a spatially flat FRW metric with a = a2 g(cj , v cj ) = 3H > 0.
given scale factor a(t), namely j=1

g = dt dt a2 (t)d2 r, (4.1) Thus we reformulate our assumption of the presence of in-


flation throughout the entire past portion of the spacetime:
d2 r dx dx + dy dy + dz dz, We require the existence of a timelike geodesic congruence
v covering the entire past, with everywhere positive diver-
represents inflation for those times t when a
> 0 and a > 0. gence divv > 0. Here, the past portion of the spacetime
The Hubble expansion rate is defined as is understood as the past of some Cauchy 3-surface. Note
d that the existence of an everywhere diverging congruence is
H(t) = ln a, a nontrivial condition. In any spacetime, one can always find
dt
geodesic congruences having arbitrary, positive or negative,
so the conditions for the presence of inflation are H + H 2 > 0 divergence at any given point, but not necessarily everywhere
and H > 0. A full discussion of possible varieties of cosmo- to the past of a Cauchy surface.
logical inflation is beyond the scope of this text. We shall re- Given an everywhere diverging geodesic congruence, we
strict our attention to a class of spacetimes that are qualita- define the local Hubble expansion rate to be H 13 divv. We
tively similar to the inflating FRW case, namely, to spacetimes shall assume, for simplicity, that there exists a constant H0 > 0
with a metric of approximately the form (4.1), in suitable local such that H H0 everywhere. (This condition can be signifi-
coordinates. Locally, the parameters a and H will be well- cantly relaxed without affecting the conclusions.)
defined and should satisfy the conditions H > 0, H + H 2 > 0. Now we imagine that some dust-like particles follow the
The processes that generate inflation may result in local fluc- geodesic congruence v everywhere, and consider a geodesic
tuations of the Hubble rate H, but these fluctuations will oc- observer who moves through the universe and measures the
cur on distance and time scales typically much larger than the velocities of the dust particles. This observer can conclude
horizon size H 1 . that the spacetime is inflating and measure the local value of
It is clear that inflation can continue forever to the future. H in the following way. Let the observers worldline be ( )
The main statement is that such an inflationary spacetime can- with an affine tangent vector u , such that g(u, u) = 1.
not be also eternal to the past. The way to prove this is to show In principle, the observer can measure the following two (3-
that a past-directed timelike geodesic will have only a finite dimensional) quantities: the relative velocity, ~vrel ( ), of the
range of proper time. Events that preceded that time are not particle that passes by the observer at a time , and the dis-
placement vector ~x( ; ), in the observers rest frame, be-
3 This section is based on the paper [3]. tween nearby particles that passed the observer at times

90
4.1 Singularity theorems

and . After registering two nearby particles at times and Since > 1, we have
+ , the observer can calculate the separation and the
relative velocity of the two particles in the (approximate) lo- ( ) + 1
ln >0 for all .
cal rest frame of the particles. Let n be a normalized, spacelike ( ) 1
vector describing the separation between two particles in the
Finally, our hypothetical observer can integrate the instanta-
particle rest frame. Then the observer can compute the quan-
neous value of H( ) along the worldline ( ). By assumption,
tity g(n, n v) and thus estimate the local Hubble rate from
H H0 , therefore
the rate of change of the separation in the direction n,
Z 2
1 H( )d H0 (2 1 )
H divv g(n, n v)
3 1

(the minus sign compensates for the spacelike character of for all 1 , 2 . On the other hand,
the vectors n and n v). Note that n must be normalized, Z 2
g(n, n) = 1, so the observed Hubble rate depends only on 1 (2 ) + 1 1 (1 ) + 1
H( )d = ln ln
the direction of the observers motion but not on its speed. 1 2 (2 ) 1 2 (1 ) 1
Since the spacetime is approximately isotropic, the estimate 1 (2 ) + 1
of the value of H from a single direction n is expected to be < ln .
2 (2 ) 1
reasonably accurate.
The next calculations derive a formula for H in terms of Hence, there is a lower bound on the admissible values 1 of
the vectors u and v. It will be convenient to denote ( ) the proper time,
g(u, v). The quantity is the factor of special relativity,
describing the time dilation between the observer and the par- 1 (2 ) + 1
1 > 2 ln min (2 ).
ticle rest frames. Note that g(u, v) > 1 for two future-directed 2H0 (2 ) 1
timelike vectors u 6= v normalized to g(u, u) = g(v, v) = 1.
Since we assume that the observer is not at rest with respect This means that no timelike geodesic can be extended to the
to any of the particles, we will always have > 1. past further than to min (2 ). Thus, the spacetime is geodesi-
cally incomplete to the past. (A similar argument shows that
Calculation: Using the geodesic property of the fields v and null geodesics are also past-incomplete.) The only geodesic
u, show that the vector n and the instantaneous Hubble rate worldlines that might be complete to the past are the particle
H are expressed by worldlines.
What could the observer see at earlier times min ? One
u v u
n= , H= . possibility is that the observer did not exist at those times be-
2 1 2 1
cause the spacetime is singular in the distant past. In this case,
Derive the relationship between the observed 3-dimensional we can say that a past-directed geodesic hits a singularity at
quantities ~vrel ( ), ~vrel ( + d ), ~x( ; + d ), and the 4- = min or sooner, but we cannot deduce any more informa-
dimensional vectors u, v, as well as u , in the limit d 0. tion about the singularity. For instance, different observers
Solution: The vector ndt should represent the infinitesimal might hit different, separate singularities. The only alter-
displacement of the observer in the particle rest frame, where native to the presence of singularities is to conclude that the
the time interval is dt. Spatial vectors in the particle rest frame spacetime was not inflating at < min . In both cases, the
are (spacelike) vectors orthogonal to v. Thus, the vector n is inflationary epoch could not be past-eternal.
proportional to the projection of u onto the subspace v :
u vg(u, v) u v 4.1.3 Conjugate points on geodesics
n= = ,
|u vg(u, v)| 2 1 Proofs of several singularity theorems are based on the prop-
where we have used g(u, u) = g(v, v) = 1. Now we can di- erties of geodesic curves. In this and the following sec-
rectly compute H g(n, n v). Since v is normalized and tions we shall obtain some necessary geometric results about
geodesic, we have g(v, x v) = 0 = g(x, v v) for all x. So the geodesics.
only surviving term in g(n, n v) is proportional to g(u, u v), Let u be a tangent vector field to a congruence which con-
sists of geodesics emitted from a single point p; thus p is a
g(u, u v) u focal point for the congruence u. There is a possibility that
H = g(n, n u) = = 2 .
2 1 1 some of the lines of u will focus again at another focal point
This is the desired formula. In the instantaneous inertial q. The condition for this possibility is easy to derive using the
frame where the observer is at rest, we have u = {1, 0, 0, 0} geodesic deviation equation (see Sec. 1.9.5).
and thus v = {, ~vrel }, where Namely, consider a connecting vector c for u. The vector c
satisfies the geodesic deviation equation,
1
( ) q
2 ( ) u u c R(u, c)u = 0. (4.2)
1 ~vrel
We now pick a single geodesic ( ) such that = 0 at p, and
and is the observers proper time. In this way, the observer
solve Eq. (4.2) only along that geodesic for > 0. Since we
can compute ( ) and u .
are solving a second-order equation, we need to supply initial
Replacing u by , we can simplify the formula for H to
values c and u c at = 0. At p, we must have c = 0 since p
1 ( ) + 1 is a focal point for u. We may pick the value u c( = 0) arbi-
H( ) = ln . trarily, and obtain c( ) from the deviation equation. Suppose
2 ( ) 1

91
4 Global techniques

that there exists a particular solution c( ) such that c(1 ) = 0 may possess different conjugate points to p, and also there
at some other point q (1 ) on the curve. It means that an may exist several conjugate points q, q , ..., to p along a sin-
infinitesimally close, neighbor geodesic4 from the congruence gle curve . Thus the property of q to be conjugate to p is a
u intersects at the point q. Moreover, note that the equa- property of the local geometry around a selected geodesic .
tion (4.2) is linear in c; if c( ) is a solution, so is c( ) for Note also that the focusing of geodesics does not yet indicate
any constant . It follows that there exists a one-parametric set any singularities in the spacetime; after all, we can select con-
of infinitesimally close neighbor geodesics corresponding to gruences that focus at any point by simply emitting geodesics
connecting vectors c, and each of those geodesics intersects from that point.
at q. It is clear that q is a second focal point for the congru- The focusing theorem (see Sec. 2.4.2) shows that divu will
ence u. In that case, the point q is called conjugate to p along reach infinity within a finite proper time if the congruence ini-
the line ( ). tially has negative divergence and if the strong energy con-
dition holds. Hence, conjugate points must exist along each
Remark: A vector field c is called a Jacobi field for a curve curve of the congruence, under the same assumptions.
if it satisfies Eq. (4.2) along with u . It is clear that
Jacobi fields are essentially connecting vectors for a family
of geodesics around . One uses the concept of Jacobi fields 4.1.4 Second variation of proper length
when one would like to examine the focusing properties of a In Sec. 1.9.3 we have seen that a geodesic curve extremizes the
single curve ( ) and to avoid the need to introduce a whole proper length among all curves connecting two fixed points.
congruence of geodesics around and to talk about intersec- For instance, in the Minkowski space, timelike geodesics max-
tions of infinitesimally close geodesics. For instance, one imize the proper time; note that spacelike geodesics do not
can define conjugate points without involving a congruence, minimize the proper distance, since the metric g has the sig-
in the following way: Two points p = (1 ), q = (2 ) on a nature (+ ). However, in more general spacetimes the
geodesic curve ( ) are conjugate if there exists a Jacobi field proper length of a geodesic does not always deliver the maxi-
c( ), not identically zero, which vanishes at 1 and 2 . mum proper time. We shall now see that the maximization of
It is intuitively obvious that the divergence of the field u proper length is closely related to the existence of conjugate
tends to infinity at a focal point. To show this more formally, points for neighbor geodesics.
consider the defining property of the divergence divu, The infinitesimal change of the proper length of a curve
under an infinitesimal variation is described by Eq. (1.77) of
Lu V = V = (divu)V, Sec. 1.9.3. For a timelike curve ( ) with a tangent vector
v = normalized by g(v, v) = 1, perturbed by the flow of
where V is a 3-volume spanned by three connecting vectors
a vector field t with a parameter along the flow, we have
transverse to u. As we cross a focal point, at least one of the
Z 2
connecting vectors c that span V will vanish. Thus V passes d
L[; t] = g(t, v v)d.
through 0 at a focal point; in the generic case where the func- d 1
tion V ( ) has only simple zeros, V will also change sign after
a focal point. Note that V u V is always finite since The derivative dL[; t]/d vanishes when is a geodesic, but
V = (u, c1 , c2 , c3 ) and every connecting vector cj satisfies a this is not sufficient to have an actual maximum 2
of the func-
2
linear equation and thus cannot become infinite. Therefore, tional L. The proper length is maximized if d L/d < 0. To
any point where divu becomes infinite must be a focal point verify this condition, we again consider a vector field v cre-
where V = 0. Also, we can rewrite the above equation as ated by transporting the tangent vector
along the flow lines
divu = ln V . If divu remains finite, ln V cannot go to of t. Then
Z 2
within a finite time . It follows that divu at a focal d2
L[; t] = t g(t, v v)d
point. (In the generic case, we will have divu before d 2 1
a focal point and divu + after a focal point.) Thus we Z 2 Z 2
have proved that the condition divu is necessary and = g(t t, v v)d g(t, t v v)d.
1 1
sufficient for the existence of a focal point of a field u.
Finally, let us develop a qualitative picture of why conju- (Note that v v 6= 0 away from the curve .) Since [v, t] = 0,
gate points occur. We have seen that gravity generally acts the term t v v is expressed through the Riemann tensor as
as an attractive force that makes geodesics converge. There-
g(t, t v v) = R(t, v, v, t) + g(t, v v t).
fore, a bunch of geodesics emitted from a point p will usually
tend to refocus at a later point q. The refocusing may take a Finally, on the geodesic curve we have v v = 0 and thus
while, but once the divergence divu becomes negative along Z 2
one line, refocusing is inevitable within a finite time. (Note d2
L[; t] = (R(v, t, v, t) g(t, v v t)) d.
that the focusing theorem applies to hypersurface orthogonal d 2 1
geodesics, and the congruence emitted from p is hypersurface The integral expression in the right-hand side above must
orthogonal by Statement 1 from Sec. 2.4.2. Moreover, this con- be nonpositive for all vector fields t (and for all ) if the curve
gruence is integrable by Statement 1 from Sec. 2.2.3.) Focal maximizes the proper length under infinitesimal variations.
points can occur along a caustic surface or caustic line, so we We cannot simplify d2 L/d 2 any further, but we note that the
select one geodesic on which focusing occurs at one point expression in the right-hand side is similar to the geodesic de-
q. Then q is the point conjugate to p along ; other geodesics the linear operator in-
viation equation (4.2). Denoting by G
4 Generally, the focal intersection exists only among infinitesimally close volved in that equation, we can rewrite d2 L/d 2 as
geodesics! This is so because the connecting vector c will generally fail Z 2
to satisfy the geodesic deviation equation for a neighbor geodesic at a fi- d2 t)d,
L[; t] = g(t, Gt
nite distance from . d 2 1

92
4.1 Singularity theorems

is
where the precise definition of G three-dimensional transformations acting in the subspace v .
If there are no conjugate points, the transformation A is non-
R(v, t)v,
Gt t t v v t. degenerate for all > 1 . In that case, we will now show
that d2 L[; t]/d 2 0 for all vector functions t( ) that sat-
The vector t will be a Jacobi field if t Gt = 0. Thus, d2 L/d 2
isfy t(1 ) = t(2 ) = 0. This will be sufficient to show that
will vanish if t is a Jacobi field. In the present case, the field L[] has a maximum. Since A1 is well-defined, we can set
t plays the role of a perturbation field and the curve ( ) z( ) A1 ( )t( ); the vector z( ) will then belong to the
is fixed at endpoints (1 ) and (2 ). Thus an admissible subspace v . We can express d2 L[; t]/d 2 through z( ) as
field t must vanish at endpoints. But a Jacobi field that van- follows,
ishes at the endpoints can exist only if both endpoints are
Z 2
conjugate points with respect to the curve . The functional d2 G Az
(Az))d

d2 L[; t]/d 2 is a quadratic functional of the vector field t. L[; t] = g(Az,
d 2 1
Heuristically, one expects that a negative-definite quadratic Z 2 h i
functional will not vanish at nonzero values of its argument. = 2A z)
g(Az, A
g(Az, z) d
Hence, we expect that the proper length cannot be maximized
Z 12 h i
if there exists a Jacobi field that vanishes at both the endpoints, = Az)
g(Az, A z)
g(Az, + g(Az,
Az d,
or (as will be shown below) if one of the endpoints is conju- 1
gate to a point within the curve. Thus we found a crucial link (4.3)
between the existence of conjugate points and the maximiza-
tion of proper length. This link is formalized in the following where in the last line we have canceled the total derivative
statement. Az).
g(Az, Now let us rewrite the first two terms in the last
line in Eq. (4.3) through the adjoint operator AT ,
5
Statement 1: A geodesic curve maximizes the proper
length between points (1 ) and (2 ) with respect to in- Az)
g(Az, g(Az, A z) z).
= g(z, (AT A AT A)
finitesimal variations iff has no conjugate points to (1 )
within the interval [1 , 2 ]. Note that g(Gx, y) = R(v, x, v, y), so the known symmetry of
Sketch of proof: If there are no conjugate points then there the Riemann tensor, R(a, b, c, d) = R(c, d, a, b), means that G
is a one-to-one map between a Jacobi field c at a point ( ) and is self-adjoint, G T = G. Thus, the operator AT ( ) satisfies the
the initial value of the derivative u c at = 0. This map is equation
specified by a nondegenerate transformation A( ). The func-
tional d2 L[; t]/d 2 evaluated on an arbitrary vector t can be
AT AT G = 0, AT (1 ) = 0, A T (1 ) = 1.
1
expressed through the auxiliary vector field z A t, and
shown to be a negative-definite functional of z by an explicit
The operator AT A AT A is analogous to a Wronskian and is
calculation that uses the integrability of the geodesic congru-
time-independent because
ence emitted at (1 ).
Conversely, if there exists a focal point ( ), where is  
T A AT A = AT G A AT G
A = 0.
between 1 and 2 , then there exists a Jacobi field c, not iden- A
tically zero, and such that c(1 ) = c( ) = 0. To show that the
proper length is not maximized, it is sufficient to find even a Since the operator A T A AT A is equal to zero at = 1 due
single example of a vector field t such that d2 L[; t]/d 2 > 0. to the initial condition A( 1 ) = 0, we have A T A AT A = 0
If we define the field t0 along the curve ( ) by t0 ( ) c( ) for all . Thus
for 1 < < and t0 ( ) 0 for < < 2 , then we will
have d2 L[; t0 ]/d 2 = 0 except at = where t0 ( ) has a Az)
g(Az, g(Az, A z)
0
corner. Now, one can deform this field t0 ( ) by an infinites-
imal amount and construct a smooth field t( ) such that still and the expression (4.3) is simplified to
d2 L[; t]/d 2 > 0. The construction of t( ) goes heuristically
Z 2
as follows: The field t( ) is almost equal to c( ) up to = , d2
which produces an almost zero value of the integral, and then L[; t] = g(Az,
Az)d.

d 2 1
t( ) can be chosen to be almost zero for < < 2 and such
that the resulting integral is positive. (The curve correspond- Since the vector Az belongs to the subspace v , it is spacelike,
ing to t0 ( ) has a corner that can be straightened out.) hence the above expression is always nonpositive.
More detailed proof: First we prove that there is a maxi- Now we consider the second case where a conjugate point
mum of L[] if no conjugate points are present on . For is present. We will then show that L[] does not have a maxi-
brevity, we denote (covariant) derivatives along the curve mum. The presence of a conjugate point means that there ex-
by an overdot. The time-dependent transformation A( ) ists a Jacobi field c( ), not everywhere zero, satisfying c = Gc,
that maps initial values of c( 1 ) to final values c( ) is the c(1 ) = 0, c( ) = 0 for 1 < < 2 . We shall now demon-
(transformation-valued) solution of the equation strate that d2 L[; t]/d 2 > 0 with a specific choice of the per-
turbation vector t. The vector t will be defined by smoothing
1 ) =
A G 1 ) = 0, A(
A = 0, A( 1. the corner of c( ) near = , with a small additional per-
turbation:
Note that Gv = 0 and g(Gx, v) = 0 for all x, in other words, t( ) s( )c( ) + q( )a,
G is transverse to the vector v. Thus G and A are effectively
where s( ) is a smooth step-like function such that s( <
5 [12], Proposition 4.5.8. ) 1, while s( > ) 0; > 0 is a small constant,

93
4 Global techniques

q( ) is a smooth function that vanishes at = 1,2 and is pos- which will be positive for small enough . This concludes the
itive for 1 < < 2 , for example, q( ) = ( 1 )(2 ); proof of Statement 1.
and a is a constant spacelike vector (to be determined later). Thus, in the presence of (at least one) conjugate point on a
The function s( ) is chosen to be nonconstant only within a curve , the maximum proper length is not achieved by . For
very narrow interval around = . The idea is to have example, a body on a circular orbit in a central gravitational
very small and s( ) very close to the Heaviside step function, field does not maximize the proper time over one period after
so that t( ) is smooth but t( ) c( ) for 1 < < and it comes back to an initial point p, since the elapsed proper
t( ) 0 for < < 2 . By construction, t exactly vanishes time is smaller than that of a body at rest at p. In fact, the
at the endpoints = 1,2 . With the above definition of t( ), proper time is maximized by another geodesic, corresponding
we can directly compute the relevant quantities, neglecting to a body thrown away from the center of gravity from p such
unimportant terms of order 2 : that it comes back to p at the right time.
A related situation is a congruence of timelike geodesics
t = q( )Ga
Gt 2s c sc,
normal to a given spacelike 3-surface S. Then this congruence
Z 2 Z 2
d2 t)d =
is hypersurface orthogonal everywhere (not just at S), by an
2
L[; t] = g(t, Gt g(sc, 2sc + sc)d argument analogous to that of Statement 1 in Sec. 2.4.2. For a
d 1 1
Z 2 point p outside of S, there may or may not be a geodesic that
2 q( )g(a, 2s c + sc)d + O(2 ), maximizes the proper distance between p and S. If a maximiz-
1 ing geodesic exists, it must be one of the congruence normal
where in the last line we have used the self-adjointness prop- to S. However, maximization is impossible if the congruence
erty (derived using integration by parts), has focal points between p and S. As before, a definition can
be given in terms of Jacobi fields: A point q on a geodesic
Z 2 Z 2
normal to S is called conjugate to S with respect to if there
g(x, Gy y )d = x
g(Gx , y)d,
1 1
exists a Jacobi field c along , such that c = 0 at q but c 6= 0 on
S. Similarly to Statement 1 above, one can prove the follow-
which holds for vectors x, y that vanish at endpoints = 1,2 . ing.
Since s and s are nonzero only in a narrow interval around
= , it is sufficient to approximate Statement 2: A timelike curve maximizes the proper dis-
tance between a point p and a spacelike 3-surface S if is a
c( ) ( ) b, b c( ) geodesic normal to S which contains no points conjugate to S
t( ) ( ) s( )b + q( )a, before the point p.
The focusing theorem now shows that conjugate points
where b is a known spacelike vector. (With a suitable choice must exist under certain conditions.
of the function s( ), the error of this approximation will be of
order 2 .) Therefore, Statement 3: Suppose that S is a spacelike 3-surface, u is a
normalized timelike vector field orthogonal to S, such that
Z 2
d2 L[t] 2
divu 0 < 0 at some point p S, and the strong en-
2
= O( ) g(b, b) ( ) (2ss + ( )s s) d ergy condition holds, Ric(x, x) 0 for all timelike x. Then
d
Z 21 the geodesic normal to S at the point p contains a conjugate
2q( )g(a, b) (2s + ( ) s) d. point to S within proper time 3/0 from p.
1
Under the conditions of Statement 3, the geodesic nor-
The step-like function s can be chosen so that the first inte- mal to S at p that stretches up to a point q which is further
gral in the above equation is arbitrarily small, e.g. of order . 2 than 3/0 from p (as measured along ), cannot maximize the
Indeed, integration by parts yields proper length between q and S. Our conclusion is the follow-
ing.
Z 2 Z 2
( ) (2ss + ( )s s) d = ( )2 s 2 d ; Statement 4: Suppose the conditions of Statement 3 hold,
1 1
and additionally divu 0 < 0 everywhere on the surface
heuristically, s( ) ( ) and thus ( )s 0. These S. If a point q can be connected to S by a timelike curve of
statements can be made mathematically precise, for instance, proper length 3/0 then there is no timelike curve of maxi-
by using an explicit formula for s( ). The remaining integral mum length connecting q and S.
is easily evaluated,
Proof: If a timelike curve of maximal proper length exists
Z 2 Z 2
and reaches S at a point p then must be a timelike geodesic
(2s + ( ) s) d = [s + (( )s)]
d = 1. that is orthogonal to S at p (or else we can deform or shift
1 1
the point p and increase the proper length). However, if the
Finally, we note that b 6= 0, since by assumption c is not ev- proper length of is larger than 3/0 , there exists a point on
erywhere zero and b = c 6= 0 at the point = where c = 0. conjugate to p, and thus does not maximize the proper
Also, q( ) > 0. So we can choose the vector a such that length.
An example of this situation is a half-hyperboloid t2 r2 =
2q( )g(a, b) = 1. 1, t < 0, in the Minkowski spacetime. This spacelike 3-surface
is orthogonal to every straight line (i.e. to every geodesic)
With this choice, we obtain
passing through the origin. Thus, the origin is the focal point
d2 for all the geodesics emitted orthogonally from S. The half-
2
L[; t] = + O(2 ), lightcone, t2 r2 = 0, t < 0, bounds the domain where conju-
d

94
4.1 Singularity theorems

gate points do not arise. For any point q such that t2 r2 > 0, An intuitive picture of a gravitational collapse is that of a
t 0, and for any point in the upper half-spacetime, t > 0, cloud of particles that keep moving closer to each other. The
one can find timelike curves of arbitrarily large proper length mathematical expression of this property is that the congru-
leading from q to some point on S. Thus, the maximal-length ence of the particle worldlines has a negative divergence. Ac-
curve connecting some point q to S must have proper length cording to the focusing theorem, a consequence of negative
< 1 between q and S. divergence is a focusing of the geodesics. As a result, the
geodesics fail to maximize proper length. This is the technical
Calculation: Compute the divergence of the future-directed basis of the singularity theorems. Of course, the geodesic fo-
normal vector field u to the half-hyperboloid t2 x2 y 2 z 2 = cusing by itself does not signal any singularities; we need ad-
1, t < 0, in the Minkowski spacetime, and show that this 3- ditional conditions on the initial velocities of the particles to
surface satisfies the conditions of Statement 4 as well as its conclude that singularities will develop. In the first singular-
conclusion. ity theorem, these conditions are formulated as requirements
Solution: The hyperboloid is specified by the equation on a spacelike 3-surface to which the particle worldlines are
g(x, x) = 1, where x (t, x, y, z) is the position vector. The orthogonal.
future-directed normal vector field is Recall that a Cauchy surface for a domain is a 3-surface
x such that any timelike curve within the domain intersects
u = p ,
g(x, x) exactly once. The intuitive meaning of a Cauchy surface
is that a partial differential equation has a unique solution
and the Levi-Civita connection = , so we have a x = a throughout the domain if appropriate initial data are speci-
and thus (on the surface) fied on . The theorem states that a Cauchy surface having an
x everywhere negative divergence leads to a singularity.
a u = a p = a + xg(a, x).
g(x, x) Statement 1: If is a spacelike Cauchy surface for the en-
tire spacetime, with a normal vector field u whose divergence
The divergence is found as
is everywhere negative-bounded, divu 0 < 0, and if the
strong energy condition holds, then every timelike geodesic
divu = Tr(a,b) g(a u, b)
normal to has finite proper length no greater than 3/0 .
= Tr(a,b) [g(a, x)g(b, x) g(a, b)] = 1 4 = 3.
Sketch of a proof: Consider a point p to the future of . Ev-
So we may set 0 = 3 and apply Statement 4. The proper ery timelike geodesic leading from p to must intersect ex-
length to the focus is 3/0 = 1, in agreement with the geomet- actly once. The points of intersection form an open set I(; p)
ric result. bounded by intersection points with null geodesics from p to
Finally, we state an analogous theorem for null geodesics. . Timelike geodesics intersecting near the boundary of
The differences between the timelike and the null case are I(; p) have very small proper length, so we may consider
the subset p; L) I(; p) of intersection points for curves
I(;
purely technical: 1) Jacobi fields for null geodesics n are effec-
tively two-dimensional because connecting vectors differing with proper length L for some lower bound L. We can se-
by a multiple of n are equivalent. 2) Null geodesics contain- lect L such that I(; p; L) 6= , and then the subset I(;
p; L)
ing conjugate points do not maximize the proper length be- is compact. (These intuitively clear topological statements
tween endpoints, thus the maximal proper length is positive, will be given a formal proof in Lemma 1 below.) Therefore,
and there is a timelike curve connecting the endpoints. 3) One the proper time is maximized by some geodesic curve leading
considers a spacelike 2-surface to which null geodesics are or- from p to . However, by Statement 4 in the previous sec-
thogonal (note that there are two null directions orthogonal tion, we know that there are no curves maximizing the proper
to a spacelike 2-surface, but no null vectors orthogonal to a length and stretching further than 3/0 . Therefore the space-
spacelike 3-surface). 4) One uses the null energy condition in- time cannot include points at a proper length greater than 3/0
stead of the strong energy condition. 5) The focusing theorem from . In other words, every timelike geodesic emitted from
for null geodesics predicts focusing within parameter interval must be future incomplete.
2/0 , instead of 3/0 in the timelike case.
Lemma 1: Let I(; p; L) be the set of intersection points of
Statement 5: Suppose that S is a spacelike 2-surface, n is a timelike geodesics leading from a point p to a Cauchy surface
null vector field orthogonal to S, such that divn 0 < 0 at , whose proper length is L, where L > 0 is a given con-
a point p S, and the null energy condition holds, Ric(x, x) stant. Then I(; p; L) is a compact set.
0 for all null x. Then the null geodesic normal to S at p, with Proof: Suppose that the set I(; p; L) is not compact. Then
initial tangent vector n, contains a conjugate point to S within there exists a sequence {j } of geodesics, whose proper
the affine parameter interval 2/0 . If the geodesic can be lengths are L, such that the corresponding intersection
extended beyond the conjugate point to some point q then points q1 , q2 , ..., on constitute a sequence without accumula-
does not maximize the proper length between p and q, and tion points within I(; p; L). However, the initial tangent vec-
thus q can be connected to p by a timelike curve. tors j at p are within the past lightcone within Tp M, which is
a compactset if we also include the null directions. Hence, the
4.1.5 Singularity in collapsing or expanding sequence j of tangent vectors must have an accumulation
universe point within the past lightcone. We can emit a geodesic
( ) from p with the initial tangent vector . If ( )
We shall now derive some basic singularity theorems. Both intersects at a point q then q must be an accumulation
theorems below are due to S. W. Hawking. (Stronger theorems point for {qj }. Hence the proper length of cannot be less
exist but require much more technical work.) than L and so q I(; p; L). This yields a contradiction with

95
4 Global techniques

the assumption that {qj } has no accumulation points within to themselves infinitely many times; informally, we may call
I(; p; L). Thus ( ) cannot intersect . Since every time- such curves almost closed curves. Since timelike geodesics
like curve intersects , we see that the vector must be null can be found arbitrarily close to null geodesics, the presence
and ( ) is a null geodesic. Since ( ) does not intersect of almost closed null geodesics is an undesirable feature of
, for any there exists a neighborhood of ( ) that does the spacetime, similar to a causality violation or the presence
not intersect . It is then possible to find a timelike curve of a time machine. A spacetime is called strongly causal if it
(which is not necessarily a geodesic, not necessarily starting does not admit timelike curves that repeatedly come arbitrar-
at p) that runs along ( ) for all and always remains away ily close to some points. Therefore, C() cannot be compact in
from . The existence of an inextendible timelike curve that a strongly causal spacetime. Now suppose C() is not com-
never crosses contradicts the definition of a Cauchy surface. pact; then C() contains a sequence of points q1 , q2 , ..., which
This concludes the proof. does not have an accumulation point within C(). We may
An interpretation of the first theorem is the following: If choose a sequence of points { qj } arbitrarily close to {qj } but
we see a universe that is everywhere contracting (everywhere within D(). Then, every point qj lies on at least one time-
along a single Cauchy surface), we should expect to hit a sin- like curve intersecting . Suppose that L is the length of that
gularity within a finite time. By inverting the direction of curve; then the set of all timelike curves with length L is
time, we also come to the conclusion that a Cauchy surface compact, so one of these curves has the largest proper length.
with everywhere positive divergence contains a singularity in So, for each point qj we have a timelike geodesic j ( ) that has
its past. Thus, if our universe is everywhere expanding, and the maximal proper length among all the timelike curves con-
if the strong or the null energy condition is satisfied, then the necting qj with . (We have already shown that each of these
universe must have begun at a singularity at a finite time in curves has proper length max < 3/0 , assuming that j (0) is
the past. on .) Let pj be the point where the curve j intersects , and
let uj j (0) be the initial tangent vector. The intersection
Self-test question: The half-hyperboloid t2 x2 y 2 z 2 = 1, points p1 , p2 , ..., and the corresponding initial tangent vectors
t < 0, in the Minkowski spacetime, is an infinite spacelike sur- u1 , u2 , ..., are sequences on a compact set and thus must
face whose normal vector x has an everywhere negative di- accumulate at some point p and vector u . Now, we can
vergence. Does it follow from Statement 1 that the Minkowski show that the geodesic ( ) emitted from at p with the
spacetime must contain a singularity to the future of the half- initial tangent vector u cannot be extended past the proper
hyperboloid? time 3/0 . If ( ) were well-defined for > 3/0 then we
Answer: No. (The half-hyperboloid is not a Cauchy surface.) would have a narrow tube-shaped neighborhood stretching
along ( ) for 0 < 3/0 , such that infinitely many
geodesics j are inside that tube. Thus the point sequences
4.1.6 Singularity in a closed universe {qj } and therefore {qj } would accumulate somewhere within
The assumption that a given 3-surface is a Cauchy surface the (compact) neighborhood of . This contradicts the as-
for the spacetime is quite strong, since it is a global relation- sumption that the sequence {qj } has no accumulation points.
ship between and the entire spacetime, and not merely a Thus we have proved the following theorem.
local property of the surface . One can replace this assump-
tion by the requirement that be compact (this will apply to Statement 2: If a spacetime contains a compact spacelike
closed universes). The result will be the incompleteness of surface (without boundary) having a normal vector u such
some geodesics emitted from , which may fill only a part of that divu < 0 everywhere on (thus there exists 0 < 0 such
the entire spacetime. Here is how this conclusion can be de- that divu 0 ), and if the strong energy condition holds, and
rived. if the spacetime is strongly causal6 (contains no almost-closed
We restrict our attention to the part of the spacetime con- null geodesics), then there is at least one incomplete timelike
sisting of points p for which there exists a timelike curve geodesic emitted from .
between and p. This part of the spacetime is called the do-
A universe can contain a compact, edgeless spacelike 3-
main of dependence D() of the surface . We would like to
surface only if the universe is closed. In open universes, every
prove that D() contains some incomplete geodesics if sat-
spacelike 3-surface without boundary must stretch to infinity
isfies the conditions of Statement 1 above, except for being a
and so cannot be compact. The interpretation of the second
Cauchy surface. It follows from the proof of Statement 1 that,
singularity theorem is that a closed universe must contain a
within D(), a geodesic connecting to a point p at proper
singularity if there exists an everywhere contracting spacelike
length > 3/0 does not maximize the proper length. There-
section.
fore, any geodesic that does maximize the proper length must
be shorter than 3/0 . Note that this theorem demonstrates only the existence of
Let us denote by C() D() the boundary of the do- a single incomplete timelike geodesic, whereas Statement 1
main of dependence; it is a 3-surface called the Cauchy hori- of the previous section showed (under stronger assumptions)
zon. It can be seen that the Cauchy horizon C() is locally that every timelike geodesic is incomplete. The presence of in-
a null surface (although it may also contain some corners). complete geodesics emitted from means that the spacetime
The fact that a boundary D of a domain of dependence D is contains a singularity somewhere to the future of .
a null 3-surface will be derived below (see the proof of State- Analogous statements can be derived for null geodesics us-
ment 1 in Sec. 4.2). There are now two interesting possibili- ing the null energy condition.
ties: either C() is compact, or it is not compact. Suppose first
that C() is a compact set. Note that a null surface is gen-
erated by null geodesics (see Sec. 2.2.8). Then the null gen- 6 S.W. Hawking also proved a version of this theorem without the strong
erators of C() must wind around it, coming arbitrarily near causality condition. Reference?

96
4.1 Singularity theorems

4.1.7 Singularity in gravitational collapse compact set T [0, 201 ], therefore F is itself compact. Since
M is time-orientable, there exists a smooth vector field v such
Qualitatively, gravitational collapse occurs when a mass m is that g(v, v) = 1. By definition of the Cauchy surface, every
concentrated within a region smaller than the Schwarzschild flow line of v intersects C exactly once. Also, a flow line of
radius Rs 2m. The collapse results in a singularity, regard- v can intersect the 3-surface F at most once; this follows
less of the composition of the collapsing matter. from the definition of F as the boundary of the set F(T ).
In this section we shall consider a singularity theorem due Thus, the flow lines of v define a map from F to C: a point
to Penrose, showing that the formation of a singularity is in-
p F is mapped into the intersection of the corresponding
evitable once the normally outgoing lightrays start to con- timelike curve with C. This map is continuous and one-to-one
verge. Let us investigate this condition in more detail. between F and its image; however, F is compact while C is
There exist precisely two null directions orthogonal to a
not, which is an impossible situation. (The image of F must
2-dimensional subspace spanned by two spacelike vectors. be a compact subset of C, so it must have a nonempty bound-
Thus, a spacelike 2-surface can emit two congruences of ligh- ary within C which will be mapped to a boundary of F, and
trays orthogonal to it. Normally, one congruence will be con- yet the boundary of F is empty since = 0.) Therefore
verging (the ingoing lightrays) and the other diverging (the we must reject the assumption that all null geodesics can be
outgoing lightrays). A 2-surface T is called a trapped sur-
extended beyond 201 ; in other words, there must exist an
face if both congruences of lightrays emitted orthogonally to incomplete null geodesic to the future of T .
T into the future have a negative divergence.
Penroses argument begins by considering a spherically Calculation: Using the Schwarzschild metric (1.36), com-
symmetric distribution of matter. Suppose at first that a mass pute the affine tangent vector n for radial null geodesics emit-
m is concentrated within a region of radius r0 < 2m in a ted outwards from a sphere of radius r. Show that the di-
spherically symmetric fashion. Then the metric outside of the vergence of n is r1 for r > 2m, zero for r = 2m, and
matter distribution is Schwarzschild, and it is easy to see that r1 for r < 2m.
at points r such that r0 < r < 2m, both lightcones emitted Hint: In the Schwarzschild metric, a radial null geodesic
into the future direction are converging (have negative diver- has an affine tangent vector of the form n = 1 t + 2 r ,
gence). Thus a sphere of radius r1 is a compact, spacelike, where 1,2 are some functions of r only. Lightrays emitted
trapped 2-surface for r0 < r1 < 2m. We shall shortly prove outwards at r = 2m have the tangent vector in the direc-
that the existence of such a trapped 2-surface indicates the tion t . For r < 2m, the radial coordinate has the meaning of
presence of a singularity in the future. Now, if the matter time, and the future is the direction of decreasing values of
distribution is slightly nonspherical, the sphere will remain r (because, e.g., the proper time of a timelike observer falling
a trapped surface because the divergence of the lightrays will into the black hole will increase with decreasing r).
be only slightly perturbed and will not become positive. Thus Solution: We begin with the case r = 2m. The Schwarzschild
a singularity arises even for asymmetric configurations of col- metric is ill-defined at r = 2m, so we shall use a geometric ar-
lapsing matter. gument instead of explicit calculations: The null vector t is
The precise formulation of the theorem is the following. divergence-free because the cross-section area of the null sur-
face r = 2m is 4r2 and thus remains constant along t (see
Statement 1:7 If a time-orientable spacetime manifold M
Sec. 2.1.2 and the example in Sec. 2.2.5). Now consider the
has a non-compact Cauchy surface C, if the null energy con-
case r 6= 2m. To compute the divergence of n, it is conve-
dition holds to the future of C, and if there exists a compact,
nient to use a basis {l, n, e , e }, where l and n are null and
spacelike, trapped 2-surface T , then there exists at least one
g(l, n) = 1, while the spacelike basis vectors are e = r1 ,
incomplete null geodesic to the future of T .
e = (r sin )1 , in spherical coordinates. The vector n
Proof: Consider the future subdomain F(T ) of the surface should be an outward radial geodesic while l points along
T ; this subdomain consists of all the points p that can be con- the inward lightray (but is not necessarily an affine tangent
nected to T by some timelike curve. The proof of the theorem vector to it). After we compute the vectors n and l, it will
is based on studying the properties of the boundary F(T ) follow that
of the subdomain F(T ). By considering geodesics near T , it
is easy to see that the 3-surface F starts from T as a null divn = Tr(x,y) g(x n, y)
surface made by null geodesics emitted orthogonally from T . = g(n n, l) + g(l n, n) g(e n, e ) g(e n, e )
Since T is compact, the everywhere negative divergence of = g([n,e ], e ) + g([n,e ], e ).
null geodesics emitted from T has an upper bound 0 < 0.
Thus, by the focussing theorem, every null geodesic will have Now let us determine n and l. Denote by
a conjugate point to T within a finite interval of the affine  1/2
parameter (not larger than 201 ). Let us assume that every 2m
z(r) 1
null geodesic to the future of T is complete; in particular, ex- r
tendable beyond the parameter interval 201 . We know that
a geodesic does not maximize proper length beyond a conju- the redshift factor for the Schwarzschild spacetime (see
gate point; therefore, points p on null geodesics beyond 201 Sec. 3.1.2). Assuming that
can be connected with T by a timelike geodesic with positive
proper length. Thus, points on null geodesics beyond 201 n = 1 t + 2 r , l = 1 t + 2 r ,
are inside F(T ), and so the boundary F consists of finite seg-
ments of null geodesics, each segment not longer than 201 . where 1,2 and 1,2 are unknown scalar functions, we use the
Since T is compact, the set F can be viewed as a subset of a conditions
7 [12], 8.2, Theorem 1. See also the paper [24]. g(l, l) = g(n, n) = 0, g(l, n) = 1,

97
4 Global techniques

and find curves can escape as well). Thus, the horizon H can also be
thought of as the boundary of the past domain of dependence
1 2 1
z 2 21 = 2
2 , , z 2 12 = 2 22 , of the future null infinity I + . Let us study the properties of
z z H in more detail.
2 1
z 1 1 2 2 2 = 1; The first important property of an event horizon is stated
z
by a theorem due to R. Penrose.
thus 1,2 and 1,2 may be chosen as functions only of r. Fur-
ther, we have Statement 1: An event horizon is a null surface whose gen-
erators are null geodesics without future endpoints, except if
[l, n] = 2 (1 t + 2 r ) 2 (1 t + 2 r ) , a generator hits a singularity. (Generators cannot leave H and
enter the interior of B.)
where the prime denotes derivatives with respect to r. Using
Proof: By construction, there cannot exist any future-
the condition
directed timelike curves crossing H from its interior B out-
0 = g(n n, l) = g(n, n l) = g(n, [l, n]), wards; also, for any point p 6 B there should exist at least
one timelike curve not crossing H. It follows from these re-
we find quirements that the 3-surface H cannot be locally timelike
(i.e. spanned by one timelike and two spacelike directions).
1
z 2 1 (1 2 2 1 ) 2 2 (2 2 2 2 ) = 0. Also, H cannot be spacelike, because a spacelike 3-surface sep-
z arates a past subdomain from a future subdomain, so the fu-
Thus the four equations determine the unknown functions ture of H would have to be within B, but then some points to
1,2 , 1,2 . A solution is (up to a constant) the past of H will be unable to emit signals that do not enter B.
Therefore, H must be a null surface, except for points where
1 1 2
 H has a corner so the tangent 3-space to H is undefined. A
n = 2 t r , l = t z r ,
z 2 null surface locally separates a past subdomain from a future
subdomain, and it is clear that B must be on the future side
where corresponds to the sign of z 2 . Finally, we compute from H. We also know that null surface is generated by null
the commutators geodesics (see Sec. 2.2.8). Suppose that a null generator had
1 1 an endpoint p, and consider the causal structure of the tan-
[n, e ] = [r , e ] = [r , ] = e , gent space at p. We are assuming that p is not a singularity, so
r r
1 1 that can be continued past p. It is clear that cannot exit B;
[n, e ] = [r , ] = e , thus, since leaves H, it must enter the interior of B. There
r sin r
exists a point p 6 B infinitesimally displaced to the past of
and therefore , and there exists at least one null curve starting from p
2
divn = . which escapes to infinity (or else we would have no timelike
r
curves from p escaping to infinity, contradicting the assump-
This means that the divergence of an outward-pointing, tion p 6 B). Therefore, can never cross H. However, the
future-directed null congruence is positive for r > 2m and curve is arbitrarily close to , hence it enters B. This contra-
negative for r < 2m. diction proves that cannot have an endpoint.
The condition that singularities do not occur at the hori-
4.2 Hawkings area theorem zon is motivated as follows. We expect that the future history
of an asymptotically flat spacetime is completely predictable
We have seen in Sec. 2.2.5 that the event horizon of a from initial data on a Cauchy 3-surface C preceding the grav-
Schwarzschild black hole is a null surface. In more general itational collapse. Thus, any null curve escaping to infinity
situations, e.g. in the presence of several black holes that may must intersect C at one point. (This condition is called asymp-
move with respect to each other or even merge with each totic predictability with respect to C.) If a null generator of
other, the event horizon is still a null surface. In this section a horizon H stops at a singularity p, it means that the curve
we shall study the global properties of event horizons. cannot be continued past p. Since there exist null curves

Consider an asymptotically flat spacetime that contains 6 B infinitely close to and reaching infinity, it follows that
some points from which signals cannot escape to infinity. 8 there exists a (null and/or spacelike) curve that starts
Heuristically, the set B of all such points is interpreted as a at the singularity p and reaches infinity. To an observer at
black hole-like region. For a point p 6 B, there exists a infinity, this curve appears to be a past-directed curve that

timelike curve connecting p to infinity; this curve may be per- stops at p. Thus, the curve does not intersect C and the

turbed while remaining timelike, thus there exists a neighbor- asymptotic behavior at infinity in the direction of cannot
hood of p which is not in B. Thus the set B is topologically be predicted from initial data on C. Another way to express
closed. The event horizon H is defined as the boundary of this is to say that the singularity is directly visible to an ob-
B; since B is closed, we have H B. The horizon H is a 3- server at infinity. There is a (so far, not proved but strongly
surface that separates B from the domain of spacetime from supported) conjecture that any singularities that can occur in
which timelike curves can escape to infinity (and hence, null reasonable physical processes, such as gravitational collapse,
must be hidden behind event horizons and are thus invisi-
8 Note that the notion of escaping to infinity is well-defined if the space-
ble to observers at infinity. This statement is called the cosmic
time is asymptotically flat (a neighborhood of infinity is the place where
+ censorship conjecture (nature abhors naked singularities).
the spacetime becomes almost Minkowski, i.e. a neighborhood of I
on a conformal diagram). Generalizations of this notion exist for non- There are some artificial examples of spacetimes with naked
asymptotically flat cases, such as the de Sitter spacetime. singularities, but all such examples are pathological in one or

98
4.3 Holographic principle

another way. Therefore, the assumption of asymptotic pre- area A[H C2 ]. Therefore, A[H C2 ] A[H C1 ].
dictability appears reasonable. The area theorem has had a wide-reaching impact on all of
Let us now examine the area of the event horizon. To be fundamental physics. It has lead to the hypothesis of black
precise, we need to define what it means to consider an event hole entropy,
horizon H at a certain time. Let us select a global space- 1
S = A = 4m2 ,
like Cauchy 3-surface C in the entire spacetime. The surface 4
C may be interpreted as the surface of constant time. The and to the holographic principle. Let us consider a simple ap-
intersection B C of the black hole domain B with C will plication of the area theorem. In an asymptotically flat space-
be the black hole interior at a given time. (The intersection time containing two distant black holes with masses m1 and
BC may consist of several disconnected pieces, in which case m2 , there may be some process that lets the two black holes
we would say that there are several black holes at this time.) come closer together and eventually merge. Initially, the total
The boundary (B C) will be a spacelike 2-surface H C; area of the event horizon of this spacetime is 16(m21 + m22 ).
this is the total event horizon at a given time. We have seen After merging, the spacetime contains a larger black hole of
in Sec. 2.1.2 that the cross-section area A of a null 3-surface is mass M , with the area of the event horizon 16M 2 . The
defined independently of the observers reference frame, and area theorem says that 16M 2 16(m21 + m22 ) regardless
satisfies Eq. (2.6), of the details of the merger, and regardless of whether any
n A = (divn) A, extra matter or radiation was absorbed or emitted in the pro-
cess of merging. This inequality sets a fundamental bound on
if the null 3-surface has generators n. In Sec. 2.2.5 we showed the amount of energy that can be extracted from a black hole
that the Schwarzschild horizon is a null surface whose gen- merger.
erators are n = t . Since the total area of the Schwarzschild
horizon in the stationary reference frame is time-independent,
A = 4r2 , the field t has zero divergence. In a general, 4.3 Holographic principle
nonstationary spacetime containing black holes, the area of
a horizon H may change with time (i.e. with the choice of the See the review [5].
Cauchy surface C), and thus the divergence of the null gener-
ators n of H may be nonzero. However, it turns out that the In the lecture, I covered: general ideas behind the General-
divergence of n cannot be negative, so the area cannot dimin- ized Second Law of thermodynamics, Bekensteins black hole
ish with time. This is the essential content of the area theo- entropy, Susskinds collapse argument for spacelike bound,
rem due to S. W. Hawking. Let us first derive the property problems with spacelike bounds (closed universes, expanding
divn 0 and then prove the area theorem. universes, almost-null spacelike surfaces), Boussos covariant
entropy bound formulated using lightsheets, and Boussos
Statement 2: The divergence of null generators of an event projection theorem.
horizon H is everywhere nonnegative, if the null energy con- A rough map of ideas: Black holes must contain entropy
dition is satisfied and the spacetime is asymptotically pre- for the 2nd law to hold. Hawkings area theorem supports
dictable. the identification of entropy and the horizon area. Assuming
Proof: We use the notation from the proof of Statement 1. If that black hole is the highest-entropy state, we obtain a bound
divn < 0 at a point p of H then divn < 0 also within a neigh- on the entropy of matter within a (spacelike) 2-sphere. Since
borhood of p. By the focusing theorem, the null generators entropy counts the internal states, there is enough space
of H emitted from the neighborhood of p will focus within a on the boundary of a region to encode all the possible states
finite interval of the affine parameter. After focusing, some within the region; hence the term holography. Boussos co-
point q on a null geodesic emitted from p will be reachable variant entropy bound is a generalization of this bound that
from p by a timelike curve. A timelike curve starting at p must remains valid in more complicated situations, and yields the
be completely within B (or else this curve will escape to infin- spacelike bound as a particular case if appropriate conditions
ity, but p B). Thus, the point q will be inside B and not on the hold. Presently, the holographic principle has the status of a
horizon surface H any more. Thus, a null geodesic emitted strongly supported conjecture.
from p will have to enter B within a finite interval of the affine
parameter; the part of within H will then have an endpoint
where enters the interior of B. This contradicts Statement 1.

Statement 3: The total area of all the event horizons can-


not diminish with time, under the conditions of Statements 1
and 2. In other words: For a Cauchy surface C1 and another
Cauchy surface C2 to the future of C1 , the area of the intersec-
tion does not decrease, A[H C2 ] A[H C1 ].
Proof: Consider first the intersection H C1 . A portion
of H between C1 and C2 will be generated by null geodesics
emitted from H C1 to the future. Since each generator has
no endpoints on H, it must intersect C2 by the Cauchy sur-
face property of C2 . The intersection of the generators with
C2 will have an area not smaller than A[H C1 ], since these
geodesics have everywhere nonnegative divergence. There is
also a possibility that other disconnected pieces of H will have
formed between C1 and C2 ; this would only increase the total

99
5 Variational principle
Additional literature: [34]. where the summation over is implied. The boundary terms
vanish if
Under variational principles we understand considera- X L
tions based on the variation of functionals. The commonly j
j 0 sufficiently rapidly as |x| , |t| ,
used variational principle states that the equation of motion j ,

for the fields are derived by extremizing a certain functional


of the fields called the action or the Lagrangian. The Hamil- which is the usual assumption. Thus we jobtain the following
tonian formulation of field dynamics is then derived from the Euler-Lagrange equations for the fields ,
Lagrangian formulation. In this chapter we shall examine La-  
S [] L i , i L i , i
grangian and Hamiltonian formulations of general relativity = 0. (5.2)
and explore some related issues. j (x) j x j,

The formula (5.2) holds for all Lagrangians that depend


on fields and their first derivatives. If a Lagrangian for a
5.1 Lagrangian formulation field contains second-order derivatives such as ; , the cor-
responding Euler-Lagrange equations will generally contain
5.1.1 Classical field theory derivatives of third and fourth order.
If a classical field theory is to be compatible with general rel-
Classical field theory describes fields, i.e. functions on space- ativity, the Lagrangian must be generally covariant. In prac-
time M or sections of some vector bundle with the spacetime tice, this means that the Lagrangian should be an invariant
M as the base. The fields under consideration may be scalar combination of covariant derivatives of fields and the metric
fields 1 , 2 , ...; vector fields, spinor fields, and so on. Let us g . The integration must be performed with the volume ele-
denote all the fields collectively by j , where j is an index enu-
ment d4 x g, where g det (g ) is the determinant of the
merating each individual field and each of their components. covariant metric tensor. Normally, this factor is included into
Field theory is based on the variational principle (or Hamil- the Lagrangian L.
tons principle of stationary of action, or action principle for The simplest example of a field is a scalar field. The La-
short): The field equations are the conditions for having a lo- grangian density for a real-valued scalar field (x) is
cal extremum of the action functional,  
1
Z L (, ) = g , , V () g, (5.3)
S [] = d4 x L (i , i , ...) (5.1) 2
M
where V () is a potential that describes self-interaction of the
where the Lagrangian density L depends on the field j and field.
its (covariant) derivatives.
The action principle states that a physically realized config-
Covariant volume element
uration j (x) of a field must be an extremum of the action
functional. The variation of the action under a small change The expression d4 x does not give the correct volume ele-
j (x) of fields j (x) is ment if the coordinates x are not Cartesian or if the space-
Z time is curved. It is not difficult to show that the volume el-
X S  2 
S[] = 4
d x j
(x) + O j
. ement in anyp number of dimensions is given by the formula
j (x) n
M j dV = d x |g(x)|. If u1 , ..., un are some vectors in a Euclidean
space, let Gij ui uj be the n n matrix of their pairwise
This yields the Euler-Lagrange equation for the field, scalar products. Then the volume of the n-dimensional
p paral-
lelepiped spanned by the vectors ui is V = |det G|. We can
S
= 0. prove this standard statement by considering the matrix Uij
j (x) of components of the vectors ui in an orthonormal basis {ej },
i.e. X j
This method of deriving field equations is called a Lagrangian ui = Ui ej .
formulation of a field theory. j
The currently established field theories (electrodynamics,
The matrix U can be also understood as a linear transforma-
gravitation, weak and strong interactions) are described by
tion that maps the unit parallelepiped spanned by {ej } to a
Lagrangian densities which depend only on the fields and
j j parallelepiped spanned by {ui }. A standard definition of the
their first derivatives, L = L , , . For such Lagrangians,
determinant of a linear transformation is the volume of the
the first-order variation of the action is given by the formula
image of a unit parallelepiped after the transformation. There-
Z   ! fore, the volume V is V = det U . Then we observe that the ma-
X L i , i L i , i
S = 4
d x j
(x), trix Gij satisfies
p G = U T U , therefore det G = (det U )2 = V 2
j x j
M j , and V = |det G|.

101
5 Variational principle

In general relativity, the spacetime has a metric with signa- that the contribution of the boundary terms vanishes, we ob-
ture (+, , , ) and the determinant det g is always neg- tain Z Z
ative (except at singular points where it may be zero or infi-
dn x gA; B = dn x gA B, . (5.5)
nite). Therefore we change the sign of g and write the volume

element as d4 x g. This formula can be used to integrate by parts: We set A
, g , B , and find
Minimal coupling to gravity Z

Z

d4 x g, , g = d4 x g , g ; .
The action for a scalar field,
Z   ;
1 Note that the covariant derivative of the metric is zero, g =
S[] = d4 x g g , , V () , (5.4)
M 2 0, so we may lower or raise the indices under covariant
derivatives at will; for example, A; = A; .
explicitly depends on g and thus describes a field coupled
to gravity. This form of coupling is called the minimal cou-
pling to gravity; this is the minimal required interaction of a 5.1.2 Einstein-Hilbert action
field with gravitation which necessarily follows from the re-
Given a metric g on a manifold M, one can compute the Rie-
quirement of compatibility with general relativity.
mann tensor R and the Ricci tensor R . Einstein postu-
The Euler-Lagrange equation for a minimally coupled field
lated that the metric in a spacetime must be such that the Ricci
follows from the action (5.4),
tensor satisfies the Einstein equation
L L  V
= gg , , + = 0. 1
, R g R = 8GT ,
2
This equation can be rewritten in a manifestly covariant form where R R g and T is the energy-momentum tensor
as of matter. This equation can be derived from a variational
V
;
; + = 0. principle by extremizing an appropriately chosen action func-
tional.
This is similar to the Klein-Gordon equation, , , + m 2
= 0, We shall begin with the vacuum Einstein equation (i.e. the
except for the presence of covariant derivatives. Einstein equation in the absence of matter, T = 0),
A free (i.e. noninteracting) field has the potential
1
R g R = 0. (5.6)
1 2 2 2
V () = m .
2
This equation can be derived from the Einstein-Hilbert ac-
This is the simplest nontrivial potential; an additional linear tion, Z
term A can be removed by a field redefinition (x) = (x) + 1 1
SEH [g] R gd4 x, (5.7)
0 . The parameter m is the rest mass of the particles described 16G M 16G
by the field in quantum theory. The equation of motion for a where G is Newtons gravitational constant and g
free field is linear and thus describes waves that can cross p det g , so that gd4 x is the covariant volume element.

without distorting each other. In other words, the field has The integration in Eq. (5.7) is performed over the entire space-
no self-interaction. A field would have self-interaction if the time M. Such an integral might diverge when the manifold
potential V () were such that V 6= 0, so that the equation of M is noncompact and if the curvature R does not vanish at in-
motion would be nonlinear. finity, in which case one should limit the integration to a very
large but finite subdomain M M.
Gausss law with covariant derivatives In the framework of the variational principle, one considers
a small change in the metric,
When computing the variation of a generally covariant action
such as the action (5.4), one needs to integrate by parts. A g g + h ,
useful shortcut in such calculations is an analog of Gausss
law with covariant derivatives. where is a small number and h is an arbitrary symmet-
The covariant divergence of a vector field A can be written ric tensor field, which is assumed to vanish outside of the sub-
as domain M. The first-order variation of the EH action SEH [g]
1 
A; = gA . is a linear functional of h (p) that can be defined as
g
Z
The formula analogous to the Gauss law is n SEH [g] d
d p h (p) SEH [g + h].
Z I M g (p) d =0

A ; gdn x = hA dn1 S , Note that the integration above is performed over all points
V V
p M and involves the coordinate-based volume element
where h is the partial metric on the hypersurface V and dn p rather than the invariant volume element gdn x. The
dn1 S is the oriented element of the (n 1)-volume on the point p plays the role of a label, much like the indices , in
hypersurface. Using this formula, we may reduce volume in- h (p), and thus a simple summation over all the points p
tegrals of total divergences to boundary integrals. Assuming is needed.

102
5.1 Lagrangian formulation

The expression Hint: Show that



d det 1 + B = 1 + Tr B + O(2 ),
SEH [g + h]
d =0
where 1 is the identity matrix, is small enough, and B is
can be interpreted as the directional derivative of the action an arbitrary matrix.
functional SEH [g] in the direction of h (p). (I put the word The variation of SEH [g] can be split into three terms,
direction in quotes because h (p) is a tensor-valued func- Z

tion on the manifold that describes an arbitrary modification SEH [g] = d4 p R g g
of the metric g (p) at every point.) The action SEH is ex- Z M  
tremized if its derivative vanishes in every direction, i.e. if 4 1
= d p g g R + R g Rg g .
SEH /g (p) = 0. This yields an equation for g. This method M 2
of deriving the Einstein equation is the Lagrangian formula-
Shortly, we shall show that the first term in parentheses above,
tion of general relativity.
Z
Let us adopt the following, somewhat more abstract ge-
d4 p gg R , (5.8)
ometric picture of the variational principle. The set of all M
the possible metrics g (p) can be considered as an (infinite-
dimensional) manifold G, every individual metric g being a vanishes (under suitable boundary conditions) because it is a
single point of G. The Einstein-Hilbert functional SEH [g] is total divergence. Thus, we shall have
then a real-valued scalar function on this manifold, SEH : Z


1

G R. As usual, the tangent space Tg G to the manifold G is SEH [g] = d4 p g R Rg g ,
M 2
the set of all directional derivatives at a point g . Thus, a
tangent vector which is equivalent to
Z
SEH [g] 1
h dn p h (p)
= R Rg . (5.9)
M g (p) g (p) 2

defines the derivative in a particular direction h . The tan- So the condition SEH /g = 0 is equivalent to the vacuum
gent vector h acts on the function SEH [g], yielding Einstein equation (5.6).
Z It remains to analyze the term (5.8). We shall now use the
SEH [g] index-free notation for clarity. The term of interest is
h SEH dn p h (p) ,
M g (p)
g R = Tr(x,y) Ric(x, y). (5.10)
which is the first-order variation of SEH in the direction h .
This can be also written as The Ricci tensor is a contraction of the Riemann tensor,
Z
SEH [g] Ric(x, y) = Tr(a,b) R(x, a, y, b).
dn p h (p) (dSEH [g]) h,
M g (p)
To determine the variation of the Riemann tensor, we need

where dSEH [g] is the naturally defined 1-form on G, the gra- to consider the change in the Levi-Civita connection, ,
dient of SEH in the space of metrics. The components of under g g = g+g. It is easy to show that the difference
the 1-form dSEH [g] are SEH /g (p), where we interpret p between any two connections does not contain derivatives,
as an index on par with , . The action SEH has a (local) ) (f x) = f ( )x,
(
extremum at a particular point g if the 1-form dSEH [g] is
equal to zero at g . and hence acts as a transformation-valued 1-form ,
In the rest of this section, we shall show, by a direct calcula-
tion, that the condition SEH /g (p) = 0 is equivalent to the u x u x (u)x.

vacuum Einstein equation.
An explicit formula for can be derived (see the Calculation
It is convenient to compute the first-order variation of
below) but is not required for the present derivation.
SEH [g] with respect to the inverse metric g rather than
with Now we use the definition (1.62) to express the variation of
respect to g . We need to find the variation of g and
the Riemann tensor through ,
R g R , where R is the Ricci tensor, under an infinites-

imal change g g + g . The calculation of g is R(u, v)w = [(u), v ] w + [u , (v)] w ([u, v])w
easy if we use the standard formula for the derivative of the
p = (u )(v)w (v )(u)w. (5.11)
determinant of a matrix (note that g = det(g ), while
det(g ) = 1/ det(g ) = g 1 ): Calculation: Verify Eq. (5.11). Details: omitted.
The variation of the Ricci tensor is found1 as a trace of the
g 1
= g g. variation of the Riemann tensor,
g 2
Ric(x, y) = Tr(a,b) g(R(x, a)y, b)
Calculation: Derive the formula
= Tr(a,b) g (b, (x )(a)y (a )(x)y) .
1

det A = (det A) A jk
, 1 From this point and until the end of the section, index-free computations
Ajk involve traces and are more cumbersome than computations in the index
notation. However, I would like to show how one could manage a com-
where A Ajk is a finite-dimensional square matrix and A1 plete calculation without indices. Below, an equivalent computation using
is the inverse matrix. indices is shown as well.

103
5 Variational principle

We cannot simplify the last expression any further, until we The last term is a total divergence, as required.
consider the trace (5.10),
Calculation: Consider the first-order variation of the Levi-
Tr(x,y)(a,b)g (b, (x )(a)y) Tr(x,y)(a,b)g (b, (a )(x)y) . Civita connection,
(5.12)
The tensor has rank (1, 2) and can be contracted in two 1  
(x)y lim x y x y ,
different ways, yielding two vector fields q1,2 defined by 0

q1 Tr(x,y) (x)y, where and are Levi-Civita connections corresponding to


g(q2 , y) Tr(a,b) g(b, (a)y), for all y. the metrics g and g+h, and h is an arbitrary symmetric tensor
of rank (0, 2). An explicit formula for the tensor in terms of
The vector fields q1,2 involve derivatives of the metric pertur-
h can be derived,
bation g. In the index notation, we have

( (x) y) x y ; q1 g , q2 g
.
2g((x)y, z) = (x h) (y, z) + (y h) (x, z) (z h) (x, y).
Now we use the properties of the trace (see Sec. 1.7.3) and In the index notation,
the fact that can be interchanged with g and with the mute
vectors a, b, x, y appearing under the trace operation. The 1
= g (h; + h; h; ) . (5.15)
first term in Eq. (5.12) is expressed as 2
Tr(x,y) x Tr(a,b) g(b, (a)y) = Tr(x,y) g(x q2 , y) div q2 , Derivation: We use Eq. (1.43) and choose vectors x, y, z with
all vanishing derivatives, i.e. vector fields x, y, z such that
and the second term as

Tr(x,y)(a,b)g (b, (a ) (x)y) = Tr(a,b) g b, a Tr(x,y) (x)y (x or y or z) (x or y or z) = 0.
=Tr(a,b) g (b, a q1 ) div q1 . This choice is certainly possible at one point p and does not
Hence, influence the result since (x)y involves no derivatives of x or
g R = div (q2 q1 ) div q, (5.13) y but only values of x and y at p. Note that with this choice,

where q q2 q1 is an auxiliary vector field defined through we still have x y 6= 0 since is a different connection from

g . Therefore, the term (5.8) is an integral of a total diver- , however
x y x y = O().

gence and thus vanishes,
Z Z
Now Eq. (1.43) yields (up to terms of order 2 )
d4 p gg R = d4 p g div (q) = 0,
M M
2 (g + h) ( x y, z) 2g(x y, z)
as long as q vanishes at the boundary M of the domain of  
= 2h x y, z

integration M. (Since q depends on the derivatives of g ,
this condition is satisfied if g identically vanishes outside + x h(y, z) + y h(x, z) z h(x, y)

the domain M, or alternatively if both g and g; vanish

= (x h) (y, z) + (y h) (x, z) (z h) (x, y)

at the boundary of M. The Einstein-Hilbert action SEH [g] de-
pends on second derivatives of the metric. It is a fortunate because the term
fact that the equations of motion (the Einstein equations) are  
only second-order in g; but, in any case, it is no surprise that 2h x y, z = O(2 )
the variation SEH [g] contains boundary terms linear in g.)
This result completes the derivation of the vacuum Einstein can be disregarded.
equation from the Einstein-Hilbert action.

Remark: The index-free computations above are cumber- 5.1.3 Nonlinear f (R) gravity
some, mainly because different traces of a high-rank tensor In the Lagrangian formulation of field theory, every field is
are required. The index-free notation is best for low-rank described by an action functional,
tensor computations that do not involve complicated traces. Z
After arriving at Eq. (5.11), it is definitely easier to proceed
gdn p L[],
using the index notation. M

Calculation: Using the index notation, reproduce the where L is an appropriate Lagrangian. Thus, Einsteins theory
derivation of Eq. (5.13), starting with Eq. (5.11). of gravity can be viewed as a field theory where the field
Solution: The index notation for the Riemann tensor is is the point-dependent metric g (p) and the Lagrangian is
b the Einstein-Hilbert one, Eq. (5.7). If we choose a differ-
a b x y R R(a, b, x, y) g(R(a, b)x, y),
ent Lagrangian, we may obtain a different theory of grav-
hence Eq. (5.11) gives ity. Many alternative theories of gravity have been consid-

R = ; ; , ered (and most of them have been rejected). Motivated by
the requirement of general covariance, one usually constrains
R = R = ; ; , the choice of gravitational Lagrangians to combinations of in-
variant quantities, such as R, R R , R R , R, etc.
 

g R = g ; ;
  These modified theories of gravity give rise to equations with

= g g
q ; . (5.14) high-order derivatives of g because the Ricci scalar R[g ]
; contains second derivatives of g .

104
5.1 Lagrangian formulation

We will consider only a relatively simple modification of the field and changing the metric by a suitable conformal trans-
Einstein-Hilbert action consists of replacing the Ricci scalar R formation.
by a nonlinear function of R, so that the action is Rather than guessing the necessary change of variables, one
Z can perform the simplification in a systematic way by using
the method of Lagrange multipliers. I will first describe the
SN L [g] = gdn p f (R). (5.16)
M application of this method to a trivial example borrowed from
classical mechanics. Then I will explain the application of that
(The function f (R) is nonlinear iff f (R) 6= 0). Such theo-
method to the theory (5.16).
ries are called theories of nonlinear gravity, f (R) gravity, or
In classical mechanics, the dynamics of a system may be
scalar-tensor gravity for the reason explained below.2
described by the action
Let us now derive the equation of motion for g from Z
the f (R) Lagrangian. We shall follow the derivation in

S[q] = L(q, q)dt,
Sec. 5.1.2 with appropriate modifications, using the index no-
tation where convenient.
where q(t) is the trajectory of the system, which we will
As before, the variation of SN L [g] can be split into three
assume to be a vector-valued function of time. If the La-
terms,
grangian L[q, q] depends on first derivatives of q(t) nonlin-
Z
early (which is usually the case), the equations of motion for
SN L [g] = d4 p f (R g ) g q(t) are second-order (they contain q ). Suppose we would like
M
Z   to reduce the Lagrangian to a simpler form that generates only
f (R)
= d4 p g f (R) (g R + R g ) g g . first-order equations of motion. One way is to introduce a new
M 2
independent variable v(t) which is equal to the velocity q(t);
The term g R = q ; is a total divergence, as shown by then one expects to obtain a system of first-order equations
Eq. (5.14), but this term is now multiplied by f (R) which is for {q(t), v(t)}. A straightforward way to replace velocities
not a constant since, by assumption, f (R) is nonlinear. There- with an independent variable is to add a constraint q v = 0
fore the integral to the Lagrangian with a Lagrange multiplier. Since we have
Z Z a vector-valued constraint at every moment of time t, the La-
4 4 grange multiplier must also be vector-valued function of time.
d p g f (R)g R = d p gf (R)q ;
M M Let us denote this vector-valued Lagrange multiplier by s(t).
Thus the modified action can be written as
does not identically vanish any more, but instead contributes Z
to the equation for g . Integrating by parts and omitting v, s] = [L(q, v) + s (q v)] dt.
S[q, (5.18)
boundary terms, we have
Z Z The variation of the new action S with respect to three in-

d4 p g f (R)q ; = d4 p g [f (R)]; q . dependent vector-valued functions q, v, s yields equations of
M M motion that are equivalent to the original ones.

However, q still contains derivatives of g , and we now We note that the variables s and v enter the action (5.18) as
need to use Eq. (5.15) to find Lagrange multipliers; the action involves derivatives of q but
no derivatives of s or v. As we suspect that two Lagrange
q = g ; g g ; . (5.17) multipliers constraining a single function q are too many, we
would like to eliminate one of these Lagrange multipliers. To
Calculation: Derive Eq. (5.17) from Eqs. (5.14) and (5.15). this end, let us compute the variation of S with respect to v:
Hint: Note that g is not equal to g with indices
raised, but v, s]
S[q, L(q(t), v(t))
= s(t).
g = g g g , v(t) v
where g is the perturbation in g entering Eq. (5.15). Ex- This is an algebraic equation that can be solved with respect
press q through g rather than through g . to v (if L is a nonlinear and nondegenerate function of v). Let
Finally, we obtain us denote by V(q, s) the functions that express v through q
Z and s,
4 v = V(q, s).
d p g [f (R)]; q
M Since the action (5.18) contains v merely as a Lagrange multi-
Z
 ;

= d4 p g [f (R)]; g [f (R)]; g . plier (no derivatives of v are involved), we may now substi-
M tute v = V(q, s) into the action (5.18) and obtain an equivalent
action that depends only on q and s:
Therefore, the equation of motion for g is
Z
1
S[q, s] = [L(q, V(q, s)) s V(q, s) + s q]
dt.
f (R)R f (R)g + g f (R) [f (R)]; = 0.
2
This action is linear in time derivatives, and hence the equa-
This equation has fourth-order derivatives of the metric and tions of motion for q(t), s(t) are first-order in the time deriva-
is quite complicated. In order to simplify this theory, the tives. One may recognize this as the Hamiltonian action,
standard practice is to reduce the above equation to a system where s is the canonical momentum corresponding to q, and
of second-order equations by introducing an auxiliary scalar
L s V(s, q) H(s, q)
2 See, for instance, G. Magnano and L. M. Sokolowski,
arxiv:gr-qc/9312008. is the Hamiltonian.

105
5 Variational principle

Remark: As we have seen, the Lagrange multiplier s has the The term 6 is a total covariant derivative and can be omit-
significance of the canonical momentum. Usually, the canon- ted from the action. Also, we may express through using
ical momentum corresponding to q would be denoted by the Eq. (5.21) and regard as a new scalar field. Introducing the
letter p. In the present derivation, I chose a different letter, auxiliary function
s, because I wanted to emphasize that we do not know the
significance of the new variables in advance. It will not be al- r() f (r())
V () ,
ways the case that the new variables introduced through the 2
=()
Lagrange multiplier method have the significance of canoni-
we rewrite the action (5.22) as
cal momenta.
Z p
The method of Lagrange multipliers is more general than
g , ] =
S[ gdn p [R[ g , , V ()] .
g ] + 6
the familiar passage from the Lagrangian to the Hamiltonian M
description. For instance, one could treat Lagrangians con-
taining higher derivatives in the same systematic manner. This is the action of usual, unmodified Einstein gravity, mini-
One could also replace only some of the higher derivatives but mally coupled to a scalar field with a self-interaction poten-
not all, if this proves to be convenient in a particular situation. tial V (). The form of the potential V () is ultimately deter-
 mined by the function f (R) in the original action (5.16).
Let us now apply the method of Lagrange multipliers to The additional scalar field can be heuristically viewed as
the action (5.16). At this point, we do not have the purpose a scalar component of the original gravitational field g .
of reducing the system to first-order equations, but rather to The metric tensor g alone is insufficient to describe the ef-
remove the nonlinearity in the f (R) term. Therefore, we are fects of gravity in these nonlinear theories. Therefore, these
motivated to introduce a new field r equal to the Ricci scalar models are also called scalar-tensor theories of gravity. The
R[g ]. Hence, an action equivalent to the action (5.16) is original field variable g is called the Jordan frame variables,
Z while the transformed variables {, g } are called the Ein-
stein frame variables. The name Einstein frame means that
gdn p [f (r) + (R[g ] r) ] , (5.19)
M in that frame the theory looks like the ordinary Einstein grav-
ity coupled to some additional matter fields. Calculations are
where is a Lagrange multiplier field. Variation of the above
often easier in the Einstein frame. However, the metric g is
action with respect to r yields the algebraic equation
an auxiliary variable that does not describe the physically ob-
f (r) = , served metric (i.e. the action for other matter fields, such as the
electromagnetic field, contains g and not g ). So one needs
which can be solved to express r through (since by assump- to perform the conformal transformation back to the Jordan
tion f (r) 6= 0). Let us denote by r() the function obtained frame to recover the physical metric g .
by solving the algebraic equation f (r) = . We can then sub- After this brief excursion into alternative theories of gravi-
stitute r into the action (5.19) and obtain an equivalent action, tation, I continue to consider only the currently standard the-
Z ory (Einsteins General Relativity).

S[g , ] = gdn p [f (r()) r() + R[g ]] , (5.20)
M
5.1.4 Energy-momentum tensor
which now depends only on the metric g and the new field
. Note that the field must be varied independently of g ; Consider a field theory that contains some matter fields j
the constraint R[g ] = r() will be a consequence of the equa- interacting with gravity. In the Lagrangian formulation, the
tions of motion. total system is described by an action of the form
At this point, the action (5.20) depends linearly on R, and Z

we can use another trick to simplify the equations further. S[j ; g ] = (LEH [g] + L[j ; g ]) gd4 x,
M
Namely, we perform a suitable conformal transformation of
the metric. According to Eq. (3.21) with N = 4, a conformal where L[j ; g ] is a Lagrangian for the matter fields and their
transformation g g with a conformal factor e2 6= 0 interactions among themselves, while
will change the variables as follows,
1
p LEH [g] R
g = e2 g , g = e4 g, 16G

R[g ] = e 2
R[ + 6
g ] + 6
g , , , is the Einstein-Hilbert Lagrangian for gravity. The Euler-
Lagrange equation for a matter field j is then
where the DAlembert operator  is defined with respect to
the new metric g . We can now express the action (5.20) L[; g ]
= 0,
through the new metric. An appropriate choice of , namely j

1 1 while the equation for the gravitational field is


= ln , e2 = , (5.21)
2 LEH [g] L[j ; g ]
+ = 0.
cancels the factor in the action (5.20). The action in the new g g
variables is
Z p  We have already computed LEH /g , and so we can rewrite
f (r()) r() this equation as the Einstein equation,
g , ] =
S[g , ] S[ g dn p 2
M
+ 6
 1 L[j ; g ]
+R[
g ] + 6 g , , . (5.22) R Rg = 16G 8GT ,
2 g

106
5.1 Lagrangian formulation

where we have defined the tensor T called the energy- symmetry if we do use the equations of motion. We shall now
momentum tensor (EMT) of matter, show that the requirement of general covariance for a set of
fields i with a Lagrangian L[i ; g], together with the Euler-
2 L[j ; g ]
T . (5.23) Lagrange equation of motion,
g g
L
This tensor plays the role of a source for the gravitational = 0, (5.26)
i (x)
field. In most field theories, this tensor also describes the dis-
tribution of local energy and momentum density of the field, yields a conservation law for the fields energy-momentum
defined in the conventional sense. tensor T defined by Eq. (5.23).
For example, a minimally coupled scalar field with the La- To derive the conservation law, we consider an infinitesimal
grangian (5.3) has the EMT coordinate transformation

1 xx
= x + f (x),
T = , , g , , + V ()g . (5.24)
2
where f is a vector field, which is the generator of the trans-
The electromagnetic field is described by the Maxwell tensor formation (points are shifted along the flow lines of f ). The
F , which satisfies the Maxwell equations. These equations matter fields i are transformed by the flow of f according to
can be derived from the Lagrangian
i (x) i (x) = i (
x) + qi (
x),
g g
LEM [F ; g] = F F = F F g g , (5.25)
16 16 where qi (x) is defined appropriately for each field, such that
the local variation of the field i is i = Lf i . For instance,
from which also follows the EMT
the metric g is transformed as
 
1 1 
T = g F F F F g . g (x) g (x) = g (x)+Lf g (x) = g + f ; + f ; .
4 4

Calculation: Derive the above expressions for the EMT of a We know that the action is invariant
R under the transforma-
scalar field and of the Maxwell field. tion, therefore the variation Ld4 x must vanish:

Solution: If a Lagrangian is of the form L = gL then Z Z 
L L

0 = Ld x = 4
g (x) + i (x) d4 x
2 L L g (x) i (x)

= Lg + 2 . Z
L
Z
L
g g g = f ;
+f ;
 4
d x+ i (x)d4 x.
g (x) i (x)
In the scalar field Lagrangian, the only term that depends on (5.27)
g is 12 g , , . Hence,
Since the fields i satisfy Eq. (5.26), the second term vanishes.
L[g, ] 1 Expressing the first term through the tensor T , we get

= , , ,
g 2 Z Z h
; ;
; i
4
T = , , g L T f gd x = T f T f gd4 x
Z
as required. The same procedure for the Lagrangian LEM ;
= T f gd4 x = 0. (5.28)
yields the required answer since

L 1
 g Here we assumed that f vanishes at infinity sufficiently
= F F g + F F g = F F .quickly. Since Eq. (5.28) must be satisfied for arbitrary f (x),
g 16 8 ;
we conclude that the conservation law T = 0 holds.
Calculation: Compute the 0-0 component of the electromag-
netic EMT and show that it coincides with the familiar expres- Remark: The absence of the covariant volume factor g in
sion for the energy density of the electromagnetic field, Eq. (5.27) is not a mistake; the result is nevertheless a covariant
quantity. The derivative with respect to f is calculated using
1  
~ 2 + |B|
~ 2 . the chain rule, e.g.
T00 = |E|
8 Z Z  
4 L L
Ld x =
g (x) + i (x) d4 x,
5.1.5 General covariance g (x) i

Consider a field theory where every matter field i , i = and the rule requires a simple integration over d4 x. The cor-
1, ..., N , is coupled to gravity through the metric tensor g rect covariant behavior is supplied by the Lagrangian L that

and the covariant derivatives . Such a field theory is invari- itself contains a factor g.
ant under general coordinate transformations; this property In a flat spacetime, the laws of energy and momentum
is called the general covariance of the theory. General coor- conservation follow from the invariance of the action under
dinate transformations can be parametrized by four functions spacetime translations. In the presence of gravitation the
(x). According to the Noether theorem, the invariance with
x spacetime is curved, so in general the spacetime translations
respect to gauge transformations leads to mathematical iden- are not a physical symmetry any more. However, the action
tities that hold regardless of the equations of motion. How- is covariant with respect to arbitrary coordinate transforma-
ever, we can obtain useful conservation laws out of a gauge tions. The corresponding conservation law is the covariant

107
5 Variational principle

;
conservation of the EMT, T = 0. However, this law does value of the function is not modified beyond the necessary
not actually express the conservation of energy or momentum change due to the replacement of the argument, x x + a.
of the matter field i because of the presence of the covariant The Lagrangian L[] is invariant under the replacement
derivative. That equation would be a conservation law if it
x x + a in the following sense: the value of L[](x), com-

had the form ( gT ) = 0, but instead it can be shown puted at a fixed location x, is equal to the value of L[](x + a),
that  i.e., to the value of the same function L[], computed for the
gT = g T 6= 0, old field , at the location x + a. Let us now express this in-
where is the Christoffel symbol. The energy of the matter variance as a mathematical identity.
fields alone, described by the energy-momentum tensor T , It is convenient to consider an infinitesimal displacement
is not necessarily conserved; the gravitational field can change a. At a given location x, the value of any scalar function f (x)
the energy and the momentum of matter. changes under x x + a, to first order, as

Remark: For one scalar field with the Lagrangian (5.3), f (x) f(x) f (x) = f (x + a) f (x) = a f + O(a2 ).
the conservation of energy-momentum tensor, T ; = 0,
This is the local variation of the value of the function f . (From
where T is given by Eq. (5.24), and the equation of motion,
now on, we shall drop the terms O(a2 ) since we need to con-
g ; + V () = 0, are equivalent:
sider only first-order variations.) Because of the assumption
; 1 of translational invariance, the same argument applies also to
0 = T ; = (, , ) (, , ); + V (), the Lagrangian L[](x), considered as a function of x,
 2
;
= ; + V () , .
L[(x)] L[(x)] L[(x)]
In a theory of a single scalar field coupled to gravity, the = L[](x + a) L[](x) = a L[](x).
conservation law T ; = 0 is essentially a consequence of the
Einstein equation. Therefore, one may say that the Einstein The above relation is, essentially, the requirement of trans-
equation contains the equation of motion for the scalar field. lational invariance of the Lagrangian L. On the other hand,
However, the same statement will not hold for theories with the same variation L can be computed directly, expressing it
more than one scalar field. A single conservation law can- through the local variation of the field,
not yield several independent equations of motion for all the
(x) (x) (x) = a (x).
fields.
We have
L L
5.1.6 Symmetries and Noether theorems L[(x)] = + , .
,
The fundamental theorems of E. Noether explain the relation-
Thus the same quantity L can be expressed in two different
ship between symmetries and conservation laws. We show
ways:
simple examples of theories with symmetries. L L
L[(x)] = + , = a L.
,
Translational symmetry This identity expresses the invariance of the Lagrangian L un-
A field theory is described by a Lagragian, and a symme- der translations. A conservation law can be now derived, as-
try of a field theory means a transformation that leaves the suming that the Euler-Lagrange equation holds for a specific
Lagrangian invariant. For example, consider the theory of field configuration (x),
a scalar field with the Lagrangian L[], for instance, that of
L[] L[]
Eq. (5.3) in a flat, Minkowski spacetime, where the metric is = 0.
g = = diag(1, 1, 1, 1). Since this Lagrangian in this ,
spacetime does not depend explicitly on the coordinates, it is Namely, since a is a constant vector, we can express and
invariant under a translation of fields, (x) (x+ a), where L as total divergences,
a is an arbitrary constant 4-vector. We shall now study the
consequences of this symmetry for any Lagrangian L[] that = a = , a = (a ) , a L = (La ) ,
depends only on and .
and then rewrite the above identity as
Remark: In this section only, we denote spacetime points by
boldface letters x (t, x, y, z) for brevity. This notation is dif- L L
0= + , a L
ferent from the notation in the rest of the text, where boldface ,
   
letters are vectors or vector fields. In the Minkowski space- L L L
time, one can identify points (t, x, y, z) with vectors from R4 , = + , L a .
, ,
but in a curved spacetime points are not vectors.
Let us first examine the transformation (x) (x + a) in Since the Euler-Lagrange equation is assumed to hold and
more detail. The transformation is understood as the replace- a is arbitrary, it follows that the following conservation law

ment of the function (x) by a different function (x), holds,  
L
0 = , Lg T ,

(x) (x) = (x + a). ,
where
Note that the value of the new function at a location x is equal L
to the value of the old function at the location x + a; i.e. the T = , Lg (5.29)
,

108
5.1 Lagrangian formulation

is the energy-momentum tensor of the field . Indeed, the The local variation of the derivative , is slightly more com-
tensor T coincides (after lowering the index) with the expres- plicated due to the dependence of f on x,
sion (5.24).
A relation of the form j divj = 0 in flat space is called
(x) (x) (x + f ) (x)
(, ) = = = (f , ), .
a conservation law, and the vector field j is called a conserved x x x x
current. A simple interpretation can be given in Minkowski
The local variation of the Lagrangian (at fixed x) due to the
coordinates (t, ~x): The 4-vector j is decomposed into the time
local variation of the field is
component j 0 and a spatial part ~j. Then j 0 is a density of some
substance at a point, while ~j is the 3-velocity. The conserva- L L
L[(x); x] = + (, )
tion law 0 = j , = t j 0 + div~j shows that the change in the ,
   
density j 0 is always due to the transport of the substance L L
= = , f ,
from neighbor points, which means that the amount of the , , , ,
substance is conserved. Alternatively, we may consider a
spacelike 3-surface t = t1 and compute the total amount Q(t1 ) where we have used the Euler-Lagrange equation. On the
of the substance on the surface, other hand, the invariance of the action under the transfor-
Z mation means that
Q(t1 ) j 0 d3 ~x. Z Z Z
t=t1 d4 x L[(x); x] = d4 x
L[( ] d4 x L[(x); x].
x); x

The conservation law then says that Q(t) is constant,


The volume element is transformed as
Z 0 Z    
dQ(t) dj 3   x   f
= d ~x = div~j d3 ~x = 0. 4
d x 4
= d x det = d x det 1 + 4
dt dt x x
  f
  
The quantity Q is called the conserved charge corresponding = d4 x 1 + = d4 x (1 + divf ) .
to the current j with respect to a 3-surface. The equations x
of motion for the field are such that the charge Q is time-
(Recall that det (1 + A) = 1 + TrA + O(2 ) for a matrix A.)
independent. This is the meaning of the conservation law.
Since
More generally, we may consider a one-parametric group L
of coordinate transformations L[( ] L[; x] = f = f L,
x); x
x
xx (x; ), (x) (x) ( x), we obtain the identity (up to terms of order 2 )
 
L L f
where is a real parameter and = 0 corresponds to the 0= , f f L
identical transformation. If a Lagrangian L[; x], possibly de- , , x x
 
pending explicitly on x, is invariant under this transformation L
= , f f L j , .
group, in the sense explained above, then a similar calculation , ,
leads to the conservation law j , = 0 for a certain 4-vector
field j , called the Noether current corresponding to the sym- Since is arbitrary, it follows that j is a conserved current.
metry transformation x x . The Noether current j is then If the symmetry group has n > 1 parameters then n con-
defined by servation laws can be found. Let us study this case in some
L more detail. Suppose we are given an n-parametric group of
j = , f f L,
, smooth transformations of a spacetime manifold M. Continu-
ous groups of transformations, i.e. groups that are themselves
where smooth multidimensional manifolds, are called Lie groups. If
x
f . the symmetry group G is an n-dimensional Lie group then
=0 each element G acts as a transformation x x (x; ).
Calculation: Derive the above expression for the Noether An infinitesimal transformation x x + x corresponds to an

current j corresponding to the transformation x x , element of the group G which is infinitesimally close to the
identity transformation 1 G. Such elements can be (heuristi-
(x) ( x), assuming that
cally) parametrized as = exp(v) or = 1+v, where v is
Z Z a tangent vector from the (n-dimensional) tangent space T1 G

d4 x L[(x); x] = d4 x L[( x); x ]. and exp(v) is the exponentiation map that produces points
along the flow lines of a vector v. By definition, a tangent vec-
Solution: Consider a transformation with an infinitesimal tor v is a derivation operator acting on functions on G. The
value of , transformation of points x corresponding to the vector v can
be written as
x (x; ) xx = x + f (x, v),
xx = x + f (x), f (x) .

=0 where
x (x)
The vector field f describes the flow of the infinitesimal co- f (x, v) =vx (x; ),
=0
ordinate transformation. The local variation of the field is
and x (x; ) is a function on G. It is clear that f (x, v) is linear
(x) (x) = ( x) (x) = f , = f . in v, thus f is a 1-form on T1 G with values in Tx M. In the

109
5 Variational principle

index notation, f may be written as fs , where the index s is and to suppress terms of order 2 or higher. The local varia-
n-dimensional and labels the tangent space to the Lie group. tion of the field is
The calculation leading to the definition of the Noether cur-
rent yields (x) = i(x), = i ,
  hence the local variation of the Lagrangian is
L
, fs fs L v s js ; = 0,
, ;
0 = L[(x), x] = L[(x), x] L[(x), x]
L L L L
thus the Noether current js is also a 1-form on T1 G with val- = +
+ , +

,
, ,
ues in Tx M. We have seen an example of such a Noether cur-    
rent (5.29): the group of transformations is R4 and thus the in- L L L L
= + = i,
dex s in js can be identified as a four-dimensional Minkowski , ,
,

, , ,
index, yielding a (1,1)-tensor T . The corresponding Noether
charge where we have used the Euler-Lagrange equations. For the
Z
3
above Lagrangian, the Noether current is
Q (t0 ) = T0 d x
t=t0 , ,


j = g ,
represents the total 4-momentum on a spacelike 3-surface t = 2i
t0 . The total energy and the total momentum are conserved, and the Noether charge is interpreted as the charge density of
Q (t) = const. the field .
Remark: The EMT (5.29) obtained as a Noether current
with respect to translation symmetry is called the canonical Infinite-dimensional (gauge) symmetry
EMT, while the tensor defined by Eq. (5.23) is called the met-
ric EMT. The canonical and the metric EMT coincide for the The Noether theorem applies to a more general case: namely,
scalar field but not necessarily for other fields; however, this an n-dimensional Lie group of transformations that changes
mismatch is merely a technical problem with the calculations. both the points x and the values of fields a , a = 1, ..., A, ac-
Both these tensors are conserved in flat space, and therefore cording to
the difference between them is a divergence-free tensor. In-
xx
= x + f (x), a (x) a (x) = a (
x) + qa (
x),
deed, we can always add a divergence of an arbitrary anti-
symmetric tensor to a conserved current without changing the and adds a total divergence to the Lagrangian L,
conservation law, Z Z Z
d4 x L[a (x), x] = d4 x
L[a ( ] + d4 x ( C ) ,
x), x
j j = j + (B B ), ; j , = j , = 0.
where C is a suitable auxiliary vector field. A correspond-
Note that the calculation leading to the definition of the
ing Noether current js exists also in this case, indicating a
Noether current j only shows that j , = 0 and thus does not
conserved quantity Q that remains constant by virtue of the
necessarily determine the physically correct expression for j .
equations of motion.
It is the metric EMT, not the canonical EMT, that contributes
to the Einstein equations and plays the role of the source of Remark: The conservation of Q can be viewed as simply a
gravity. consequence of the equations of motion, and derived by suit-
able algebraic manipulations from the Euler-Lagrange equa-
Internal symmetry tions. However, in that case one may wonder what other con-
servation laws may be found and how the necessary manip-
The essense of Noethers theorem is that every symmetry of ulations are to be guessed. On the other hand, the Noether
a Lagrangian leads to a conservation law. Our first example theorem explains that the existence of conserved quantities is
was a transformation that did not modify the value of the field necessary, as long as the Lagrangian has a continuous sym-
(beyond the necessary change due to the coordinate shift). metry. (And, conversely, every conservation law is a conse-
The second example is a field theory with an internal symme- quence of a symmetry.) Thus, symmetries and the Noether
try, i.e. a symmetry transformation that changes the value of theorem provide a more natural way to obtain conservation
the field but does not involve the coordinates. laws in a given field theory.
Consider a complex-valued scalar field with the La- A qualitatively different situation arises if the theory is in-
grangian variant under symmetry transformations parametrized by an
arbitrary function (x) of the spacetime. (This makes the group
1 of transformations an infinite-dimensional group.) Such sym-
L[] = (g V (||)) g.
2 metry transformation is called a gauge symmetry. A familiar
example of a gauge symmetry is the Maxwell theory with the
This Lagrangian is manifestly invariant under the group of
Lagrangian
transformations = ei , where is an arbitrary real
number; this is an example of an internal symmetry of the 1
LEM = F F , F A A ,

theory. The identity L[] = L[] leads to a conservation law. 16
Again, it is convenient to apply an infinitesimal transforma- which is invariant under the gauge transformation
tion,
= (1 + i) + O(2 ), A (x) A (x) = A (x) + (x),

110
5.2 Hamiltonian formulation

where (x) is an arbitrary function. ) is another solution, for an arbitrary function (x). The
(;
We shall now examine the consequences of a gauge symme- function (x) can be chosen so that = 0 on but 6= 0
try. Suppose that a field theory is described by a Lagrangian
to the future of . The solution ((x); ) will then have the
L[] that is invariant under the (infinitesimal) internal sym- same initial data in on but will differ from (x) to the fu-
metry transformations ture of . Thus, the solution of the Cauchy problem is not
unique.
= + (; ), In fact, this lack of uniqueness was a problem initially en-
countered by A. Einstein who came up with the following
where (; ) is linear in , and is an arbitrary function of hole argument when trying to derive the equations for the
x. For simplicity, let us assume that (, ) depends only on metric g. Suppose g(x) is a solution for the metric in the space-
and the first derivatives , , so that time, subject to some boundary conditions. If the boundary

(, ) = p() + q (), , conditions are imposed on some 3-surface that encircles a
domain of spacetime, we can find a small region (a hole)
where p() and q () are some known coefficients. Then the inside the domain. Due to the general covariance of the the-
local variation of the Lagrangian is ory, we are free to transform the coordinates, x x , in such
a way that the coordinates are changed only within the hole
L L but remain the same everywhere else. This would change the
0 = L[(x), x] = + ,
, metric g(x) inside the hole but not elsewhere. The new metric
L L L L g still satisfies the same boundary conditions as g. Thus, the
= p + q , + (p), + (q , ), metric at any given point x is not uniquely specified by the
, ,

L L
 
L L
 equations of motion and boundary conditions.

= p+ p, + q + (p + q , ) , The resolution of this paradox is that coordinates x are ar-
, ,
bitrary labels assigned to events in spacetime. A gauge trans-
L formation x x merely relabels the events but does not in-
+ q , .
, fluence the observable effects of gravitation (e.g. the distance
between some physical events). Thus, solutions g and g are
Since (x) is an arbitrary function, the terms involving , ,
physically equivalent. In any theory with a gauge symmetry,
, etc., must separately vanish. Thus we obtain the fol-
two solutions that differ by a gauge transformation are phys-
lowing three identities,
ically equivalent.
L L Moreover, the freedom to choose an arbitrary function (x)
p+ p, = 0, in a gauge transformation means that any one of the fields j
,
L L may be set to zero (or to any other function). This will reduce
q + (p + q , ) = 0, the number of unknown functions to solve for. Such a reduc-
,
tion is called gauge fixing. For example, the gauge symmetry
L L A A + in electrodynamics allows one to fix A0 = 0
q + q = 0.
, , and to consider the remaining three components A1 , A2 , A3
as the only physically relevant fields. However, the Maxwell
(The last line follows because , is an arbitrary symmetric
equations in terms of A1 , A2 , A3 appear much less symmetric
tensor.) Note that we did not assume that the field (x) satis-
and more difficult to solve. Alternatively, one may choose the
fies the Euler-Lagrange equation. Hence, the above identities
Lorentz gauge condition A ; = 0, or other suitable condi-
hold simply because of the mathematical properties of the La-
tion.
grangian and the fields.
To conclude: The presence of a gauge symmetry indicates
As an illustration, let us evaluate the three identities in the
that the description of the theory contains too many fields.
Maxwell theory. The field is now the 1-form A , and we
The freedom can be eliminated by fixing the gauge. However,
have
a description in a fixed gauge may be much less convenient
LEM LEM 1 than the original description.
= 0, = F ,
A A, 4
A = , = , q , ;
q; = 0; p = 0.
5.2 Hamiltonian formulation
Of the three identities, the first two yield 0 = 0 and the last
one is Literature sources: [28], chapter 4; [36], Appendix.
L L
q + q = F + F = 0.
A, A, The purpose of this section is to introduce the Hamiltonian
formulation of general relativity.
This is a simple mathematical consequence of the definition of
The starting point of a classical theory is the Lagrangian
F .
L(qi , qi ), where qi (t), i = 1, ..., Nq , are the generalized coor-
The invariance of a field theory under gauge transforma-
dinates, and the action functional is
) does not lead to useful conservation laws
tions (; Z
but has another important consequence for the theory. Con- S[qi (t)] = L(qi , qi )dt.
sider a Cauchy problem for the field , which consists of spec-
ifying the initial values of the field in (x) at an initial space-
A Hamiltonian formulation of the theory is then obtained
like 3-surface . Suppose that (x) is a solution of the Euler-
through the following steps:
Lagrange equation for the initial data in on . The transfor-
mation is a symmetry of the equations of motion, so Define the canonical momenta pi L/ qi .

111
5 Variational principle

Express qi through pi from these equations. 5.2.1 Electrodynamics in Hamiltonian


P formulation
Define the Hamiltonian H(pi , qi ) i pi qi L(qi , qi ),
where qi are expressed as functions of pi in the right-hand The Lagrangian (5.25) of the Maxwell theory is
side.
g g  ~ 2 
~ 2 ,
Use the Hamilton equations of motion, F F = |E| |B|
16 8
d H d H which (in flat space) is the familiar expression for the pres-
qi = , pi = .
dt pi dt qi sure of the electromagnetic field.3 The spacetime metric g is
now considered as a known function, so only the electromag-
These equations can be also considered as Euler- netic potential A is to be determined. (To express this, one
Lagrange equations following from the Hamiltonian ac- says that g is a background field.) To compute the canon-
tion principle, ical momenta corresponding to A , we need to separate the
Z t2 ! dependence of the Lagrangian on the time derivatives of the
X field A .
dt pi qi H(p, q) = 0, (5.30)
t1 i
Suppose a time foliation of the spacetime is fixed and some
coordinates xa , a = 1, 2, 3 are chosen on equal-time surfaces
where pi (t) and qi (t) are varied independently, subject only t = const. Let n be a normalized, timelike vector field orthog-
to the boundary conditions qi (t1,2 ) = 0. onal to the equal-time surfaces. For each point p, the subspace
n (p) Tp M is the tangent space to the equal-time surface
In field theory, the role of the generalized coordinates qi (t) intersecting the point p. Then we may decompose the vector
is played by the fields j (t, x, y, z), j = 1, ..., N . The index field A as
i in qi now becomes a condensed index i {j, x, y, z} in A = nA0 + P A,
j (t, x, y, z). Therefore, in the Hamiltonian formalism the de-
pendence on time t must be separated from the dependence where A0 g(n, A) and P x = x ng(n, x) is the orthogonal

on spatial coordinates (x, y, z) xa . (We shall use Latin in- projector onto n . The 4-vector P A is spacelike and tangent
dices for three-dimensional components.) In a curved space- to the equal-time surfaces. Hence, in local coordinates {xa } on
time, there is no natural time variable, which means that we equal-time surfaces, the vector P A is described by a 3-vector
need to arbitrarily select a coordinate t and the correspond- with components {A1 , A2 , A3 } A. ~ Similarly, we can define
ing family of spacelike 3-surfaces t = const. Such a fam- the 3-vectors E ~ and B.
~ Time derivatives appear in the electric
ily of nonintersecting spacelike 3-surfaces, covering the entire field E, ~
spacetime, is called a time foliation of the spacetime. The
traditional Hamiltonian approach is noncovariant: it requires ~ Ea = t Aa a A0 t A
E ~ grad A0 ,
an explicit time foliation of the spacetime and treats the time
dependence of every variable differently from the spacial de- while the magnetic field
pendence.
3
The Lagrangian L(qi , qi ) is replaced by a Lagrangian L, X
~ Ba =
B ~
abc b Ac = rot A
which is an integral over a 3-surface,
b,c=1
Z
L[i , i ] = d3 xa L(i , a i , i ), depends only on spatial derivatives. It is clear that the La-
t=const
grangian does not contain the time derivative of A0 . There-
fore, we cannot define a canonical momentum p0 for A0 ,
where we separated the time derivatives i from the spatial
and there are no Hamilton equations of motion for the pair
derivatives a i . The Lagrangian L is now a functional and is
{A0 , p0 }; in this case, one says that A0 is not a dynamical vari-
denoted by a script letter, in distinction from the Lagrangian
able. This conclusion agrees with the existence of a gauge sym-
density L which is a function.
metry in electrodynamics. Namely, by performing a gauge
The canonical momentum pi (t, xa ) is defined as the func-
transformation A A + one could set A0 to an ar-
tional derivative of L with respect to i (t, xa ),
bitrary fixed function (this is called fixing the gauge). For
simplicity, let us set A0 = 0. Note that the variation of the
L[i , i ]
pi (t, xa ) = . Lagrangian with respect to A0 gives
i (t, xa )
3
Finally, the summation over i in the definition of the Hamilto-
X 
~ = 0,
a gEa g div E (5.31)
nian must be replaced by a summation over j and an integra- a=1
tion over the 3-surface,
Z which does not contain time derivatives and is therefore a
X
H[pi , i ] = pj (t, xa ) j (t, xa )d3 xa L[i , i ]. constraint of the theory.
t=const j There might be a sign error here: p = E? See Wald [36],
p. 461-462.
We shall now derive Hamiltonian formulations for the 3 Severalfield theories, e.g. the scalar field and the electromagnetic field,
Maxwell theory where the field is the 4-potential A (x),
have the energy-momentum tensor of a perfect fluid form, T =
and for Einsteins General Relativity where the field is the u u (p + ) g . In that case, the Lagrangian density L is equal to
metric g (x). the pressure p and the Hamiltonian density H to the energy density .

112
5.2 Hamiltonian formulation

The canonical momenta pa (a = 1, 2, 3) for the spatial com- The Hamiltonian formalism involves replacing velocities qi
ponents of A are easily found, by momenta pi through the relations pi = L/ qi . However,
in many cases these relations cannot be solved for some qi ,
g g and then the standard rule for finding the Hamiltonian can-
pa (x) = Ea = Aa .
4 4 not be used. For instance, the Lagrangian for electrodynamics
The time derivatives A a are expressed through the momenta is independent of A 0 and thus the relation p0 = L/ A 0 can-
as not be solved for A0 . Another example is when L is linear
4 in, say, the velocity q1 ; in this case, L/ q1 is independent of
A a = pa ,
g q1 and again the same problem arises: namely, we cannot ex-
hence the Hamiltonian is press q1 through the momentum p1 . In these cases, one can
use a slightly different but equivalent approach to develop
Z 3 !
2 X 2 g ~ 2 a Hamiltonian formulation. This approach is known as the
H[pa , Aa , A0 ] = d3 xa pb (x) + B Faddeev-Jackiw formalism.
t=const g 8
b=1
Z The main idea of the Faddeev-Jackiw approach is to rec-
3 g  ~ 2 
~ 2 , ognize that the Hamilton equations of motion are first-order
= d xa |E| |B|
t=const 8 in time derivatives and follow, as Euler-Lagrange equations,
which (in flat space) is the familiar expression for the total from the action (5.30) that is linear in all the velocities qi . Given
energy of the electromagnetic field at a fixed time. the Lagrangian (5.30), we do not attempt to introduce new
The Hamilton equations of motion are canonical momenta for qi , because we recognize that these
momenta are already present in the Lagrangian (they are the
Aa H 4 variables pi ). Thus, the transition from a Lagrangian formu-
= = pa ,
t pa g lation to a Hamiltonian formulation can be seen as the re-
pa H X placement of a Lagrangian action L(q, q), which is generally
= = ~
c ( gabc Bb ) g rot B.
t Aa nonlinear in the velocities qi , by another Lagrangian, namely
b,c L P pi qi H(p, q), which is linear in the velocities and
i
It is straightforward to see that the above equations, together thus yields first-order Euler-Lagrange equations of motion.
with the constraint (5.31), are equivalent to the Maxwell equa- The replacement L L comes at the cost of extending the La-
tions in vacuum. The constraint is merely a restriction on grangian by introducing extra momentum variables pi . There-
possible initial conditions: Once the constraint holds at an fore, we need to add only as many momentum variables as
~ =
initialtime, it will hold at any future time since t div E needed for the Lagrangian to become linear in all the veloci-
~ = 0. ties. If we notice that the Lagrangian is linear in velocities, we
div rot B conclude that all the necessary momentum variables are al-
ready introduced, and the Hamilton equations will be found
Remark: The gauge fixing condition A0 = 0 does not en-
as the ordinary Euler-Lagrange equations for an ordinary (La-
tirely remove the gauge freedom because one can still per-
grangian) system. All these Euler-Lagrange equations will
form a transformation Aa Aa + a with a function (xa )
be (at most) first-order in time derivatives, as expected for
depending only on spatial coordinates. An additional gauge
Hamilton equations. After deriving these equations, one can
fixing condition, such as
easily determine whether constraints are present in the sys-
X 
~ = 0, tem. Namely, the constraints will be any equations not con-
a gAa = div A
a
taining time derivatives.
As a first example, consider the Lagrangian
will remove the remaining gauge freedom. Note that the con-
~ = 0 will be an automatic consequence of this 1 1 2
straint div E L0 (x, x,
y, y,
z, z) + mz 2
= xy y V (x, z).
gauge condition. The resulting gauge is called the radiation 2 2m
gauge because div E ~ = 0 is the Maxwell equation in vacuum, This Lagrangian is linear in x,
independent of y,
and quadratic
describing the propagation of pure electromagnetic field, and in z.
We would like to obtain a Hamiltonian formulation for
A0 = 0 means, heuristically, the absence of an electrostatic this system. Since the Lagrangian is nonlinear in z,
we intro-
field component. duce the canonical momentum for the variable z,
Since we have fixed the gauge, we had to add the con- L0
straint (5.31) to the Hamilton equations of motion. It is possi- pz = mz,

z
ble instead to keep the nondynamical field A0 in the Hamilto-
1
nian (without introducing the momentum p0 ). The constraint express z = m pz , and build the temporary Hamiltonian
will then be obtained as the equation H/A0 = 0. To get a 1 2 1 2
better idea of the Hamiltonian treatment of constrained sys- H0 pz z L0 |z=p
z /m
= xy
+ pz + y + V (x, z).
2m 2m
tems, in the next subsection we shall consider some simple
examples from classical mechanics. Now we formulate the temporary Hamiltonian action prin-
ciple as the variation of the new, extended Lagrangian,

5.2.2 Hamiltonian mechanics of constrained 0 (x, x,


L pz ) pz z H0
y, z, z,
systems 1 2 1 2

= pz z + xy p y V (x, z).
2m z 2m
See also [18, 14] for a more detailed and wide-ranging devel-
opments. The Lagrangian L 0 is linear in all the velocities. Therefore, we
do not need to introduce any more momentum variables;

113
5 Variational principle

instead, we simply need to relabel some of the existing vari- Of course, after determining the Hamilton equations and
ables as momenta. Presently, it is clear that y plays the role the constraints, one needs to solve these equations. Some-
of the momentum for x. Therefore, relabeling px y, we times it is easy to solve the constraints, i.e. to express some
obtain the Hamiltonian action, canonical variables as functions of others, and thus to reduce
the number of dynamical degrees of freedom. In other cases,
1 2 2

solving the constraints is a nontrivial task, and instead one
L0 (x, x,
px , z, z, pz ) = px x + pz z p + pz V (x, z),
2m x attempts to solve the dynamical equations and impose the
which describes a particle of mass m in a two-dimensional constraints afterwards. A full consideration of this topic is
potential V (x, z). The Euler-Lagrange equations of motion are beyond the scope of these lectures; there exists an extensive
literature about constrained systems, both classical and quan-
1 tized.
x = px , px = V /x,
m
1
z = pz , pz = V /z. 5.2.3 Gauss-Codazzi equation
m
In the Hamiltonian formalism, time derivatives play a com-
Clearly, there are no constraints in this model. We conclude pletely different role from spatial derivatives. To compute the
that the Lagrangian L0 is actually not constrained, despite the canonical momenta in general relativity, we will need to sep-
fact that we could not introduce the momenta px and py in arate the spatial and the temporal derivatives in the Einstein-
the conventional manner. The reason for the difficulty was Hilbert Lagrangian gR. Under spatial derivatives it
purely technical: the initial Lagrangian L0 was already half- is natural to understand the intrinsic covariant derivatives
way done becoming a Hamiltonian. However, the unusual within the 3-surfaces t = const. (We know from Sec. 1.10.1-
Lagrangian L0 is completely equivalent to the more conven- 1.10.2 that a 3-surface embedded in a manifold receives an in-
tional L 0.
duced metric and an induced Levi-Civita connection.) There-
Some systems, however, are constrained. As a further ex- fore, the first step toward a Hamiltonian formulation of GR is
ample, consider the Lagrangian to express the curvature scalar R as a sum of a term depend-
ing only on spatial derivatives of the metric and remaining
1 2
L1 (q1 , q1 , q2 , q2 ) = (q1 q2 ) . terms.
2
Consider a 3-surface embedded into a spacetime man-
This Lagrangian is quadratic in q1 , so we introduce the mo- ifold M with a metric g. The surface is spacelike if the
mentum normal vector is timelike. The surface is timelike if the
L1 tangent space to at each point, T (p), contains a timelike
p1 = q1 q2 ,
q1 vector (and thus is spanned by one timelike and two space-
solve for q1 = p1 + q2 , and obtain the temporary Hamilto- like directions). A timelike 3-surface can be considered as
nian a sub-spacetime where two-dimensional creatures might
live. (To emphasize this interpretation, a timelike 3-surface
1 2 is frequently called 2+1-dimensional.) A surface receives
H1 (q1 , p1 , q2 , p2 ) = p1 q1 L1 |q1 =p1 +q2 = p1 + p1 q2
2 the induced metric h = g|T , so the two-dimensional crea-
tures may define the induced Levi-Civita connection (3) and
and the extended Lagrangian
compute the corresponding (3-dimensional) Riemann tensor
(3)
1 R(a, b, c, d) for tangent vectors a, b, c, d. There is a rela-
1 (q1 , q1 , q2 , q2 , p1 ) = p1 q1 H1 = p1 q1 p2 p1 q2 .
L
2 1 tionship between the reduced curvature (3) R and the curva-
ture of the full 4-dimensional spacetime, which we denote
Since L 1 is linear in all the velocities, we now find the Hamil- (4) R for clarity. This relationship is called the Gauss-Codazzi
ton equations as the Euler-Lagrange equations for L 1: equation, which we shall now derive.
Let n be a (spacelike) vector field which is everywhere nor-
q1 p1 q2 = 0, p1 = 0, p 1 = 0. mal to the surface and normalized, g(n, n) = 1. (The case
of a timelike n is fully analogous.) We know from Sec. 1.10.1
Clearly, we have one constraint, p1 = 0, and one dynamical and 1.10.2 that the induced Levi-Civita connection (3) is
equation, q1 = q2 . found using the projector P = 1 n gn onto the tangent
Practice problem: The system with the Lagrangian L1 is in- bundle T , (3)
x t = x t (x)t,
variant under the gauge symmetry
where we have defined as a transformation-valued 1-form,
q1 q1 + (t), q2 q2 + t (t),
(x)t ng(t, x n). (5.32)
where (t) is an arbitrary function. Determine the identity
that this symmetry generates via Noethers theorem. Although the calculations in Sec. 1.10.1-1.10.3 were performed
for the case of flat spacetime M, the assumption (4) R = 0 can
Practice problem: Derive the Hamilton equations of motion be easily dropped. In the derivation leading to Eq. (1.85), we
for the following Lagrangians: only need to retain the underlined term [x , y ] z [x,y] z and
to replace it by (4) R(x, y)z. This yields
x 2 x 2 (x + y)2 y 2
L3 = + xy z; L4 = + ; L5 = xy + + xy. (3)
2 2 2 2 R(x, y)z = (4) R(x, y)z + [(x), (y)] z
Determine which of these Lagrangians are constrained. + (x )(y)z (y )(x)z.

114
5.2 Hamiltonian formulation

Substituting the definition (5.32) of , we find (for tangent vec- Remark: The reason for calling K the extrinsic curva-
tors x, y, z, t) ture is the following. The intrinsic curvature (3) R of the
3-surface will differ from the full curvature (4) R of M
(x)(y)z = ng((y)z, x n) = ng(n(...), x n) = 0, if the 3-surface is embedded into the spacetime M in a
(x )(y)z = (x n)g(z, y n) + n(...), curved way. The tensor K contains the complete infor-
g((x )(y)z, t) = g(t, x n)g(z, y n), mation needed to compute the intrinsic curvature on if the
full curvature (4) R is known. The covariant derivative (3) is
where we denoted by (...) some uninteresting terms that can- also expressed through K and (4) . Therefore, the tensor K
cel under the scalar product. Hence, completely characterizes the additional curvature on due to
the embedding of into M. Note, however, that not all the
(3)
R(x, y, z, t) =(4) R(x, y, z, t) components of (4) R can be restored from the knowledge of
+ g(t, x n)g(z, y n) g(t, y n)g(z, x n). three-dimensional tensors (3) R and K : one also needs some
information about the 4-vector field n and its derivatives.
The above expression contains the bilinear form
Statement: The relation (5.33) was derived only for tangent
B(n) (x, y) g(x n, y), vectors; the tensor (3) R vanishes if any of its arguments is par-
allel to n. It is possible, however, to express (4) R(x, y, z, n) in
which is the distortion tensor of the field n (see Sec. 2.3.1).
terms of K for tangent x, y, z:
Since the vector field n is, by construction, orthogonal to the
   
3-surface , we may extend n to a normalized geodesic vector (4)
R(x, y, z, n) = (3) y K (x, z) (3) x K (y, z).
field in a neighborhood of , and thus the tensor B(n) is sym-
metric and transverse to n. Effectively, B(n) is a 3-dimensional (5.34)
symmetric tensor in the tangent bundle T . Therefore, the We can derive Eq. (5.34), starting from Eq. (5.32).
metric g in the definition of B can be replaced by the partial Derivation: We start from
(n)
metric (3)
R(x, y)z = (4) R(x, y)z + [(x), (y)] z
B(n) (x, y) = h(x n, y), + (x )(y)z (y )(x)z.
h(x, y) g(x, y) + g(x, n)g(y, n). We write K instead of K for brevity. Substituting (x)y =
nK(x, y) and using the fact that K is symmetric and trans-
In this context, the distortion tensor B(n) is called the extrinsic
verse to n, we find
curvature tensor of the surface and denoted K ,
(x)(y)z = nK((y)z, x) = nK(n(...), ...) = 0,
K (x, y) = h(x n, y); K = h n ; .
(x )(y)z = (x n)K(y, z) + n (x K) (y, z),
The extrinsic curvature K depends only on the derivatives g((x )(y)z, n) = (x K) (y, z).
of n in tangential directions and is thus independent of the
completion of n to a vector field outside . Since (3) R(x, y, z, n) = 0, we have
Note that the same tensor K can be defined by using an (4)
R(x, y, z, n) = (y K) (x, z) (x K) (y, z).
arbitrary, unnormalized vector n , in the following way:
x n, y), The derivatives K can be replaced by (3) K because K is
K (x, y) = h( transverse to n:
y) g(x, y) 1
h(x, g(x, n)g(y, n). K(x y, z) = K((3) x y, z) + K(n(...), ...) = K((3) x y, z);
g(n, n)
(x K) (y, z) Lx (K(y, z)) K(x y, z) K(y, x z)
The resulting tensor is identical to B(n) for n = |g( )|1/2 n
n, n .
= (3) x (K(y, z)) K((3) x y, z) K(y, (3) x z)
Using the extrinsic curvature K , the 3-dimensional Rie-
mann tensor (defined only for tangent vectors x, y, z, t) can be = (3) (x K) (y, z).
rewritten as
Thus we obtain the desired formula.
(3) (4)
R(x, y, z, t) = R(x, y, z, t) + K (x, t)K (y, z)
K (y, t)K (x, z). (5.33) 5.2.4 Boundary term in Einstein-Hilbert action
This is called the Gauss-Codazzi equation. In Sec. 5.1.2 we derived the Einstein equation from a varia-
The induced connection (3) can be expressed through the tional principle by imposing the condition that both g and
extrinsic curvature K as g must vanish on a boundary surface M. This is dif-
ferent from the usual condition where only the variation of
(3)
x y = x y ng(x n, y) = x y nK (x, y). a field is constrained to vanish (but not its derivative). The
formal reason for imposing the extra
R condition was the pres-
where x, y are tangent to . In the index notation, the con-
ence of a total divergence term, M gd4 p divq, where q is
ventional way to denote induced derivatives is by a vertical
a vector field depending on the derivatives g, as defined by
bar, 
(3)
 Eq. (5.14). However, one can cancel this divergence term by
x y x y | . subtracting an extra boundary term from the Einstein-Hilbert
action, thus defining the corrected action SG [g],
Thus we can write (for a tangent vector y y )
I

y | = y ; n y K . SG SEH 2K hd3 p,

M

115
5 Variational principle

where K is a suitable function (the factor 2 is for later conve- The induced partial metric h is defined using the orthogonal
nience), and h is the induced metric on the 3-surface M. In projector onto the tangent bundle T M,
this section, we show how to choose the required function K
so that the variation of the action SG [g] yields simply h = g n n ; h = g n n .
Z   is every-
4 1 (For simplicity, we assume that the 3-surface M
SG [g] = gd p R Rg g (p),
M 2 where spacelike and g(n, n) = 1. For spacelike n, the induced
1 1
metric would be h = g + n n and we would need to
without any total divergence terms. The only boundary con- take h instead of h. Finally, the surface M may con-
dition will be that g should vanish on M. sist of timelike and spacelike pieces, as long as it is nowhere
An analogy with ordinary mechanics may be helpful. Con- null.) The remainder of this section is occupied by a calcula-
sider the familiar nonrelativistic Lagrangian tion showing that K h n; satisfies Eq. (5.35).
1 2 Remark: Note that the quantity K is equal to the trace of the
L(q, q)
= mq V (q),
2 namely
extrinsic curvature of the 3-surface M,
describing a particle of mass m in a potential V (q). This La- K = Tr(x,y) K M
(x, y) = g K = h K .
grangian yields the same Euler-Lagrange equations as
On the surface M, we have h n; = g n; = divn. How-
q, 1
L(q, q) = mq q V (q). ever, the function divn contains non-tangential derivatives of
2
the metric and thus cannot be used for a variational principle
However, the Lagrangian L depends on the second derivative where g is not zero.
of q. The variation of L with respect to q(t) contains bound- We begin with Eq. (5.17),
ary terms with q(t) as well as q(t), 
g(n, q) = n g ; g g ; = n g (g; g; ) .
Z t2 t2 t2 Z t2 (5.36)
1 1
Ldt = mq q + mqq
(mq + V (q)) q(t)dt. Since the variation g of the metric vanishes on M, it fol-
t1 2 t1 2 t1 t1
lows that any tangential derivative of g vanishes. Since the
At first glance, we are required to impose the boundary con- inverse partial metric h consists of tensor products of tan-
ditions q(t)
= q(t) = 0 for t = t1,2 . However, the problem gential vectors, we have
with the Lagrangian L can be fixed by adding a boundary
h g; = 0. (5.37)
term, so that the corrected action becomes
Z t2 t2 Substituting g = h + n n into Eq. (5.36), we therefore
1
Scorr [q] = L(q, q,
q)dt + mq q . find
t1 2 t1
g(n, q) = n h (g; g; ) = n h g; .
The variation of the boundary term is
t2 t2 t2 On the other hand, we consider the variation K due to the
1 1 1 variation g. Since g = 0 on the surface M, the vector field
mq q = mq q + mqq ,
2 t1 2 t1 2 t1 n remains normalized and orthogonal to the surface, and the
only varying quantity is the Levi-Civita connection , thus
which precisely cancels the unwanted term with q in the vari-  

ation L. K h n n =
h n .
Instead of adding a boundary term, we may add its deriva-
tive to the Lagrangian L, so that the corrected action is writ- Using Eqs. (5.15) and (5.37), we find
ten as
1 1
Z t2 
d

1
 K = h n (g; + g; g; ) = h n ga; .
Scorr [q] = q,
L(q, q) + mq q dt. 2 2
t1 dt 2
Therefore,
2K = h n ga; = g(n, q)
R
Of course, this is identical to the usual action Ldt for the
particle in a potential. The additional term cancels the second as required.
derivative present in L.
In General Relativity, it is more convenient to keep the Remark: Adding a boundary integral to the Einstein-Hilbert
boundary term as such. The Einstein-Hilbert Lagrangian action is equivalent to adding a total divergence (2Kn )

gR is equivalent to other Lagrangians depending only on div(2Kn) to the Lagrangian gR, provided that the vector
first derivatives of the metric, but none of the corrected La- field n is extended from the boundary M to the entire do-
grangians cannot be written in a covariant form. main M (note that K is a function of n and the metric g). More
Presently, we are looking for a function K such that generally, one can add an arbitrary total divergence divB of
some vector field B to the Lagrangian, without changing the
(2K) = g(n, q), (5.35) equations of motion. When the vector field B is chosen ap-
propriately, the variation SG of the corrected action
where n is a normalized vector field orthogonal to the 3-
surface M. If we find such K, then the boundary term will 1
Z

be equal to the volume integral of div (2Kn). SG [g] [R divB] gd4 x
16G M

116
5.2 Hamiltonian formulation

contains only volume and boundary terms with g . This Hence, in a basis {n, a, b, c}, where a, b, c are orthogonal to n,
indicates that SG [g] is first-order in the derivatives of g, as are the metric g has the matrix representation
the Lagrangians of most other field theories. However, the
vector field B cannot be expressed as a function of the metric 1 0 0 0
0
g alone. g = 0 ,

It seems4 that B can be expressed through a choice of four
vector fields, one of which should coincide with n on M. 0
Denote by K[n] the extrinsic curvature function evaluated on where the stars indicate the unknown nonzero components
a normalized vector n. Since we will need to vary the metric hab of the partial metric h. Note that g = h, so the
while keeping the vector n fixed, we need a formula for K[n]
covariant 4-volume element is gd4 x = hdt d3 xa ,
that includes the normalization of n,
the3 covariant 3-volume element on a surface t = const
while
g(n, n n) is hd xa .
K[n] = divn . It is clear that the Lagrangian depends only on h, therefore
g(n, n)
we choose h(t, xa ) to be the set of generalized coordinates
Choose an orthonormal frame {e0 , e1 , e2 , e3 } in the entire do- in the sense of the Hamiltonian formalism.
The next task is
main M. The vector fields {e } are orthonormal with respect to find the dependence of the Lagrangian, gR, on hab /t
to the metric g. Then define and to compute the corresponding canonical momenta pab .

The time derivative hab hab /t can be expressed in a
X
B= 2K[e ]g(e , e )e . geometric way as
h Ln h,
since the basis vectors in the 3-surface commute with n t ,
This specifies B as a function of g that contains first deriva-
i.e. Ln ea = 0:
tives of g. Then one can show that R divB depends only
on first derivatives of g. This can be shown, for instance, by hab
considering a conformal transformation g g e2 g. One = t (h(ea , eb )) = Ln (h(ea , eb )) = (Ln h) (ea , eb ).
t
computes R B
div and checks that the term containing 
P Hence, for tangent vectors x, y we have

in R cancels with the terms e (e ) coming from B.
This explanation needs some more detail. (Ln h) (x, y) = Ln (h(x, y)) h(Ln x, y) h(x, Ln y)
Thus, the explicit form of SG [g] depends not only on g but = n h(x, y) h(n x x n, y) h(x, n y y n)
also on an essentially arbitrary choice of the vector field B
Therefore, it is impossible = h(x n, y) + h(x, n y) = 2K(x, y), (5.38)
in the interior of the domain M.
to write the first-order action SG [g] as an explicit and covari- where K(, ) is the extrinsic curvature of the 3-surface t =
ant (i.e. coordinate-independent) function only of g. In a par- const. Therefore, h ab = 2Kab , and the dependence of the
ticular coordinate system, the vector field B may be chosen Einstein-Hilbert action on h ab will be found if we express
with particular numerical components so that the first-order
R
SEH [g] = d4 x gR through the extrinsic curvature Kab of
Lagrangian appears to be a function only of the components g the surfaces t = const.
and g, . However, this form of the first-order Lagrangian is We already saw a relation of precisely this sort, namely the
not covariant (it depends on a chosen coordinate system). Gauss-Codazzi equation (5.33) that expresses the 4-curvature
R (4) R through the 3-curvature (3) R and the extrinsic cur-
5.2.5 The Hamiltonian for pure gravity vature Kab . The only nontrivial calculation in this section will
be to show that
Now we shall compute the canonical momenta and the Z

Z
 
Hamiltonian for pure gravity (without matter) in general rel- d4 x g(4) R = d4 x g (3) R + Kab K ab Kaa Kbb ,
ativity. (5.39)
For simplicity, we choose a time foliation such that the up to total derivative terms that we omit. Note that Eq. (5.33)
(timelike) vector field n g 1 dt normal to the surfaces t = is very similar to Eq. (5.33), except for the different sign at the
const is normalized, g(n, n) = 1. This means that n is a K terms. We shall derive Eq. (5.39) from Eq. (5.33) at the end
geodesic vector field (why?); also, t is the proper time along of this section, and now let us assume that Eq. (5.39) is valid
the flow lines of n. This choice of the time foliation, called the and determine the canonical momenta.
synchronous gauge, may be impossible to perform globally
Since the Lagrangian density gR depends on h/t only
due to focusing of the field n. However, this choice is always through the terms containing K , the canonical momentum
ab
possible locally, because one can emit timelike geodesics nor- pab is
mal to an arbitrary initial spacelike surface and use the proper

time along the geodesics as the coordinate t. It is also con- ab 1 ( gR) h 
p = Kab K ab Kaa Kbb .
venient to choose the local coordinates xa within a surface 16G hab 32G Kab
t = const such that the basis vectors ea are connecting vectors
for n, i.e. Ln ea = 0. This completely removes the coordinate Rewriting for convenience
freedom (fixes the gauge). 
Kab K ab Kaa Kbb = Kab Kcd hac hbd hab hcd ,
The partial metric h induced on the 3-surface t = const is
we find after a simple calculation
h = g n n .
h 
4 I need to make this consideration more precise. pab = K ab Kcc hab . (5.40)
16G

117
5 Variational principle

The inverse relation allows us to express the velocity h where we wrote Kaa (3) Tr(x,y) K(x, y). Then we evalu-
through the momentum p: ate the 3-trace of the result over (y, t), using the fact that
  R(n, n, , ) = 0 and hence
32G 1
h ab = 2K ab = pab pcc hab .
h 2 (3)
Tr(x,z) (4) R(n, x, n, z) = (4) Tr(x,z) (4) R(n, x, n, z).
Now we can write the Hamiltonian for pure gravity:
We find
H[pab , hab ]
Z  
(3)
R (3) Tr(x,z)(y,t) (3) R(x, y, z, t) =(4) R
3 ab h (3) ab a b
= d x p hab R + Kab K Ka Kb 2(4) Tr(x,z) (4) R(n, x, n, z) + Kab K ab Kaa Kbb .
t=const 16G
Z
h h ab i
= d3 x K Kab Kaa Kbb (3) R It remains to compute the term (4) Tr(x,z) (4) R(n, x, n, z). I
t=const 16G shall present a computation in the index notation and also an
Z   
h (3) 16G 1 index-free computation, for comparison. Let us first convert
= d3 x R+ pab pab paa pbb .
t=const 16G h 2 this term to the index notation:
(5.41)
Tr(x,z) (4) R(n, x, n, z) Tr(x,z) z g n x [ , ] n
The Hamilton equations of motion are cumbersome but fol-
low straightforwardly, = g g n [ , ] n = n [ , ] n .
 
hab H 32G 1 c
The last expression is simplified as
= ab = pab hab pc ,
t p h 2   
pab H n n ; n n ; = n n; n
; n; n n; ; + n; n;
= ab ;
t h
  = div (...) + K K K K ,
h (3) R ab 1 a b 16G
= + p p ab p p
16G hab hab 2 a b h where we used the relation K = n; and omitted the unin-
 
h (3) 1 teresting total divergence terms. Finally, we obtain
= Rab (3) Rhab
16G 2 (3)
    R (4) R Kab K ab + Kaa Kbb + div (...) ,
8G 1 1
+ pab pab paa pbb hab 4 pac pcb pcc pab .
h 2 2 which is equivalent to Eq. (5.39).
In the index-free notation, the manipulations with traces are
Note that we used the three-dimensional Ricci tensor (3) Rab
somewhat less transparent. We need to use the property that
when computing the variation of (3) R.
mute vectors have vanishing derivatives under the trace op-
The equations of motion in the Hamiltonian form are equiv-
eration, as well as
alent to Einsteins equations, but now it is clear that the metric
hab (t) and the extrinsic curvature Kab (t) can be found from A(x, y, z, ...) = Tr(a,b) g(a, x)A(b, y, z, ...)
initial data hab (t0 ), Kab (t0 ) at an initial spacelike 3-surface
t = t0 , as a solution of an appropriate Cauchy problem. This (see Sec. 1.7.3). Also, we have n n = 0 and K(x, y) =
has applications in numerical relativity, i.e. a numerical so- K(y, x). Thus,
lution of the Einstein equations.
Tr(x,z) (4) R(n, x, n, z) = Tr(x,z) g(z, n x n + x n n)
Derivation of Eq. (5.39) =Tr(a,b)(x,z) [g(a, n)g(z, b x n) + g(a, x n)g(z, b n)] .
Let us now derive Eq. (5.39) from Eq. (5.33). The second term in brackets yields directly
The 3-dimensional curvature (3) R is defined as the
3-dimensional trace of the induced Riemann tensor Tr(a,b)(x,z) g(a, x n)g(z, b n) = Tr(a,b)(x,z) K(a, x)K(z, b) = Kab K
(3)
R(a, b, c, d) with respect to (a, c) and (b, d). Since n
is orthogonal to the 3-surfaces, the 3-dimensional trace (3) Tr while the first term in brackets is transformed as
of any tensor A can be expressed through its 4-dimensional
trace as g(b, n)g(z, a x n) = a [g(b, n)g(z, x n)]g(b, a n)g(z, x n),
(3)
Tr(x,y) A(x, y) = (4) Tr(x,y) A(x, y) A(n, n). which yields a total divergence and the term
The Gauss-Codazzi equation (5.33) is  2
Tr(a,b)(x,z)g(b, a n)g(z, x n) = Tr(a,b) K(a, b) = (Kaa )2 .
(3)
R(x, y, z, t) = (3)R(x, y, z, t)+K(x, t)K(y, z)K(y, t)K(x, z).
Therefore we again recover Eq. (5.39).
We begin by evaluating the 3-dimensional trace of the above
equation over (x, z):
5.2.6 Constraints in General Relativity
(3)
Tr(x,z) (3) R(x, y, z, t) =(3) Tr(x,z) (4) R(x, y, z, t)
The Hamilton equations of motion are incomplete without
+(3) Tr(x,z) K(x, t)K(y, z) K(y, t)Tr K the constraint equations which we neglected to derive earlier.
= (4) Tr(x,z) (4) R(x, y, z, t) (4) R(n, y, n, t) These constraints are present because the Lagrangian LEH is
independent of the time derivatives of the components g 0 of
+(3) Tr(x,z) K(x, t)K(y, z) K(y, t)Kaa , the metric. To see this explicitly, we can repeat the derivation

118
5.2 Hamiltonian formulation

of the Hamiltonian formulation without assuming that t is the t is a connecting vector for ea , we have
proper time along geodesic lines of n.
We have computed the Hamiltonian in the syncronous hab
t [h(ea , eb )] = Lt [h(ea , eb )]
gauge for simplicity, but the calculation can be performed in t
an arbitrary gauge, i.e. with an arbitrary selection of the time = (Lt h) (ea , eb ),
foliation. Suppose that a function t is given on a spacetime,
such that the contravariant gradient vector g 1 dt is every- and hence for arbitrary vectors x, y tangent to a 3-surface, we
where timelike (but not necessarily normalized). We may use find, similarly to Eq. (5.38),
the function t as the time coordinate and define equal-time (Lt h) (x, y) = h(x t , y) + h(x, y t ).
3-surfaces t = const. Let us define the normal vector to the
3-surfaces, n N g1 dt, where N is such that g(n, n) = 1, Since h(n, ) = 0, we can simplify the above terms, e.g.
namely N 2 = g 1 (dt, dt). The local coordinates xa on these
3-surfaces may be chosen arbitrarily. Given a choice of the h(x t , y) = h(x (N n + s), y)
local coordinates xa , we may define the vector t which acts = N h(x n, y) + h(x s, y)
on functions f (t, xa ) as f f /t, the partial derivative at
fixed xa . The vector t is not necessarily parallel to n, and by = N K(x, y) + h((3) x s, y).
construction we have
In the last line, we replaced by (3) since, by definition,
g 1 dt, t ) = N (dt) (t ) = N,
g(n, t ) = N g(
h((3) x y, z) g(x y, z)
hence in general we can decompose t as
for tangent vectors x, y, z. Hence,
t = N n + s, hab /t = 2N Kab + sa|b + sb|a ,
where s is a spacelike vector tangent to the surface t = const, where we denote (3) a by a vertical bar.
P3
which is thus equivalent to a 3-vector s = a=1 sa ea in the It now follows from Eq. (5.39) that the Lagrangian LEH =
local basis {ea }. The parameters N and sa are called the lapse
gR does not depend on the time derivatives of N and
function and the shift vector for a chosen coordinate sys- sa . Therefore, we do not need to introduce the canonical mo-
tem. The lapse describes how much time elapses between menta for these nondynamical variables. As before, the only
two consecutive t = const surfaces along the lines of n, and canonical momenta are pab corresponding to the components
the shift vector shows how much the local coordinate system hab of the induced 3-metric, and a simple calculation shows
shifts when we pass from one t = const surface to another that pab is still related to Kab by Eq. (5.40). Hence, the Hamil-
along n. tonian is
Let us now determine the induced metric h within the 3- Z
surfaces. The matrix hab g(ea , eb ) is related to the metric h h ab 
H = d3 xa K Kab Kaa Kbb (3) R N
g in the basis {t , ea } by the following matrix representation 16G
i
(verify this!), 
K Kcc hab |a sb .
ab

N 2 + h(s, s) s1 s2 s3
A complete derivation of the Hamiltonian in an arbitrary
s1 h11 h12 h13
g = , gauge (including also a detailed treatment of all the bound-
s2 h21 h22 h23
ary terms) can be found in the book [Poisson 2004], chapter 4.
s3 h31 h32 h33
Here, we shall only determine the constraints of the theory.
where sa hab sa . It follows from this representation The variables N and sa enter the Hamiltonian as nondy-
that the namical Lagrange multipliers, therefore the constraints are
determinants of g and h are related by g = N h. (See
Calculation below.) H
= 0 KKab Kaa Kbb (3) R = 0,
2
Calculation: Show that det g = N det h when g is related N
to hab by the above formula. H 
= 0 K ab Kcc hab |a = 0.
Solution: To compute det g directly, subtract from the first sa
row of the matrix g the linear combination of the other three The constraint H/N = 0 is called the Hamiltonian con-
rows with coefficients sa . This does not change the determi-  straint or the energy constraint, while H/sa = 0 is called
nant, but the first row of the matrix simplifies to N 2 , 0, 0, 0 . the momentum constraint. We find (perhaps with surprise)
Hence that the constraints make the Hamiltonian itself vanish, H =
2 0.
N + sa sa sa
det g = det
sb hab
Remark: A vanishing Hamiltonian is a characteristic feature
2
N 0 2 of any theory which is invariant under arbitrary coordinate
= det = N det hab .
sb hab transformations. The constraint means that H[pab , hab ] = 0
for pab , hab that solve the equations of motion. It is important
The action (5.39) involves (3) R, which is independent of the to realize that the constraint H = 0 does not make the theory
choice of gauge, and the extrinsic curvature Kab . The relation- trivial; the Hamiltonian H[pab , hab ] is a nontrivial functional of
ship between the extrinsic curvature K(x, y) = h(x, y n) and the canonical variables pab , hab and can be used to derive the
the metric derivatives hab /t needs to be derived now. Since equations of motion.

119
5 Variational principle

5.3 Quantum cosmology tion,



i [t; hab , j ] = H[t; hab , j ]
Additional literature: [35]. t
H ( pab , hab ; pj , j ) [t; hab , j ]
 
The Hamiltonian description of General Relativity was de-
veloped to quantize that theory. We shall now describe the =H , hab (xc ); , j (xc ) [t; hab , j ].
hab (xc ) j (xc )
approach to quantization of gravity due to J. Wheeler and B.
DeWitt. So far it was impossible to develop a full quantum Here, the operator H is the total Hamiltonian of the sys-
theory of gravity based on the Hamiltonian description pre- tem, which is itself a functional of all the canonical variables.
sented above. Technical difficulties are too great to derive in For instance, the Hamiltonian for pure gravity is given by
detail, for example, the quantum state that corresponds to a Eq. (5.41). Note that the operators pab and pj are replaced by
Schwarzschild spacetime. Therefore this approach to quanti- functional derivatives with respect to hab and j .
zation of gravity remains formal.5 However, as we shall now
see, some interesting qualitative results can be obtained in the 5.3.2 Wheeler-DeWitt equation
Wheeler-DeWitt approach.
Since classical General Relativity is a constrained theory, the
constraints must be passed on to the quantum theory as well.
A comprehensive treatment of quantization for constrained
5.3.1 Wave function of the universe systems is beyond the scope of these lectures; here we shall
adopt the simplistic point of view that the constraint equa-
According to the general scheme of canonical quantization, tions, which are of the form C(qi , pi ) = 0, should be made
we need to replace the coordinates q, p by operators p, q satis- operator-valued and imposed as operators on physical states:
fying the standard equal-time commutation relations. In the C(
qi , pi )(qi ) = 0.
case of gravitation, we shall therefore write
In other words, quantum states satisfying the constraints
h i are the only valid physical states of the system. It follows that
cd (t, xb ) = c d (3) (xa xb ).
pmn (t, xa ), h m n the Hamiltonian constraint, H = 0, is translated to the restric-
tion

H[t; hab , j ] = 0. (5.42)
It is easier to visualize the quantum theory in the Schrdinger
picture. The quantum state in the coordinate representation Then the Schrdinger equation yields
is a wave function (t; qi ) of time t and the generalized co-
ordinate qi ; the operator qi acts as multiplication by qi , and [t; hab , j ] = 0.
i
t
pi = i q i . In General Relativity, the role of qi is played by the
3-metric hab (xc ), stripped of the dependence on time. There- Hence, the wave function of the universe is time-independent
fore, the wave function is a functional [t; hab (xc )]. Heuris- and satisfies the Hamiltonian constraint (5.42), which is called
2 in this context the Wheeler-DeWitt (WD) equation. The time
tically, one might expect that |[t; hab ]| is the probability of
independence of the wave function may appear to be a puz-
observing the metric hab (xc ) at time t. (Below, we shall see
zling feature of the theory. However, it is a necessary feature:
that the interpretation of this wave function is not quite as
In a generally covariant theory, the time parameter t is an
straightforward as it may appear.)
arbitrary label on events in spacetime, and physically mean-
We may also consider other matter fields j (t, xa ) coupled ingful probabilities must be defined in terms of coordinate-
to gravity. In the Hamiltonian formalism, these fields will independent functionals of hab and j . We shall consider the
be described by generalized coordinates j (xa ) and the cor- interpretation of [t; hab , j ] in the next section.
responding canonical momenta pj (xa ). Thus, the wave func- Although we wrote the equations for quantized General
tion of the full theory will be a functional of hab and j . Such a Relativity, it remains difficult to extract any tangible results
functional [t; hab (xc ), j (xc )] is called the wave function of from these equations. Quite apart from the problem of solv-
the universe because it describes (in principle) all the possible ing the highly complicated equation (5.42), the question of the
processes anywhere in the universe. operator ordering: The Hamiltonian H is a nonlinear function
The space consisting of all the possible gravity and mat- of hab and pab , containing terms such as hab hcd pac pbd , hence it
ter field configurations {hab (xc ), j (xc )} is called the super- is unclear how to order these noncommuting operators in the
space.6 Thus, the wave function of the universe is a (complex- quantum Hamiltonian H. The quantum Hamiltonian is thus
valued) function on superspace. not a well-defined operator unless we adopt a particular pre-
According to the standard rules of quantum mechanics, the scription for the operator ordering. To determine the correct
wave function of the universe satisfies the Schrdinger equa- operator ordering, one needs to compute some predictions of
the quantum theory with one or another operator ordering
5 Currently, a more promising approach
and to compare these predictions with experiments or with
to canonical quantization of General
other known results. Such computations are generally too dif-
Relativity is a Hamiltonian approach based on a different choice of gen-
eralized coordinates, called Ashtekar variables. The theory based on ficult, and thus the Wheeler-DeWitt equation (5.42) remains,
these variables is called loop quantum gravity and is beyond the scope in its full generality, a formula without application. (How-
of this book. ever, below we shall see that the Wheeler-DeWitt equation can
6 Note that a point of superspace is a configuration h (x ), (x ) which
ab c j c
be transformed into a differential equation and solved, if one
contains functions only of 3-dimensional coordinates xc , not of time t.
This can be visualized as an instantaneous field configuration on a 3- restricts gravity and matter fields to spatially homogeneous
surface of constant time. configurations.)

120
5.3 Quantum cosmology

5.3.3 Interpretation of the wave function evidence that the universe around us is extremely homoge-
neous on large scales. Therefore, in cosmology one usually
Even if one somehow computes the wave function of the uni- assumes that the 3-surfaces of constant time are spatially ho-
verse , its interpretation is nontrivial, mainly because the mogeneous. In other words, the 3-metric hab (t) and the fields
functional [hab , j ] is independent from the time parameter j (t) depend only on the time t and are independent of the
t. This fact, however, does not mean that the theory always spatial coordinates xc . Hence, we are motivated to consider
describes a static universe! A physically meaningful time the theory of quantized gravity with spatially homogeneous
must be defined not as the value of the parameter t, which
metric and fields. The corresponding simplification of the WD
can be changed by a coordinate transformation, but through equation consists of reducing the infinite-dimensional super-
some physical process (a clock). Since the wave function space to a finite-dimensional space containing 3-metrics hab
of the universe contains information about all the processes
and field configurations j that are spatially homogeneous,
through its dependence on hab and j , the theory can describe i.e. independent of the 3-coordinates xc . The reduced super-
an evolving universe if an appropriate clock process could space is called minisuperspace. The WD equation in minisu-
be found. perspace becomes a (partial) differential equation for a wave
It is important to realize that the clock process must be re- function (hab , j ). The analysis of this simplified WD equa-
alized by an essentially classical rather than by an essentially
tion is the subject of quantum cosmology.
quantum physical system. In the quantum language, the gen- To be specific, let us consider a model of classical space-
eralized coordinate describing the clock process must exhibit time whose equal-time surfaces are homogeneous 3-spheres
very small quantum fluctuations around a large expectation
S 3 (hence, a closed universe!) with a fixed 3-metric . The
value. If the universe were in a quantum state in which spacetime metric is
quantum fluctuations of every variable (including the metric)
are significant, one could not expect to observe anything re- g = N 2 (t)dt dt a2 (t),
sembling a flow of time. Therefore, even the possibility of an
interpretation of the wave function of the universe depends where N (t) is the lapse function and a(t) is an unknown func-
on the existence of (nearly) classical systems in the universe. tion of time called the scale factor. Since the lapse function
Suppose that a nearly classical subsystem exists and is de- is nondynamical, the scale factor is the only physical vari-
scribed by a variable c . According to a standard result of able in the gravitational sector of the model. Further, suppose
quantum mechanics, the wave function of the subsystem is of there exists a scalar field (t) which is, again, independent of
the semiclassical (WKB) type, the spatial coordinates xc . Thus the minisuperspace is two-
dimensional and consists of two variables, a and . We shall
(c ) exp (iScl (c )) , now quantize this cosmological model using the WD equa-
tion.
where Scl (c ) is the value of the classical action on a classi- We need to express the classical Hamiltonian (5.41) for grav-
cal trajectory (t), expressed as a function of the value of the ity through the variable a(t), and add the Hamiltonian for the
variable c at a final time, scalar field ,
Z t Z  
1
Scl (c ) =
Lc (, )dt, (t) = c . H = d3 xc N h p2 + hab ,a ,b + V () ,
t0 t=const 2

The total wave function of the universe is a product of the where p is the canonical momentum for , and
semiclassical part (c ) and the quantum part q , 1
V () + m2 2 + O(3 )
[hab , j ] = exp (iScl (c )) q [c ; hab , j ]. 2
is a potential describing the vacuum energy density , the
Let us now see how the variable c can be used as a clock mass m, and a possible self-interaction of the field . The

process. The trajectory c (t) of the classical subsystem is, at partial metric h = a2 , so h = a3 . Since the field
least locally, a monotonic function of t. Therefore, the val- is spatially homogeneous, we may integrate over 3-spheres,
ues of c can be used as time and then the quantum part of assuming that the metric is normalized so that
the wave function, q [c ; hab , j ], becomes a time-dependent Z
wave function for the quantum variables hab , j . One can
d3 xc = 3 = 2 2 ,
show (see e.g. [35]) that a Schrdinger-type equation holds for t=const
this wave function, the variable c playing the role of time, if
where
the total wave function satisfies the WD equation. 2 n/2
n
(n/2)
5.3.4 Minisuperspace is the n-volume of a unit n-sphere S n . (This normalization of
The WD equation is a functional differential equation for a the 3-metric means that the radius of the 3-sphere is equal
functional on a superspace and, as such, is too complicated to 1.) Thus we obtain
to be solved in general. One can obtain specific results from  
3 1 2
the WD equation if one simplifies the problem sufficiently H = N a 3 p + V () .
2
drastically.
Note that one of the main applications of General Rela- Returning to the gravitational sector of the model, we need to
tivity is cosmology where one considers only the gross fea- compute the 3-curvature scalar (3) R and the extrinsic curva-
tures of the universe, that is, fields averaged over astronom- ture tensor Kab . The 3-curvature (3) R depends only on the in-
ically large scales. Astronomical observations offer a strong trinsic geometry of the 3-surface t = const, which is a 3-sphere

121
5 Variational principle

S 3 with the natural metric and radius a(t), and thus a space of Finally, we can write the total Hamiltonian for the minisu-
constant curvature; the value of the curvature is a1 (t). The 3- perspace theory,
dimensional Riemann tensor for a space of constant curvature
was computed in Sec. 1.10.3, hence H (pa , a, p , ) = HG + H
  
N G 2 3 2 4 1 2
(3) 2
R(a, b, c, d) = a (h(a, c)h(b, d) h(a, d)h(b, c)) . = pa a + a 3 p + V () .
a 3 4G 2
Using Therefore, the wave function of the mini-universe, (a, ),
(3)
Tr(a,b) h(a, b) = 3, satisfies the WD equation
we find   
G 3 2 1 2
(3) p2a a + 2 2 a4 p + V ()
R = (3) Tr(a,c)(b,d) (3) R(a, b, c, d) = 6a2 . 3 4G 2
 
G 2 2 3 2 
It remains to compute the extrinsic curvature. We may intro- = ~ a a + 2 a4 ~2 2 + 2V () = 0.
duce local coordinates xa on a surface t = const; however, 3 4G
the explicit form of these coordinates will not be necessary This equation needs to be supplemented by boundary condi-
for the present calculation. It is sufficient to note that the ba- tions. However, the choice of the boundary conditions is a
sis 3-vectors ea xa commute with each other and with t . subtle problem and we shall not discuss this issue here. In the
Hence, from Eq. (1.43) we get present case, we shall simply choose the solution of the WD
1 a equation that will have a clear physical interpretation.
g(eb t , ec ) = t g(eb , ec ) = aa(e
b , ec ) = hbc . It is reasonable to suppose that the scale factor a has a semi-
2 a
classical regime where it is the large radius of the 3-sphere
Since the normal vector n to the 3-surfaces t = const is n = t = const. (Our universe is large.) Therefore, in the semiclassi-
N 1 t , the extrinsic curvature tensor is cal regime the value of a can be used as the clock variable.
a We can then express the wave function in the semiclassical
K (x, y) g(x n, y) = N 1 g(x t , y) = N 1 h(x, y). form with respect to the variable a,
a

Then we determine the relevant combination of the traces of (a, ) = exp i~1 S(a) (a, ),
K,
 2  2 where the factor (a, ) is assumed to be a slow-varying func-
a  a tion of a but quickly-varying function of the quantum variable
Kbc K bc Kbb Kcc = hbc hbc hbb hcc = 6 .
Na Na , namely

~ .


The gravitational Lagrangian becomes a
Z
3 N h (3)
  Then we can expand the WD equation in powers of ~, substi-
LG = d x R + Kab K ab Kaa Kbb tuting the potential V (),
t=const 16G
Z  2 !  
3 N h 6 a G 2 9 2 a2 6 3 4
= d x 6 . S + a V0 (5.43)
t=const 16G a2 Na 3 4G2 G
iG~ 
We can now integrate over the 3-volume of the 3-sphere t = (S + 2S a ) + 2 a4 ~2 2 + m2 2 = 0. (5.44)
3
const, Z
The first line (5.43) above is of zeroth order in ~ and thus must
d3 x h = a3 3 ,
t=const be satisfied separately. This is an equation determining the
because all the terms are spatially homogeneous. Therefore, behavior of the metric variable a. Its solution is
Z r
6N a3 3 1

a 2

3

1 2
 6 3 4 9 2 a2
L= = N a 1 a
. S(a) = da a V0 .
16G a2 N 2 a2 4G N2 G 4G2

To compute the Hamiltonian, it remains to determine the To visualize the physical interpretation of this solution, we
canonical momentum pa , note that Eq. (5.43) is formally analogous to the semiclassi-
cal form of the stationary Schrdinger equation for a one-
L 3 aa dimensional particle in a potential
pa = = .
a 2G N
9 2 a2 3 3 4
Hence, U (a) a V0 .
2G N 8G2 G
a =
pa ,
3 a This particle can tunnel from the initial state at a = 0 to the
and the gravitational Hamiltonian is value r
3
HG (pa , a) = pa a LG a = a0 .
  8GV0
3 N 2 2G 3 4G2 The regime a > a0 corresponds topa classical motion of the
= Na + pa +
4G a 3 4G 9 2 particle with the velocity da/d = 2U (a), while for 0 < a <
3 G N 2 a0 the motion is classically forbidden since U (a) < 0. The tun-
= Na p .
4G 3 a a neling process is interpreted as the creation of a closed universe

122
5.3 Quantum cosmology

having the initial size a0 . Since the assumed initial state a = 0


corresponds to a sphere of zero radius or, in other words, to
the absence of space, this process is also called creation of a
universe from nothing.
Consider now the second part (5.44) of the WD equation. In
the semiclassical regime, S(a) is a slow-changing function of
a, so we can disregard |S | S 2 . Since a is a well-defined
semiclassical variable in the regime a > a0 , we may define
the time according to the heuristic picture of a moving
particle with the coordinate a. It is convenient to define

3 3 a4 da 3 3 a4 da
d = p = ,
2G 2U (a) 2G S (a)

and express a through in the range a > a0 . Then Eq. (5.44)


can be rewritten as

.
i~ = ~2 2 + m2 2 H

This is a familiar time-dependent Schrdinger equation for a
quantum system with the Hamiltonian H .
In this way, we find that the time-independent wave func-
tion of the universe (a, ), when properly interpreted in the
regime when one of the variables is classical, yields a familiar
picture of the evolution of quantum systems with time. Note
that in the classically forbidden regime 0 < a < a0 , no time
can be defined, and a classical interpretation of the wave func-
tion is impossible.

123
6 Tetrad methods
The tetrad formalism is covered in the books [21, 33, 36] In a local coordinate system {x } where the metric is spec-
with varying degrees of clarity and detail. ified as a set of components g (p), the components ea (p) of
Note: The material on vector bundles is standard but needs the tetrad may be found by the following procedure. At a
to be expanded. Also, there should be more material on fixed spacetime point p, consider the matrix g (p) as a bilin-
tetrad formalism: derivation of Einstein equation, as well as ear form. This bilinear form may be diagonalized by a linear
an explanation of tetrad formalism in terms of vector bun- coordinate transformation, according to standard procedures
dles (making geometric sense of tetrad indices, covariant of linear algebra. Suppose that {t, x, y, z} are new coordinates
exterior differential, etc.). in which the metric is diagonal at the point p, e.g.

A 0 0 0
0 B 0 0
6.1 Tetrad formalism g (p) = diag (A, B, C, D) 0 0 C 0

In the standard approach, General Relativity is formulated as 0 0 0 D


a theory of the metric. In other words, the main variable is with A, B, C, D > 0. Then it is easy to write an orthogonal
the metric tensor g, while the connection and the curvature are basis in the tangent space Tp M, for instance
expressed through the metric. An alternative formulation is 1 1
through orthonormal bases of vector fields (called tetrads). e0 (p) = t , e1 (p) = x ,
A B
This formulation is convenient for many calculations, and also
1 1
serves to connect gravity with spinor field theory. Some ap- e2 (p) = y , e3 (p) = z .
proaches to quantum gravity are based on the tetrad formula- C D
tion. In this way, a tetrad may be determined at every point p of the
For convenience, we will restrict attention to a four- manifold.
dimensional manifold having a metric with Lorentzian sig- Conversely, if a tetrad {ea (p)} is given at every point p of
nature (+ ). Generalizations to other dimensions and a manifold M then we may recover the metric tensor g ex-
signatures are straightforward. plicitly in the following way. First we need to compute the
four 1-forms a (a = 0, 1, 2, 3) that comprise the dual basis to
{ea }. Here and below we denote vectors and forms by bold-
6.1.1 Tetrads
face symbols. (It is convenient to write a as an upper index
a
If a metric tensor g on a manifold M is specified, one can al- in , even though we presently do not consider it a tensor
ways choose an orthonormal basis {e0 (p), e1 (p), e2 (p), e3 (p)} index.)
in each tangent space Tp M. One of the tetrad vectors must
be timelike and the other three must be spacelike (because the Construction of the dual basis: We work in the tangent
metric g has a Lorentzian signature). It is a useful convention space Tp M at a point p of a manifold M. Since {ea (p)} is a
to choose the vector e0 timelike and the other three spacelike, basis in Tp M, any vector v Tp M is uniquely decomposed
so that the scalar product matrix is as a linear combination
X
v= a ea ,
g(ea , eb ) = ab diag (1, 1, 1, 1) , a

where a are some numbers. When the basis {ea } is fixed,


where ab is understood as a fixed matrix (which is numer-
the numbers a are linear functions of v, and thus are 1-forms
ically the same as the standard metric in Minkowski space-
which we denote a ; in other words, a a (v). These 1-
time). Note that the Latin indices label the basis vectors and
forms a yield the components of a vector in the basis {ea },
run (for convenience) from 0 to 3. Since these indices are not
so that
tensor indices, we will indicate summations explicitly. X
v= a (v)ea .
We called the basis {ea } orthonormal even though the
a
norm of some vectors is equal to 1, as it must be since the
This relationship holds for any vector v; sometimes this is
metric has Lorentzian signature. Essentially we have chosen
written as a decomposition of the identity operator,
a set of vector fields {ea (p)}, defined at every point p. Such an X
orthonormal set of vector fields {ea (p)} is called a tetrad or a =
1 ea a . (6.1)
vierbein in case of four-dimensional manifolds, and a frame a
field or vielbein in any number of dimensions.1 a
The set of 1-forms { } is called the dual tetrad and is a basis
Remark: We call the basis {ea } orthonormal even though in the dual tangent space T M(p). Sometimes one calls a
the norm of some vectors is equal to 1, as it must be since simply the tetrad for brevity.
the metric has Lorentzian signature.  In any case, it is straightforward to pass from {ea } to {a }
and back. In a local coordinate system, the dual basis has com-
1 The German words vierbein and vielbein are pronounced approxi- ponents a which comprise a 4 4 matrix satisfying
mately as the English nonwords fear-bine and feel-bine respectively. X
The corresponding French term is repre, pronounced close to rep- a v ea = v
pair.
a

125
6 Tetrad methods

for any v . Therefore, without changing the metric g. Here, ba (p) is a ma-
X trix representing a Lorentz transformation in the fiducial
a ea = , Minkowski spacetime. This matrix ba (p) may be cho-
a sen arbitrarily at every point p and thus is called a local
Lorentz transformation. Conversely, any two orthonor-
so the matrix of components a can be computed as the in- mal bases {ea } and { ea } are connected by a Lorentz
verse matrix to ea .  transformation. Therefore, the local Lorentz transforma-
Once the 1-forms a , a = 0, 1, 2, 3, are computed, the metric
tions ba (p) represent the entire freedom of choosing the
tensor g is expressed as tetrad field for a fixed metric g.
!
X
a
X
b
X A tetrad is essentially a set of four vector fields which
g(u, v) = g (u)ea , (u)eb = ab a (u) b (v).
must be linearly independent at every point. Since it
a b a,b
is impossible to have globally nonzero vector fields on
(6.2)
some manifolds (e.g. on a 2-sphere), the tetrad fields
Rewritten in a more condensed notation, the above formula
{ea (p)} might have to be defined only locally. In other
becomes X words, different tetrad fields will have to be defined
g= ab a b .
within different charts covering the manifold.
a,b

In components, It is clear that the 1-forms a can be expressed through ea


X and the metric as follows,
g = ab a b .
a,b
X
a (v) a v = aa g(ea , v) = ab g(eb , v),
This is an explicit formula that expresses the metric tensor g b
through the dual tetrad. Analogously, the inverse metric g 1
where v is an arbitrary vector. In the notation of Sec. 1.5.5,
is expressed through the tetrad vectors {ea } as
we may rewrite this as
X
g 1 = ab ea eb . X
a = aa gea = ab geb .
a,b
b

Thus, in order to define a metric structure on a manifold, (There is no summation over a in the expressions aa gea
one may specify a tetrad {ea } or alternatively a dual tetrad above.) Often it is convenient to raise and lower the Latin
{ a } instead of specifying the metric tensor g. (tetrad) indices using the fiducial Minkowski metric ab .
Thus, one defines
Remarks:
X
The tetrad {ea (p)} can be viewed as a map from a fixed, a aa a = ab b , (6.3)
four-dimensional spacetime R4 to the tangent space b

Tp M. A vector and writes equations such as



v a v 0 , v 1 , v 2 , v 3 R4 g(ea , v) = a (v). (6.4)

is mapped to the tangent vector However, we stress that the dual tetrad { a } can be cal-
X culated from {ea } without knowledge of the metric tensor
v v a ea (p) Tp M. g.
a
The dual tetrad {a } is orthonormal with respect to the
a
This map v v is compatible with the scalar product inverse metric,
in R4 defined by the matrix ab and the scalar product in g 1 ( a , b ) = ab .
Tp M defined by the metric g, namely
Thus { a (p)} is an orthonormal basis in the cotangent
X space T M(p).
ab ua v b = g(u, v) if ua u, v a v.
a,b A note on terminology: Sometimes the frame basis {ea }
or the dual basis {a } are called nonholonomic bases,
Therefore we refer to the abstract space R4 and the met- while the coordinate basis {/x } is called holonomic.
ric ab as the fiducial Minkowski spacetime and metric. This terminology will not be used in the present text; in-
This fiducial flat spacetime can be interpreted as the in- stead we call {ea } an orthonormal frame basis. The term
stantaneous reference frame of a certain observer. This holonomic seems to come from mechanics textbooks.
observers four-velocity instantaneously coincides with In theoretical mechanics, there is a notion of holonomic
the timelike vector e0 (p) and the observers spatial axes and nonholonomic constraint equations. A constraint
are chosen along the directions e1 , e2 , e3 . equation is a relation of the form
There are infinitely many possible tetrads corresponding X dq j
to one and the same metric g. A given tetrad {ea } can be Aj (q) = 0,
dt
transformed via a Lorentz rotation, j

X 
ea (p) e
a (p) ba (p)eb (p), where q q j Rn is a time-dependent vector of gen-
b
eralized coordinates and Aj (q) are some coefficients. A

126
6.1 Tetrad formalism

constraint equation is called holonomic if it can be rewrit- Example 2: Suppose now that we are given a two-
ten as a total time derivative of a function, i.e. if there ex- dimensional spacetime with local coordinates {t, x} and the
ists a function (q) such that the constraint equation is frame fields
simply d(q)/dt = 0, in other words if e0 = t , e1 = eHt x .

(q) The task is to determine the metric tensor g such that this
Aj = , j = 1, ...n. frame is orthonormal.
q j
We can recover the metric by first determining the dual
We can reformulate these statements more frame,
P transparently
by introducing the auxiliary 1-form A j Aj dq j . Then 0 = dt, 1 = eHt dx,
the constraint equation is rewritten as A q = 0, where and then writing
q is the tangent vector to the trajectory q(t). The con-
straint is holonomic if there exists a scalar function (q) X
g= ab a b = dt dt e2Ht dx dx
such that A = d; in the standard terminology, the con-
a,b
straint is holonomic when the 1-form A is exact (equal
to a differential of something). So one might call a ba- dt2 e2Ht dx2 .
sis { a } holonomic if there exist functions xa such that
a = dxa , and nonholonomic otherwise. An orthonor- This metric is the two-dimensional version of de Sitter met-
mal basis is holonomic only if the space is flat and {xa } ric (1.37). 
a
are Cartesian coordinates; in all other cases one has d 6=
0 at least for some a, so the basis is nonholonomic. Sim- 6.1.3 Hodge duality
ilarly, the orthonormal basis vectors ea usually cannot be
represented as partial derivatives with respect to some Using the metric g, one determines the Levi-Civita tensor ,
coordinates; accordingly, some commutators [ea , eb ] are and then one can establish a one-to-one linear map between
nonzero. Thus one calls such {ea } a nonholonomic ba- n-forms and (4 n)-forms. This map is called the Hodge du-
sis. However, what is important for our considerations ality operation. Since this operation is used relatively little in
is the orthonormality of the chosen frame basis {ea } with the present text (it rarely yields a computational advantage), I
respect to a given metric g. If we called {ea } is a non- only sketch the construction on examples and list some basic
2
holonomic basis, we may create an impression that we se- properties of the Hodge duality.
lected some nonholonomic basis, whether orthonormal or We work with a four-dimensional manifold where the met-
not. However, an arbitrary non-orthonormal, nonholo- ric g and the Levi-Civita tensor are known. First consider a
nomic basis would be much less useful in calculations 1-form . The Hodge duality map yields a 3-form called
than an orthonormal basis. For this reason, we call {ea } the Hodge dual to ; this operation is also called the Hodge
an orhonormal frame basis (or a vielbein) rather than star. The 3-form is defined by its action on arbitrary vec-
a nonholonomic basis.  tors x, y, z:

( ) (x, y, z) (
g 1 , x, y, z).
6.1.2 Examples
Somewhat more awkwardly, this definition can be rewritten
Example 1: Consider the Schwarzschild metric (1.36). The as
task is to determine a tetrad basis and a dual tetrad basis. = g1 .
Since the metric is in a diagonal form, a possible choice of
It is clear that 7 is a linear map in (if we regard the
the tetrad is
metric g and the tensor as fixed).
 1/2  1/2
2M 2M Example: Consider a four-dimensional Minkowski space
e0 = 1 t , e1 = 1 r ,
r r with the orthonormal basis of 1-forms {dt, dx, dy, dz}. The
1 1 Levi-Civita tensor is
e2 = , e3 = .
r r sin
= dt dx dy dz.
Note that the tetrad is undefined (singular) if r = 0, r = 2M ,
or sin = 0 because at these locations the coordinate system Let us compute the Hodge star applied to the 1-form dt. Using
(t, r, , ) and/or the metric g are singular. This is an exam- the vector g1 dt = t , we find
ple of a tetrad defined only locally, that is, only within a cer-
tain limited coordinate patch, rather than globally in the entire (dt) = t = dx dy dz.
spacetime. More than one coordinate patch must be used to
cover the entire Schwarzschild spacetime. Similarly, we have
a a
The dual basis { } corresponding to the tetrad {e } is
(dx) = x = dt dy dz.
 1/2  1/2
2M 2M Heuristically, must contain the complement of , so that
0 = 1 dt, 1 = 1 dr,
r r is proportional to . 
2 3
= rd, = r sin d. 2
I was motivated to make this excursion into the Hodge duality by read-
ing the lecture notes [13], chapter 6, where I learned about Statement
 6.1.3.2(a).

127
6 Tetrad methods

Another convenient way to express is to use an orthonor- In four dimensions, the Hodge star establishes a linear map
mal basis {ea }. Since from 0-forms (scalars) to 4-forms, from 1-forms to 3-forms,
X etc. In particular, the 4-form is mapped into a scalar. The
g1 = ab (eb ) ea , numerical factor 1/n! leads to the relation = 1:
a,b
1 X aa dd
we can write = ... (ed ...ea ) (ed ...ea )
X 4!
= aa (ea ) (ea ) . a,b,c,d

a = 00 11 22 33 = det ab = 1.
This expression can be also used as a definition of . By con-
struction, this definition is independent of the choice of the Example: The Hodge dual to a scalar function (0-form) f
basis {ea }, as long as this basis is orthonormal with respect is a 4-form f = f ; taking the Hodge dual again, one gets
to the fixed metric g and positively oriented with respect to (f ) = f . Therefore, f = f for scalar functions f .
. (We merely rewrote a basis-free definition in terms of an Let {a } (a = 0, 1, 2, 3) be an orthonormal basis of 1-forms;
arbitrary orthonormal basis.) The basis independence can be then we have
made manifest by expressing through the trace operation 
0 = e0 0 1 2 3 = 1 2 3 ;
as follows, 
= Tr(a,b) (a ) (b ) . 3 = e3 0 1 2 3 = 0 1 2 ;

It is clear that this can be used as a definition of equivalent (0 1 ) = e1 e0 0 1 2 3 = 2 3 ;
to the definitions above. (2 3 ) = 0 1 ; (1 2 3 ) = 0 ; etc.
Similarly, a 2-form (a, b) is converted to its Hodge dual
, which is also a 2-form. This 2-form can be defined using In four dimensions, one can derive a more convenient
an orthonormal basis {ea } as follows, formula for the Hodge dual to a 4-form by noting that
(ea , ..., ed ) 6= 0 only when all the indices a, b, c, d are differ-
1 X aa bb
(eb ea ) (eb ea ) . (6.5) ent:
2!
a,b 1 X aa dd
= ... (ed ...ea ) (ed ...ea )
As before, this definition is independent of the choice of the 4!
a,b,c,d
orthonormal basis {ea }. This can also be seen with the fol-
= (det ab ) (e1 , e2 , e3 , e4 ) = (e1 , e2 , e3 , e4 ).
lowing trick. Any 2-form can be decomposed into a linear
combination of exterior products of 1-forms. Since the Hodge It is clear that this formula extends from four-dimensional to
duality is a linear map, we only need to consider a single exte- N -dimensional space,
rior product = , where and are 1-forms. The 2-form
( ) is then expressed as = (det ab ) (e1 , ..., eN ), (6.6)
g 1 , g1 , x, y).
(( )) (x, y) = ( where ab is the canonical form of the N -dimensional metric
(i.e. the form where the metric is diagonal and has only 1 on
It is straightforward to verify that this expression is equivalent
the diagonal). 
to Eq. (6.5). When the definition of is written in this basis-
The action of the Hodge star on basis 1-forms {a }, as il-
free manner, it becomes manifest that depends only on the
lustrated by the examples above, can be summarized by the
metric g but not on the choice of the basis {ea }. However, cal-
following formulas:
culations using an explicit basis {ea } are much more conve-
nient since one does not need to decompose all n-forms into 1 X abcd
linear combinations of exterior products of 1-forms. Some- 1 = a b c d ,
4!
a,b,c,d
times also the index-free trace notation is convenient,
1 X abcd
1 a = b c d ,
Tr(a,a )(b,b ) (b a ) (b a ) . 3!
b,c,d
2!
1 X abcd
So far we have seen the definitions of for 1-forms and (a b ) = c d ,
2!
2-forms. To express the Hodge dual of an n-form , one can c,d
X
build an expression analogous to Eq. (6.5) but with n sets of ( a b c ) = abcd d ,
vectors and the factor 1/n! in front: d

1 X a1 a1 an an  
... ean ...ea1 ean ...ea1 where abcd is not a tensor but a totally antisymmetric array of
n! a ,...,a numbers (the Levi-Civita symbol) defined such that 0123 = 1.
1 n

1 (It is important to note that the 1-forms a differ from a by


= Tr(a1 ,b1 )...(an,bn ) (an ...a1 ) (bn ...b1 ) . the sign factor aa .)
n!
The example above shows that f = f for a scalar f .
Here we wrote the arguments eaj in the reverse order for con- Since = 1, we also have = . A similar property
venience; recall that holds for any n-form .
ean ...ea1 (ea1 , ..., ean ). Statement 6.1.3.1: For any n-form in N -dimensional man-
ifold with canonical metric ab , one has
As before, this definition of depends only on the metric g
and on but not on the choice of the orthonormal basis {ea }. = (det ab ) (1)(N n)n .

128
6.1 Tetrad formalism

(Proof on page 172.)  Statement 6.1.3.2: If is a 2-form and is the 1-form corre-
It is not particularly easy to perform calculations with the sponding to the vector v = g1 , then
Hodge star operation. For instance, there are no general iden-
tities relating the Hodge star to other operations, such as = (v ), (6.8)
the exterior differential or the Lie derivative; so in general
( ) 6= , d 6= d, Lx 6= Lx , etc. When do- where both sides are 3-forms.
ing calculations with the Hodge star, one needs to use either Hint: Use Statement 6.1.3.2. (Proof on page 173.) 
an explicit basis {ea }, or the index-free trace formalism.
Practice problem: Verify Eq. (6.8) for = 0 1 , = 1 ,
It follows from the example above that
and for = 2 3 , = 2 . 
0 0 = ; ( 0 1 ) ( 0 1 ) = ; etc.
6.1.4 Levi-Civita connection
On the other hand, it is easy to see that
Given a metric g, one may determine the Levi-Civita connec-
0 1 = 0; ( 0 1 ) ( 0 2 ) = 0; etc. tion explicitly (see Eq. (1.43) in Sec. 1.6.6). It turns out that
the Levi-Civita connection can be expressed directly through
It seems that the combination 1 2 is nonzero on basis the tetrad. We begin by assuming that the metric tensor g and
n-forms only when the two forms are equal. This combina- the Levi-Civita connection are known, and then derive the
tion is indeed a useful function of two n-forms. The following formula for the covariant derivative of the tetrad field. Sub-
statement lists some properties of this function. sequently, it will be clear how to recover the connection from
Statement 6.1.3.2: (a) If 1 and 2 are two n-forms in an N - the tetrad without explicitly involving the metric g.
dimensional manifold, one can compute the N -form 1 (2 ) Given a tetrad {ea }, we would like to be able to compute the
that is symmetric under interchange of 1 and 2 : covariant derivative u v for arbitrary vector fields u, v. Since
{ea } is a basis, every vector field v can be expressed through
1 2 = 2 1 . {ea } as
X
Thus the scalar (1 2 ) is a symmetric bilinear function in v= v a ea ,
a
the space of n-forms. This bilinear function can be viewed as a
a a
natural scalar product in the space of n-forms. In particular, if where v v (p) is a set of four scalar functions (the Latin
{ a } is an orthonormal basis of 1-forms, then an orthonormal index is not a tensor index). Therefore it is sufficient to be able
basis in the space of n-forms is the set of exterior products to calculate u v for u = ea and v = eb , where a, b = 0, 1, 2, 3.
According to Eq. (1.43), we have
{a1 ... an } ,
1
with all possible sets of indices {a1 , ..., an } for which these ex- (x y, z) = x g(y, z) + y g(x, z) z g(x, y)
2 
terior products are nonzero. g(x, [y, z]) g(y, [x, z]) + g(z, [x, y]) .
(b) If 1 , 2 are two n-forms, the combination 1 2 is
proportional to the volume element with the following coeffi- This formula enables us to calculate g(ea eb , ec ) if we can
cient, evaluate terms such as ea g(eb , ec ) and g(ea , [eb , ec ]). Since
g(eb , ec ) = bc is (by construction) a fixed numerical matrix,
1 2 the directional derivatives of ab are zero, so ea g(eb , ec ) = 0.
Vol Thus we have
= Tr(a1 ,b1 )...(an,bn ) 1 (a1 , ..., an )2 (b1 , ..., bn ). (6.7)
n!
g(ec , [ea , eb ]) g(ea , [eb , ec ]) g(eb , [ea , ec ])
(c) Suppose 1 and 2 are 1-forms corresponding to vectors g(ea eb , ec ) = 2
.
v1 and v2 , i.e. 1 = gv1 , 2 = gv2 .Then the scalar product
defined in part (a) coincides with the natural scalar product of We may rewrite this formula using the lower-index version of
1-forms given by the inverse metric g 1 . More precisely, the dual tetrad c [see Eqs. (6.3) and (6.4)] as

1 2 = g 1 (1 , 2 ) Vol = g(v1 , v2 ) Vol. c [ea , eb ] a [eb , ec ] b [ea , ec ]


c ea eb = . (6.9)
2
(d) If is a 2-form and 1 , 2 are 1-forms corresponding to
vectors v1 and v2 , one has Since c v is the c-th component of a vector v in the ba-
sis {ea }, the formula (6.9) completely determines the vector
(1 2 ) = (v1 , v2 ) Vol. ea eb if the basis vectors ea and their commutators [ea , eb ]
are known.
(Proof on page 172.)  We point out that the essential information required
for computing the covariant derivative is the commutators
Remark: It is interesting to note what happens if 1 is an n1 - [ea , eb ]. In general, these commutators can be expressed as
form and 2 is an n2 -form with n1 6= n2 . In that case, 1 2
is an (n1 + N n2 )-form while 2 1 is an (n2 + N n1 )- X
[ea , eb ] = C c ab ec ,
form. If n1 6= n2 , either n1 + N n2 > N or n2 + N n1 > N . c
Hence, one of the expressions 1 2 or 2 1 is identically
zero as an n-form with n > N , while the other expression is where C c ab = C c ba are some scalar coefficients (which are,
not necessarily zero. It is clear that Statement 6.1.3.2 cannot of course, functions of the spacetime point p). The coefficients
be extended to this case.  C c ab can be explicitly computed from {ea } without need to

129
6 Tetrad methods

involve the metric g; recall that the commutator [u, v] is a ge- Just as the commutator of vector fields is defined geomet-
ometric operation defined independently of the metric. Then rically without involving the metric, the exterior differential
one has complete information about the Levi-Civita connec- da is a 2-form defined geometrically by
tion through Eq. (6.9). The formula (6.9) can be rewritten
as da (u, v) = u a (v) v a (u) a ([u, v]).
1
cc c ea eb c ea eb = (Cc ab Cb ac Ca bc ) , (6.10) Substituting the tetrad vectors eb and ec instead of u, v, we
2 have
where Cc ab c
is obtained from C ab by lowering the first index d a (eb , ec ) = a ([eb , ec ]) C a bc .
through ac . Alternatively, we may write So the 2-forms da carry essentially the same information as
1X the commutators of the tetrad vectors. Hence, we expect to be
ea eb = (Cc ab Cb ac Ca bc ) cc ec . (6.11) able to express the Levi-Civita connection through the dual
2 c
tetrad a and its differentials da . To do this, one uses some
clever tricks, which we will now show.
At this point, one can compute covariant derivatives u v of
arbitrary vector fields u, v by expressing them through the ba- Since {a } is a basis in the dual space, the 2-form da can
be always decomposed as
sis {ea } and using the Leibnitz rule.
1X a b
Example: We have a two-dimensional spacetime with local d a = X bc c
coordinates {t, x} and the given orthonormal frame 2
b,c

e0 = t , e1 = eHt x . with some coefficients X a bc = X a cb . It is easy to see that


these coefficients are directly related to C a bc (see the following
The task is to compute u v, where u and v are t or x . Calculation).
Let us first determine ea eb for all a, b. We begin by calcu-
lating the commutators [ea , eb ]. The only nontrivial commu- Calculation: Show that X a bc = C a bc .
tator is Solution: Consider the 2-form da applied to arbitrary vec-
[e0 , e1 ] = He1 . tors u, v. Decompose u and v through the tetrad and compute
It follows that C 1 01 = C 1 10 = H are the only nonzero !
c c X X
coefficients among C ab . Lowering the first index of C ab , we a
(d ) (u, v) = (d ) a b
u eb , c
v ec
find C1 01 = C1 10 = H. Then we use Eq. (6.11) and compute b c
X X
e0 e0 = 0, (no nonzero Cc ab ) = ub v c (da ) (eb , ec ) = C a bc ub v c .
b,c b,c
1
e0 e1 = (C1 01 C1 01 ) 11 e1 = 0,
2 On the other hand, ub b u and v c c v, so
1
e1 e0 = (C1 10 C1 01 ) 11 e1 = He1 ,
2 X X 
1 X a bc b c (u, v) = X a bc ub v c uc v b
e1 e1 = (C1 10 C1 10 ) 00 e0 = He0 .
2 b,c b,c
X
Finally, we use the Leibnitz rule: =2 X a bc ub v c .
  b,c
t x = e0 eHt e1 = e0 eHt e1 + eHt e0 e1
Since u, v are arbitrary vectors, we obtain X a bc = C a bc . 

= t eHt eHt x = Hx .
Let us now see how to recover the Levi-Civita connection.
Similarly, we find t t = 0 and x x = He2Ht t .  So far we have the property

Remark: In the Koszul formula (1.43), terms with scalar 1X c a


dc = C ab b , (6.12)
products vanish for an orthonormal basis; however, terms 2
a,b
with commutators remain. If we choose the basis vectors as
coordinate derivatives (this is the choice usually made in which shows that the coefficients C c ab can be easily computed
calculations using index notation), the terms with commuta- by decomposing dc into the basis of a b . However, ac-
tors will vanish but terms with scalar products will remain. At cording to Eq. (6.10), covariant derivatives of vectors involve
first glance, it is not obvious which choice of basis has more a certain combination of C c ab ,
advantages in calculations. However, it turns out that in many
1
cases calculations are shorter in an orthonormal basis.  c ea eb = (Cc ab Cb ac Ca bc ) ,
2

6.1.5 Connection as a set of 1-forms rather than simply C c ab . (Here and below we freely lower
and raise all Latin indices by implicitly using the fiducial
In the previous section we have seen that the information Minkowski metric ab .) Now we note that the scalar expres-
about Levi-Civita connection is carried entirely by the com- sion c u eb linearly depends on the vector u and is thus
mutators [ea , eb ] and can thus be recovered without explicitly equal to some 1-form applied to u. Let us denote that 1-form
involving the metric g. A more elegant (and practically more by c b ; in other words, by definition, we set
useful) description of the Levi-Civita connection is obtained
by considering the dual tetrad {a }. c b (u) c u eb .

130
6.1 Tetrad formalism

Another way to restate the definition is to say that the 1-form which can be easily read from Eq. (6.14). It turns out3 that
c b satisfies X the two requirements (6.16), (6.17) are sufficient to specify c b
c
u eb = b (u)ec (6.13) uniquely, so that one does not need to use Eq. (6.14). In prac-
c tice, it is often quicker to guess the solution of Eqs. (6.16),
for any vector u. The 1-forms b are called the connection 1- (6.17) than to follow the straightforward but longer for-
c

forms or the spin connection corresponding to the dual tetrad mula (6.14).
a .
Example: Consider a two-dimensional spacetime with local
It is clear from Eq. (6.10) that the 1-forms c b (that is, c b
coordinates {t, x} and a frame field {e0 , e1 } defined as
with the first index lowered) satisfy
1X e0 = ef (t) t , e1 = eh(t) x ,
c b (Cc ab Cb ac Ca bc ) a . (6.14)
2 a where f (t), h(t) are scalar functions depending only on t (but
not on x). The task is to compute the connection 1-forms c b
Thus, the connection forms c b may be computed straightfor-
c and the covariant derivatives t x and x x .
wardly from the commutator coefficients C ab . After comput-
Solution: The metric corresponding to the given frame
ing c b , covariant derivatives of arbitrary vector fields can be
field is
expressed as follows,
g = e2f (t) dt2 e2h(t) dx2 .
X
u eb = c b (u)ec , We begin by defining the dual frame,
c
0 = ef (t) dt, 1 = eh(t) dx,
X  X  X
u v = u v b eb = u v b eb + v b c b (u)ec .
b b b,c
and the differentials,
In terms of the components v c of a vector field v, we find the
d 0 = 0, d1 = he h(t) dt dx = he h(t)f (t) dx 0 ,
formula
X Note that we intentionally wrote d 1 in
(u v)c c u v = u v c + c b (u)v b . (6.15) where h t h(t).
b the form (...) rather than (...) 1 , trying to satisfy the
0

antisymmetry property (6.17) which entails 0 0 = 1 1 = 0.


It is clear that c b play the role of the Christoffel symbols in Then we can try to guess a solution c b of Eq. (6.16) as follows,
the formula for the covariant derivative.
The formula (6.14) is not necessarily the quickest way to 00 =
0 1 = 0, h(t)f (t) dx,
1 0 = he 1 1 = 0.
compute c b . Comparing it with Eq. (6.12), we notice that it
might help to compute the following expression, Despite our efforts, the 1-forms a b are not yet the correct con-
nection forms because they do not satisfy the antisymmetry
X 1X property (6.17). To correct this, we may add a multiple of d0
c b b = (Cc ab Cb ac Ca bc ) a b
2 1 0 and a multiple of d1 to
to 0 1 . Presently, it is clear that
b a,b
1X we only need to add he h(t)f (t)
dx to 0 1 . So the correct set of
= Cc ab a b . the connection 1-forms is
2
a,b
0 1 = 0 1 = 1 0 = 1 0 = he h(t)f (t) dx.
(The simplification in the second line happens because the
terms symmetric in a, b cancel). Then we can raise the index c, The covariant derivatives are now computed using the for-
use Eq. (6.12), and obtain the following relationship, mula (6.15). Consider the vector v = x = eh(t) e1 . In the or-
X thonormal frame {e1 } the vector v has a single nonzero com-
d c = cb b. (6.16) ponent, v 1 = eh(t) . Therefore Eq. (6.15) is rewritten as
b
c
The relationship (6.16) is called the first Cartan structure (u v) = u v c + c 1 (u)v 1 .
equation.
A direct evaluation of components yields
An immediate practical importance of this equation is that
it can be used, instead of Eq. (6.14), to determine the 1-forms 0
(u v) = 0 1 (u)eh(t) = he 2h(t)f (t) u x,
c b . Of course, Eq. (6.16) alone does not specify c
b uniquely.
P
For instance, the 1-forms 12 a C c ab a are also a solution of (u v)1 = u eh(t) = he h(t) u t.
Eq. (6.16), although these 1-forms do not coincide with c b .
Therefore, for arbitrary u,
More generally, if c b is any solution of Eq. (6.16), one may
add the following term, n
h(t) eh(t)f (t) (u x) e0 + (u t) e1 .
o
u v = he
X
cb cb + B c ab a ,
a Substituting u = t or u = x , we find

where B c ab = B c ba is an arbitrary array of coefficients sym- h(t) e1 = h


t x = he x;
metric in (a, b). The new c b will still be a solution of 2h(t)f (t) e0 = he
2h(t)2f (t) t .
Eq. (6.16). However, in addition to Eq. (6.16) the connection x = he
x

1-forms c b must satisfy the antisymmetry requirement, 


c b = b c , (6.17) 3 See Sec. 6.1.6 for a general proof of this statement.

131
6 Tetrad methods

6.1.6 *Solving equations for n-forms In other words,


1X a
In Sec. 6.1.5 we found that the connection 1-forms a b can ea
n a
be found by solving the Cartan structure equation (6.16). In
general, there are several tricks for finding general solutions is the identity operator on n-forms.
of linear equations for n-forms in the frame basis.
The following statement shows how to solve the equations Proof of Statement 6.1.6.2: Let us apply the n-form at the
such as the Cartan structure equation in a more general case. right-hand side of Eq. (6.22) to a set of n arbitrary vectors
Statement 6.1.6.1: If any set of 2-forms Ac is given (where {v1 , ..., vn }. By definition (1.19) of the exterior product, we
c is a frame index) and {c } is a dual frame basis then there have
exists a unique set of 1-forms a b such that !
X
X a
Aa + a b b = 0; a b = b a . ea (v1 , ..., vn )
a
b
n
X X
j1
The 1-forms a b can be expressed by the formula = (1) a (vj ) (ea , v1 , ..., v
j , ..., vn ) ,
j=1 a
X c 1
a b = a bc , a bc (Aa bc Ab ac Ac ab ) , (6.18)
c
2 where the hat over v j means that the j-th vector is absent from
the list of arguments. Since the n-form is linear in each ar-
where Ac ab are the coefficients in the decomposition gument, we may substitute
1X
Ac = Ac ab a b , Ac ab = Ac ba . X a
(vj ) (ea , v1 , ...) =
X
(a (vj )ea , v1 , ...)
2
a,b
a a

Idea of proof: Guess the formula for a solution a b and = (vj , v1 , ...) .
then show that the difference a b a b of two possible so-
lutions is zero. (Proof on page 173.)  Hence,
A consequence of Statement 6.1.6.1 for Ac = dc , a b !
a b is that the system of equations (6.16), (6.17) has the unique X
a
ea (v1 , ..., vn )
solution (6.14).
a
The formula (6.18) can be also rewritten in a component- n
free manner, using the frame bases {ea } and {a } instead:
X j1
= (1) j , ..., vn )
(vj , v1 , ..., v
! j=1
1 X c n
a b = eb Aa ea Ab Ac (ea , eb ) . (6.19) X
2 c = (v1 , ..., vn ) = n (v1 , ..., vn ) .
j=1
It is clear that this expression is antisymmetric in (a, b). The
corresponding solution for the connection 1-forms can be This shows that Eq. (6.22) holds. 
written as Using Statement 6.1.6.2 with n = 2, we can transform
1 1 1X Eq. (6.21) as follows,
a b = eb da ea d b (eb ea d c ) c . (6.20)
2 2 2 c X X
P 2 a b b = 2Aa (ea Ab ) b
b
It is instructive to check directly that Aa + b a b = 0 b b
!
holds with a b defined by Eq. (6.19). We compute X X
X X X + eb (ea Ac ) c
b
b b b
2 a b = (eb Aa ) (ea Ab ) c b
X X
b b
X
b
= 2Aa (ea Ac ) c + (ea Ac ) c
c b
Ac (ea , eb ) . (6.21) c c
b,c = 2Aa .
To proceed, we note that expressions such as P
X Thus Aa + b a b b = 0.
(eb Aa ) b A useful alternative way of writing the formula (6.19) is
b shown in the following calculation.
are reminiscent of the decomposition (6.1). The following Calculation 6.1.6.3: The formula (6.19) can be rewritten
statement shows how such expressions can be simplified. equivalently as
Statement 6.1.6.2: Suppose is any n-form and {ea } and 1 X c
{ a } is a basis and a dual basis for which the decomposi- a b = eb Aa ea Ab eb ea Ac . (6.23)
2
tion (6.1) holds. (These bases must be dual to each other but c
do not actually have to be orthonormal.) Then one has the
decomposition Details: We may interchange the order of c and ea in
Eq. (6.19) by using the property of x ,
1X a
= (ea ) . (6.22)
n a x (c ) = (x c ) c x ,

132
6.2 Applications of tetrad formalism

which holds for any n-form and for any vector x. Since Examples: Here are some previously derived identities
ea c = ac , we find rewritten in this notation:
g(u, v) = a (u) a (v), g = a a ;
X X X
ea c Ac = ac Ac c ea Ac
c c c gu = g(u, ea ) a = a (u) a ,
X
= Aa c
ea Ac Tr(a,b) X(a, b) = X(ea , ea ),
c = a (ea ),
and where is a 1-form. 
!
X X
eb ea c Ac = eb Aa c ea Ac 6.2.1 Computing geodesic equations
c c
X
= eb Aa ea Ab + The geodesic equation u u = 0 is used to find trajectories
c eb ea Ac .
c of particles in GR. We now show how to use the vielbein for-
(6.24) malism to derive the required differential equations for the
particle worldline {x ( )} in a local coordinate system.4
Since Assuming that {x } is a given local coordinate system in
eb ea Ac Ac (ea , eb ), an n-dimensional manifold, we are interested in determining

the formula (6.19) can be rewritten as Eq. (6.23).  a trajectory {x ( )} such that = 0.a Suppose that
an orthonormal frame {ea }, the dual frame { }, and the cor-
Remark: The spin connection a b can be found from the for- responding connection 1-forms a b are already known. We
mula (6.23) by substituting Ac dc . Due to the Frobenius would like to determine the unknown functions {x ( )}. The
theorem, the expression c Ac = c d c is equal to zero if tangent vector can be expressed as
the 1-forms c correspond to hypersurface-orthogonal vector
= ua ea , ua a .
fields ec . Since the orthonormal frame {ec } frequently con-
sists of hypersurface-orthogonal vectors, calculations are sim- The coefficients ua will be particular expressions involving
plified when the formula (6.23) is used.  x ( ) and x ( ). Then we use Eq. (6.13) and compute
As an aside, we note that the formula (6.23) cannot hold
= (ua ea )
P 1-forms
for
a
Ac (any 1-form Ac is decomposed uniquely as
a Aac and the coefficients Aac are not necessarily antisym- = ( ua ) ea + ub ec ( c b )

metric), but it can be generalized from 2-forms Aa to n-forms a b a

= u + u b ea .
a .
Statement 6.1.6.4: Given a set of n-forms Ac (where c is a Since {ea } is a basis, we obtain n equations
vielbein index and n 3), there exists a set of (n 1)-forms X
B b a such that ua + ub a b = 0, a = 1, ..., n.
b
X b
Aa B b a = 0; B a b = B b a . (6.25) These equations are second-order in derivatives of x ( ), be-
b cause ua contains x , while the derivative along the curve,
a
A suitable set of B b a is determined by the equivalent expres- u , produces x when it acts on x contained in ua .
sions The geodesic equations can be further simplified, so that
precomputing a b is not required.
1 1 X
Bb a = (eb Aa ea Ab ) eb ea c Ac Calculation 6.2.1.1: The geodesic equation for a vector field
n1 n (n 1) c u can be written as
(6.26)
u (a (u)) + b (u)ea u (db ) = 0, a = 1, ..., n. (6.28)
1 1 X
c
= (eb Aa ea Ab ) (eb ea Ac ) .
n n (n 1) c One can derive the formula (6.28) by using the explicit rela-
a a
(6.27) tionship between b and , for instance, in the form (6.20).
However, it is faster to use the Koszul formula (1.43) for com-
Unlike the case n = 2, the set B b a satisfying Eq. (6.25) is not puting the scalar product g(u u, ea ), which must vanish for
unique. (Proof on page 173.)  every a = 1, ..., n. We find (without assuming g(u, u) = const)
1 
g(u u, ea ) = 2u g(u, ea ) ea g(u, u) 2g(u, [u, ea ])
6.2 Applications of tetrad formalism 2
1 g(u, u)
= u a (u) ea + b (u) b ([ea , u])
After developing powerful techniques for manipulating the 2
orthonormal frames, let us see what calculations are possible 2
= u a (u) + b (u) (d b (u, ea ) + ea b (u))
in this formalism. To make the notation more concise, I will
g(u, u)
now adopt the Einstein summation convention for the frame ea
indices; for instance, I will now write ea a (u) instead of 2
3
= u a (u) + b (u)d b (u, ea ),
P a
a ea (u). (In Sec. 6.1 I did not use the Einstein summation
convention but wrote all summations explicitly, to make the 4 The idea to explore geodesic equations in tetrad formalism came to me after
presentation more clear.) The frame indices are raised and reading the section entitled Computing with adapted frames and exam-
lowered using the fiducial Minkowski metric ab . ples of the book [20].

133
6 Tetrad methods

1 2
where = is due to Eqs. (6.2) and (6.4); = is due to the defini- We obtained a closed second-order equation for x( ), and we
tion (1.21) and the fact that u b (ea ) = u ab = 0; and = is
3 can proceed to solve this equation by standard methods. Re-
due to ducing to the first-order equation by a substitution x = v(x),
we find
1   1
b (u) (ea b (u)) = ea b (u) b (u) = ea g(u, u). vv + (h + f ) v 2 + f e2h = 0,
2 2
where the prime stands for d/dx. The last equation can be
Of course, g(u, u) = const for a geodesic field u, but we did integrated to
not use that fact in the present derivation. As a result, the left- 1 2 2(h+f ) 1 2 1 2f
v e = C e ,
hand side of Eq. (6.28) is strictly equivalent to g(u u, ea ) even 2 2 2
for non-geodesic vector fields u. 
where C 2 > 0 is an integration constant. Finally, the solution
In practical calculations, it is frequently convenient to use
of the geodesic equation for x( ) is expressed in the form of
the conservation law g(, )
= const in place of one of the n
an indefinite integral,
geodesic equations. (The first-order equation g(, )
= const
is always a consequence of the second-order geodesic equa- Z x( ) Z x( )
dx eh+f dx
tions.) = = 0 .
v(x) C 2 e2f
Example: Let us determine the geodesic curves in the two-
dimensional metric Then the solution for t( ) can be found by integrating the
equation
p
g = e2f (x) dt2 e2h(x) dx2 . t = ef 1 + e2h x 2 = Ce2f (x( )) .

The obvious orthonormal frame consists of the vectors e0 = In some cases (for some functions f and h), these equations
ef t , e1 = eh x and the corresponding 1-forms 0 = ef dt, can be integrated analytically and the geodesics obtained in
1 = eh dx. According to the geodesic equation (6.28), a curve closed form. 
( ) {t( ), x( )} with the tangent vector
  6.2.2 Determining Killing vectors
dt( ) dx( ) )t + x(
)
( , t( )x
d d A Killing vector field k satisfies Lk g = 0 (see Sec. 1.6.7); a
conformal Killing vector satisfies Lk g = 2g (see Sec. 3.1.3).
is geodesic if the following two equations hold, Sometimes one would like to determine whether a Killing vec-
tor or a conformal Killing vector exists for a given metric. One
+ b ()
(a ()) (d b ) (,
ea ) = 0, a = 0, 1. can perform the necessary calculations in the vielbein formal-
ism.
(The overdot denotes d/d everywhere.) Presently, d0 = Since the metric g can be expressed through the vielbein (see
f ef dx dt and d 1 = 0, while Sec. 6.1.1) as
g = ab a b ,
0 () = ef t,
= 0 () 1 () = eh x,
= 1 ()
while ab is a constant numeric matrix, we have
so the two geodesic equations are h i
   Lk g = ab (Lk a ) b + a Lk b .
ef t + ef t f ef dx dt ,
ef t = 0, [a = 0]
  
eh x + ef t f ef dx dt , eh x = 0. [a = 1] The Lie derivatives Lk a can be computed easily through the
Cartan homotopy formula (1.22),
Simplifying these equations, while keeping in mind that
acts on t and x simply as d/d , we find Lk a = d (a k) + k da .

t + 2f x t = 0, [a = 0] Then one can derive the Killing equation easily.


+ h x 2 + f e2f 2h t2 = 0.
x [a = 1] Example: Consider a metric

These are second-order equations for a general geodesic curve g = e2f (t) dt2 e2h(t) dx2 .
{t( ), x( )}.
We now use the property g(, = const to simplify further Let us determine whether
)
calculations. To be definite, let us look for timelike geodesics
normalized by k a(t, x)t + b(t, x)x

= e2f t2 e2h x 2 = 1.
)
g(, can be a Killing vector for g, where a(t, x) and b(t, x) are un-
known functions.
Presently it is clear that the equation for x( ) contains only The dual frame is
t2 and functions of x and x. Therefore, let us substitute t2
2
through x into that equation, 0 = ef (t) dt, 1 = eh(t) dx.

+ h x 2 + f e2h 1 + e2h x 2
x The differentials are
+ (h + f ) x 2 + f e2h = 0.
=x d0 = 0, dx.
d 1 = eh hdt

134
6.2 Applications of tetrad formalism

So we compute Then the transformation R(u, v) acts on the vector eb as


 
Lk 0 = d 0 k + k d0 = d ef a , R(u, v)eb = ea Ra b (u, v).
 
Lk 1 = d 1 k + k d1 = d eh b + eh h (adx bdt) .
The 2-forms Ra b completely describe the curvature tensor and
Note that   are called the curvature 2-forms.
d(ef a) = ef a ,t dt + ef a ,x dx, It is often convenient to lower the index of the curvature
h
and similarly for d(e b). So the Killing equation, Lk g = 0, 2-forms and to consider the array
becomes
Ra b ac Rc b .
0 0 0 0 1 1 1 1
Lk + Lk Lk Lk = 0.
The relationship between the 2-forms Ra b and the covariant
This is rewritten (with implied symmetric tensor products) as Riemann tensor R(a, b, c, d) can then be expressed as
h   i
ef a ,t dt + ef a ,x dx ef dt R(x, y, ea , eb ) = Rb a (x, y) = y x Rb a .
h   i
eh b ,t dt + eh b ,x dx + eh h (adx bdt) eh dx
Remark: There is an obvious arbitrariness in the definition
= 0. of the 2-forms Ra b : one could interchange the indices a and
b, writing Ra b instead of Rb a above. This is the freedom of
Gathering terms with dt2 , dt dx, and dx2 , we obtain the equa- choosing the overall sign of the Riemann tensor. While not
tions essential, this freedom leads to annoying sign discrepancies

ef a ,t = 0, between different textbooks. 
  The curvature 2-forms can be computed directly and ef-
2h = 0,
ef a ,x ef eh b ,t eh + bhe ficiently from the connection 1-forms c b using the for-

h = 0. mula (6.30) below. To derive that formula, let us begin by
eh b ,x + ahe
expressing R(u, v)eb with help of Eq. (6.13). We find
These equations can be simplified to
 u v eb = u ( a b (v)ea )
ef a ,t = 0, = (u a b (v)) ea + a b (v)u ea
a,x = b,t e2h2f , = (u a b (v)) ea + a b (v) c a (u)ec ;
b,x = ah. [u,v] eb = ea a b ([u, v]).

One solution is a = 0, b = const, which is the obvious Killing We can now compute the coefficient at ea of R(u, v)eb (while
vector k = x . If a 6= 0, we may solve the first equation as doing that, we need to relabel the index a c):
a = q(x)ef with q 6= 0, differentiate the second equation by
x, the third equation by t, and divide by q to obtain a R(u, v)eb = u a b (v) v a b (u) a b ([u, v])
q,xx  
h f . + c b (v) a c (u) c b (v) a c (u)
= e2h2f h
q = (d a b ) (u, v) + ( a c c b ) (u, v),
Since the right-hand side is a function of t while the left-hand
side is a function of x, solutions exist only if both sides are where we used the standard definitions of da b and a c
b for 1-forms a b . The result is called the second Cartan
c
equal to a constant.
  Unless the functions f (t), h(t) are such structure equation,
that e2h2f h h f = K = const, no other Killing vectors
exist besides k = x .  Ra b = d a b + a c c b . (6.30)

6.2.3 Curvature as a set of 2-forms Remarks: The curvature 2-forms Ra b (with the first index
lowered through ac ) are obviously antisymmetric in a, b be-
The connection forms a b contain first derivatives of the met- cause a b are.
ric and are analogous to Christoffel symbols. The Riemann The Cartan structure equations make computations of the
tensor also can be conveniently represented as a set of 2- curvature tensor significantly shorter. The simplification is
forms. due to several factors: firstly, the connection forms a b , be-
Consider the definition of the Riemann tensor, ing antisymmetric in a, b, have fewer nonzero coefficients
R(u, v)w u v w v u w [u,v] w, than the Christoffel symbols , which are symmetric in
, . Secondly, many terms are cancelled automatically dur-
and recall that this is a transformation-valued antisymmetric ing the calculation of exterior differential and exterior product
bilinear form, if viewed as a function of the vectors u, v. For in Eq. (6.30). 
fixed u and v, the object R(u, v) is a linear transformation that
can be described, as usual, by a matrix of coefficients in the Calculation: Compute the connection and2fthe2 curvature 2h 2
basis {ea }. Let us denote this matrix by Ra b (u, v); so we set, forms for the two-dimensional metric g = e dt e dx ,
by definition, where f (t, x) and h(t, x) are arbitrary functions of t and x.
Compute the curvature 2-form of the two-dimensional de Sit-
Ra b (u, v) a R(u, v)eb . (6.29) ter metric by setting f (t, x) = 0, h(t, x) = Ht.

135
6 Tetrad methods

Solution: The dual frame is Calculation 6.2.4.1: Using the tetrad formalism, we will now
compute the change in the Ricci tensor and Ricci scalar due to
0 = ef dt, 1 = eh dx. a conformal transformation of the metric.
Let us denote by { a } the dual basis in the metric  g. The
cor-
The differentials da are responding dual basis in the metric g e2 g is e a . Sup-
pose that a b are the connection 1-forms for the basis {a }.
d0 = df ef dt = f,x ef dx dt, d1 = h,t eh dt dx. We now need to compute the modified connection forms a b.
After that, the modified Riemann and Ricci tensors will be de-
The only nonzero connection forms are 0 1 and 1 0 , so after termined.
some tries we get The connection forms a b are found using the first Cartan
0 1 f h hf structure equation (6.16):
1 = 0 1 = 1 0 = 0 = f,x e dt + h,t e dx.
dc = c b b ,
The only nonzero curvature 2-forms are R0 1 = R1 0 and are 
found as d e c = c b e b .
X
R0 1 = d0 1 + 0c c1; It follows that
c
(d) c = ( c b c b ) b .
however, the sum over c falls out because 0 0 = 1 1 = 0.
Thus the only nontrivial component of the curvature 2-forms We can now use Eq. (6.23) to express the 1-forms
is
  cb
c b c b
R0 1 = d 0 1 = f,x ef h ,x dx dt + h,t ehf ,t dt dx
h explicitly through {a } and d:
hf
 f h
 i
= h,t e ,t
f,x e ,x
dt dx. 1
a b = eb (d a ) ea (d b ) eb ea (c d c ) .
2
Note that the calculation is significantly simplified due to the c
use of forms. The last term vanishes since c = 0, and after a simplifi-
For the de Sitter case, we find cation,
eb (d a ) = eb (d) a (d) ab = (eb ) a ab d,
R0 1 = H 2 eHt dt dx.
we find
 a b = (eb ) a (ea ) b . (6.32)
The Riemann tensor is expressed through a b via the sec-
6.2.4 Ricci tensor and Ricci scalar ond Cartan structure equation (6.30). Hence, the curvature
2-forms R a b for the metric g are related to the 2-forms Ra b
The Ricci tensor Ric is defined as the trace of the Riemann for the metric g by
tensor,
Ric (a, b) Tr(x,y) R(a, x, b, y). a b = Ra b + d a b
R
In the tetrad formulation, the first two arguments of R() are + a c c b + a c c b + a c c b .
the implicit arguments of the 2-forms Ra b , so by Eq. (6.29) we Now we need to substitute from Eq. (6.32). The individ-
ab
have ual terms can be rewritten as follows,
R(a, x, eb , y) g(R(a, x)eb , y) = g(ea , y)Ra b (a, x). d a b = (d (eb )) a + (eb ) d a [ab ] ,

We can use the basis {ea } to compute the trace: where I indicated by [ab ] the repetition of previous terms
with the indices a and b interchanged. Further, we use the
Ric (a, eb ) = Tr(x,y) R(a, x, eb , y) = R(a, ec , eb , ec ) first Cartan structure equation to simplify
= Rc b (a, ec ). a c c b = (ec ) a c b (ea ) c c b
Therefore, the Ricci tensor is adequately described by the set = (ec ) a c b + (ea ) db .
of 1-forms Using the identities
Ricb ec Rc b = ec Rb c ; (ec ) c = c ec (d) = d,
Ric (x, y) = b (y)x Ricb = x g(eb , y)Ricb . (ec ) (ec ) = Tr(a,b) (a d) (b d) = g 1 (d, d) ,
c c = 0,
Here it is important to recall that eb differs from eb and is de-
fined by eb ab ea . (the first line above follows from Statement 6.1.6.2), we find
Similarly, the Ricci scalar is computed as the trace
a c c b

R = Tr(x,y) Ric (x, y) = eb Ricb = Ra b eb , ea . (6.31) = ((ec ) a (ea ) c ) ((eb ) c (ec ) b )
= [(eb ) a (eb ) b ] d g 1 (d, d) a b
These equations offer an efficient method for practical calcu-
lations with the Ricci tensor and scalar. = a b d g 1 (d, d) a b .

136
6.2 Applications of tetrad formalism

Putting the terms together and simplifying, we obtain The trace of the Hessian H appearing in the last expression
is traditionally called the (covariant) DAlembertian of and
a b = Ra b + (d (eb )) a (d (ea )) b
R denoted by ,
+ (ec ) [a c b b c a ]
+ a b d g 1 (d, d) a b . (6.33) Tr(a,b) H (a, b) = H (ea , ea ) .

At this point we note that the terms d (ec ) involve sec- The Ricci scalar is given by
ond derivatives of . It is useful to introduce explicitly the
tensor consisting of the second covariant derivatives of . This b = e eb Ric
= eb Ric
R b,
tensor is called the Hessian of the function and denoted
H . The Hessian is a symmetric bilinear form defined by the so we finally obtain
equivalent formulas b
= eb e Ric
e2 R
H (a, b) = a (b d) = a b a b .
= R + 2 (N 1)  + (N 1) (N 2) g 1 (d, d).
It follows from the definition that H (a, b) contains only
derivatives of and of the metric, but no derivatives of a or and R
The formulas for Ric coincide with those obtained in
b. Calculation 1.8.4.1. The calculation in the tetrad formalism is
Since is a given function, we may assume that the Hessian somewhat shorter and less tedious than the calculation in vec-
H is a known tensor. We can now express the terms d (eb ) tor notation (see details on page 169). The trade-off is that one
through the Hessian. First, we use the Leibnitz rule for the needs to introduce and use new techniques, such as connec-
derivative a of a 1-form d applied to a vector b, tion 1-forms, the exterior product, and the insertion operator.

a [(d) b] = (a d) b + (d) a b
= H (a, b) + (a b) .
6.2.5 Einstein-Hilbert action in tetrads
Substituting the basis vectors a = ea and b = eb , we find
The dynamics of General Relativity is described by Einstein
ea [(d) eb ] = ea d (eb ) = H (ea , eb ) + (ea eb ) .equations, which are derived from the action principle. In Sec-
tion 5.1.2 the action was defined as a functional of the metric
Using Eq. (6.13), we can express ea eb through the connection
tensor g. The metric was treated as a dynamical variable that
1-form a b and obtain
X satisfies certain equations of motion.
c An alternative approach is to regard the tetrad of 1-forms
ea d (eb ) = ea eb H + ea b (ec ) .
a
c { } as the primary dynamical variable.
P aThe metric tensor can
a
By dropping the insertion ea of the arbitrary basis vector ea , be expressed through { } as g = a a . However, it is
we can rewrite the above as an equality of 1-forms, also useful to work directly with the tetrad, without using the
metric tensor. In this section I will explain how the Einstein-
d (eb ) = eb H + c b (ec ) . Hilbert action is expressed and the Einstein equations are de-
rived in the tetrad formalism.5
This relationship is now used to simplify Eq. (6.33) further,
The standard way of expressing the Einstein-Hilbert action

Ra b = Ra b + (eb H ) a (ea H ) b in local coordinates is
Z
+ a b d g 1 (d, d) a b . (6.34) 1
SEH = R gd4 x,
Using Eq. (6.34), we can now compute the Ricci tensor, 16G

b = ea R
Ric b a = e ea R
b a. where R is the Ricci scalar and G is Newtons constant. This
formula can be rewritten with help of the volume 4-form,
The following auxiliary calculations are helpful, Z
1
ea (a b ) = (N 1) b ; SEH = R Vol.
16G M
ea a b = ea [(eb ) a (ea ) b ]
= (N 1) (eb ) ; Now we would like to express the 4-form R Vol in terms of
the tetrad {a }. This can be done using the following trick.
(ea ) a b = (ea ) ((eb ) a (ea ) b ) We note that R is a double trace of the Riemann tensor, and
= (eb ) d g 1 (d, d) b ; that a multiple trace can be seen in Eq. (6.7). Therefore, it will
be possible to express the 4-form R Vol directly through the
where N is the dimension of the spacetime,
curvature 2-forms. Using Eq. (6.31) and Statement 6.1.3.2(d),
N = ea a . we find
Then we find R Vol = Ra b (eb , ea ) Vol = Ra b (b a ).
b = ea R
e Ric b a = Ricb
Using the relationship (see Sec. 6.1.3)
+ ea [(ea H ) b (eb H ) a ]

+ ea b a d g 1 (d, d) b a
 1 abcd
(a b ) = c d ,
2!
= Ricb + (N 2) eb H + H (ea , ea )b
5I learned some relevant details regarding this derivation from the course
(N 2) (eb ) d + (N 2) g 1 (d, d) b . notes [15], lecture 15.

137
6 Tetrad methods

we obtain vector from R2 , say the basis vector e1 with components (1, 0),
1 is a smooth and everywhere nonzero vector field (p; e1 ) on
R Vol = abcd Ra b c d .
2 S 2 . However, such a vector field does not exist, as shown
Hence, the Einstein-Hilbert action can be rewritten as by the following statement, which is a standard theorem of
Z   topology (one cannot comb a sphere).
1
SEH = Ra b b a Statement 6.3.2.1: There is no smooth, everywhere nonzero
16G M
Z tangent vector field to a sphere S 2 . (Proof on this page.) 
1
= abcd Ra b c d .
32G M Proof of Statement 6.3.2.1: Consider a stereographic pro-
jection of the sphere S 2 onto a plane. The sphere is realized as
the set (x, y, z) : x2 + y 2 + z 2 = R2 in R3 , and the projection
6.3 Connections on vector bundles is described explicitly by

6.3.1 Vector bundles as generalization of x + iy


(x, y, z) ,
tangent bundles Rz

The tangent bundle to a manifold can be viewed as a collec- where the plane is parametrized by a single complex coordi-
tion of vector spaces Tp M attached to each point p of the man- nate . It is clear that the north pole (0, 0, R) is mapped onto
ifold M. These vector spaces are attached to points in a spe- infinity,
x + iy
cial way, so that the relation between tangent spaces Tp M and lim p = ,
Tp M for neighbor points p, p can be nontrivial. x,y0 R R2 x2 y 2
This construction is generalized by replacing the tangent while the south pole (0, 0, R) is mapped onto = 0 (the
space Tp M by a different, perhaps more complicated, vector center of the plane). Suppose we have a smooth and every-
space V . The result is the concept of a vector bundle, which where nonzero vector field v on the sphere; then the image
is the union of copies of vector spaces V (p) attached to each of v under the projection is a smooth, everywhere nonzero
point p of a manifold M. All the vector spaces V (p) should vector field v on the plane. By the smoothness assumption, v
have the same dimension. The manifold M is then called the is almost constant and nonzero in a sufficiently small neigh-
base manifold and the vector space V (p) is called the fiber of borhood of the north pole. We may trace the direction of v
the bundle at point p. A function : p V (p) on M with around a small closed contour surrounding the pole, and
values in the fibers V (p) is called a section of the bundle. For compute the accumulated angle swept by v. This accumu-
example, sections of the tangent bundle T M are vector fields lated angle will be zero since v is almost constant. However,
v(p) since the value of a vector field at a point p is a vector after the stereographic projection the small contour will be-
from the space Tp M. Sections of more general vector bundles come a large closed contour 1 on the plane. Since the interior
can be visualized as V -valued functions f (p) on the manifold, of on the sphere will be mapped to the exterior of 1 on the
such that the value of f (p) belongs to the fiber space V (p). plane, the direction of v is mirrored and the angle swept by
The reason we need to consider more general vector bun- when traced around will be 4. However, the contour
v
dles than T M is that it is convenient to represent physical 1 can be continuously deformed into a small contour 2 sur-
fields in (classical) field theory as sections of vector bundles. rounding the center = 0 of the plane. The angle swept by v
Fields are usually vector-valued (or tensor-valued) functions around 2 is zero since v must be approximately constant in
of spacetime, e.g. a Dirac spinor field represents particles a sufficiently small neighborhood of = 0. However, the an-
of spin 12 and has values in a four-dimensional complex vec- gle swept by a vector field around a closed contour is always
tor space. When we represent a field by a section of a vector an integer multiple of 2 and hence must remain constant if
bundle, the spacetime is the base manifold M and the value is deformed continuously and if v is smooth and everywhere
of the field at a point belongs to some vector space which is nonzero. Thus the angle swept by v around 1 is zero and
the fiber at that point. For example, spinor fields (p) can cannot be equal to 4. This gives a contradiction. Therefore,
be thought of as sections of a spinor bundle with fibers C4 ; either the vector field v is not smooth around the poles (in-
tensor fields A (p) are sections of a vector bundle with the validating the first step of the proof), or v goes to zero or is
fibers Tp M Tp M. Vector bundles give us a mathematical non-smooth at another point, which makes the accumulated
construction that describes all the physical fields in a unified angle jump discontinuously from 4 to 0 during the deforma-
language. tion of the contours 1 2 . 
So it follows from Statement 6.3.2.1 that the tangent bundle
6.3.2 Examples of bundles T S 2 is nontrivial. However, the tangent bundle of a 3-sphere,
T S 3 , is trivial. To show this, we first note that the sphere S 3 is
A direct product M V , where M is any manifold and V is the same manifold as the special unitary group SU (2) (see the
any vector space, is obviously a vector bundle; such bundles discussion in Sec. 1.2.2). Tangent vectors on SU (2) correspond
are called trivial. Many of the vector bundles used in physics to infinitesimal unitary transformations, which are always of
are trivial, but sometimes the bundles turn out to be nontriv- the form 1+h, where h is a small anti-Hermitian matrix, h =
ial, i.e. not isomorphic to a direct product of a base and a fiber. h . The space of all anti-Hermitian matrices is isomorphic
For example, consider the tangent bundle T S 2 to a sphere to R3 . A trivialization that maps T S 3 into S 3 R3 can be
S . This bundle has the fiber R2 and the base S 2 . If the bundle described explicitly as follows.6 A pair (A, h) SU (2) R3 ,
2

T S 2 were trivial, one would be able to find a trivialization of 6


It was proved in differential topology [4] that the only spheres with triv-
T S 2 , i.e. a smooth, one-to-one map : S 2 R2 T S 2 which
ial tangent bundles are S 1 , S 3 , and S 7 . Moreover, no even-dimensional
preserves the linear structure in the fibers. The existence of sphere S 2n admits a continuous and everywhere nonzero tangent vector
such a map would imply that the image of a fixed nonzero field [30]. (This information is not actually used in GR.)

138
6.3 Connections on vector bundles

where both are represented by 2 2 matrices, is mapped into It is easy to see that this condition coincides with the con-
the tangent vector dition (1.39) expressing the compatibility of the connection
= + with the metric. Therefore the statement that the
 connection on the tangent bundle comes from the orthog-
v f (g) f A(1 + sh)A1 ,
s s=0
onal gauge group is the same as the assumption of the com-
patibility of with the metric. This is just one example of
where f (A) is a function on M SU (2), A SU (2), h is a 22 how physical assumptions can be expressed in the language
anti-Hermitian matrix, and 1+sh is a 22 matrix representing of gauge groups. This explanation needs to be made more
an infinitesimal unitary transformation for small s. clear. What exactly is invariant under a gauge transforma-
tion? Is invariant?? (It isnt.) In fact, the above equation
6.3.3 Covariant derivatives on vector bundles g((v)x, y) + g(x, (v)y) = 0 seems to be incorrect!!!

A covariant derivative on a vector bundle is a generalization


of directional derivative, so that for a section and a vector
field u, the derivative u is another section. We have seen
that the covariant derivative acts on various tensors in dif-
ferent ways (the covariant derivatives of a vector and a tensor
are given by different formulae). Using the picture of bundles,
we say that tensors of different ranks are sections of different
bundles, and it is natural that different bundles have differ-
ent covariant derivatives. (Up to now, we have been using the
symbol to mean different connections, depending on the
rank of tensor it acts on.)
A general way to write the covariant derivative on a vector
bundle is
u t = u t + (u)t,
where is the partial derivative connection defined using
a particular local coordinate system on the base manifold,
i.e. u u , and is a transformation-valued 1-form. The
value of is a linear transformation within a fiber and is
called the connection 1-form.

6.3.4 Gauge theories and associated bundles


Gauge field theories (such as electrodynamics, electroweak
theory, and chromodynamics) use vector bundles with an ad-
ditional structure. Namely, a certain Lie group G is acting in
each fiber, such that the equations of the theory are invari-
ant under local transformations of the group. For example,
the gauge group G is U (1) for electromagnetism, SU (2) for
the electroweak theory, and SU (3) for chromodynamics. The
group G is then called the gauge group and the vector bundle
is called associated to this group (this is not a rigorous defi-
nition but is intended only as a qualitative hint). For an asso-
ciated bundle, the connection 1-form has values in a repre-
sentation of the Lie algebra g of the group G. This of course
limits the possible connections , leaving only the physically
relevant ones. Why not give a more rigorous definition?

6.3.5 Tangent bundle as associated bundle


Let us try to see if the tangent bundle T M can be also viewed
as a bundle of which a gauge group acts. The gauge group in
that case would be the orthogonal group SO(n) that acts fiber-
wise, preserving the metric g in each fiber. (For a metric with
the Lorentzian signature, the group will be SO(3, 1), called
also the Lorentz group.) The connection 1-form is then
required to give a transformation which is one of the trans-
formations from the representation of the Lie algebra so(n),
i.e. for any vector v the transformation (v) must be an anti-
symmetric linear transformation such that

g((v)x, y) + g(x, (v)y) = 0.

139
7 Spinors
Additional literature: [27, 32]. ments hj called quaternionic units2 obey the multiplication
rule
The purpose of this chapter is to explain the definition of
spinors and spinor fields, and to show some basic uses of h1 h1 = h2 h2 = h3 h3 = h1 h2 h3 = 1. (7.1)
spinors in general relativity.
It follows from the multiplication rule that
3
7.1 Introducing spinors hj hk = jk 1 +
X
jkl hl , (7.2)
l=1
A spinor is a two-dimensional complex-valued vector, S,
where S C2 is an auxiliary space (the space of spinors).1 The where is the Kronecker symbol and jkl is the coordinate
necessity to introduce spinors became apparent only after the representation of the three-dimensional Levi-Civita symbol,
development of quantum mechanics: it turned out that parti- 123 = 1. In particular,
cles such as electrons or quarks must be described by spinor-
valued fields, rather than by vector or tensor fields. Below I h1 h2 = h3 = h2 h1 ,
shall explain the motivation for introducing spinors in more
detail, and now let us start with a heuristic picture. so the quaternionic space H forms a non-commutative alge-
A usual vector is a directed magnitude, which literally bra.
means a direction along a curve leading from a point of
Statement: Assuming Hamiltons quaternionic multiplica-
the manifold to a neighbor point. Tensors are defined alge-
tion rule (7.1), the associativity and distributivity of quater-
braically through vectors. Every tensor remains unchanged
nionic multiplication, and the property 1 x = x 1 = x for
under a rotation by 2 because every point returns to its origi-
any quaternion x, derive Eq. (7.2).
nal place, and tensors are defined through directions between Solution: Consider the expression h1 h2 h3 h3 = h1
points. Now, a spinor is a special kind of directed magni-
h2 = h3 . Similarly, we find h2 h3 = h1 . Consider the
tude that remains unchanged only under a 4 rotation but
expression h2 h1 h2 h3 = h2 ; after multiplying by h3 h2
changes sign under a 2 rotation. Clearly, such an object can-
on the right, it follows that
not be a vector or a tensor, or any other object defined through
points of a manifold, because every point in the neighborhood h2 h1 = h2 h3 h2 = h1 h2 = h3 .
of a given center of rotation will return to its initial position af-
ter a 2 rotation. Thus, a spinor cannot be defined through the Consider the expression h3 h1 h2 h3 = h3 ; it follows
tangent space to a point on a manifold. Spinors must belong that h3 h1 = h2 . All other required relations are obtained in
to some auxiliary vector space in which the rotation group a similar way.
P3
acts in a special way. To understand how a rotation can act in A general quaternion is x x0 1 + j=1 xj hj , where xj are
such an unusual way, we need to study the rotation group in real coefficients. It is sometimes useful to visualize a quater-
some more detail. nion as a sum of a scalar x0 and a 3-vector ~x (x1 , x2 , x3 ).
The conjugate quaternion is defined by
7.1.1 Quaternions and rotations 3
X
We consider rotations in a three-dimensional Euclidean space x0 1
x xj hj .
R3 ; the group of all orthogonal transformations (R RT = j=1

1) is O(3), and its subgroup SO(3) contains proper rota-


A quaternion x is called purely imaginary if x = x, which
tions, i.e. orthogonal transformations preserving the orien- means that the first coefficient vanishes, x0 = 0. The magni-
tation (det R = 1). Quaternions (invented by W. R. Hamil-
tude |x| of a quaternion x is
ton) are a generalization of complex numbers where one in-
troduces three imaginary units instead of one. Quaternions 1/2
|x| x20 + x21 + x22 + x23 .
are useful in physics primarily because they represent three-
dimensional rotations in a concise way. Also, spinors are 2
= |x| 1 and |x y| = |x| |y|, similarly
It is easy to see that x x
quaternions, in the sense which will be made precise below. to the properties of complex numbers.
(In brief: The four-dimensional space of quaternions is equiv-
alent to the complex, two-dimensional spinor space). Let us Calculation: Verify these properties.
now define and explore the properties of quaternions. Solution: Using the vector notation, the multiplication rule
The set of quaternions H is a four-dimensional real space can be written concisely as
spanned by the basis vectors 1, h1 , h2 , h3 . The special ele-
(x0 1 + ~x) (y0 1 + ~y) = (x0 y0 ~x ~y ) 1 + x0 ~y + y0 ~x + ~x ~y .
1 Thesetwo-dimensional objects are also called Weyl spinors, in distinction
from Dirac spinors that have four complex components and are defined 2Amore traditional notation for quaternionic units is i, j, k instead of
using Weyl spinors. The connection between the two will be explained h1 , h2 , h3 , but we shall need to mix quaternions with complex numbers
below, and we say spinor instead of Weyl spinor. with their own imaginary unit i.

141
7 Spinors

Since x = x0 1 ~x, it follows that x x = |x|2 1. The property Note that the determinant of this matrix is equal to |a|2 , and
2 2 2
|x y| = |x| |y| is straightforward to derive, recalling that that the Hermitian conjugate matrix corresponds to a :
the vector product ~x ~y is orthogonal to both ~x and ~y, and
2
using the formula |~x ~y | = |~x| |~
2 2
y | |~x ~
2
y| . = |a|2 = a20 + a21 + a22 + a23 ; a
det a = a.
An important property is that every nonzero quaternion x
has an inverse x1 such that x x1 = x1 x = 1. Remark: As an additional motivation, it is helpful to derive
a matrix representation of quaternions without guessing the
Calculation: Find the inverse x1 to a general quaternion Pauli matrices. The four-dimensional space H of quaternions
P
x = x0 1 + 3j=1 xj hj , assuming x 6= 0. Show that the in- can be interpreted as a two-dimensional complex vector space
verse would not necessarily exist if we admitted complex co- if we somehow define what it means to multiply a quaternion
efficients xj . by a complex number,
Solution: Since x x = |x|2 1, we have
( + i) a =???
x 1
x1 = 2 = 2 (x0 1 x1 h1 x2 h2 x3 h3 ) .
|x| |x| Clearly, it is sufficient to define the multiplication by i, and
P4 we need to imitate the property i2 = 1. Since any of the unit
2 2 2
The inverse exists when |x| = j=0 xj 6= 0, which is not quaternions hj satisfies hj = 1, we may choose for instance
guaranteed for quaternions with complex coefficients xj . h1 as the representative of i, so then we define
Another useful property is the exponentiation formula.
( + i) a a + a h1 .
Statement: The exponential of a quaternion x is defined by
the Taylor series, This is clearly a linear map in H (note that we multiply by
h1 from the right). We expect that H, considered as a com-
1 1
exp x = 1 + x + x x + x x x + ... plex vector space, will be two-dimensional, so any quaternion
2! 3! a H can be represented as a linear combination of two ba-
Show that every nonzero quaternion x is an exponential of sis quaternions with complex coefficients. We may choose, for
some other quaternion b ln x. Show that ln x is purely instance, {1, h2 } as the basis, since h1 is already chosen as the
imaginary if |x| = 1. representative of i. (Different such choices will yield different
Hint: First compute the exponential of a purely imaginary but algebraically equivalent representations.) Then an arbi-
quaternion b, then use the fact that 1 commutes with purely trary quaternion will be decomposed as
imaginary quaternions.
P 3
Solution: If b = 3j=1 bj hj then bb = |b|2 1, and we find X
a a0 1 + aj hj = (a0 + ia1 ) 1 + (a2 ia3 ) h2
b j=1
exp b = 1 cos |b| + sin |b| ,
|b|
and so can be represented by an array of two complex num-
b bers (a0 + ia1 , a2 ia3 ) C2 . Since the multiplication by
exp (b0 1 + b) = 1eb0 cos |b| + eb0 sin |b| .
|b| a fixed quaternion x is a linear transformation of the vec-
P3 tor space C2 , it is natural to represent that transformation
For an arbitrary x = x0 1+ j=1 xj hj , we set x = exp(b0 1+ b)
by a 2 2 complex matrix. For example, the transformation
and find
a h3 a will be equivalent to
b0 = ln |x| ,  
a0 + ia1
x1 h1 + x2 h2 + x3 h3 x0 h3 = h3 (a0 1 + a1 h1 + a2 h2 + a3 h3 )
b= p arccos . a2 ia3
2 2
x1 + x2 + x3 2 |x|
= a3 1 a2 h1 + a1 h2 + a0 h3
 
The quaternionic multiplication law admits a representa- a3 ia2
tion in terms of complex 2 2 matrices. We may guess the =
a1 ia0
required representation if we note that the multiplication rule   
0 i a0 + ia1
of the Pauli matrices, = ,
      i 0 a2 ia3
0 1 0 i 1 0
1 = , 2 = , 3 = , therefore h3 is represented by the matrix i1 . Similarly, we
1 0 i 0 0 1
3 find h1 i3 and h2 i2 . In this way we find a possible
X
j k = jk + i jkl l , representation of quaternions through Pauli matrices. (This
l=1 representation is equivalent to the standard one, hj ij ,
after a permutation of the unit quaternions.)
is very similar to the multiplication rule of the quaternionic The connection between quaternions and rotations is found
units hj , except for the extra factors of i. Therefore a repre- by using the following important trick. We consider the linear
sentation for hj by 2 2 complex matrices may be found as transformation
j = ij , j = 1, 2, 3, assuming (naturally) that the identity
h c : x c x c1 ,
R
matrix 1 is used for the quaternion 1. The general quaternion
P3 for a fixed quaternion c 6= 0. Note that the transformation R c
a = a0 1 + j=1 aj hj will be represented by the matrix
preserves the magnitude of quaternions,
 
a0 ia3 ia1 a2
=
a . (7.3) |R c (x)| = |x| .
ia1 + a2 a0 + ia3

142
7.1 Introducing spinors

c , then
We may normalize |c| = 1 without changing R of a quaternion according to how many 2 rotations were
made, and this information cannot be derived from one el-
c (x) = c x
R c (7.4) ement of SO(3). If one were given not only an element
R SO(3) but also a continuous path leading from the iden-
and so R c (
x) = R c (x). Thus the three-dimensional sub- then one would be able to
tity transformation 1 SO(3) to R,
space of purely imaginary quaternions is invariant under R c. (Paths
choose a unique element of SU (2) corresponding to R.
A purely imaginary quaternion x = x, interpreted as a 3- in SO(3) corresponding to an odd and an even number of 2
vector, is transformed by R c without changing its Euclidean rotations are topologically inequivalent and cannot be contin-
length; thus R c acts as an orthogonal transformation in R3 , uously deformed into each other.) On the other hand, we

i.e. Rc O(3). Since every quaternion c such that |c| = 1 have seen that infinitesimal transformations can be uniquely
is an exponential of a purely imaginary b = ln c, we have mapped 1-to-1 between SO(3) and SU (2).
We have seen that a quaternion can be equivalently viewed
c (x) = exp (b) x exp (b) ,
R as a two-dimensional complex vector C2 on which quater-
nions c act as unitary matrices c SU (2). Although the vec-
and thus we can smoothly deform R c into the identity map
tor does not belong to the real three-dimensional space R3 ,
by decreasing b to zero. Since the determinant det R = 1 SO(3) performed in
we can say that feels a rotation X

for R O(3), we must have det Rc = 1 for all c, and 3 =R c and
R because we can find a quaternion c such that X
therefore R c SO(3). In this way, normalized quaternions
transform the vector with the unitary matrix c. (Note that
represent three-dimensional rotations, and multiplication of unitary matrices preserve the Hermitian scalar product in C2 .)
quaternions corresponds to composition of rotations: R aR b = Transformed in this manner, represents an unusual kind of

Rab . Since Rc is quadratic in c, heuristically one may say that a directed magnitude that rotates only half-way. Thus,
the quaternion c is a square root of the rotation R c. the two-dimensional complex vector is in fact a spinor and
Statement 7.1.1.1: The magnitude of a quaternion b = ln c the auxiliary space S C2 is the spinor space. To summarize,
c , and the spinors are quaternions viewed as elements of C2 .
corresponds to a half of the angle of the rotation R
direction of b is the axis of rotation.
Hint: Consider an infinitesimal rotation R exp b with |b| 1, 7.1.2 The Lorentz group
and show that it rotates by the infinitesimal angle 2 |b| around
the axis parallel to b. Derivation: Needs to be filled in! So far, we worked in the Euclidean space R3 and ignored the
time dimension of the spacetime. We shall now extend the
Statement 7.1.1.2: Show that a quaternion c such that |c| =
considerations to the 3+1-dimensional case. We have seen that
1 is represented by a matrix c which is unitary and has unit SO(3) are represented by matrices acting
proper rotations R
determinant.
in a two-dimensional complex space S of spinors, such that a
Hint: Use previous statements and the property c = c.
spinor changes sign under a rotation by 2. We shall now
It is important to note that a quaternion represents only the
extend the considerations to the 3+1-dimensional case.
half-angle of rotation, so a rotation by 2 may be represented
Our goal for now is to define spinors at a point. We shall
by a nontrivial quaternion c = 1 as well as by c = 1. Imag-
be working in the Minkowski spacetime, which we interpret
ine that we use a quaternion to keep track of the orientation of
as a tangent plane to a point of a 3+1-dimensional spacetime
a rigid body; then the quaternion will change sign after a rota-
manifold.
tion by 2 and return to the original value after another rota-
Lorentz transformations are linear transformations of the
tion by 2. This unusual behavior is the main reason quater-
Minkowski spacetime that preserve the metric g.
nions are useful in the construction of spinors.
Since R c = R c , every two opposite quaternions corre- preserving the
Statement: Show that any transformation L
spond to the same three-dimensional rotation. The set of all
metric, g(Lx, Ly) = g(x, y), must be a linear transformation.
normalized quaternions {c : |c| = 1} is a sphere S 3 in the four-
Hint: Consider an image of an orthogonal basis under L.
dimensional space H. Thus, the set SO(3) is equivalent to a
Solution: An orthogonal basis {ej } must be transformed
sphere S 3 with identified opposite points. We have thus de- j }. Then, we have
c , and into another orthogonal basis {Le
fined a projection map S 3 SO(3) acting as c R
this map is everywhere 2-to-1. Such maps between manifolds + y), Lz)
= g(x + y, z)
g(L(x
are called coverings, so in particular S 3 SO(3) is a twofold
covering. Lz)
= g(Lx, + g(Ly,
Lz).

Moreover, the sphere S 3 is represented as the set of uni-
tary complex matrices with unit determinant via Eq. (7.3); the Choosing z = ej for j = 0...3, we obtain the required linearity
group of these matrices is denoted by SU (2). Note that this property.
group is at the same time a manifold (S 3 ), and also the mul- All the Lorentz transformations L form a group O(3, 1),
tiplication by a fixed matrix A SU (2) is a smooth map with a subgroup SO(3, 1) of orientation-preserving transfor-
S 3 S 3 . Groups that are smooth manifolds are called Lie mations (det L = 1). Below we shall always consider only
groups. the group SO(3, 1) which we shall call the Lorentz group.
Note that the multiplication of quaternions corresponds, on Lorentz transformations include proper rotations, SO(3)
the one hand, to the multiplication of matrices, and on the SO(3, 1), and we already know how spinors transform un-
other hand, to multiplication of rotations. Thus the covering der SO(3). Now we need to investigate how spinors may
SU (2) SO(3) is a group homomorphism. transform under other elements the Lorentz group that are
There is no canonical way to define a smooth inverse map not purely rotations. These other elements are the boosts,
SO(3) SU (2) because we would need to choose the sign i.e. a transformation between reference frames moving with

143
7 Spinors

respect to one another. (A a proper rotation transforms be- such that g(u, ~vrel ) = 0, represents the relative velocity of v
tween frames that are at rest with respect to one another.)3 We in the frame u. Note that 2 = 1 |~vrel |2 . If we consider
may identify a proper rotation in a geometric way as a Lorentz the images of u under every possible boost, we find that each
transformation that has an invariant timelike vector v0 and an 3-vector ~vrel R3 uniquely corresponds to a boost. Therefore
invariant spacelike vector s0 , such that g(v0 , s0 ) = 0. A trans- the space of all boosts has the topology R3 . By Statement 1,
formation L such that Lv
0 = v0 , Ls
0 = s0 , and g(v0 , s0 ) = 0, every L SO(3, 1) can be decomposed into a product of a
is interpreted as a spatial rotation in the rest frame of an ob- boost and a proper rotation. Thus, the manifold SO(3, 1) can
server with 4-velocity v0 around the axis s0 . A boost cannot be mapped one-to-one into the manifold SO(3) R3 .
have invariant timelike vectors, but instead it has two invari-
ant spacelike vectors s1 and s2 , which are orthogonal to the
7.1.3 Lorentz transformations of spinors
direction of the boost velocity. Let us describe boosts more
explicitly. To determine how spinors transform under a boost, we need
to extend the quaternionic picture of proper rotations to
Statement: Show by deriving an explicit formula that there boosts. Once each Lorentz transformation is mapped to a
SO(3, 1) transforming a timelike vector u
exists a boost L quaternion, the representation (7.3) will yield the correspond-
into another timelike vector v 6= u, and leaving invariant two ing spinor transformation.
spacelike vectors s1,2 , orthogonal to u. We can arrive at a quaternionic representation of Lorentz
Solution: The required Lorentz transformation L should act transformations by analogy with Eq. (7.4) if we consider
in the 2-plane spanned by {u, v}. Such a transformation will quaternions with complex coefficients. Denote by H C the
P3
leave invariant every spacelike vector orthogonal to u and v. space of linear combinations c = c0 1 + j=1 cj hj , where cj
The subspace (u, v) is two-dimensional, therefore there ex- are complex coefficients. By analogy with Eq. (7.4), let us con-
ist two spacelike vectors orthogonal to u which will be left sider the quaternionic transformation4
invariant. An explicit formula for L can be derived e.g. by de-
termining the coefficients ij in the general expression for a c : x c x c,
L (7.5)
linear transformation in the {u, v} plane,
where c is a quaternion satisfying
= 11 ug(x, u) + 12 ug(x, v) + 21 vg(x, u) + 22 vg(x, v).
Lx
2
|c| c20 + c21 + c22 + c23 = 1,
Assuming g(u, u) = g(v, v) = 1, the result is
and the quaternion c is at once the quaternionic and the com-
= vg(x, u) + v u g(x, u v),
Lx plex conjugate of c,
2 1
3
where g(u, v) > 1 is the boost factor. Note that for X
v u, 1 the result is well-behaved. c c0 1 cj hj .
j=1
The relationship between boosts and rotations is the follow-
ing. Complex-valued quaternions do not necessarily satisfy
2
Statement 1: Every Lorentz transformation can be repre- |c| 0, but it is easy to check that the algebraic property
2 2 2
sented as a product of a boost and a proper rotation. |x y| = |x| |y| still holds (its derivation is independent of
the assumption that cj are real-valued). Thus we have
Proof: Suppose a Lorentz transformation L SO(3, 1)
brings a timelike vector u into a (timelike) vector v. If v = u c x|2 = |x|2
|L
then L is itself a proper rotation, so we can still say that L is
a product of a trivial boost and a rotation. If v 6= u, we as before.
1 that brings u into v. After performing the It is easy to see that x y = y x
, and so the transformation
can find a boost L

boost, the 3-plane orthogonal to u is brought into the 3-plane Lc acting in the complex quaternionic space H C preserves
v orthogonal to v. Now we look for the remaining transfor- the subspace of self-adjoint or Hermitian quaternions x
2 SO(3, 1) such that L = L 1 . Since L
2L 2 should such that x = x:
mation L
leave v invariant, we have L 2 (v ) v , and thus L
2 is a c x c x c = c x
L c = c x c.
proper rotation.
This statement allows us to understand the topological Note that a Hermitian quaternion must have the form x =
structure of the group SO(3, 1). P3
x0 1 + i j=1 xj hj , where xj are real. Hence, x can be viewed
as a real 4-vector with components (x0 , x1 , x2 , x3 ). Then we
Statement 2: The Lorentz group SO(3, 1) has the topology
SO(3) R . 3 immediately find that the quantity |x|2 = x20 x21 x22 x23
is conserved by transformations L c . Therefore L c act as
Proof: If a boost changes a timelike vector u to v then the Lorentz transformations in the 4-dimensional space of Hermi-
spacelike vector tian quaternions x. (The same result holds for anti-Hermitian
v u quaternions x such that x = x.)
~vrel ,
Hence, we have found a correspondence between quater-
3 In the geometric approach adopted in this text, coordinate systems are not nions c and Lorentz transformations L c . The following simple
used, so Lorentz transformations change vectors rather than coordinate sys- considerations show that the inverse map, SO(3, 1) H C,
tems. In the coordinate-based approach, such transformations are called
active, while transformations that merely change coordinate systems are 4 Notethat the formula x c x c1 does not generate Lorentz transfor-
called passive. mations!

144
7.2 Spinor algebra

also exists. It follows from previous calculations that every yields a quaternion c such that L = L c , and then we com-
2
c such that |c| = 1 is equal to an exponential of a purely pute the corresponding matrix c acting in S. The product of
imaginary quaternion, Lorentz transformations L corresponds to the product of ma-
trices c.
c = exp b, b = b1 h1 + b2 h2 + b3 h3 , Given a Lorentz transformation L SO(3, 1), the quater-
nion c (and thus the matrix c) is defined only up to a sign.
where now bj are also complex-valued. As before, Hence, it is necessary to consider spinors C2 as quantities
only orientation-preserving transformations are generated by that are defined only up to a sign. (In quantum mechanics,
quaternions. Transformations L exp b with all real bj corre- this would be a natural conclusion if the spinor were con-
spond to proper rotations that preserve the 4-vector u sidered as a wave function which is defined only up to a
(1, 0, 0, 0). These rotations form the subgroup SO(3). Trans- phase.) As before, a spinor changes sign after a rotation by
formations with imaginary bj are boosts that preserve two 2. However, now we know how to rotate the spinor in any
spacelike vectors orthogonal to u (see Calculation below). reference frame since we derived the action of an arbitrary
Since every Lorentz transformation is a product of a proper Lorentz transformation on a spinor.
rotation and a boost, and since the product of quaternions cor-
responds to the product of transformations, it follows that for
any L SO(3, 1) there exists a complex-valued quaternion c 7.2 Spinor algebra
such that L =L c.
The role of quaternions in the construction of spinors is il-
Calculation: Derive the explicit formula for the transforma- lustrative but purely auxiliary. Quaternions provide a visual
tion of the coefficients xj of the 4-vector x under x L c x, derivation of the concept of spinor spaces adapted to 3+1-
where c = exp (ih1 ), and show that this is a boost in the di- dimensional spacetimes, as well as a convenient tools for com-
rection x1 . putations with rotations and Lorentz transformations.6 From
Now let us use the matrix representation (7.3) which maps now on, we may forget quaternions and simply assume that
2
quaternions c into 2 2 complex matrices c acting in S C2 . spinors are elements of a complex vector space S = C in
2 which two structures are defined:
The condition |c| = 1 is equivalent to det c = 1, therefore the
set of all the admissible quaternions is the same as the set of
1. A representation up to a sign of the Lorentz group
all 2 2 complex matrices with unit determinant. This set is
SO(3, 1) which rotates by half-angle (the spinor repre-
denoted by SL(2, C) and is called the special linear group in
sentation).
two complex dimensions. It is clear that we have built a group
homomorphism SL(2, C) SO(3, 1). As before, quaternions 2. A fixed 1-to-1 correspondence between 2 2 Hermitian
c and c generate the same Lorentz transformation L c . Thus, matrices and vectors from the Minkowski spacetime R4 .
the map SL(2, C) SO(3, 1) is a 2-to-1 covering.
It is easy to see that the quaternionic conjugation c c Using these properties as building blocks, we shall now study
corresponds to the Hermitian conjugation of matrices, c the tensor algebra generated by the space S and its relation to
c . Therefore the matrix formula describing the transforma- the usual tensor algebra in the Minkowski spacetime. Since
tion (7.5) is quaternions will not be used any more, we shall denote the
c x = cx
L c , spinorial form of Lorentz transformations simply by L s rather
than by c. Also, we shall not distinguish between SO(3, 1)
where c SL(2, C) and the matrix x describing the 4-vector and SL(2, C) since this distinction will play no role in our cal-
x is   culations.
x0 + x3 x1 ix2

x . (7.6) In a real vector space V , tensors of rank (p, q) are defined as
x1 + ix2 x0 x3
tensor products of p copies of V and q copies of the dual space
Note that this matrix is Hermitian, x = x ; thus the subspace V . To build tensors from the space S, we need to consider
of Hermitian quaternions is equivalent to the subspace of the dual space to S. We define the dual space S as the space
Hermitian 2 2 matrices. A concise expression for the matrix of linear functions on S, i.e. functions f : S C such that
x is
X3 f (b + c) = f (b) + f (c), b, c S, C.
x= j xj ,
j=0 Additionally, we can define the complex conjugate dual space
S as the space of anti-linear functions on S, i.e. functions
where j , j = 1, 2, 3 are the Pauli matrices, and we have set
As before, we have f : S C such that
0 1.
(c), b, c S, C.
f (b + c) = f (b) + f
det x = g(x, x) = x20 x21 x22 x23 .
Finally, we define the space dual to S as the space S of com-
In this way, the Pauli matrices j provide a 1-to-1 map be- plex conjugate spinors.
tween the Minkowski spacetime R4 and the space of all 2 2 All the four spaces S, S, S , S are two-dimensional, and
Hermitian matrices.5 Below we shall denote this map by . a basis {a, b} in S naturally  generates basis {a , b }
a dual
Thus we have built the spinor representation of the Lorentz in S , a conjugate dual basis a
, b in S , and a conjugate
group in the spinor space S = C2 . The procedure can be sum- basis a
in S.
, b There is a natural antilinear map from S
marized like this: Every Lorentz transformation L SO(3, 1)
6A definition of spinors for higher-dimensional spacetimes can be formu-
5 Thismap is called the Infeld-van der Waerden map, after the names of its lated using a suitable generalization of quaternions, called Clifford alge-
discoverers. bras.

145
7 Spinors

to S : a linear function f (x) generates an anti-linear function Although the above formula is written in a specific basis, it
whose values are f(x). The corresponding map S S is also can be reinterpreted as the definition of the fundamental 2-
denoted by the overbar, so for instance form . The form is chosen as follows: First we take any
nonzero 2-form S S ; then we multiply by a suitable
+
a + b = a b S if a, b S, C.
constant so that = satisfies Eq. (7.7).
Lorentz transformations act on dual spinors by inverse ma- Once the 2-form is defined on S S, we naturally define
1 the map : S S and the inverse form 1 on S S , as
trices L s , and on conjugate spinors by Hermitian conjugate
s . well as the corresponding conjugate forms. Note, however,
matrices L that the 2-form is antisymmetric and hence there is a choice
Spinorial tensors can be built by taking tensor products of of sign in defining the inverse. For instance, we might debate
the spinor space S and its three conjugates. The rank of a whether the vector 1 a should be defined by
spinorial tensor contains four numbers, for instance, (11, 22),
showing how many copies of each of S, S, S , S participate (1 a , b) a b
in the tensor product. For instance, Hermitian bilinear forms
or by (1 a , b) a b? The convention is to use the first
are (0, 1 1)-tensors on S. Complex conjugation is then a natu-
lm
  argument of (as shown above) but the second argument of
rally defined map from j k, -tensors to kj, ml -tensors.
1 . In the index notation, this corresponds to the following
Since we have many different types of tensors, it is in-
definitions,
evitable that we must eventually use the (abstract) index nota-
tion. It is traditional to denote indices in spinor spaces by cap- aA BA = aB , bB BA = bA , AB BC = A C A C.
ital letters, and indices in conjugate spaces by a prime, thus
In the index notation, additional trouble arises because A C =
TAB denotes a rank (0, 1 1) spinorial tensor. Note that the or-
C A if we raise and lower indices using AB , so one uses the
der of primed and unprimed indices is insignificant: TAB and
unambiguous notation A C = C A . The same conventions
TB A is the same object.
apply to the conjugate forms A B and A B .

7.2.1 The fundamental 2-form Statement: Show that the fundamental 2-form is invariant
under Lorentz transformations.
Since the spinor space S is two-dimensional, every 2-form on s = 1.
Hint: det L
S is proportional to an arbitrarily selected one. There is in de-
Solution: A Lorentz transformation of is the 2-form L
fact a naturally selected 2-form, which we shall call the fun- fined by
damental 2-form and denote by . We shall now define the
2-form by using the map and the Minkowski metric g.
L(a, b) (L s a, L s b) = (L
sa L
s b)
The map transforms Hermitian matrices into 4-vectors, = (det L s ) (a b) = (a, b).
and we need to be more specific about the space of such ma-
trices. Suppose h is a Hermitian matrix and h R4 is In the index notation: The transformed AB is
the corresponding 4-vector. Since the formula for the Lorentz AB = LC D
A LB CD
transformation is
  which is antisymmetric in A, B and thus should be propor-
(
L h) = L s hL
s , tional to AB . Contracting with AB , we find the coefficient of
proportionality to be equal to (det L)2 = 1.
it is clear that h is a spinorial tensor of rank (1
1, 0), i.e. h
S S. In the index notation, we may write the application of
the map to such a tensor as 7.2.2 Relationship of spinors and vectors
AA Given the fundamental 2-form , we may choose a preferred
(
h) = AA h .
basis adapted to the form. The traditional notation for the ba-
Moreover, the property of being Hermitian means that h = h. sis is {o, }, where o, S and (o, ) = 1. This is as close as

Hence, the Minkowski metric g defines a bilinear form in the possible to an orthogonal basis, given that is antisymmetric.
We shall temporarily denote this bilinear form The basis {o, } is called the spin basis or the spin frame. The
space S S.
by angular brackets, basis vectors o, naturally induce to spin frames in the dual,
conjugate, and dual conjugate spaces. It follows that the fun-
hh1 , h2 i g (
h1 ,
h2 ) . damental 2-form is written as
On the other hand, we have (with some choice of basis in S) the AB = oA B A oB ; 1 = o = o o .
explicit formula (7.6), from which it follows that the determi-
Using the spin frame, we obtain a natural basis in the (four-
nant of a Hermitian matrix h is equal to g( h,
h). The deter-
and it is dimensional) space of (11, 0)-tensors:
minant of h is a quadratic form in the space S S, n o n o
clear from the above equation that this quadratic form gener-
AA , nAA oA oA , oA A , A oA , A A
lAA , mAA , m

ates the bilinear form hh1 , h2 i. However, the determinant can


be expressed through the Levi-Civita symbol as {o o, o , o, } .

1 Transforming these basis tensors by the map , we obtain the


det h = AB A B hAA hBB . 4-vectors {l, m, m,
n}. The vectors l and n are real-valued but
2
m and m are complex-valued, because the corresponding ba-
It follows that, in the same basis where Eq. (7.6) holds, sis elements are not Hermitian. It is easy to see, however, that
1 each of the vectors {l, m, m,
n} is null. In fact, these vectors

g AA BB = AB A B . (7.7) are a Newman-Penrose null tetrad (see Sec. 2.5).
2

146
7.2 Spinor algebra

(aAbA ) is a null vector when aA S,



Statement: Show that 7.2.3 Simplification of spinorial tensors
bB S are two arbitrary spinors. Show that the tetrad
{l, m, m, n} defined above satisfies the required properties of Since the spinor space S is two-dimensional, its properties
a NP null tetrad: g(l, n) = 1, g(m, m) = 1, g(n, n) = 0, etc. are significantly simpler than those of a higher-dimensional
Find a linear transformation of the spin basis {o, } that pre- space. For instance, every antisymmetric tensor of rank (0, 2)
serves 1 = o and induces the tetrad transformation (2.21)- is proportional to . Hence, an arbitrary rank (0, 2) spinorial
(2.22). tensor TAB can be simplified into a symmetric tensor and a
multiple of ,
Hint: Use Eq. (7.7).
Answer: The spin basis can be transformed as ei/2 , 1 1
TAB = (TAB + TBA ) + (TAB TBA )
o oei/2 + Aei/2 , where and A are the parameters of 2 2
Eqs. (2.21)-(2.22). 1 CD

= T(AB) + TCD AB .
Since the map is fixed, we can identify 4-vectors and 2
(11, 0) spinorial tensors. Let us now consider how we could
Moreover, every spinorial tensor can be reduced to a combi-
construct some tensor objects out of spinors and in this way
nation of products of the 2-form and some totally symmetric
obtain a geometric interpretation for spinors.
tensors.8 For example, a rank (0, 22) tensor can be decom-
Firstly, a spinor c S defines a null vector nc = (c c). posed as
A A
In components, this is written as n = AA c c
or more

concisely nAA = cA cA . (The map is usually omitted for 1
TABA B = T(AB)A B + TCDA B CD AB .
brevity since it is a fixed structure relating the spinor and the 2
vector spaces.) One may say figuratively that a spinor is a The same decomposition can be applied to the primed indices.
square root of a null vector. The null direction nc is the prin- Now let us consider two special cases. Suppose the tensor
cipal geometric information contained in a spinor c. However, TABA B comes from an antisymmetric 4-tensor T , we have
spinors differing by a phase, cei , define the same null vector TABA B = TBAB A , and then
nc . The phase information can be extracted if we define the
flag bivector TABA B = AB A B + AB A B ,

F(c) c c + c c, where the symmetric spinorial tensor AB of rank (0, 2) is de-


fined by
F AA BB cA cB A B + AB cA cB . (7.8) 1
AB TABA B A B .

2
The spinorial tensor F AA BB is equivalent to an antisymmet- If, on the other hand, TABA B comes from a symmetric 4-
ric bivector F satisfying tensor T , then
1  
F n = 0, F F = 0. (7.9)
TABA B = T(AB)(A B ) + AB A B TCDC D CD C D .
4
It follows that there exists a (c-dependent) vector kc such that (7.10)
Thus the only nontrivial spinorial tensors we need to con-
Fc = nc k, F = n k n k , g(nc , kc ) = 0. sider are totally symmetric ones. For those, the following de-
composition property holds.
The choice of the vector kc is up to a multiple of nc , so the pair
Theorem: Every totally symmetric spinorial tensor can be
(nc , Fc ) only defines a null plane containing the null direction
decomposed into a symmetric product of spinors:
nc . This geometric configuration is similar to a flag that has
a flagpole nc and the flag plane {kc , nc }; so we call a pair T(AB) = a(A bB) , T(ABC) = a(A bB cC) ,
(nc , Fc ) a flag corresponding to a spinor c. Spinors c and c
differing by sign give the same flag, and conversely, a pair etc. The null flagpoles na , nb , etc., corresponding to the
(n, F ) determines a spinor c up to a sign.7 spinors a, b, etc., are called the principal null directions of
the spinorial tensor T .
Calculation: Show that a phase rotation, c ei c, rotates
the flag plane by the angle 2. Proof: A totally symmetric tensor of rank n is completely
These properties offer a possible geometric interpretation of determined by its values on a single vector; e.g. T (x, y, z) can
spinors and illustrate the fact that spinors are related to null be restored if we know T (x, x, x) for every x. Setting x =
directions. Since the flag bivector carries the entire physical o + z, where o and are two spinors building a spin frame,
information in a spinor (the sign of a spinor is not observ- we find that T (x, x, x) is a cubic polynomial in z. A cubic
able), one could in principle forego spinors and reformulate polynomial can be factorized as
every spinor equation in terms of unambiguously defined flag
T (x, x, x) = (a0 + a1 z) (b0 + b1 z) (c0 + c1 z) ,
bivectors. However, flags are quadratic in spinors, and thus
linear equations of motion for spinor fields will become non- where a0 , a1 , ..., c1 are suitable complex numbers (not
linear tensor equations if written in terms of flags. Spinors uniquely determined). We can interpret these numbers as the
provide a simpler formulation of many equations, especially spin frame components aA , bA , cA of three spinors a, b, c.
those with massless fields for which null directions are espe- Since TABC is totally symmetric, we have obtained a decom-
cially important. position T(ABC) = a(A bB cC) . The general case of a spinorial
tensor of rank n is treated analogously.
7 This
can be shown following [36], Eqs. (13.1.47)-(13.1.50), and deriving an
explicit formula for c given a pair (n, F ). 8 See [27] or [32] for a complete proof.

147
7 Spinors

Remark: Classification of 4-tensors. Since every where is defined above as a standard, fixed matrix repre-
Minkowski 4-tensor can be mapped into a spinorial ten- senting the fiducial Minkowski metric. These four vectors
sor, we may reduce every 4-tensor first to totally symmetric obviously form a basis in the tangent space Tp M, and such a
spinorial tensors, and then to products of individual spinors. basis is called an orthonormal tetrad. We can always choose
In this way, every 4-tensor gives rise to a set of principal such a tetrad {ej (p)} in some neighborhood of any point p.
null directions. This construction provides a classification The tetrad {ej } can also be viewed as a map from an auxil-
of 4-tensors based on how many of their principal null iary Minkowski space R4 to the tangent space Tp M, whereby
directions coincide. a 4-vector (x0 , x1 , x2 , x3 ) R4 is mapped into the spacetime
P3
vector x j=0 ej xj Tp M. To contrast 4-vectors and 4-
Calculation: Derive the spinorial representation of the Levi-
tensors in the fiducial Minkowski space R4 with vectors and
Civita symbol by mapping it to spinorial tensors and
tensors defined through tangent spaces Tp M in the actual
reducing to products of the fundamental 2-form AB .
curved spacetime, the latter are sometimes called world vec-
Answer:
tors and world tensors. Since {ej (p)} is a basis, we obtain a
AA BB CC DD = i(AB CD A C B D A B C D AC BD ). 1-to-1 correspondence between R4 and Tp M; it is easy to see
(7.11) that the inverse map Tp M R4 is defined as x (xj ), where
xj g(x, ej (p)), j = 0, 1, 2, 3.
Statement: Show that the flag bivector F given by The construction of spinors at a point p in a curved space-
Eq. (7.8) satisfies time manifold M uses the fixed, Minkowski-space matrices
j
F F = 0. (7.12) (M) AB
, j = 0, 1, 2, 3. We define the modified map

(p) : S S Tp M by using a tetrad {ej (p)} which maps the

This is an additional property of flags, besides the properties result of (M) into Tp M:
given by Eq. (7.9).
3
X
(hAB ) = j AB
AB h ej Tp M.
7.3 Equations for spinor fields j=0

We have worked in the Minkowski spacetime R4 , interpreted This defines the curved-spacetime map (p) which can be
as the tangent space at a single point of a spacetime manifold. thought of a world-vector-valued spinorial (0, 11)-tensor,
Now we shall consider a (curved) spacetime where a spinor
3 3
field should be defined at every point. X j
X j
AB (p)
AB ej (p);
AB AB ej .

j=0 j=0
7.3.1 Spinors in curved spacetime
Due to the defining property (7.13) of a tetrad, we have
A spinor field is, roughly speaking, a spinor-valued function
on a manifold M; thus we would like to write (p) S to 3
X

denote the value of the spinor field at a point p. However, g (p)
AA (p)

BB (p) = g (p)ej (p)AA
j k
ek (p)BB

spinors are tied to the Lorentz transformations in a tangent j,k=0


space Tp M, and tangent spaces differ at different points of the 3
X j k
manifold. Therefore, we cannot directly compare the values of = jk AA BB .

spinors at different points. j,k=0


A precise picture of a spinor field is provided by the con-
struction of a vector bundle (see Sec. 6.3). We consider a Therefore, the spinorial 2-form AB automatically satisfies the
spinor bundle with the spacetime M as the base and the property analogous to Eq. (7.7) at every point p,
spinor space S as the fiber. Thus, we would like to have a 1

separate copy of the spinor space S at each point p M. A g (p)
AA (p)
BB (p) = AB A B .
2
spinor field is defined9 as a (smooth) section of the spinor bun-
dle; thus, the value (p) at a point p is a spinor belonging to Note that the definition of the spinor spaces S(p) involves
the copy S(p) of the spinor space. an arbitrary choice of an orthonormal tetrad {ej (p)}. Differ-
In the previous section, we constructed the spinor space ent choices of the tetrad are related to teach other with Lorentz
S through the Minkowski metric, diag(1, 1, 1, 1). transformations. (For any two orthonormal tetrads {ej } and

However, we now need to assign a spinor space S(p) to each ej , there exists a unique Lorentz transformation L such that
point p in a curved spacetime where the Minkowski metric is j = e .) The map
Le and the 2-form AB are invariant under
j
replaced by a general p-dependent metric tensor g(p). There- Lorentz transformations. Thus, any Lorentz-invariant spino-
fore, we need to modify the construction of the spinor space,
rial expression, such as AB A B g AA BB , will be mapped
so that the map and the antisymmetric tensor AB are com-
into a world scalar, independently of the choice of the tetrad.
patible with the metric g(p) at each point. This is achieved by
the following construction. Remark: Since spinors are not defined directly through
Consider a set of four vectors {ej }, j = 0, 1, 2, 3 such that points of the manifold, there is no concept of arbitrary point-
wise transformations (diffeomorphisms) directly applied to
g(ej , ek ) g ej ek = jk , (7.13) spinor fields! The only transformations that apply to spinors
9 We ignore possible topological difficulties with the construction of a spinor are Lorentz transformations L s within the spinor space. Thus
bundle for a given base manifold M. The spinor bundle exists for all one cannot define the change in a spinor field due to the flow
physically relevant spacetimes M. of an arbitrary vector field v on M; in other words, the Lie

148
7.3 Equations for spinor fields

derivative Lv of a spinor with respect to a vector is un- F is the antisymmetric tensor F . We have seen that such a
defined.10 However, this does not mean that the presence of tensor can be reduced to a symmetric spinorial tensor AB by
spinors in a theory violates general covariance. General point
transformations can be applied to the tetrad vectors ej and 1
AB = A B F AA BB ,
thus effectively change the map , which is the only connec- 2
= AB A B + AB A B .

tion between the spinor space S and the tangent space Tp M. F AA BB
Field theories involving spinors can be formulated in terms of
Lorentz invariants in the spinor space, and invariants involv- Now we shall show that the Maxwell equations can be written
ing the map and the world metric g, and then these theories concisely in terms of AB , namely
are generally covariant.
1
AA AB = A B j BB , (7.14)
2
7.3.2 Covariant derivative on spinors
where j BB j BB is the spinorial version of the familiar
 
The Levi-Civita connection (but not any other connection, charge 4-current j , ~j .
see below!) can be uniquely extended to a covariant deriva-
It is convenient to define the Hodge dual F which is again
tive v on spinor fields. In addition to the usual properties
a 2-form,
(linearity, Leibnitz rule, torsion-freeness), the Levi-Civita con- 1

nection satisfies F F ,
2
v = v , and to rewrite the Maxwell equations equivalently as
v AB = v AB = 0, F = j ,
v
= 0. ( F ) = 0.

Naturally, the gradient operator is also equivalent to the (The last equation is equivalent to [ F ] = 0, i.e. dF = 0,
(0, 1
1)-tensor AA in the spinor space, which is a consequence of the identity ddA = 0, which is

equivalent to F = dA.) Using the explicit expression (7.11)
v v v AA AA . for the Levi-Civita symbol , it is straightforward to ver-
ify that
Perhaps, it appears strange that such simple operations
as the coordinate derivative cannot be defined on spinor AA BB
F AA BB + i (F ) = 2AB A B .
fields. One way to explain the fact that only the Levi-Civita
connection is defined on spinors is to consider the flag field Then, the Maxwell equations are equivalent to
(n , F ) corresponding to a spinor field . Suppose we at-  
tempt to define a parallelly transported spinor using some
j = (F + i F ) = 2AA AB A B
connection . We say that is parallelly transported along a 
vector v if v A = 0. Then we expect that the flag (n , F ) = 2 AA AB A B ,
should also be parallelly transported along v. However, the
basic properties (7.9), (7.12) of a flag (the vector n should be since AB is constant under . Multiplying both sides by
null, F should be transverse to n , etc.) involve the metric B C , we find that Eq. (7.14) is equivalent to the Maxwell
g. Unless the connection is compatible with the metric g, a equations.
parallelly transported flag will fail to remain a valid flag; for
Statement: Consider a spinorial tensor AB = A B , where
instance, the vector n may fail to remain null, F may fail
A is some spinor, and show that the corresponding tensor
to remain transverse to n , etc. Only the metric-compatible
F describes an electromagnetic wave in vacuum.
(Levi-Civita) connection guarantees that the pair (n , F ) will
remain a valid flag after an arbitrary parallel transport. Hint: In this case, F is the flag of the spinor , so F
must satisfy the properties (7.9), (7.12) of a flag. For an elec-
Another argument is based on the concept of associated
tromagnetic field described by 3-dimensional vectors E ~ and
bundles. The spinor bundle is associated to the gauge group
~
B, we have
SL(2, C), which is essentially equivalent to the Lorentz group
SO(3, 1). The Levi-Civita connection is the gauge-invariant  
connection with respect to the gauge group, because is com- F F = 2 |B| ~ 2 |E|
~ 2 ,
patible with the metric (see Sec. 6.3.5). No other connection is ~ B.
~
F F = 2E
admissible in the associated bundle. Thus, no other connec-
tion can be defined on spinors. Calculation: Show that the energy-momentum tensor of the
electromagnetic field is
7.3.3 Maxwell equations 1 1
 

T = g F F F F g
The electromagnetic field corresponds to a particle of spin 1 4 4
and is described by a 2-form F = dA, where A is a 1-form rep- 1
TAA BB = AB A B .
resenting the electromagnetic potential. In the index notation, 2
10 Unless the vector v is a conformal Killing vector for the metric. See [27], Hint: Use Eq. (7.10) to simplify the spinorial form TABA B
vol. 2, 6.6. of the symmetric tensor T .

149
7 Spinors


The spinorial form of the Maxwell equation has its roots A B in Eq. (7.17). Therefore, only the term containing YC sur-
in group theory.11 In a heuristic language, this is the sim- vives. After some simplification, we find the equation for
A ,
plest and the most natural relativistic equation for a symmet-
ric spinorial tensor of rank (2, 0). 1 1
BA A = BA AA A = B = m2 B .
2 2

7.3.4 Dirac equation The1conventional definition of the auxiliary 1field A


is
2m A , which absorbs the awkward factor 2 m2 . Then the
The Dirac equation describes a spinor field A correspond- equations acquire a more symmetric form,
ing to a massive particle of spin 21 . We shall first consider the
Dirac equation in Minkowski spacetime and then generalize m
AA A = A ,
to a curved spacetime. 2
The simplest nontrivial Lorentz-invariant equation for a m
BA A = B .
field A would be 2

 + m2 A = 0, (7.15) At this point, we are done rewriting the second-order equa-
tion (7.15) as a system of first-order equations. We now intro-
where the DAlambert operator  is duce a four-dimensional, complex-valued
 vector S S
B
instead of the pair , A , and write the above system of

 = AA AA . equations as a first-order equation for ,

Diracs main motivation was to rewrite this second-order = m, (7.19)


equation as a system of first-order equations. The standard
way to achieve this is to introduce extra fields for first deriva- where are the required 4 4 complex matrices called the
tives. So let us introduce an auxiliary (complex conjugate and Dirac matrices,
dual) spinor field A , which will be the first derivative of A ,
 
0 AA
= 2 .
A AA A .
BB 0

We expect that the equation for A will be of the form The vector is called a Dirac spinor.
BA The Dirac equation in curved spacetime is Eq. (7.19) rather
A = (...), and we shall now derive that equation.

than Eq. (7.15). In the presence of curvature, these equations
We need to simplify the expression
are not equivalent since covariant derivatives do not com-
BA BA A mute.
A =
AA . (7.16)

Let us first consider a more general spinorial tensor of this


form,
XAA BB C BB AA C ;
when we are done simplifying XAA BB C , the term (7.16) will
be expressed as

BB AA A = A B AC BD XDB AA C . (7.17)

Since we are working in flat space where the curvature


identically vanishes, the covariant derivatives commute (even
when acting on arbitrary tensors or spinors), so

XAA BB C = XBB AA C .

Thus we can apply the decomposition (7.10), suitable for sym-


metric 4-tensors, to the first four indices of XAA BB C :
1
XAA BB C = X(AB)(A B )C + AB A B YC , (7.18)
4

YC XAA BB C AB A B .

It is easy to see that



YC = AB A B AA BB C = g C = C .

When we substitute Eq. (7.18) into Eq. (7.17), the symmet-


ric term X(AB)(A B )C will be canceled after contracting with
11 Itcan be shown the homogeneous version of the spinorial Maxwell equa-
tion, AA AB = 0, determines the irreducible representation of the
Poincar group having mass 0 and spin 1.

150
A Elements of Special and General Relativity
This is a brief review of basic concepts of relativity and dif- is an auxiliary four-dimensional vector space M with co-
ferential geometry. In this appendix I mostly use the tradi- ordinates {t, x, y, z}. This space M R4 is called the
tional index notation to facilitate comparison with the preva- Minkowski spacetime. In the spacetime picture, events
lent physics literature. The material in this appendix is as- x M, x {t, x, y, z} are points of the spacetime (vec-
sumed known in the main part of the book; the explanations tors such as x are denoted by boldface letters). Bodies fol-
here cover only a portion of the material in standard relativity low worldlines x(t), y(t), z(t) which may be drawn as lines
textbooks. A good introductory textbook is [29]. in M and parametrized by functions x(t), y(t), z(t). It is usu-
Special Relativity (SR) is a theory describing the motion ally more convenient to parametrize worldlines by four func-
of light and point masses in cases when gravity can be ne- tions t( ), x( ), y( ), z( ), where is a real-valued parameter.
glected. The name General Relativity (GR) is used to denote Then a worldline can be represented by a point-valued func-
Einsteins theory of gravitation, which he developed as a gen- tion x( ).
eralization of SR. I shall now review the mathematical foun- A Lorentz transformation between inertial frames will then
dations of these theories, omitting most proofs. be seen as a linear transformation of the coordinates in the
spacetime, {t, x, y, z} {t , x , y , z }. However, it is natu-
ral to think of the spacetime as an independent arena where
A.1 Special Relativity events happen, regardless of coordinates introduced in it. In
other words, the coordinate values (t, x, y, z) ascribed to an
The theory of Special Relativity is based on two main postu- event x by a particular inertial observer may vary, but the
lates: (i) All the laws of physics are the same in every inertial event x happens by itself at a particular place-and-time,
reference frame.1 (ii) The speed of light, denoted c, is inde- whether observed or not. (We may imagine the spacetime M
pendent of the speed of the light source. One can derive all as a chart containing complete histories of all the bodies and
the statements of SR from these two postulates. For instance, a complete record of all the events that happened or will ever
it is shown that no massive body can be accelerated from rest happen.) To differentiate between ordinary three-dimensional
to a velocity greater than or equal to c. Also, the distance x vectors and four-dimensional vectors from M, the latter are
and time t measured in a moving reference frame must be dif- frequently called 4-vectors.
ferent from the distance x and time t measured in a rest frame. It is easy to see that the Lorentz transformation preserves
Namely, the results of these measurements are related by the the quadratic form,
Lorentz transformation
g(x, x) t2 x2 y 2 z 2 ,

ct = (ct x) , x = (x ct),
called the relativistic interval or the Minkowski metric. In
1 v
p , , fact, the metric g has a far greater significance than merely
1 2 c
an invariant of Lorentz transformations. The most important
use of g is in computing distances and time intervals between
where we assume that the motion is in the positive x direction,
events. Given two events, x and y, we may ask whether an
and < 1 is the relative velocity of the two reference frames,
inertial observer might pass through both x and y. If this is
measured in the units of c. Since c is an absolute constant, one
the case then these events will have coordinates {t1 , 0, 0, 0}
can measure velocities in the units of c and distances in the
and {t2 , 0, 0, 0} in this observers rest frame. The time interval,
units of time. This choice of measurement units is mathemati-
t2 t1 , measured according to this observers clock, is called
cally equivalent to setting c 1 in all equations, and we adopt
the observers proper time interval between x and y. This
this convention everywhere in this text.
interval can be computed as
The fact that coordinates and times are generally not mea-
sured to be the same in different reference frames is at the q p
core of the theory of relativity; for instance, events seen as si- (x, y) (t2 t1 )2 = g(y x, y x).
multaneous for one observer may precede one another for a
different observer. Thus, simultaneity is not absolute but is The 4-vector y x connecting the events in the Minkowski
only defined relative to an observers reference frame (this is space is then called a timelike vector. The last term in the
one justification for the name relativity). above formula is expressed through g and is thus Lorentz-
invariant. Hence, any observer can compute the proper time
(x, y) using this formula.
A.1.1 Spacetime Another possibility is the existence of a lightray connecting
the events x and y. Lightrays are worldlines of bodies that
Rather than use distances and times measured in different move with the speed of light; for example, the line x = t, y =
reference frames, one adopts a more convenient way to de- z = 0 is a lightray. It is easy to see that the squared interval in
scribe events. Namely, one introduces the spacetime, which this case is g(y x, y x) = 0. The 4-vector y x connecting
1 Of course, it is also postulated that such inertial reference frames exist and, the events is then called a null vector.
in particular, are approximately realized in a laboratory freely floating in Finally, there may be an inertial observer for whom
empty space. the events x, y are simultaneous and thus have coordinates

151
A Elements of Special and General Relativity

{t0 , x1 , x2 , x3 } and {t0 , y1 , y2 , y3 }. In this case, g(yx, yx) < A.1.2 Motion of bodies in SR
0, the 4-vector y x is called a spacelike vector, and the real
number The motion of a massive body is described by a worldline
x( ), where is an arbitrary parameter. It is convenient to
v
u 3 choose to be the proper time in the bodys rest frame. With
uX 2
p that choice, many equations are simplified; in particular, the
L(x, y) t (yj xj ) = g(y x, y x)
j=1
4-vector called 4-velocity, defined by

d
is equal to the distance between the points x, y in the reference v x x,
d
frame where both events occur simultaneously. The quantity
L(x, y) is called the proper distance between the events. satisfies g(v, v) = 1 because g(v, v) is frame-invariant and in
Again, this distance can be calculated in any other reference the bodys rest frame we have x = {, 0, 0, 0} and thus v =
frame using the above formula involving g. {1, 0, 0, 0}.
Conversely, it can be shown that the sign of g(y x, y x) The 4-acceleration,
unambiguously specifies which of the three above cases (time-
d d2
like, null, or spacelike) takes place. Thus, the metric g pro- a v = 2 x,
vides not only a means to compute the proper time and d d
the proper distance between events in an arbitrary reference is always orthogonal to v in the sense of the metric g, namely
frame, but also classifies pairs of events according to whether g(v, a) = 0. When is the proper time, a noninteracting body
their separation is timelike, null, or spacelike. Since no ma- moves according to the equation
terial bodies or signals can propagate faster than light, only
events separated by a timelike or null interval can influence d2
x = 0.
or cause each other. Hence, the metric g determines which d 2
points of M can causally influence each other, i.e. describes
Solutions of this equation are straight worldlines, which de-
the causal structure of the spacetime.
scribe motion with constant velocity.
We have arrived at a picture of the Minkowski spacetime The above equation of motion can be derived from a varia-
M whose points are events and where a metric g is defined. tional principle,
Reference frames corresponding to different inertial observers Z p
are merely coordinate systems that we may introduce on M. x)d
g(x, = 0,
Using this picture, known physical laws can be reformulated
only using 4-vectors x and the metric g, without reference to a which means that the inertial trajectory is an extremum of the
particular coordinate system on M. In other words, the laws proper time among all the possible worldlines x( ). (It can be
of physics have a frame-invariant character compatible with shown that the extremum in fact a global maximum.) With the
the Lorentz transformations, which means that these laws are normalization
R x)
g(x, = 1, the variational principle becomes
relativistic (agree with Special Relativity). simply d = 0.
Note that, according to a standard result of linear algebra, To describe a body of rest mass m that interacts with other
the quadratic form g(x, x) gives rise to a symmetric bilinear bodies, one writes the action
form g(x, y) which we may define by Z Z
S = m d + Lint [x, x]d,
1
g(x, y) = (g(x + y, x + y) g(x, x) g(y, y)) .
2 is a function of the worldline
where the Lagrangian Lint [x, x]
x( ) that represents interactions with other bodies. The tra-
Clearly, g plays the role of the scalar product in the 4- jectory x( ) of the body should extremize L with respect to all
dimensional vector space M. In coordinates: if x possible worldlines. The resulting equation of motion is the
{x0 , x1 , x2 , x3 } and y {y0 , y1 , y2 , y3 } then relativistic Newtons law,

g(x, y) = x0 y0 x1 y1 x2 y2 x3 y3 . d2
m x = f, (A.1)
d 2
This bilinear form is also called the Minkowski metric. Two where the 4-vector f is the 4-force which may in general
4-vectors x, y are called orthogonal to each other if g(x, y) = depend on x and x. The 4-vector p mx is called the
0. Note that the unusual signs in the metric g make the geo- 4-momentum and carries the information about the energy
metric interpretation of the scalar product g(x, y) and of the and the momentum of the body, p = {E, p1 , p2 , p3 }, where
orthogonality somewhat difficult. For instance, a null vector {p1 , p2 , p3 } p~ are the components of the 3-momentum. Note
x, such as {1, 1, 0, 0}, is orthogonal to itself since g(x, x) = 0. the normalization, g(p, p) = m2 , which implies the energy
However, in any given reference frame, the time direction is law p
always orthogonal to every spatial direction, and the three E = m2 + p~2 .
spatial directions are also mutually orthogonal, just like in
normal Euclidean geometry. From the point of view of lin-
ear algebra, the assumption of an inertial frame is equiva- A.2 Index notation
lent to choosing a basis {e0 , e1 , e2 , e3 } in the vector space M
such that e0 is timelike, e1,2,3 are spacelike, and g(e0 , e0 ) = 1, Let us now introduce the index notation. In a chosen reference
g(e0 , ej ) = 0, g(ej , ek ) = jk for j, k = 1, 2, 3 (the basis vec- frame, a 4-vector x has four coordinates (also called compo-
tors are orthonormal with respect to the bilinear form g). nents) that are usually denoted x0 , x1 , x2 , x3 with an upper

152
A.3 Transition to General Relativity

(superscript) index; the index value 0 means time and the A linear function s of a vector v must be a linear combi-
values 1, 2, 3 mean spatial coordinates. The scalar product of nation of the components v , = 0, 1, 2, 3, i.e. the function s
two 4-vectors x, y can then be written as must have the form
3 X
X 3 s(v) = s0 v 0 + s1 v 1 + s2 v 2 + s3 v 3 s v .

g(x, y) = g x x ,
=0 =0 The numbers s are the components of the covector s. Thus,
in view of the Einstein summation convention, it is natural to
where g is a 4 4 matrix, denote covectors by letters with lower indices.
A standard example of a covector is the function which
1 0 0 0 maps x to the scalar product of x with a fixed vector b:
0 1 0 0
g diag(1, 1, 1, 1) . s : x g(b, x).
0 0 1 0
0 0 0 1 (This is the slope of the hyperplane z = g(b, x), where the
coordinate z is the ordinate axis of the plot.) It is easy to
The low (subscript) position of indices in the symbol g is
see that the components of the covector s are s = g b . It is
not accidental. For brevity, it is customary to use the Einstein
customary to refer to the relation between the vector b and the
summation convention, which consists of dropping the sum-
covector s as the lowering of an index of b , and to denote
mation signs and implying summation every time an identical
the two objects b, s by the same letter (rather than by different
index letter appears once as a subscript and once as a super-
letters as I have done here). Thus, one writes b instead of
script. Then, g(x, y) is written as g x x . Also, one does not
s and calls it the covariant version of the vector b or the
distinguish between a vector x and the array x of its com-
vector b with the index lowered. In this way, the metric
ponents in some basis; one simply says a vector x . This is
g can be seen as the operator that lowers indices, while
the essence of the index notation: It is always assumed that a
the inverse metric g raises indices. (In a more geometric
basis is chosen and that one is performing calculations with
language, g is a map from vectors to covectors and g is
components of vectors and tensors in the chosen basis. All 4-
the inverse map.)
vectors and 4-tensors then appear as tables of numbers, which
are indexed by (Greek) letters, for example: x , g , .
A Lorentz transformation is described by a matrix which A.3 Transition to General Relativity
acts on 4-vectors x as
Einsteins main motivation for introducing General Relativ-
: x x ; x = x . ity was to remove the restriction to inertial frames and to ad-
mit arbitrary non-inertial, that is, accelerated reference frames.
It is easy to see that the condition of Lorentz invariance of the Since gravitation is locally equivalent to an accelerated refer-
metric is written as ence frame (the equivalence principle), he hoped to achieve
a description of gravitation in this way.
g = g . Once we admit arbitrary non-inertial reference frames, we
must assume that the coordinates x may be curvilinear, while
Furthermore, one introduces the Kronecker delta symbol
the metric g may be non-diagonal and generally coordinate-
, which corresponds to an identity matrix, and the inverse
dependent. Thus, the theory of General Relativity is obtained
metric g such that
from Special Relativity by the following steps:
(1) We replace the Minkowski spacetime M = R4 , which is
g g = .
itself a vector space, by an arbitrary four-dimensional space
Numerically, g = diag(1, 1, 1, 1) is the same matrix as M, called the spacetime manifold. Coordinates x in M are
g . chosen arbitrarily.
One also introduces the Levi-Civita symbol which is (2) The Minkowski metric = diag(1, 1, 1, 1) is re-
a rank 4 totally antisymmetric tensor, defined by the relation placed by a symmetric tensor g (x) = g (x) which can
0123 = 1 in an inertial frame. depend on the coordinates x . Therefore, we imagine that
M is an arbitrarily curved spacetime rather than a flat
Finally, it is often convenient to work with dual vectors,
Minkowski spacetime. (A spacetime manifold is flat if there
also called covectors or covariant vectors. A covectors can be
exists a coordinate system in which the metric tensor is
pictured geometrically as the slope of the (multidimensional)
g (p) = at every point p. A manifold is curved if no
graph of a function of x . The slope at a chosen point de-
such coordinate system can be found.)
scribes the derivative of a function in every direction at once,
(3) The signature of the matrix g must remain (+ )
in the following sense. The derivative of a function f (x) in the
everywhere. In particular, for each event p M there
direction given by a vector v is
must exist a local coordinate system where g (p) =
f diag(1, 1, 1, 1). Moreover, there exists a local coordinate
Dv f v . system (called a locally inertial frame) where the physical
x
laws at the event p are exactly the same as in Special Relativ-
Thus the slope of a function f (x) is adequately described by ity in the absence of gravitation. Thus, gravitation is locally
the collection of components s f /x . If s is the slope equivalent to a non-inertial reference frame (the equivalence
of a function, then the derivative of a function in a direction principle).
v is the number s v . Thus, covectors can be thought of as (4) Physical laws, which were previously made relativis-
linear functions of vectors. tic (compatible with arbitrary inertial frames and rewritten

153
A Elements of Special and General Relativity

through the Minkowski metric), are now reformulated in a A.4 Covariant derivative
way that is independent of the choice of coordinate systems
and admits arbitrary, non-inertial coordinates. In particular, The requirement (4) above means, in particular, that the rela-
in every equation the Minkowski metric is replaced by the ten- tivistic Newtons law (A.1) should be rewritten so that it holds
sor g (x). Scalar products and the raising/lowering of tensor in all coordinate systems. An immediate difficulty with this
indices is performed using g (x). requirement is that the laws of physics involve derivatives of
(5) The theory prescribes equations (the Einstein equations) vectors, and taking a derivative of a vector or a tensor in an
that can be used to determine the metric g (x) from a given arbitrary coordinate system in curved space is a somewhat
distribution of matter in the entire spacetime. The Einstein complicated operation. For instance, d2 x /d 2 is not a cor-
equations involve the curvature of the spacetime and the rect expression for the -th component of the 4-acceleration,
energy-momentum tensor of matter (see below). unless we are using Cartesian coordinates in flat Minkowski
It follows from (1) that the coordinates x are not them- spacetime. Let us examine this problem in more detail; we
selves 4-vectors any more, and for instance the coordinates x shall begin by considering derivatives in curved coordinates
and y of two events cannot be simply subtracted to obtain in flat space, and then generalize to curved space.
a 4-vector y x . Moreover, the theory must be invariant
under arbitrary coordinate transformations, A.4.1 Curved coordinates
x x
(x0 , x1 , x2 , x3 ). (A.2) Suppose that x ( ) is the worldline of a particle in the flat
However, the derivative x dx /d actually is a 4-vector be- Minkowski spacetime specified in a rectangular coordinate

cause it is the difference of coordinates at infinitesimally close system {x }, and suppose that { x } is a curvilinear coordi-

points, and such points belong to an almost-Minkowski lo- nate system related to {x } by four given functions f (x),
0 1 2 3
cal environment which exists according to (3). The fact that = 0, 1, 2, 3, as x = f (x , x , x , x ). The components of

u dx /d is a 4-vector can be also formally established by 4-velocity in the coordinate system {x } are u dx /d ,
2 2
considering the change of the components u under the trans- and the components of the 4-acceleration are a d x /d .


formation (A.2), In the coordinate system { x }, the new components of the 4-
velocity, u , and the 4-acceleration, a , must be calculated ac-
d x dx x
u u x
= = u , (A.3) cording to Eq. (A.3):
d x d x
f f f du
which is the usual linear transformation law for components =
u u , a

= a = . (A.4)
of a vector under a general change of basis. x x x d
Hence, the metric g (x) still determines proper times The worldline of the particle in the new coordinate system is
and proper distances, but only between infinitesimally close x ( ) f (x( )), and so one might try to compute the 4-
events. To make this statement, one frequently writes a some- velocity and the 4-acceleration directly in the coordinate sys-
what strange-looking equation tem { x } by computing derivatives d x /d and d2 x /d 2 .
Now, the 4-velocity will be computed correctly because, by
ds2 = g dx dx ,
virtue of the chain rule, f (x) = (f /x) x, and hence
where dx means an infinitesimal displacement between dx f dx f
two nearby points, and ds2 is not really a square of any ds = = u =u .
2 d x d x
since ds may also be negative. (One can regard the above
equation simply as a jargon notation, i.e. a meaningless but However, the 4-acceleration
would be found incorrectly if one
convenient and widely understood shorthand for the words uses the formula du /d :
the infinitesimal interval is equal to g dx dx .) To deter-
   
d d f d f f du
mine the proper time interval between two widely separated u
=
u = u
+
d d x d x x d
events x(0) and x(1) , we need to select a line in M leading  
d f
from x(0) and x(1) , say a curve x (s), where s is some param- = u
+a . (A.5)
d x
eter such that x (0) = x(0) and x (1) = x(1) , and integrate
along the curve, For a general coordinate transformation (not merely a linear
change of coordinates where f /x = const), we have
Z 1r
dx dx  
g (x (s)) ds. d f
0 ds ds 6= 0,
d x
Here we assume that curve x (s) is such that dx /ds is ev-
therefore a 6= du /d . So the formula d u /d cannot be used
erywhere timelike, so the expression under the square root is
in a general coordinate system { x } to compute the compo-
positive. Such curves are called timelike worldlines. The re-
nents of the 4-acceleration a . The reason for the problem is
sulting time interval will of course depend on the chosen
that the coordinate system { x } is curved and so the deriva-
curve x (s), but not on the choice of the parameter s along the
tive du /d reflects not only the change in the 4-velocity, but
curve. In general, it is not easy to determine whether there ex-
also the change in the directions of coordinate axes.
ists a timelike curve connecting two given points in the space-
A correct expression for a is given in Eq. (A.4). Neverthe-
time. It might happen that only a null or a spacelike curve
less, we would like to have a formula for a that can be used
connects two given events. (Two events are timelike sepa-
directly in the coordinate system { x }. The solution of the
rated if they can be connected by a timelike worldline.) With-
problem is evident from Eq. (A.5),
out a detailed analysis of the behavior of the metric g ev-
 
erywhere in M, it is not immediately clear whether two given du d f
events can causally influence each other. a
= u .
d d x

154
A.4 Covariant derivative

Here, u can be expressed through u using Eq. (A.4). There- Suppose that {x } were actually the same coordinate sys-

fore, the 4-acceleration a in a curved coordinate system can tem as {x }; then we would have 2 f /(x x ) = 0 and

be computed from d u /d and u , if additionally we know Eq. (A.6) would still define the correct derivative A /x .

the functions f (x ), = 0, 1, 2, 3, that relate the curved coor- Therefore, the covariant derivative reduces to the coordinate
dinates to the flat ones. Note that d/d is effectively a deriva- derivative /x in flat coordinates, and so it makes sense to
tive in the direction of the 4-velocity u and can be applied to use the covariant derivative always. A tensor expression in-
arbitrary functions of coordinates as volving covariant derivatives, such as
d a = u u ; ,
(...) = u (...) .
d x is valid in every coordinate system.
Since f are functions of coordinates (not of ), we may write A more concise way to represent the covariant deriva-
    tive (A.6) is to write
d f f
=u . A
d x x x A ; + A .
x   2
Therefore, we can compute the derivative of a vector field x f
A (x) in the direction u in any coordinate system: in flat co- .
f x x
ordinates {x } as
d A Once the quantities are computed for a given coordinate
A u
d x system { x }, it is easy to differentiate vectors directly in that
and in curved coordinates {
x } as coordinate system, without needing to convert tensors back
" # " to flat coordinates every time.
2   2 # Note that is an array of numbers that depends on the
A f A x f
u A =u
A
. coordinate system in a nontrivial way and, despite its appear-

x x x
x f x x
ance as an indexed quantity, is not a tensor of rank (1,2). The
The expression in brackets above is called the covariant transformation law for is given in Eq. (A.9) below.
derivative and denoted either by a semicolon or by the sym-
bol (pronounced nabla or del), A.4.2 Curved space and induced metric
A


2f In the previous section, we found how a vector field A (x)
A A ; A x . (A.6)
x f x x should be differentiated in an arbitrary, curved coordinate
system in flat space. The method was to differentiate A in
In this notation, the derivative of a vector A in the direction a flat coordinate system and then recalculate the components

is written as u
u A ; or u A . to the curved coordinates. The result was a covariant formula

By construction, the covariant derivative A ; is found by for the derivative, A . Let us now consider a more general
first computing the derivative A /x in flat coordinates case: a curved space.

(where we know the correct way to differentiate vectors) and A general manifold M can be visualized as a non-
n
then recalculating the components to the curved coordinates straight (curved) surface in flat Euclidean space R . A di-
{
x } by using the transformation law for tensors, rection within the manifold is represented by a vector tangent
  to the surface. Since tangent vectors are at the same time vec-
A f x tors in Rn , we can compute scalar products of such tangent
A ; =

.
x x f vectors by using the standard scalar product in Rn . Thus we
have automatically defined a metric on the surface. (A metric
The formula (A.6) gives these components A ; directly on a manifold is a quadratic form on tangent vectors.) This
through the components A ( x), without need to know the naturally defined metric on the surface is called the induced

components A in flat coordinates. Of course, the components metric on the surface with respect to the given embedding in
n n
A ; transform correctly (covariantly) under a change of co- R . The idea is that the known scalar product in R induces
ordinates. The origin of the name covariant derivative. the scalar product of tangent vectors.

Remark: I would like to emphasize that the covariant Example: A unit 2-sphere in three-dimensional Euclidean
derivative A is not a different kind of derivative. If the space with coordinates {x, y, z} is defined as the locus of
notion of the covariant derivative causes you difficulties, it points satisfying
might be instructive to focus attention on the case of flat space. x2 + y 2 + z 2 = 1.
In flat space, there is already a well-defined, familiar direc-
 The 2-sphere a curved two-dimensional manifold denoted
tional derivative of vectors, namely A /x , where x S 2 . A tangent vector to the sphere at a point {x, y, z} can
is a flat (Cartesian) coordinate system. However, we have be visualized as a three-dimensional vector with components
seen that the formula A /x only holds in flat coordinates, {v1 , v2 , v3 } such that
while in curved coordinates the correct expression for the di-
rectional derivative is the formula (A.6). Thus, the covari- xv1 + yv2 + zv3 = 0.
ant derivative (A.6) can be thought of as a covariant formula (This equation selects the tangent plane at point {x, y, z}
for the (already well-defined and familiar) directional deriva- on the sphere.) The scalar product of two tangent vectors
tive of a vector field in flat space. The covariant formula is {u1 , u2 , u3 } and {v1 , v2 , v3 } is the usual Euclidean product,
preferable because it holds in curved coordinates as well as in
Cartesian coordinates.  u v = u1 v1 + u2 v2 + u3 v3 .

155
A Elements of Special and General Relativity

Points on the sphere may be labeled by intrinsic coordinates, with respect to the embedding into the bulk. A vector field
e.g. by the spherical coordinates {, }. The point {, } has defined within the manifold M is visualized as a field of vec-
Euclidean coordinates {X, Y, Z}, where X, Y, Z are functions tors A (x) everywhere tangent to the surface. We can com-
of and , pute the derivative of a tangent vector A with respect to a
tangent direction u as u A /X , but this vector may have
X = cos cos , Y = cos sin , Z = sin . a nonzero component normal to the surface. If we simply dis-
card this component, i.e. if we project the vector u A /X
A tangent vector with three-dimensional components orthogonally onto the surface, we obtain a vector tangent to
{v1 , v2 , v3 } is visualized as a velocity of a point moving the surface. Let us denote this vector by u A (this nota-

within the sphere. If this point moves along a trajectory tion is justified since the projection of u A /X is linear in
( ) {( ), ( )}, where and are some functions, then u ). The tensor A is called the covariant derivative of the

the velocity vector is described by two intrinsic components vector field A . Thus we have obtained a derivative operation
{t1 , t2 },   defined on tangent vectors within the manifold M. Simi-
d d larly, the covariant derivative is defined on arbitrary tensors.
v = {t1 , t2 } = , .
d d The procedure we employed to define the covariant deriva-

It is straightforward to see that the three-dimensional compo- tive involves the orthogonal projection of A /X onto the
nents {v1 , v2 , v3 } of the same velocity vector v are found as tangent space of the hypersurface M. It may be unclear why
it is useful to apply such a projection rather than some other
X d X d operation. One can motivate this procedure by the following
v1 = + , etc.
d d considerations. Imagine that a massive body is constrained
to move without friction along a hypersurface M, while no
Thus the correspondence between the intrinsic components other forces influence its motion. Further, imagine that the
{t1 , t2 } and the three-dimensional components {v1 , v2 , v3 } is a motion of the body is observed by a flat observer who lives
linear transformation, entirely within the hypersurface and thus cannot see the ex-
X X Y Y Z Z tra dimensions. To the flat observer, such a body appears
v1 = t1 + t2 , v2 = t1 + t2 , v3 = t1 + t2 . to be unforced because the constraining force normal to the

surface is invisible. Therefore, the flat observer expects that
The scalar product of two tangent vectors u and v can be writ- the acceleration of the body, as seen from within the hyper-
ten explicitly through the intrinsic components u {s1 , s2 } surface, is equal to zero. Since the force acting on the body is
and v {t1 , t2 } as always normal to the hypersurface, indeed the tangential com-
ponents of the acceleration are always zero (but not the nor-
u v = u1 v1 + u2 v2 + u3 v3 mal component). Therefore, the orthogonal projection of the
   acceleration onto the tangent space is always zero. This con-
X X X X
= s1 + s2 s1 + t2 + ... dition is equivalent to saying that the covariant derivative of

the velocity vector field u in the direction of motion is equal

After a short calculation with the functions X, Y, Z given to zero, u u = 0. Therefore, the construction of the co-
above, we find variant derivative through the orthogonal projection indeed
 expresses the idea that the velocity of a freely moving body
u v = s1 t1 + sin2 s2 t2 . remains constant along the trajectory.
Note that the normal projection is constructed through the
Thus the induced metric on the sphere (in coordinates {, }) metric G and depends on the embedding of the hypersur-
is written as face M into the flat space. However, the derivative operation
ds2 = d2 + d2 sin2 . can be written solely in terms of intrinsic coordinates {x }
Note that there is no set of coordinates {x, y} that would within the manifold M and the metric g , without referring
make a sphere a flat manifold, i.e. a space with the Euclidean to any embedding into a larger space. The derivation is stan-
2
metric dx2 + dy 2 . (If that were possible, there would be a dard and somewhat lengthy, so we omit it. We present only
unique shortest path connecting opposite poles of the sphere, the resulting final formula for the covariant derivative,
which is clearly not the case.) 
u
u = + u , (A.7)
x
A.4.3 Covariant derivative 1
g (g, + g, g, ) . (A.8)
We now need to derive a formula for the directional deriva- 2
tive for vector fields in a curved space, when a flat coordinate The array of numbers is called the Christoffel symbol
system is not available. The formula must involve only intrin- corresponding to the given coordinate system and the met-
sic coordinates on the manifold and require no information ric g . As before, the numbers depend on the coordi-
about an embedding of M into a larger space. nate system in such a way that the sum u /x + u
It is a known theorem in differential geometry that any transforms correctly as a rank (1,1) tensor under an arbitrary
manifold M with a given metric g can be represented as a change of coordinates, even though the two terms u and
hypersurface embedded into a higher-dimensional flat space u , taken separately, do not transform correctly. The re-
(the higher-dimensional space is sometimes called the bulk in quired transformation law for is easy to derive. Let us
modern physics). Then the flat coordinates {X } and the flat
metric G are available in the bulk, and the metric g de- 2 A derivation of an explicit formula the covariant derivative is performed in
fined within the manifold M is equal to the induced metric Sec. 1.6.6 using coordinate-free calculations.

156
A.4 Covariant derivative

temporarily write with a tilde when the covariant deriva- For brevity, partial derivatives with respect to a coordinate
tive is computed in the coordinate system { x } (this is only are often denoted by an index with a preceding comma,
for clarity; normally we do not use such a notation). We as-
sume that u obeys the transformation law for rank (1,1) u
u, ,
tensors, x
  x x
u = u
, while covariant derivatives are denoted by an index with a
x
x
preceding semicolon, u u; .
and substitute All the physical laws can be formulated in a generally co-
variant way (i.e., valid in any coordinate system) if we re-
u u +
u
, place all coordinate derivatives by covariant derivatives
x
. Thus, the generally covariant analog of Eq. (A.1) is
u u + u
.
x mu u = f . (A.10)
The result is
Here u dx /d is the 4-velocity of the particle, and we re-
2 placed d/d by the covariant derivative u . So far, all we
x
=
x x x x x
+ .

(A.9)

x x x
x
x
x x have achieved is a rewriting of the known physical laws in
an arbitrary coordinate system; no new physical information
The above transformation law contains a term that is not pro- is introduced or derived. The second step towards General
portional to , so the numbers may all vanish in one co- Relativity is to postulate that the correct form of the physi-
ordinate system but become nonzero in another. Clearly, the cal equations (e.g., Newtons law or the Maxwell equations)
numbers cannot represent the components of any tensor. is obtained from the known laws in Special Relativity by
A quantity with the transformation law (A.9) is called a con- substituting an arbitrary (but nondegenerate and Lorentzian-
nection. signature) spacetime-dependent metric g (x) instead of the
flat Minkowski metric and by replacing derivatives by
covariant derivatives , where the Christoffel symbol
A.4.4 Properties of covariant derivative
is defined by Eq. (A.8). This step of course leads to certain
As defined above by Eq. (A.7), the covariant derivative oper- changes in the physical laws due to a different g and a non-
ator produces tensors of rank (1,1) out of vectors. Despite trivial ; these changes are interpreted as the influence of
an extra term in the formula (A.7), the derivative satisfies the gravitation. The result is a reformulation of all the laws of
usual properties, such as physics in an arbitrary curved spacetime. This is, of course,
a major assumption about the way gravitation affects the be-
(f A ) = f A + A f, havior of particles and fields: gravitation is not a special
force but the natural and inavoidable influence of the non-
where f is a scalar function and A is a vector field. This is trivial geometry on matter in a curved spacetime. Ultimately,
to be expected since can be thought of as merely the stan- this statement must be tested by experiments. The experimen-
dard derivative operator recalculated in a curved coordinate tal status of General Relativity is quite satisfactory at present.
system. Note that f f /x is an ordinary derivative, The widespread acceptance of GR is due to the experimental
but no harm is done by writing f instead of f . confirmation as well as to the simplicity of this theory com-
Since the derivative operation is a projection of the or- pared with other theories of gravitation.
dinary directional derivative /X in the embedding space,
it follows that satisfies the standard properties of a deriva-
tive, such as linearity, A.4.5 Choice of connection
We have shown that the transformation law (A.9) can be de-
(A,,,, ,,,, ,,,, ,,,,
.... + B.... ) = A.... + B.... , rived merely from the requirement that A should trans-
form as a tensor of rank (1,1). It follows that any connection
and the Leibnitz rule, for instance
that transforms according to Eq. (A.9) will give rise to a
  tensor
(a
b ) = a b

+ a
b

. A
+ A .
x
Using this property, a generalization of the formula (A.7) can
be easily found for covectors or arbitrary tensors. In particu- So one may consideralternative covariant derivatives of the
lar, we demand that form (A.7), where is not necessarily the Christoffel con-
nection. Since the physical influence of gravity is described
 
a b = ( a ) b + a b , by terms containing , a natural question is to decide which
(x y ) = (x y ) . connection is correct.
In GR, one chooses the Christoffel symbol (A.8) as the con-
From this we can derive an explicit formula for the covari- nection. This can be justified on the basis of experimental data
ant derivative for an arbitrary tensor, in terms of coordinate confirming GR, and also by considerations of mathematical
derivatives and the Christoffel symbol. For example, simplicity. In fact, the choice (A.8) is equivalent to the follow-
ing two conditions: (i) = , and (ii) g; = 0. Let us

a = a a , review various arguments supporting these assumptions.3

T = T +
T T . 3I wish to emphasize that these are physical assumptions, i.e. something

157
A Elements of Special and General Relativity

The condition (i) can be seen as a consequence of the equiv- On the other hand,
alence principle, which says that each event p admits a locally
inertial frame where Special Relativity holds in an infinitesi- (G u v ) = (g u v ) ,
mal neighborhood of p. In other words, there exists a coor-
and it follows that g = 0.
dinate system x in which the physical laws hold with ordi-
nary derivatives /x instead of . Hence, at the event p
we have (p) = 0. Recalculating for other reference A.5 Curvature
frames using Eq. (A.9), we find that (p) = (p) in ev-
ery reference frame. Since this argument holds for any event A.5.1 Parallel transport
p, we find that the equivalence principle entails the condition
= . Alternatively, we may demand that the covariant The concept of parallel transport of vectors on a curved man-
derivatives commute on functions, i.e. f; = f; , where f is ifold M can be easily visualized if we think of the manifold
any scalar function. It can be easily seen that this condition M as a surface embedded in flat space. Suppose a path is
also forces = . given within the surface, with a tangent vector v . An ar-
The condition (ii) is another fundamental assumption that bitrary tangent vector u can be transported along the path
can be motivated in many ways. For instance, g in the in such a way that the derivative along the path in the bulk
Minkowski spacetime is given by a constant matrix, g = space, v u /X , is everywhere normal to the surface. Since
, thus g = 0, while we expect that the same property (by definition) is /X projected onto the tangent plane,
(but involving the covariant derivative) should hold in curved the condition v u /X = 0 is equivalent to the condition
spacetimes. Alternatively, we may require that the operation v u = 0.
of lowering an index of a vector or a tensor should commute This motivates the following definition: A vector u is par-
with the covariant derivative: allelly transported along a path if v u = 0, where v is a
tangent vector to the path.
g A ; = (A ); , where A A g . The covariant derivative can then be given a different inter-
pretation using the parallel transport operation. Let us denote

This property is clearly equivalent to g; = 0. Alternatively, by T u the parallel transport of a vector u along a straight
a
let us consider the properties of a locally constant vector line segment a . Then we can say that the covariant deriva-
field. A vector field u is locally constant if its covariant tive measures the rate of change in the vector u compared
derivative vanishes at a point p, i.e. u; (p) = 0. (In a locally with the change due to the parallel transport:
inertial frame at p, this condition is identical to u (p) = 0.)
If the vector u is constant in this sense, it is natural to ex- u(x + a ) Ta u(x )
pect that the length of the vector u is also locally constant: a u = lim .

0
(g u u )|p = 0. However, this condition is equivalent to
g; u u = 0. Since the direction u and the point p are arbi- It is easy to see that the scalar product of vectors is constant
trary, we must require that g; = 0 everywhere. under a parallel transport:
It is a straightforward exercise to derive an explicit form of
v (g u w ) = 0 if v u = v w = 0.
from the conditions (i) and (ii), and we omit the deriva-
tion (which can be found in standard textbooks). The result is Note that in flat space we may introduce flat coordinates
the formula (A.8). Thus the intrinsic metric g uniquely de- where we have = , and thus a vector u is parallelly
fines the Christoffel symbol and the covariant derivative. In transported along any curve iff all the components u remain
General Relativity, one never uses any other connection than constant. If we use curved coordinates in flat space, the com-
the Christoffel symbol given above. ponents of a vector may change during a parallel transport
because the directions of the basis vectors change in space.
Remark: The constancy of the metric under the covariant
However, if we execute a parallel transport of a vector along
derivative, g = 0, is a direct consequence of the con-
a closed path, the components of the vector will return to their
struction of through a projection from an embedding of
initial values.
the manifold M as a hypersurface in a bulk space with co-
ordinates X . Since the bulk has a flat metric G , we have Statement: Show that the operator a (acting on an arbi-
G = 0, where /X is the derivative with respect trary vector field u ) commutes with parallel transport along
to the flat coodinates in the bulk. Suppose u and v are two a . In other words,
tangent vector fields; then by definition of the induced metric,  
g u v = G u v . Hence, a Ta u = Ta a u .

(G u v ) = G ( u ) v + G u ( v ) .
A.5.2 Riemann tensor

Since u and v are tangent, the normal component of u In flat space, the operation of parallel transport along a closed
will disappear from the scalar product G ( u ) v . Thus path does not change any vectors. However, in a curved man-
(G u v ) = g ( u ) v + g u ( v ) . ifold, this is no longer true; e.g., on a spherical surface, a par-
allel transport of a vector along a spherical triangle will gen-
to be ultimately tested by experiments. Theories similar to GR but with erally rotate the vector. The curvature of a manifold measures

6= or g; 6= 0 can be constructed as well. Although such the failure of parallel transport around a closed curve to pre-
theories are of course more complicated than GR, physicists do consider
serve vectors.
their physical consequences and test them experimentally. So far, no al-
ternative theory of gravitation has been shown to surpass GR in correct In general, a vector u parallelly transported along a closed
experimental predictions. path will undergo a linear transformation (a rotation and a

158
A.5 Curvature

dilation). We may describe this transformation by a matrix Here we explicitly indicated the point, x(0) + a, where the
T (), so that the new vector is expressed as u = T ()u . Christoffel symbol is evaluated. Hence, the parallel trans-
In flat space, parallel transport along a closed path will not port operator T (x(0) ; a) satisfies the equation
change any vectors (even in curved coordinates!) and thus
d
T () . In the general case, the dependence of T () on T (x(0) ; a) + a (x(0) + a)T (x(0) ; a) = 0, (A.12)
d
the path is complicated but can be simplified if we consider
infinitesimally small closed paths. A simple example of such a with the initial condition T |=0 = .
path is a parallelogram spanned by two vectors a and b with It is easy to find an approximate solution of Eq. (A.12) to
sides a and b . One can compute the change in the vector first order in ,
u after a parallel transport along the parallelogram (see the
T (x(0) ; a) = a (x(0) ) + O(2 ), (A.13)
calculation below). The result is
where may be computed at x(0) since the variation of

u = u + 2
R u a b , across the parallelogram is of order and we are disregarding
2 . The other transport operators (T (x(1) ; b), etc.) are found

where R is called the Riemann tensor (also called the cur- by the same method. Combining the four operators, we ob-
vature tensor), which is given by the formula tain

R = + (A.11) T = a + b a b (x(0) ) + O(2 )
= + O(2 ).
Note that the Christoffel symbol involves first derivatives of
the metric g , and thus the Riemann tensor involves second It is clear that the current precision is insufficient to describe
derivatives of the metric. the change in vectors due to parallel transport. Thus, we need
One also defines the Ricci tensor, R R
, and the cur- to compute second-order terms.
vature scalar R g R . These quantities are convenient for Now it is necessary to take into account the variation of
writing the equation for the gravitational field (the Einstein across the parallelogram. To this end, we expand in
equation). Eq. (A.12) as
(x(0) + a) = (x(0) ) + a (x(0) ). (A.14)
A.5.3 Expressing Riemann tensor through Since is always multiplied by , it is sufficient to retain terms
of order in Eq. (A.14). Let us also compress the notation by
In this section we give a detailed derivation of Eq. (A.11). introducing the matrix (a) a and omitting the indices

We will compute the matrix T that performs the parallel in matrix multiplications. Then Eq. (A.12) with the substitu-
transport of a vector u along a closed parallelogram with tion (A.14) becomes
vertices x(0) , x(1) x(0) + a , x(2) x(0) + (a + b ), and
d 
x(3) x(0) + b , where x(0) is an arbitrary point in space- T + (a) + a (a) x T + O(2 ) = 0, (A.15)
d (0)

time and a , b are arbitrary vectors. The matrix T describes
while the first-order solution (A.13) is written as
the change in vectors as a result of the parallel transport as
= T u .
u
T (x(0) ; a) = 1 (a) + O(2 ),
The calculations are performed in an arbitrary coordinate

system covering the entire parallelogram. For brevity, we where 1 is the identity matrix. For the present purposes,
omit the indices on coordinates of points and write simply it suffices to obtain a solution of Eq. (A.15) to second order in
x , x + a, etc., instead of x , x + a , etc. Let us de- . Therefore we consider the ansatz
(0) (0) (0) (0)
note by T (x; a) the parallel transport matrix between points T (x(0) ; a) = 1 (a)|x(0) + 2 U, (A.16)
x and x + a. Then the total transport matrix will be expressed
as the (matrix) product of the transport matrices computed for where the unknown matrix U U is to be found, while
the four individual segments: the first-order term is copied from Eq. (A.13). Substituting
Eq. (A.16) into Eq. (A.15), we find
T = T (x(3) ; b)T (x(2) ; a)T (x(1) ; b)T (x(0) ; a). 1 
U = (a)(a) a (a) x .
2 (0)
Since we are only interested in considering an infinitesimally
small parallelogram, the required value of will be small, so The second-order solution is then
we need to find T only up to a certain order in . In the 
2 

course of the calculation it will become clear that only terms T (x(0) ; a) =
1 + (a) (a)(a) a
(a) .
2
2 x(0)
up to and including are relevant.
We begin by considering the parallel transport along the Note that the second-order terms,
segment x(0) x(0) + a. The parallel-transported vector
T (x(0) ; a)u can be thought of as a function of , which we (b)(b) b (b),
temporarily denote by u (). By construction, the derivative may be evaluated at any point, say at x , because the varia-
(0)
a is equal to d/d, and therefore the vector u () satisfies tion of across the parallelogram is of order . Therefore, we
the differential equation may simplify the above expression to
du () 2 
0 = a u = + a u () (x(0) + a). T (x(0) ; a) = 1 (a)|x(0) + (a)(a) a (a) .
d 2

159
A Elements of Special and General Relativity

By the same method, we compute the matrix describing A.6.2 Covariant volume element
parallel transport from x(1) to x(2) ,
The determinant of the metric is used most frequently to ex-
2  press integration over the manifold in a covariant way, such
T (x(1) ; b) = 1 (b)|x(1) + (b)(b) b (b) ,
2 that functions can be integrated in arbitrary coordinate sys-
and all the other parallel transport matrices. tems.
Finally, we have to multiply the four transport matrices, Suppose we would like to integrate a scalar function f (x)
preserving the order of matrix multiplications and discarding over aR region of the spacetime manifold. The ordinary for-
any terms of order 3 or higher. Within first-order terms, we mula d4 x f (x) is good in flat (Cartesian) coordinates but un-
need to expand to first order in , e.g. satisfactory in curved coordinates. Namely, a different choice
of coordinates, { x }, will generally yield a different result,
(b)|x(1) = (b)|x(0) + a (b)|x(0) , Z Z   Z
x
d4 x x) = d4 x f (x) det
f ( 6= d4 x f (x),
so that x

(a)|x(2) (a)|x(0) = a + b (a)|x(0) + O(2 ), /x ] =
unless the Jacobian happens to be equal to 1, det [ x
 1, which is certainly a very special case. The problem is that
(b)|x(3) (b)|x(1) = a + b (b)|x(0) + O(2 ). R
we have a formula, d4 x f (x), that works only in flat coor-
Within second-order terms, we may evaluate all s at x(0) . dinates. The solution is to write all the integrals using the
After a straightforward calculation, we obtain covariant volume element,
p
T = T (x(3) ; b)T (x(2) ; a)T (x(1) ; b)T (x(0) ; a) d4 V d4 x g(x),
=1 (a)| (b)| + (a)| + (b)| instead of d4 x. Because of the transformation law (A.17), it is
x(0) x(1) x(2) x(3)
 clear that the covariant formula for the integral of f ,
+2 (b)(a) (a)(b) a (a) b (b)
 Z
=
1 + 2 (b)(a) (a)(b) + b (a) a (b) .
p
d4 x g(x)f (x),
Restoring the full index notation, we can write the result as
yields the same answer in every
p coordinate system, including
T = + 2 R a b , flat coordinate systems where g(x) 1 (if such coordinate
R + . systems exist).

The last line coincides with Eq. (A.11). This calculation shows
that the Riemann tensor R describes the effect of an in-
A.6.3 Derivative of the determinant
finitesimal parallel transport along a closed curve on vectors. It is often necessary to compute derivatives of the metric de-

terminant, for instance g. Such derivatives can be easily
expressed through g and g.
A.6 Covariant integration In order to derive this expression, we use a general formula
for the derivative of the determinant of a (nondegenerate) ma-
A.6.1 Determinant of the metric trix:
 
An important object in GR is the determinant of the metric, d 1 d
usually denoted by g, [det A(t)] = Tr A A(t) det A(t). (A.18)
dt dt
g det (g ) . This formula can be found quickly from the matrix identity
It follows that the determinant of the contravariant metric is det eX = eTr X ,
g 1 = det (g ). Note that the function g is a determinant
X
of a (0,2)-tensor g which not a scalar function but depends where X is an arbitrary matrix, after a substitution A e .
nontrivially on the coordinate system. For instance, given two Using Eq. (A.18), we find, in particular,
coordinate systems {x } and { x }, the corresponding compo-  
g = g g g,
nents g and g of the metric are related by
  g

g dx dx = g d
x d
x , g = g g . (A.19)
2
which can be rewritten in the matrix notation (temporarily
using capital letters to denote the matrices G g and A.6.4 Covariant divergence
S x /x ) as
Divergence of a vector field,
G = S T GS.
Since det AT = det A for any matrix A, we find div a a a ; ,

det g


x
2 is an important special case where the covariant formula for
= det . (A.17) the derivative is simpler than in the general form.
det g x
Using Eqs. (A.7)-(A.8), we find
It is clear that the metric determinant is a coordinate system-
dependent quantity. a = a + a ;
Note that a physical metric has the signature (+ ), so 1 1
= g (g, + g, g, ) = g g, .
its determinant is always negative. 2 2

160
A.7 Einsteins equation

Comparing the right-hand side in the last line with Eq. (A.19), appearing in the right-hand side. When solving Eq. (A.10),
we obtain the metric g (and thus the Christoffel symbol ) are con-
1 sidered known.
= g
g Einsteins equation describes the influence of matter on the
and hence metric g :
1
1 1   R Rg = 8GT ,
a = a + a g = ga . (A.20) 2
g g
where R is the Ricci tensor, R is the curvature scalar,
This simple formula is useful because it does not require com- G is Newtons constant, and T is the combined energy-
puting the full set of Christoffel symbols. momentum tensor of all matter.

A.6.5 Integration by parts


In field theory, one often needs to integrate by parts using the
Gauss theorem for volume integrals, for instance,
Z I
4
d x u = d3 A u , (A.21)
V

where V is a 4-dimensional region where a vector field u is


defined, is the boundary 3-surface of V , and d3 A is a di-
rected 3-area element of the hypersurface . However, the
identity (A.21) is valid only if we use the same fixed coordi-
nate system at both sides of the equation. We would like to
have a covariant formula such that both sides of the equa-
tion contain only covariant quantities that remain the same
in any coordinate system.
To derive such a formula, we guess (heuristically) that the
ordinary derivative u must be replaced by the covariant
derivative u , and the volume element d4 x by the covari-

ant volume element d4 x g. In flat space, these replace-
ments do not modify the original expressions. Let us now
verify that this replacement indeed yields the correct results.
The left-hand side of Eq. (A.21) is replaced by
Z Z
4

d x u d4 x g u .
V V

This expression stays the same in every coordinate system.


Using Eq. (A.20), we find
Z Z
 
d4 x g u = d4 x gu .
V V

The above equation is valid in any coordinate system; choos-


ing some coordinate system, we may use Eq. (A.21) and obtain
(still in the same fixed coordinate system)
Z I
 
d4 x gu = d3 A gu .
V

The expression on the right-hand side is interpreted as the in-



tegral of u over the covariant area element gd3 A . This
expression is the same in every coordinate system. Therefore,
the covariant Gauss theorem is written as
Z I

d4 x g u = d3 A gu .
V

A.7 Einsteins equation


We have seen the equation of motion (A.10) of a point mass
in a curved spacetime. The influence of gravity on a massive
body is described by the presence of the covariant derivative
in the equation of motion, rather than by a force of gravity

161
B How not to learn tensor calculus
This brief chapter is intended as a consolation to those stu- N N matrix ej is nonzero. Such a matrix is called nonde-
dents who have difficulty learning tensor calculus from some generate. Since the matrix ej is nondegenerate, any vector v
GR textbooks. Almost every explanation here (except for triv- can be uniquely decomposed as a linear combination of basis
ial statements) is flawed in one way or another, even though vectors:
all the equations are formally correct. Because of this, a be- v = v j ej .
ginning student might become thoroughly confused and frus-
This is why the N numbers v ( = 1, ..., N ) are called the
trated when trying to understand this material. However, the
components of the vector v . By convention, a superscript in-
fault is with the explanations and not with the student! Better
dex is used for contravariant vectors.
explanations are given elsewhere in this book.
A metric is a nondegenerate symmetric matrix g = g ,
det g 6= 0. The scalar product of vectors u and v is equal
to g u v and can be written for brevity as
B.1 Tensor algebra

g u v u v ,
Letters with Greek indices, such as A or B , denote arrays
of numbers, and indices run over 0, ..., N 1, where N is the where the quantities v are called the covariant components
dimension of space. For brevity, we shall alwaysP use the Ein- of the vector v . Covariant components transform under a
stein summation convention: we omit the sign but always change of basis as
imply summation when an index is repeated in an expression, v = S v ;
which must occur once as an upper index and once as a lower
that is, they transform in the same way as the basis vectors
index. For example, we write
do [Eq. (B.1)]. Such vectors v are called covariant vectors or
N X
X N covectors and are written with a subscript index . Then, in
A B C D A B C D . every expression there can be only one pair of repeated in-
=1 =1 dices, one subscript and one superscript, in agreement with
the Einstein summation convention. An expression violating
The Levi-Civita symbol is defined as 123...N = 1 and this rule, such as a b c , is written incorrectly, while a b c is
....... = ....... , i.e. ... is totally antisymmetric in correct.
all the indices. For example, if N = 3 then 123 = 231 = It is clear that the metric g can be used to transform con-
321 = 1, 132 = 213 = 312 = 1, and all the other compo- travariant components into covariant ones, v = g v . This
nents of are equal to zero. operation is called lowering the index. Likewise, the inverse
The Kronecker symbol is defined as = 1 if = metric g can be used to raise the index: v = g v . Since
and = 0 if 6= . This symbol represents an identity ma- raising an index after that index has been lowered should re-
trix of dimensions N N . Sometimes, the Kronecker symbol sult in the same vector, it follows that
is also written as .
It is easy to see that g g = .

1 ...N 1 ...N = N !, A tensor is a set of components with several indices, such


1 ...N 1 1 ...N 1
= (N 1)! . as A
, that transforms under a change of basis (B.1) as a suit-
able product of components of covariant and contravariant
The determinant det A of a square matrix A is the num- vectors. For example, a tensor A transforms as the prod-
ber defined by uct u v w , i.e. according to the formula
1  
det A = ... ... A1 1 ...AN N , A
= S S
1

S 1
A
.
N! 1 N N N
This tensor is said to have rank (2,1). Tensors of any rank
where the right-hand side contains N factors of A . It is well
(m, n) are defined in this way; contravariant and covariant
known that the determinant of the product of two matrices is
vectors are tensors of rank (1,0) and (0,1) respectively. It is
equal to the product of their determinants.
clear that only tensors of equal rank can be added; for exam-
An N -dimensional contravariant vector v is an array of
ple, the expression v + u does not transform as a vector and
quantities v 1 , ..., v N called components that transform un-
is therefore incorrect.
der a change of basis according to the formula
Note that u v is not the same tensor as u v , but of course

v = v S 1 , v v = v v and u v = v u .

where S is the matrix describing the change of basis,


B.2 Tensor calculus
ej ej = Sjk ek , (B.1)
So far we considered vectors and tensors in Euclidean coordi-
and S 1 is the inverse matrix. Here, a basis is a set of N con- nates, but now we need to consider arbitrary, curvilinear coor-
travariant vectors {e1 , ..., eN } such that the determinant of the dinates x . Note that x itself is not a vector any more because

163
B How not to learn tensor calculus

the coordinates are not rectangular. However, an infinitesimal This is, of course, to be expected for a scalar quantity a b .
displacement x is a vector. To verify this, we consider an ar- Since tensors are quantities that transform as products of
bitrary change of coordinates described by arbitrary functions vectors and covectors, the covariant derivative can be defined
(x ),
x on arbitrary tensors in a similar way. For each upper index
x x = x
(x ). there is a term with + , and for each lower index a term
Then the displacement x transforms as with . For example,


x
x =
x , A
; = A +
A + A A .
x x
which is the correct transformation law for a vector under a Then one can prove that the Leibnitz rule holds for arbitrary
change of basis (B.1) with the matrix S = x / x . There- tensors,

fore, x is an example of a contravariant vector field, that 
is, a set of components v (x ) depending on the coordinates A... B ... ; = A... ; B ... + A... B ...; .
x that transform under an arbitrary change of coordinates as
In General Relativity, one uses the Christoffel symbol of the
components of a contravariant vector,
form
x
v = v . 1
x = g (g, + g, g, ) . (B.4)
2
A covariant vector field transforms as components of a co- This expression can be derived from the property = .

variant vector. Analogously, a tensor field of rank (p, q) is To derive Eq. (B.4), let us first prove that g = 0. For any
;
defined as a set of components T 1 ...p 1 ...q that transform vector X , we have g X = X , hence

under a change of coordinates as

g A ; = (A ); (B.5)
1 ...p 1 x
x p x1 xq 1 ...p
T 1 ...q = ... ... T ... .
x1 xp x 1 x q 1 q
and therefore
An important operation of tensor calculus is the covariant  
derivative. First, observe that a partial derivative of a con- g A ; = g; A + g A ;
travariant vector field, v , v /x does not transform as g; A = 0.
a tensor of rank (1,1). To get a derivative that transforms cor-
rectly, one adds a suitable correction term and thus defines The required property g; = 0 follows since A is an arbi-
the covariant derivative trary vector. Then it is a matter of juggling the indices to
derive Eq. (B.4). Namely, we write
v
v ; +
v , (B.2)
x 0 = g; = g, g g ,
where the set of quantities is called the Christoffel sym-
then exchange indices , , to get
bol. It can be verified by a direct calculation that the covariant

derivative v ; defined by Eq. (B.2) transforms as a tensor of 0 = g, g g ,
rank (1,1) as long as the Christoffel symbol transforms as
0 = g, g g .
2
= x x x x x
+ .
x
(B.3) Adding the above two equations and subtracting the initial

x x x x
x x x
one, we obtain Eq. (B.4).
Therefore, we require that the transformation law (B.3) holds
The Riemann tensor R is defined as the covariant
for the Christoffel symbol. Note that is not a tensor, derivative of the Christoffel symbol as follows,
and a suitable change of coordinates {x } { x } can make


= 0 at any point. Such coordinates are called an inertial R = + .
coordinate system at that point. However, is a ten-
It is easy to check that this formula defines a tensor (despite
sor because the second term in Eq. (B.3) drops out when we
not being a tensor), because one can notice that
write the transformation law for . Since the tensor
vanishes in a locally inertial coordinate system, it
R u = u u , (B.6)

vanishes in all coordinate systems, hence is symmetric in

(, ). which shows that R is manifestly a tensor. In flat space,

As shown in Eq. (B.2), the covariant derivative is denoted and therefore R = 0 because = .
by a semicolon and an index. The covariant derivative of a
covariant vector is defined similarly by
B.3 Hints
v
v; = v . In this section, I give you some hints as to why the preceding
x
explanations are flawed.
Then it can be shown that the covariant derivative of the scalar The Levi-Civita symbol is not merely a collection of num-
product a b coincides with the ordinary derivative, i.e. bers 0, 1, and 1, arranged in a special way, but is inter-
preted as an antisymmetric tensor or a volume N -form. The
(a b ); = a; b + a b ; =
(a b ) . Kronecker symbol is the matrix representing an identity
x

164
B.3 Hints

transformation (in any basis), while the matrix represents The assumption g; = 0 is a separate physical assumption
(in an orthogonal basis) a bilinear form that may represent a that cannot be actually derived from general properties of vec-
scalar product in a Euclidean space. Determinants of matri- tors; the argument given above assumes g; = 0 tacitly in
ces are not simply some complicated combinations of matrix Eq. (B.5). The formula (B.4) indeed follows from =
elements. Determinants have a direct geometric meaning; for and g; = 0.
instance, the oriented volume of the image of the unit cube The Riemann tensor cannot be seen as a covariant deriva-
under a linear transformation T is equal to det T . tive of because is not a tensor, and in any case, the
Vectors are most clearly seen as geometric quantities ele- given formula does not represent a covariant derivative of a
ments of vector spaces, rather than collections of components third-rank tensor. The property (B.6) is ordinarily used as a

that mysteriously transform themselves under a change of definition of R .
basis. Bases are maximal sets of linearly independent vec-
tors, and the components of a vector are coefficients relative to
a chosen basis. It is easier to regard the transformation formu-
las for components as consequences of the geometric picture
rather than as definitions.
Covariant vectors are better seen as vectors from a different
vector space (the dual space), rather than as a different kind
of components of the same vector. There is no natural map
between contravariant and covariant vectors unless a metric
is given; and when several metrics are given, there are several
such maps. The idea that there exist covariant components
of vectors can be justified only if a metric is fixed once and for
all.
The inverse metric g is either defined as the inverse matrix
to g , or is derived through the dual basis (in the dual space).
A tensor is an element of a tensor space (the space of ten-
sor products and their linear combinations). As in the case of
vectors, the components of a tensor with respect to a basis can
be defined after the tensor space is defined. Then the compli-
cated transformation law for the components will be a natural
consequence of the construction, rather than a definition of a
tensor.
An infinitesimal displacement x is a heuristic represen-
tation of a tangent vector: one imagines that a point p moves
by infinitesimal amount along a curve which is a flow line
of a given tangent vector (see Sec. 1.2.5). Covariant tangent
vectors are elements of the dual tangent space.
A covariant derivative is not simply an old derivative with
a correction added to it, but rather the only type of direc-
tional derivative that can be meaningfully considered as a ge-
ometric object in a curved space. (The coordinate derivative
v /x v , does not exist as a geometric object since it
depends on the coordinate system {x }.) However, there ex-
ist infinitely many possible covariant derivatives since there
are infinitely many possible .
The Christoffel symbol cannot be set to zero by a

change of coordinates unless T = 0 (torsion-

freeness) already holds. If the torsion tensor T is nonzero,
locally inertial coordinate systems do not exist; it is a physical
assumption that they exist, rather than a mathematical prop-
erty.
Covariant derivative is defined on arbitrary tensors by the
requirements that the Leibnitz rule hold and that coincide
with on scalar functions. These properties cannot be derived
from the covariant transformation alone. The requirement of
correct transformation of components is not sufficient to de-
fine covariant derivatives of arbitrary tensors, e.g. A , be-
cause one could in principle choose a different for tensors
of each different rank. One needs extra information to de-
rive the fact that the same is used in covariant derivatives
of tensors of every rank. This extra information comes from
assuming the Leibnitz rule and the property f = f for
scalar functions f .

165
C Calculations and proofs
C.1 For Chapter 1 where we introduced a different variable for convenience.
Substituting this into Eq. (C.2), we find
Proof of Statement 1.2.2.1 on page 5: Suppose such a
smooth map exists and maps the north pole {0, 0, 1} into a f (p) f (p1 ) f (p2 ) + f (p0 )
(a (b f ))|p0 = lim .
point p0 R2 . Consider a very small circle around the north 0
pole. Since the chart provides a smooth one-to-one map, this 0
circle will be mapped into a small image circle around p0
in R2 . The circle around the north pole can be smoothly con- Similarly, using the analogous relation
tracted into a point in two essentially different ways: either
by contracting it to the north pole, or by shifting it around the f (p ) f (p2 )
(a f )|p2 = lim ,
sphere towards the south pole and contracting to the south 0
pole, without ever crossing the north pole. These deforma-
tions of the circle are, by assumption, smoothly mapped into we find
corresponding deformations of the image circle in R2 . Dur- f (p ) f (p2 ) f (p1 ) + f (p0 )
ing the second deformation, the image circle around p0 in R2 (b (a f ))|p0 = lim .
0
is supposedly contracted into a point without ever crossing
0
p0 . But such a deformation is obviously impossible within
R2 . This contradicts the assumption that a smooth one-to-one
It follows that
map S 2 R2 exists. 
f (p) f (p )
Proof of Statement 1.2.4.1 on page 8: An analytic function ([a, b] f )|p0 = lim ,
0
f (z) is a sum of its Taylor series,
0
(m)
X 1 m d f
f (z) = z .
m! dz m z=0 as required. 
m=0

Since derivations are linear by Eq. (1.2), it is sufficient to prove Calculation 1.2.11.1 on page 13: Using the family of curves
the statement for f (z) = z m , m = 0, 1, 2, ... By Eq. (1.4), we ( ; s) as a coordinate grid, we may introduce a local coordi-
have v 1 = 0. It is trivial to check the statement for f (z) = z. nate system {, s, t1 , ..., tn2 } such that and s are the first two
Now use induction in m, starting with m = 1, i.e. with f (z) = coordinates. Then v = and c = s , therefore the vectors
z. Once the statement is proved for f (z) = z m1 , we have c and v are the first two vectors of the coordinate basis cor-
 responding to the local coordinate system {, s, t1 , ..., tn2 }.
v g m1 = (m 1) g m2 v g.
Hence, these vectors commute. 
Now we consider f (z) = z m and use Eq. (1.3),
 Proof of Statement 1.2.11.2 on page 13: We may
v f (g) = v gg m1 = (v g) g m1 + gv (g m1 ) complete the vector v(p0 ) 6= 0 to an arbitrary basis
= mg m1 v g, {v(p0 ), c1 (p0 ), c2 (p0 ), ...} at the initial point p0 . Then the re-
m lation [v, cj ] = 0 is a first-order differential equation for cj
thus the statement is proved for f (z) = z . 
that has a unique solution cj (p) along one flow line of v start-
Proof of Statement 1.2.10.2 on page 12: It is not necessary
ing from the initial point p0 , at least in some neighborhood of
to introduce a local coordinate system around p0 . Introduce
p0 . To see this explicitly, consider the equation [c, v] = 0 in a
the points p1 and p2 as in Fig. 1.6. By definition of a tangent
local coordinate system,
vector and its flow lines, we have
f (p1 ) f (p0 ) X v X c
(a f )|p0 = lim , (C.1) c v = 0.
0 x x

because the point p1 is obtained by following the flow line of
a for an interval , while the right-hand side of Eq. (C.1) is Consider one flow line ( ) of v, such that v is the tan-
the definition of the directional derivative along the flow line. gent vector to the curve ( ). For points p on the curve , the
The relation (C.1) holds for every function f , so let us apply it second term above is equal to the derivative of the component
to the function b f instead of f , c along this flow line,
1
(a (b f ))|p0 = lim (b f (p1 ) b f (p0 )) . (C.2) X c d
0 v = c ( ),
x d
We now need to express the function b f similarly, p

f (p) f (p1 ) assuming that ( ) = p. So the equation [c, v] = 0 at point p


(b f )|p1 = lim ,
0 becomes
f (p2 ) f (p0 ) d X v
(b f )|p0 = lim , c = c ,
0
d
x

167
C Calculations and proofs

which is an ordinary differential equation for the unknown is equivalent to the relation between multivectors
components c ( ), to be solved with an initial condition
c (0 ). This differential equation has a unique solution (by a1 ... an = b1 ... bn .
assumption, the vector field v is smooth everywhere). Thus
we can in principle compute the vectors cj (p), where p is It is sufficient to consider the case when all the volumes in-
any point along one flow line of v. The set of vectors volved are nonzero. By part (a), the two multivectors are al-
{v(p), c1 (p), ..., cn1 (p)} is linearly independent at the initial ways proportional with some nonzero coefficient; denote that
point p0 and will remain linearly independent along the flow coefficient by . It remains to show that the volumes are re-
line at least in some neighborhood of p0 . We can prove this by lated by the same factor .
noting that linear independence of the vectors {v, c1 , ..., cn1 } To show this, we will transform the parallelepiped spanned
can be verified by computing the determinant of their compo- by {aj } into the parallelepiped spanned by {bj }. The trans-
nents in a local coordinate system. The vectors form a basis iff formation will be performed through a sequence of steps
the determinant is nonzero. By assumption, the determinant which preserve the relationship between the volume of the
is nonzero at p0 , and the determinant is a smooth function parallelepiped and the multivector a1 ... an . Each step will
along the flow line. Thus the determinant remains nonzero either multiply both the volume and the multivector by the
along the flow line at least in some neighborhood of p0 (per- same number, or leave them both unchanged. At the end, the
haps in a very small neighborhood, but that is all we need multivector a1 ... an will be transformed into b1 ... bn ;
presently). hence, the volumes are related by the factor , which will
So far we obtained a basis of connecting vectors along one prove the statement.
flow line of v. Since the flow lines of v fill at least a small patch The allowed steps of the transformation are of three kinds:
U M of the manifold M around the point p0 , the same con- (i) adding a multiple of aj to ai ,
struction gives a basis of connecting vector fields along every
other flow line within U, again perhaps only in a sufficiently ai ai + aj ,
small neighborhood of p0 . Thus we have obtained a basis of
where 6= 0 is some number; (ii) stretching a vector ai ,
connecting fields locally, in a neighborhood of p0 . 

Proof of Statement 1.4.4.1 on page 20: (a) Let {e1 , ..., en } be ai ai ,


n
a basis in R ; we will show that a1 ... an is either equal to
where 6= 0; (iii) exchanging two vectors, ai with aj . It is
zero, or proportional to e1 ... en with a nonzero coefficient.
clear that the multivector
It will follow that any two n-vectors are proportional.
First we show that a1 ... an = 0 if the set {a1 , ..., an } a1 ... an
is linearly dependent. This follows from the antisymmetry of
the exterior product: suppose remains unchanged under (i), is multiplied by under (ii),
n1
X and changes sign under (iii). Similarly, it is easy to see that
an = j aj , the volume of the parallelepiped spanned by {aj } remains
j=1 unchanged under (i) due to the argument of Fig. 1.9, applied
to the two-dimensional plane containing the vectors {ai , aj }.
where some j are nonzero. Since The volume is multiplied by under (ii) and changes sign
a1 ... an1 aj = 0 for j = 1, ..., n 1, under (iii).
By assumption, the initial set {aj } is linearly independent
we have (otherwise the volume is zero); thus, every vector {bk } can
n1
X be expressed through linear combinations of {aj }. These lin-
a1 ... an = a1 ... an1 j aj = 0. ear combinations can be built by a finite sequence of steps (i),
j=1 (ii), or (iii). In this way, the multivector a1 ... an can be
transformed into b1 ... bn . This concludes the proof. 
Now suppose that the set {aj } is linearly independent. Then
every basis vector ek can be expressed as a linear combination Proof of Statement 1.4.5.1 on page 20: After State-
of {aj } with some coefficients, ment 1.4.4.1, we only need to prove the relationship between
X the multivectors,
ek = Akj aj .
j Tv1 Tv2 ... Tvn = (v1 v2 ... vn ) det T.
Let us now substitute these linear combinations into the
Since any two n-vectors are proportional in an n-dimensional
nonzero multivector e1 ... en , and simplify the resulting
expression using the linearity of . The result will be a sum of space, it remains to prove that the proportionality factor det T
terms of the form aj1 ... ajn with some coefficients. Only is actually independent of the choice of the vectors {vj }. (It
the terms with all different jk will survive. Finally, we can will follow that det T coincides with the determinant defined
reorder the vectors {aj } at will (and if necessary change the through an orthogonal basis.)
sign of a1 ... an ), so eventually we will obtain simply Let us first give the proof for n = 2 and then generalize to
a1 ... an with a nonzero coefficient. Therefore, the mul- any dimension n. In two dimensions, we have
tivectors a1 ... an and e1 ... en are proportional.
(b) We need to show that the relation between n- Tv1 Tv2 = (v1 v2 ) det T. (C.3)
dimensional volumes,
The bivector v1 v2 is zero if v1 is parallel to v2 ; in this case,
Vol (a1 , ..., an ) = Vol (b1 , ..., bn ) , the statement is trivial, so let us consider the case v1 v2 6= 0.

168
C.1 For Chapter 1

Then the vectors {v1 , v2 } are a basis in the plane. We would We would like to transform the expression R(a, b, c, d) to
like to show that something containing R(a, b, c, d) plus extra terms. It is clear
that we need to express everywhere through . We may
Tu1 Tu2 = (u1 u2 ) det T (C.4) work as follows. The only way to express through is by
enclosing them in the metric g and using Eq. (1.45). The dif-
for any vectors u1 , u2 . Since {v1 , v2 } is a basis, we can decom- and is a transformation-
ference between the connections
pose
valued 1-form , which is expressed as
u1 = U11 v1 + U12 v2 , u2 = U21 v1 + U22 v2 .

(a)b a b a b,

Since T is a linear map, we also have a similar decomposition
g((a)b, c) = (a ) g(b, c) + (b ) g(a, c)
(c ) g(a, b).
Tu1 = U11 Tv1 + U12 Tv2 , Tu2 = U21 Tv1 + U22 Tv2 .
We would like to write more explicitly, without enclosing it
Then we compute
in g. To this end, we introduce an auxiliary (but fixed) vector
u1 u2 = (U11 v1 + U12 v2 ) (U21 v1 + U22 v2 ) field l g1 d. Then x g(x, l) for every vector x. Using
more concisely,
the field l, we can rewrite
= (U11 U22 U12 U21 ) v1 v2 ,
   
Tu1 Tu2 = U11 Tv1 + U12 Tv2 U21 Tv1 + U22 Tv2
(a)b = g(a, l)b + g(b, l)a g(a, b)l.
= (U11 U22 U12 U21 ) Tv1 Tv2 . Now we consider the first term in the new Riemann tensor,
Therefore, the relationship (C.4) simply follows from Eq. (C.3)   
b c = a + (a)
a
b + (b) c
through multiplication by the factor U11 U22 U12 U21 .
In n dimensions, it is sufficient to consider the case when
= a b c + a (b)c (b)c,
+ (a)
v1 ... vn 6= 0, i.e. the set {vj } is a basis. We need to show
that where we omitted terms containing first derivatives of b and
Tu1 ... Tun = (u1 ... un ) det T c, since by assumption these derivatives vanish. The deriva-

for any other n-vector u ... u . Since every u can be tive a can be computed as follows,
1 n j
expressed as a linear combination of {vj },

a (b)c = g(b, a l)c + g(c, a l)b g(b, c)a l,
X
uj = Ujk vk ,
again omitting first derivatives of b, c. Finally, we simplify
k
the last term,
we may substitute these linear combinations into u1 ... un
(b)c
(a)
= g(a, l)(b)c
+ g((b)c,
l)a g(a, (b)c)l
and also into Tu1 ... Tun . Since T is a linear map,
X = g(a, l) (g(b, l)c + g(c, l)b g(b, c)l)
Tuj = Ujk Tvk , + (g(b, l)g(c, l) + g(c, l)g(b, l) g(b, c)g(l, l)) a
k
(g(b, l)g(a, c) + g(c, l)g(a, b) g(b, c)g(a, l)) l
P
so the substitution of Tuj = k Ujk Tvk into Tu1 ... Tun = (2g(b, l)g(c, l) g(b, c)g(l, l)) a + g(a, l)g(b, l)c
yields terms + g(a, l)g(c, l)b (g(b, l)g(a, c) + g(c, l)g(a, b)) l.
Tv1 ...Tvn
Now we need to antisymmetrize the resulting expression for
with
P the same coefficients as the substitution of u j = a
b c in a, b. Note that
k Ujk vk into u1 ... un . Therefore the desired relation-
ship for {uj } follows.  g(b, a l) = a g(b, l) = a (b ) ,
Calculation 1.8.4.1 on page 47: Denote by and the Levi-
and also [a, b] = 0 by assumption, therefore g(b, a l) is a
Civita connections corresponding to the metrics g and g, and symmetric bilinear form in a, b. This bilinear form is called

analogously the Riemann tensors R(...) and R(...). It is conve- the Hessian of the function ; it is a tensor representing all
nient to consider the fully covariant Riemann tensor, the (covariant) second derivatives of . Let us denote this ten-
  sor by
b, c, d) = g [
R(a, a,
b ]c
[a,b] c, d . H (a, b) = H (b, a) g(a l, b);

Since the derivatives of the arbitrary vectors a, b, c, d will in the index notation this would be ; . Then
eventually cancel, we can simplify the calculations if we as-

a (b)c = H (a, b)c + H (a, c)b g(b, c)a l,
sume that all these derivatives vanish, so a b = 0, [a, b] = 0,
etc., at a point p where we are computing the Riemann tensor.
a b 6= 0, etc.) Since [a, b] = 0, we have so the antisymmetrization of a (b)c in a, b cancels H (a, b)
(But note that
and yields
 
b, c, d) = g
e2 R(a, bc
a a c, d ,
b
a (b)c b (a)c = H (a, c)b g(b, c)a l
R(a, b, c, d) = g (a b c b a c, d) , H (b, c)a + g(a, c)b l.

169
C Calculations and proofs

(b)c
The antisymmetrization of (a) gives (after some can- orthogonal to v is then easily lifted since R(v, x, ...) is linear
cellations) and antisymmetric in v, x and hence
(b)c
(a) (a)c
(b) = g(l, l) (g(a, c)b g(b, c)a) R(v, x, v, y) = R(v, x + v, v, y + v)
+ g(a, l) (g(b, c)l g(c, l)b) + g(b, l) (g(c, l)a g(a, c)l) . for any , . Thus we can deduce R(v, x, v, y) for arbitrary

Finally, putting the pieces together, we express the new Rie- x, y. Subsequently, we can regard R(v, x, v, y) for fixed x, y
mann tensor as follows,1 as a quadratic form

b, c, d) = R(a, b, c, d) Qx,y (v, v) R(v, x, v, y). (C.6)


e2 R(a,
+ H (a, c)g(b, d) H (a, d)g(b, c) A standard trick in linear algebra is to recover a symmetric
H (b, c)g(a, d) + H (b, d)g(a, c) bilinear form A(a, b) if the quadratic form A(v, v) is known:
+ g(l, l) (g(a, c)g(b, d) g(b, c)g(a, d)) A(a + b, a + b) A(a, a) A(b, b)
A(a, b) = .
+ (g(a, l)g(b, c) g(b, l)g(a, c)) g(d, l) 2
+ (g(b, l)g(a, d) g(a, l)g(b, d)) g(c, l). Therefore, we can try to recover a symmetric bilinear form
(C.5) Qx,y (a, b) from the quadratic form Qx,y (v, v) as follows,

Note (out of curiosity) that the last six terms of the expression 1
Qx,y (a, b) (R(a + b, x, a + b, y)
above can be rewritten more concisely as the determinant of a 2
certain 3 3 matrix, R(a, x, a, y) R(b, x, b, y)) . (C.7)
b, c, d) = R(a, b, c, d)
e2 R(a, To recover Qx,y (a, b) completely, it suffices to know the val-
+ H (a, c)g(b, d) H (a, d)g(b, c) ues of Qx,y (a, b) for arbitrary basis vectors a, b; in total, we
need 10 different values (for fixed x, y). However, we are
H (b, c)g(a, d) + H (b, d)g(a, c)
allowed to measure only Qx,y (v, v) for future-directed and
g(a, c) g(b, c) g(l, c) timelike v. We note that the sum of two such vectors, v1 + v2 ,

+ det g(a, d) g(b, d) g(l, d) . is again future-directed and timelike. So we may deduce
g(a, l) g(b, l) g(l, l) Qx,y (a, b) for any two future-directed, timelike a and b using
Eq. (C.7). We also note that a basis in the four-dimensional
Now we can compute the Ricci tensor, assuming that the space can be found as a linearly independent set of 4 future-
spacetime has N dimensions. We use the properties directed, timelike vectors. Such a basis will not be orthogonal,
Tr(a,b) g(a, b) = N, but one does not need an orthogonal basis to determine a bi-
linear form. Hence, for fixed x, y the bilinear form Qx,y (a, b)
Tr(a,b) H (a, b) Tr(a,b) g(a l, b) div l , can be determined by finitely many measurements. This pro-
Tr(a,b) g(a, x)F (b, c, ...) = F (x, c, ...). cess needs to be repeated for 10 different combinations of x, y.
We are thus able to deduce the values Qx,y (a, b) for arbitrary
The trace Tr is always performed with respect to the old metric vectors a, b, x, y if we measure R(v, x, v, y) for sufficiently
g, so the new Ricci tensor is e2 times the trace Tr of the new many future-directed timelike vectors v and spacelike vectors
Riemann tensor. In this way we derive the required expres- x, y (each time orthogonal to v).
sions for Ric and R. First we use Eq. (C.5) to obtain
So far we recovered the bilinear form Qx,y (a, b), which is
(a, c) = e 2 b, c, d) symmetric in a, b and also in x, y. Despite the suggestive
Ric Tr(b,d) R(a, Eq. (C.6), we must note that Qx,y (a, b) 6= R(a, x, b, y). Con-
= Ric (a, c) sider R(a, x, b, y) for fixed x, y as a bilinear form of a, b. This
+ (N 2) H (a, c) + g(a, c) bilinear form is not necessarily symmetric in a, b. In general,
+ g(l, l) (g(a, c)N g(a, c)) this bilinear form has a symmetric part,
+ (g(a, l)g(l, c) g(l, l)g(a, c)) 1
(R(a, x, b, y) + R(b, x, a, y)) ,
+ (g(a, l) g(a, l)N ) g(c, l) 2
= Ric (a, c) + (N 2) H (a, c) + g(a, c) and an antisymmetric part,
+ (N 2) [g(l, l)g(a, c) g(a, l)g(c, l)] . 1
(R(a, x, b, y) R(b, x, a, y)) .
This is the required formula for the new Ricci tensor; the new 2
Ricci scalar is obtained straightforwardly by taking the trace The quadratic form Qx,y (v, v) is defined through Eq. (C.6)
of the new Ricci tensor.  and knows only about the symmetric part of R(...) as shown
above. Therefore, we recover precisely the symmetric part
Proof of Statement 1.9.5.1 on page 51: We would like to lift of R(...) when we determine the bilinear form Q (a, b). In
x,y
the restrictions on the vectors v, x, y. For a fixed v, the func- other words, Q (a, b) is related to R(a, x, b, y) through sym-
x,y
tion R(v, x, v, y) is a symmetric bilinear form of x, y. This metrization in (a, b),
bilinear form can be determined for spacelike x, y by a finite
number of values (say, with x and y being vectors of a space- 1
Qx,y (a, b) = (R(a, x, b, y) + R(b, x, a, y)) . (C.8)
like basis). The requirement that vectors x, y be spacelike and 2
1 Note that the Landau-Lifshitz convention has the opposite sign of R
(but To recover the full Riemann tensor, it remains to extract the
Ludvigsen has the same sign). antisymmetric part. A property of R not yet used is the first

170
C.2 For Chapter 3

Bianchi identity (1.65), which we apply to the last term in The general solution, ( ) = coth H( 0 ), exponentially
Eq. (C.8) and obtain approaches = 1 at late times. In the Newtonian limit,

2Qx,y (a, b) = R(a, x, b, y) R(x, a, b, y) R(a, b, x, y) 1


1 (~v )2 ,
= 2R(a, x, y, b) + R(a, b, y, x). 2
the equation for ( ) becomes
We now need to express this somehow through Q. Note that
a suitable combination is d 2
~v ~v = H (~v ) .
d
2Qb,x(a, y) = R(a, b, y, x) + R(a, x, y, b).
It is easy to see that a geodesic is a straight line in the three-
Hence dimensional space. Thus the force of friction acting on the
particle is directed opposite to the velocity ~v . Then, in the
2Qx,y (a, b) = 3R(a, x, y, b) + 2Qb,x(a, y); framework of Newtons second law, the force vector can be
2 expressed as
R(a, x, b, y) = (Qx,y (a, b) Qb,x (a, y)) . d
3 F~ = m ~v = H~v .
d
Thus all the values of R(...) can be deduced from the given

experimental data. 
Proof of Statement 3.1.3.4 on page 74: We need to compute

C.2 For Chapter 3 d 


v g(u, v) = g(v z 1 k , v)
d
Details for Calculation 3.1.3.3 on page 74: Suppose that the = g(k, v)z 2 v z + z 1 g(v k, v)
vector field u is the 4-velocity of the stationary observers, so g(v k, v) g(u, v)v z
= .
1 z
u = t = k, z = eHt .
z Since z
p
g(k, k), we have
Denote by v the 4-velocity of the particle. Then
1 1
v z = v g(k, k) = g(v k, k).
g(v, v) = g(u, u) = 1. 2z z
So it remains to compute the bilinear form g(x k, y). If the
The relative 3-velocity ~v is related to the relativistic gamma-
vector field k is integrable then gk is an exact 1-form and the
factor by
first term the Koszul formula,
1
= q 1;  
1 (~v )2 1 1
g(x k, y) = gk + Lk g (x, y) ,
d
2 2
on the other hand, = g(u, v). Therefore, we need to compute
the change of the gamma-factor along the worldline of the cancels (here x, y are arbitrary vectors). For an integrable con-
particle. We first note that for arbitrary x, y the property formal Killing vector k, we therefore get

g(x k, y) + g(y k, x) = (Lk g) (x, y) = 2HeHt g(x, y) 1


g(x k, y) = (Lk g) (x, y) = g(x, y).
2
holds. Thus, assuming v v = 0, we have Using this property, we compute
v g(k, v) = g(v k, v) = HeHt g(v, v) = HeHt . d g(v, v) z 1 g(v, k)
=
However, we need to compute v g(u, v), while g(u, v) differs d z
 1
2
from g(k, v) by a function of t only. So we need to compute = 1 z .
the derivative

v t v t = g(v, t ) = g(v, u) . Proof of Statement 3.2.1.1 on page 77: (a) Consider a con-
gruence of null curves in a two-dimensional spacetime, and
Then we can write
denote by u the corresponding tangent vector field. Since
  g(u, u) = 0, we have
v = v g(u, v) = v eHt g(k, v)

= v eHt eHt + eHt HeHt g(u, u u) = 0.

= 1 2 H.
In two dimensions, there is only a one-dimensional subspace
Denote by the proper time parameter along the worldline of of vectors orthogonal to a null vector u, namely the space of
the particle. Then we obtain the differential equation vectors parallel to u itself. Thus u u is parallel to u itself, so
there exists a scalar function such that
d 
v = = 1 2 H. u u = u.
d

171
C Calculations and proofs

This means that the flow lines of u are geodesic (but perhaps Proof of Statement 6.1.3.2 on page 129: (a) We will use the
not affinely parameterized). A geodesic null vector field can definition of through an orthonormal basis {ea }. We com-
be found as u, where is a scalar function satisfying pute the N -form

u + = 0. X k1 k1 ... kn kn  
1 2 = ekn ...ek1 2 1 ekn ...ek1 ;
n!
Such a function always exists since it is specified by a differ- k1 ,...,kn
ential equation along the flow lines of u. Since u (u) = 0,
so the flow lines of u (which are the same curves as the flow the first task is to show that this expression is symmetric with
lines of u but perhaps differently parameterized) are geodesic respect to the exchange 1 2 . We use Eq. (6.6) to compute
curves. the Hodge dual of this N -form,
(b) Radial null geodesics in the flat metric g are lines of ei-
(1 2 ) = (det ab ) (1 2 ) (e1 , ..., eN ) .
ther constant u or constant v (and constant , ). Since the new
metric is As an intermediate step, we need to compute
2
1 (u v)  
g = e2 (du dv + dv du) e2 dS 2 , 1 ekn ...ek1 (e1 , ..., eN )
2 4
X (1)||
while dS 2 = 0 if , = const, it is clear that lines of constant u = 1 (e(1) , ..., e(n) )(ek1 , ..., ekn , e(n+1) , ..., e(N ) )
or v (and constant , ) will remain null curves also according
n! (N n)!
to the metric g. Since the metric is independent of , , lines (C.9)
of constant , will be geodesics if they are geodesics in the
two-dimensional spacetime with metric In the expression (C.9), the summation is performed over all
permutations of the set {1, ..., N }. We note that
1 2
e (du dv + dv du) (ek1 , ..., ekn , e(n+1) , ..., e(N ) ) = 0
2
and coordinates (u, v) at constant , . But in a two- unless the set {k1 , ..., kn , (n + 1), ..., (N )} is also a permuta-
dimensional spacetime any null curve is a null geodesic, ac- tion of the same kind. It follows that a nonzero contribution is
cording to part (a) of this statement. Thus, we find that the possible only if the set {k1 , ..., kn } is a permutation of the set
radial null curves are also null geodesics in the metric g.  {(1), ..., (n)}. The expression for (1 2 ) contains a sum
over the indices k1 , ..., kn as well as a sum over all permuta-
tions ,
C.3 For Chapter 6
X X k1 k1 ... kn kn
Proof of Statement 6.1.3.1 on page 128: Let {ea } be an or- (1 2 ) = (det ab ) 2 (ek1 , ..., ekn )
n!
thonormal basis and {a } the corresponding dual basis. Since k1 ,...,kn

the Hodge duality map 7 is linear in , it suffices to ||


(1)
consider of the form = 1 ... n , where j are some 1- 1 (e(1) , ..., e(n) )
n! (N n)!
forms. Moreover, it is sufficient to consider the case when all
(ek1 , ..., ekn , e(n+1) , ..., e(N ) ). (C.10)
j are basis 1-forms a . Further, it is sufficient to treat the case
when 1 = 1 , ..., n = n . Thus we only need to consider the For a given permutation , there are n! sets {k1 , ..., kn } that
n-form = 1 ... n . Using the definition = 1 ... N give a nonzero contribution because they are permutations of
and the fact that {(1), ..., (n)}. The contributions of these n! sets to Eq. (C.10)
 are all equal, hence the sum over {k1 , ..., kn } can be replaced
en ...e1 1 ... n = 1
by an extra factor n!, while the indices k1 , ..., kn may be rela-
is (up to permutations) the only nonzero expression of the beled as (1), ..., (n). So we find
type  ||
ea eb ...ec 1 ... n ,
X (1)
(1 2 ) = (det ab ) (1)(1) ... (n)(n)
n! (N n)!
we can compute explicitly:
2 (e(1) , ..., e(n) )1 (e(1) , ..., e(n) )(e(1) , ..., e(N ) ).
= (1 ... n ) = 11 ... nn (en ...e1 )
= 11 ... nn n+1 ... N ; The expression above is manifestly symmetric under the ex-
change 1 2 . This already proves the first part of the
= (n+1 ... N ) 11 ... nn statement (a), but it is useful to derive a simplified form of

= 11 ... N N eN ...en+1 = (det ab ) (1)(N n)n . the expression (1 2 ). We note that

The sign factor (1)


(N n)n
appears in the identity (1)|| (e(1) , ..., e(N ) ) = 1
 
eN ...en+1 = eN ...en+1 1 ... n n+1 ... N for any permutation , and hence

= (1)
(N n)n
1 ... n X (1)(1) ... (n)(n)
(1 2 ) = (det ab )

n! (N n)!
due to the necessity to carry (N n) vectors {e N , ..., en+1 }
through the initial group of n 1-forms 1 , ..., n .  2 (e(1) , ..., e(n) )1 (e(1) , ..., e(n) ).

172
C.3 For Chapter 6

Since the expression above is independent of the permutation Proof of Statement 6.1.6.1 on page 132: The formula for a bc
elements {(n + 1), ..., (N )}, we may sum over them, intro- can be guessed in the following way. One notes that a bc
ducing an extra factor (N n)!. Finally, we relabel the indices should be a linear combination of Aa bc , with some indices
(1), ..., (n) as k1 , ..., kn and obtain the formula permuted. Due to the antisymmetry of Aa bc , there are only
three possible permutations (namely with either a, b, or c as
(det ab ) X
the first index). Moreover, the antisymmetry of a bc in (a, b)
(1 2 ) = k1 k1 ... kn kn
n! means that the only possibility is
k1 ,...,kn

1 (ek1 , ..., ekn )2 (ek1 , ..., ekn ). (C.11)


a bc = (Aa bc Ab ac ) + Ac ab ,
Using this explicit formula, it is straightforward to show that
the basis n-forms a1 ...an comprise an orthonormal basis where , are unknown constants. A direct substitution then
with respect to the scalar product (1 2 ). yields = = 12 .
(b) We rewrite the formula (C.11) without an explicit choice It remains to show that this solution is unique. Suppose
of basis. We note that a summation over basis vectors {ek } a b . The difference
there exist two different solutions a b and
multiplied by kk is equivalent to taking the trace, e.g. Xa b a b a b satisfies the homogeneous equations
X X
Tr(a,b) X(a, b) = kk X(ek , ek ). Xa b b = 0, Xa b = Xb a .
k b

Hence, Eq. (C.11) becomes The 1-forms Xa b can be expanded in the dual frame basis {c }
(1 2 ) as X
Xa b = Xa bc c , Xa bc = Xb ac .
(det ab )
= Tr(a1 ,b1 )...(an,bn ) 1 (a1 , ..., an )2 (b1 , ..., bn ). c
n!
Then we can write
This is equivalent to Eq. (6.7).
X X
(c) Using the explicit formula (6.7), we find Xa b b = Xa bc b c = 0.
b b,c
1 2 = Vol Tr(a,b) 1 (a)2 (b).

The trace is computed as It follows that Xa bc = Xa cb , i.e. Xa bc is symmetric in b, c.


However, it is impossible that a nonzero quantity Xa bc is
Tr(a,b) 1 (a)2 (b) = Tr(a,b) g(v1 , a)g(v2 , b) = g(v1 , v2 ). symmetric in b, c and at the same time antisymmetric in a, b.
Namely,
(d) We use Eq. (6.7) and obtain
Vol Xa bc = Xb ac = Xb ca = Xc ba = Xc ab = Xa cb = Xa bc ,
(1 2 ) = Tr(a1 ,b1 )(a2 ,b2 ) (a1 , a2 ) (1 2 ) (b1 , b2 )
2! so Xa bc = 0 for any a, b, c. Therefore, the difference between
= Vol Tr(a1 ,b1 )(a2 ,b2 ) (a1 , a2 )1 (b1 )2 (b2 ) a b and a b equals zero, i.e. the solution a b is unique. 
= Vol Tr(a1 ,b1 )(a2 ,b2 ) (a1 , a2 )g(v1 , b1 )g(v2 , b2 )
= Vol (v1 , v2 ). Proof of Statement 6.1.6.4 on page 133: The overall shape
of Eq. (6.26) can be guessed up to the values of the numerical
 coefficients. First we verify that the given expression for B a b
solves the required equations, checking the equivalence of the
Statement 6.1.3.2 on page 129: Since is a 3-form, there two given formulas, and then we investigate the uniqueness
exists a 1-form such that of the solution.
P b
= .
Let us compute b B b a , where B b a is given by
Eq. (6.26). Using Statement 6.1.6.2 for n-forms, we find
Consider an arbitrary 1-form and the expression . We
will use Statement 6.1.3.2. On the one hand, 1X b
eb Aa = Aa ,
n
b
= g 1 (, ) Vol. !
1X b X
c
X
On the other hand, eb
ea Ac = ea c Ac
n c c
b
X
= = ( ) = Aa c ea Ac .
1 1
= ( g , g ) Vol = (v x ) Vol, c

where x g1 and v g1 . We obtain the relationship Then it is straightforward to see that Eq. (6.25) holds. The
equivalence with Eq. (6.27) follows by virtue of Eq. (6.24),
g 1 (, ) = (
g 1 , g1 ) which also holds for arbitrary n-forms Aa .
that holds for all 1-forms , or equivalently We can demonstrate the non-uniqueness of the solution
B b a . One may modify a given solution B b a by adding an
x = x v arbitrary (n 1)-form X b a as long as
X
that holds for all vectors x. It follows that = v , so a X b a = 0, X b a = X a b .
= (v ).  a

173
C Calculations and proofs

Unlike the case n = 2, there exist nontrivial X b a 6= 0 satisfy-


ing these conditions. Note that X b a is an (n 1)-form while
n 3, so we may write for example

X b a = b a h,

where h is an arbitrary (n 3)-form and b is a 1-form such


P
that b b b = 0. An example of such b is
X
b Fbc c , Fbc = Fcb ,
c

where Fbc is an arbitrary but symmetric array of coefficients.


One can also add several such terms X b a with different Fbc
and h (this is not equivalent to adding one such term). Thus
there is a considerable remaining freedom in selecting the
(n 1)-forms B b a . Note that this freedom does not exist if
n = 2. 

174
D Comments on literature
D.1 Comments on Ludvigsens General
Relativity
The book is M. Ludvigsen, General relativity: a geometric
aproach (Cambridge University Press, 1999). Many explana-
tions in that book are outstandingly clear, and I benefited
greatly by reading it. Nevertheless, there are some minor
gaffes:
1) On p. 91, eq. 9.27 is supposedly the same as eq. 9.20 when
written in full. However, these equations actually differ by
the choice of the permuted indices. The relation 9.27 can be
obtained from 9.20 only if one assumes the identity Rabcd =
Rcdab , which Ludvigsen actually never mentions in the book.
This well-known standard identity is a consequence of 9.19
and 9.20.
2) On p. 103, eq. 10.14 contains a a = R..., while
the preceding (unnumbered) equation on p.102 contains
e a = R.... A minus sign has materialized from
nowhere! The answer (10.17) is correct, and the extra minus
sign is actually needed to compensate for an error made ear-
lier. In the last paragraph on p. 101, Ludvigsen writes (in
3-dimensional notation) a = whereas in fact a =
in Newtonian gravity. (The acceleration points down, the po-
tential grows upwards.) So the correct calculation starts by
introducing aa = a and not a .
3) On p. 108, top line, Using the fact that la na = 1 and
a lb = b la , we have... actually, the same result follows
with merely the assumption that la is a null geodesic. It is not
necessary to assume that la is integrable.
4) On p. 109, top line, one cannot actually derive Eq.
(11.10) as claimed. By contracting the top equation with
la mb mc and using Eq. (11.8), which already assumes that l is
integrable, one gets

D = R(l, m, l, m) + 2la m
b (a mb ).

Now it is unclear how to show that the last term is equal to


2. There is a significant freedom in m since m is chosen
simply to be orthogonal to n, l and this is not sufficient
to fix m. In fact, the null tetrad can be changed by the
transformation m ei m. Then e2i , as shown at top
of page 110. If is a function of position then the equation
D = ... will be changed after the transformation! Thus this
equation really depends on the choice of the tetrad, and some
choices are better than others. The equation (11.10) is perhaps
obtained with a suitable choice of the tetrad, but this is not
discussed in the book.

Extra topics: Derivation of Schwarzschild uniquely from


symmetry (Ludvigsen?) Definition of mass in spheri-
cally symmetric spacetime (see Frolov-Kofman?) ADM
and Bondi mass? Positivity of energy theorem? Energy-
momentum pseudotensors?

175
E License for this text
E.1 Authors position on commercial as a printed book. We recommend this License principally for
works whose purpose is instruction or reference.
publishing
Thanks to modern technology, one can prepare an entire book E.2.0 Applicability and definitions
electronically on a personal computer, in ready-to-print form.
Sending an electronic book across the world takes at most a This License applies to any manual or other work, in any
few minutes and costs about as much as a cup of tea. The medium, that contains a notice placed by the copyright holder
author encourages everyone interested in reading the text to saying it can be distributed under the terms of this License.
download and/or print it, in whole or in part. The two- Such a notice grants a world-wide, royalty-free license, unlim-
column formatting of the text is designed to require the least ited in duration, to use that work under the conditions stated
possible amount of paper when printed. Everyone is also en- herein. The Document, below, refers to any such manual
titled to commission a print shop to produce bound copies of or work. Any member of the public is a licensee, and is ad-
the text, in which case the single-column formatting may be dressed as you. You accept the license if you copy, modify
preferred. The cost of one bound copy may be estimated as 10 or distribute the work in a way requiring permission under
to 20 US dollars. copyright law.
A commercial publisher may want to offer professionally A Modified Version of the Document means any work
printed and bound copies for sale. Since this book is dis- containing the Document or a portion of it, either copied ver-
tributed with complete source, it will be a matter of minutes batim, or with modifications and/or translated into another
to reformat the book according to the taste or constraints of language.
a particular publisher even without the authors assistance. A Secondary Section is a named appendix or a front-
The author welcomes commercial printing of the text, as long matter section of the Document that deals exclusively with
as the publisher adheres to the conditions of the license (the the relationship of the publishers or authors of the Document
GNU FDL). Since the FDL disallows granting exclusive dis- to the Documents overall subject (or to related matters) and
tribution rights, the author cannot sign a standard exclusive- contains nothing that could fall directly within that overall
rights contract with a publisher. However, the author will subject. (Thus, if the Document is in part a textbook of math-
consider signing any publishing contract that leaves intact the ematics, a Secondary Section may not explain any mathemat-
conditions of the FDL. ics.) The relationship could be a matter of historical connec-
tion with the subject or with related matters, or of legal, com-
mercial, philosophical, ethical or political position regarding
E.2 GNU Free Documentation License them.
The Invariant Sections are certain Secondary Sections
Version 1.2, November 2002 whose titles are designated, as being those of Invariant Sec-
Copyright (c) 2000,2001,2002 Free Software Foundation, tions, in the notice that says that the Document is released un-
Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA der this License. If a section does not fit the above definition
Everyone is permitted to copy and distribute verbatim of Secondary then it is not allowed to be designated as Invari-
copies of this license document, but changing it is not allowed. ant. The Document may contain zero Invariant Sections. If the
Document does not identify any Invariant Sections then there
are none.
Preamble
The Cover Texts are certain short passages of text that are
The purpose of this License is to make a manual, textbook, listed, as Front-Cover Texts or Back-Cover Texts, in the notice
or other functional and useful document free in the sense of that says that the Document is released under this License. A
freedom: to assure everyone the effective freedom to copy Front-Cover Text may be at most 5 words, and a Back-Cover
and redistribute it, with or without modifying it, either com- Text may be at most 25 words.
mercially or noncommercially. Secondarily, this License pre- A Transparent copy of the Document means a machine-
serves for the author and publisher a way to get credit for readable copy, represented in a format whose specification is
their work, while not being considered responsible for modi- available to the general public, that is suitable for revising the
fications made by others. document straightforwardly with generic text editors or (for
This License is a kind of copyleft, which means that images composed of pixels) generic paint programs or (for
derivative works of the document must themselves be free in drawings) some widely available drawing editor, and that is
the same sense. It complements the GNU General Public Li- suitable for input to text formatters or for automatic transla-
cense, which is a copyleft license designed for free software. tion to a variety of formats suitable for input to text format-
We have designed this License in order to use it for manuals ters. A copy made in an otherwise Transparent file format
for free software, because free software needs free documen- whose markup, or absence of markup, has been arranged to
tation: a free program should come with manuals providing thwart or discourage subsequent modification by readers is
the same freedoms that the software does. But this License is not Transparent. An image format is not Transparent if used
not limited to software manuals; it can be used for any textual for any substantial amount of text. A copy that is not Trans-
work, regardless of subject matter or whether it is published parent is called Opaque.

177
E License for this text

Examples of suitable formats for Transparent copies include serve the title of the Document and satisfy these conditions,
plain ASCII without markup, Texinfo input format, LATEX in- can be treated as verbatim copying in other respects.
put format, SGML or XML using a publicly available DTD, If the required texts for either cover are too voluminous to
and standard-conforming simple HTML, PostScript or PDF fit legibly, you should put the first ones listed (as many as
designed for human modification. Examples of transparent fit reasonably) on the actual cover, and continue the rest onto
image formats include PNG, XCF and JPG. Opaque formats adjacent pages.
include proprietary formats that can be read and edited only If you publish or distribute Opaque copies of the Document
by proprietary word processors, SGML or XML for which the numbering more than 100, you must either include a machine-
DTD and/or processing tools are not generally available, and readable Transparent copy along with each Opaque copy, or
the machine-generated HTML, PostScript or PDF produced state in or with each Opaque copy a computer-network loca-
by some word processors for output purposes only. tion from which the general network-using public has access
The Title Page means, for a printed book, the title page to download using public-standard network protocols a com-
itself, plus such following pages as are needed to hold, leg- plete Transparent copy of the Document, free of added ma-
ibly, the material this License requires to appear in the title terial. If you use the latter option, you must take reasonably
page. For works in formats which do not have any title page prudent steps, when you begin distribution of Opaque copies
as such, Title Page means the text near the most prominent in quantity, to ensure that this Transparent copy will remain
appearance of the works title, preceding the beginning of the thus accessible at the stated location until at least one year af-
body of the text. ter the last time you distribute an Opaque copy (directly or
A section Entitled XYZ means a named subunit of the through your agents or retailers) of that edition to the public.
Document whose title either is precisely XYZ or contains XYZ It is requested, but not required, that you contact the au-
in parentheses following text that translates XYZ in another thors of the Document well before redistributing any large
language. (Here XYZ stands for a specific section name men- number of copies, to give them a chance to provide you with
tioned below, such as Acknowledgements, Dedications, an updated version of the Document.
Endorsements, or History.) To Preserve the Title of
such a section when you modify the Document means that it
remains a section Entitled XYZ according to this definition. E.2.3 Modifications
The Document may include Warranty Disclaimers next to
the notice which states that this License applies to the Doc- You may copy and distribute a Modified Version of the Doc-
ument. These Warranty Disclaimers are considered to be in- ument under the conditions of sections E.2.1 and E.2.2 above,
cluded by reference in this License, but only as regards dis- provided that you release the Modified Version under pre-
claiming warranties: any other implication that these War- cisely this License, with the Modified Version filling the role
ranty Disclaimers may have is void and has no effect on the of the Document, thus licensing distribution and modification
meaning of this License. of the Modified Version to whoever possesses a copy of it. In
addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title
E.2.1 Verbatim copying distinct from that of the Document, and from those of previ-
You may copy and distribute the Document in any medium, ous versions (which should, if there were any, be listed in the
either commercially or noncommercially, provided that this History section of the Document). You may use the same title
License, the copyright notices, and the license notice saying as a previous version if the original publisher of that version
this License applies to the Document are reproduced in all gives permission.
copies, and that you add no other conditions whatsoever to B. List on the Title Page, as authors, one or more persons
those of this License. You may not use technical measures to or entities responsible for authorship of the modifications in
obstruct or control the reading or further copying of the copies the Modified Version, together with at least five of the princi-
you make or distribute. However, you may accept compensa- pal authors of the Document (all of its principal authors, if it
tion in exchange for copies. If you distribute a large enough has fewer than five), unless they release you from this require-
number of copies you must also follow the conditions in sec- ment.
tion E.2.2. C. State on the Title page the name of the publisher of the
You may also lend copies, under the same conditions stated Modified Version, as the publisher.
above, and you may publicly display copies. D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifica-
tions adjacent to the other copyright notices.
E.2.2 Copying in quantity F. Include, immediately after the copyright notices, a license
If you publish printed copies (or copies in media that com- notice giving the public permission to use the Modified Ver-
monly have printed covers) of the Document, numbering sion under the terms of this License, in the form shown in the
more than 100, and the Documents license notice requires Addendum below.
Cover Texts, you must enclose the copies in covers that carry, G. Preserve in that license notice the full lists of Invariant
clearly and legibly, all these Cover Texts: Front-Cover Texts on Sections and required Cover Texts given in the Documents
the front cover, and Back-Cover Texts on the back cover. Both license notice.
covers must also clearly and legibly identify you as the pub- H. Include an unaltered copy of this License.
lisher of these copies. The front cover must present the full I. Preserve the section Entitled History, Preserve its Ti-
title with all words of the title equally prominent and visible. tle, and add to it an item stating at least the title, year, new
You may add other material on the covers in addition. Copy- authors, and publisher of the Modified Version as given on
ing with changes limited to the covers, as long as they pre- the Title Page. If there is no section Entitled History in the

178
E.2 GNU Free Documentation License

Document, create one stating the title, year, authors, and pub- tions with the same name but different contents, make the ti-
lisher of the Document as given on its Title Page, then add an tle of each such section unique by adding at the end of it, in
item describing the Modified Version as stated in the previous parentheses, the name of the original author or publisher of
sentence. that section if known, or else a unique number. Make the same
J. Preserve the network location, if any, given in the Docu- adjustment to the section titles in the list of Invariant Sections
ment for public access to a Transparent copy of the Document, in the license notice of the combined work.
and likewise the network locations given in the Document for In the combination, you must combine any sections Enti-
previous versions it was based on. These may be placed in tled History in the various original documents, forming one
the History section. You may omit a network location for a section Entitled History; likewise combine any sections En-
work that was published at least four years before the Docu- titled Acknowledgements, and any sections Entitled Ded-
ment itself, or if the original publisher of the version it refers ications. You must delete all sections Entitled Endorse-
to gives permission. ments.
K. For any section Entitled Acknowledgements or Ded-
ications, Preserve the Title of the section, and preserve in the
Collections of documents
section all the substance and tone of each of the contributor
acknowledgements and/or dedications given therein. You may make a collection consisting of the Document and
L. Preserve all the Invariant Sections of the Document, un- other documents released under this License, and replace the
altered in their text and in their titles. Section numbers or the individual copies of this License in the various documents
equivalent are not considered part of the section titles. with a single copy that is included in the collection, provided
M. Delete any section Entitled Endorsements. Such a sec- that you follow the rules of this License for verbatim copying
tion may not be included in the Modified Version. of each of the documents in all other respects.
N. Do not retitle any existing section to be Entitled En- You may extract a single document from such a collection,
dorsements or to conflict in title with any Invariant Section. and distribute it individually under this License, provided
O. Preserve any Warranty Disclaimers. you insert a copy of this License into the extracted document,
If the Modified Version includes new front-matter sections and follow this License in all other respects regarding verba-
or appendices that qualify as Secondary Sections and contain tim copying of that document.
no material copied from the Document, you may at your op-
tion designate some or all of these sections as invariant. To
do this, add their titles to the list of Invariant Sections in the
Aggregation with independent works
Modified Versions license notice. These titles must be distinct A compilation of the Document or its derivatives with other
from any other section titles. separate and independent documents or works, in or on a vol-
You may add a section Entitled Endorsements, provided ume of a storage or distribution medium, is called an aggre-
it contains nothing but endorsements of your Modified Ver- gate if the copyright resulting from the compilation is not
sion by various partiesfor example, statements of peer re- used to limit the legal rights of the compilations users beyond
view or that the text has been approved by an organization as what the individual works permit. When the Document is in-
the authoritative definition of a standard. cluded an aggregate, this License does not apply to the other
You may add a passage of up to five words as a Front-Cover works in the aggregate which are not themselves derivative
Text, and a passage of up to 25 words as a Back-Cover Text, works of the Document.
to the end of the list of Cover Texts in the Modified Version. If the Cover Text requirement of section E.2.2 is applicable
Only one passage of Front-Cover Text and one of Back-Cover to these copies of the Document, then if the Document is less
Text may be added by (or through arrangements made by) than one half of the entire aggregate, the Documents Cover
any one entity. If the Document already includes a cover text Texts may be placed on covers that bracket the Document
for the same cover, previously added by you or by arrange- within the aggregate, or the electronic equivalent of covers
ment made by the same entity you are acting on behalf of, you if the Document is in electronic form. Otherwise they must
may not add another; but you may replace the old one, on ex- appear on printed covers that bracket the whole aggregate.
plicit permission from the previous publisher that added the
old one.
The author(s) and publisher(s) of the Document do not by Translation
this License give permission to use their names for publicity Translation is considered a kind of modification, so you may
for or to assert or imply endorsement of any Modified Version. distribute translations of the Document under the terms of
section E.2.3. Replacing Invariant Sections with translations
Combining documents requires special permission from their copyright holders, but
you may include translations of some or all Invariant Sections
You may combine the Document with other documents re- in addition to the original versions of these Invariant Sections.
leased under this License, under the terms defined in section You may include a translation of this License, and all the li-
4 above for modified versions, provided that you include in cense notices in the Document, and any Warrany Disclaimers,
the combination all of the Invariant Sections of all of the origi- provided that you also include the original English version
nal documents, unmodified, and list them all as Invariant Sec- of this License and the original versions of those notices and
tions of your combined work in its license notice, and that you disclaimers. In case of a disagreement between the transla-
preserve all their Warranty Disclaimers. tion and the original version of this License or a notice or dis-
The combined work need only contain one copy of this Li- claimer, the original version will prevail.
cense, and multiple identical Invariant Sections may be re- If a section in the Document is Entitled Acknowledge-
placed with a single copy. If there are multiple Invariant Sec- ments, Dedications, or History, the requirement (sec-

179
E License for this text

tion E.2.3) to Preserve its Title (section E.2.0) will typically re- Everyone is permitted to copy and distribute verbatim
quire changing the actual title. copies of this license document, but changing it is not al-
lowed.

Termination
You may not copy, modify, sublicense, or distribute the Doc-
ument except as expressly provided for under this License.
Any other attempt to copy, modify, sublicense or distribute
the Document is void, and will automatically terminate your
rights under this License. However, parties who have re-
ceived copies, or rights, from you under this License will not
have their licenses terminated so long as such parties remain
in full compliance.

Future revisions of this license


The Free Software Foundation may publish new, revised ver-
sions of the GNU Free Documentation License from time to
time. Such new versions will be similar in spirit to the present
version, but may differ in detail to address new problems or
concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version
number. If the Document specifies that a particular numbered
version of this License or any later version applies to it, you
have the option of following the terms and conditions either
of that specified version or of any later version that has been
published (not as a draft) by the Free Software Foundation.
If the Document does not specify a version number of this
License, you may choose any version ever published (not as a
draft) by the Free Software Foundation.

ADDENDUM: How to use this License for your


documents
To use this License in a document you have written, include
a copy of the License in the document and put the following
copyright and license notices just after the title page:
Copyright (c) <year> <your name>. Permission is granted
to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Founda-
tion; with no Invariant Sections, no Front-Cover Texts, and no
Back-Cover Texts. A copy of the license is included in the sec-
tion entitled GNU Free Documentation License.
If you have Invariant Sections, Front-Cover Texts and Back-
Cover Texts, replace the with...Texts. line with this:
with the Invariant Sections being <list their titles>, with the
Front-Cover Texts being <list>, and with the Back-Cover Texts
being <list>.
If you have Invariant Sections without Cover Texts, or some
other combination of the three, merge those two alternatives
to suit the situation.
If your document contains nontrivial examples of program
code, we recommend releasing these examples in parallel un-
der your choice of free software license, such as the GNU Gen-
eral Public License, to permit their use in free software.

Copyright
Copyright (c) 2000, 2001, 2002 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307, USA

180
Bibliography
[1] V. I. Arnold, Mathematical methods of classical mechanics [18] J. Lee and R. M. Wald. Local symmetries and constraints. J.
(Springer, NY, 1997). 15, 27 Math. Phys. 31 , 725 (1990). 113

[2] P. Bamberg, S. Sternberg. A course in mathematics for [19] M. Ludvigsen. General relativity: a geometric approach
students of physics (Cambridge, 1990). 22 (Cambridge, 1999). v, 1

[3] A. Borde, A. Vilenkin, and A. H. Guth. Inflationary space- [20] P. W. Michor. Topics in differential geometry (lec-
times are not past-complete. Phys. Rev. Lett. 90, 151301 ture notes of a course given in Vienna, online at
(2003); online preprint gr-qc/0110012 (2001). 90 www.mat.univie.ac.at/~michor/dgbook.pdf, draft dated
April 2007). 133
[4] R. Bott and J. Milnor. On the parallelizability of the spheres.
Bull. Amer. Math. Soc. 64, 87 (1958). 138 [21] C. Misner, K. Thorne, and J. Wheeler. Gravitation (Free-
man, 1973). v, 44, 89, 125
[5] R. Bousso. The Holographic Principle. Rev. Mod. Phys. 75,
825 (2002); online preprint hep-th/0203101 (2002). 99 [22] B. ONeill. Semi-Riemannian geometry (Academic Press,
1983). 1
[6] T. Eguchi, P. B. Gilkey, and E. J. Hanson. Gravitation,
gauge theories and differential geometry. Phys. Rep. 66, 213 [23] R. Penrose. Asymptotic properties of fields and spacetimes.
(1980). Phys. Rev. Lett. 10, 66 (1963). 84

[7] J. Frauendiener. Conformal infinity. Living Rev. Rel. 7, [24] R. Penrose. Gravitational collapse and space-time singulari-
1 (2004). Online article: cited Nov. 2005, ties. Phys. Rev. Lett. 14, 57 (1965). 97
www.livingreviews.org/lrr-2004-1 . 84
[25] R. Penrose. Structure of spacetime. In: Battelle Rencontres,
[8] R. P. Geroch. Singularities, in: Relativity, edited by M. ed. by C. M. DeWitt and J. A. Wheeler (Benjamin, NY,
Carmeli, S. I. Fickler, and L. Witten (Plenum Press, NY, 1968), p. 121. 39
1970), p. 259. 89
[26] R. Penrose. Techniques of differential topology in relativity
[9] R. P. Geroch. Asymptotic structure of spacetime. In: (SIAM, Philadelphia, 1972). 89
Proc. Symp. Cincinnati, Ohio, June 14-18, 1976, ed. by F.
P. Esposito and L. Witten (Plenum Press, NY, 1977), p. 1. [27] R. Penrose and W. Rindler. Spinors and space-time (Cam-
78, 84 bridge, 1988). 141, 147, 149

[10] . Grn and S. Hervik. Einsteins theory of relativity (book [28] E. Poisson. A relativists toolkit (Cambridge, 2004). v, 111
draft, online at www.fys.uio.no/~sigbjorh/GRbook.html, [29] B. F. Schutz. A first course in general relativity (Cambridge,
dated December 2004). v 1985). v, 151
[11] S. Gudmundsson. An introduction to Rie- [30] N. Steenrod. Topology of fibre bundles (Princeton, 1951).
mannian geometry (book draft, online at 138
www.matematik.lu.se/matematiklu/personal/sigma/,
dated 2006). 1 [31] S. Sternberg. Semi-riemannian geometry and GR (book
draft, online at www.math.harvard.edu/~shlomo, dated
[12] S. W. Hawking and G. F. R. Ellis. The large-scale structure September 2003).
of space-time (Cambridge, 1973). v, 89, 93, 97
[32] J. Stewart. Advanced general relativity (Cambridge, 1991).
[13] A. Hebecker. Allgemeine Relativitt (lecture notes v, 141, 147
for the University of Heidelberg, online at
res.kaelteschaden.de/ART.pdf, dated July 2007). 127 [33] N. Straumann. General relativity and relativistic astro-
physics (Springer, 1984). v, 15, 22, 45, 125
[14] R. Jackiw. Constrained quantization without tears. Online
preprint arxiv:hep-th/9306075 (1993). 113 [34] A. Trautman. Conservation laws in general relativity. In:
Gravitation: an introduction to current research, ed. by L.
[15] A. Kempf. General Relativity for cosmology Witten (Wiley, 1962), p. 169. 101
(lecture notes for AMATH875, online at
www.math.uwaterloo.ca/~akempf/amath875.shtml, dated [35] A. Vilenkin. Interpretation of the wave function of the uni-
2005). 137 verse. Phys. Rev. D 39, 1116 (1989). 120, 121

[16] S. Lang. Introduction to differentiable manifolds [36] R. M. Wald. General relativity (University of Chicago,
(Springer, 2002). 22 1984). v, 74, 78, 85, 89, 111, 112, 125, 147

[17] J. M. Lee. Introduction to smooth manifolds (Springer, 2003). [37] H. Whitney. Differentiable manifolds. Ann. of Math. 37, 645
1 (1936). 5

181
Bibliography

[38] S. Winitzki. Drawing conformal diagrams for a fractal land-


scape. Phys. Rev. D 71, 123523 (2005); online preprint
arxiv.org/abs/gr-qc/0503061 (2005). 78

182
Index

f (R) gravity, 105 conformal transformation, 35, 47, 76


n-form, 11, 19 of Ricci tensor, 47
parallel to 1-form, 58 of Riemann tensor, 47
n-vectors, 20 conformal weight, 86
1-form, 4, 10 congruence, 9
closed, 57 conjugate point, 92, 94
exact, 57 conjugate quaternion, 141
2-form, 16 connectig vector, 4
2-sphere, 3, 5 connecting vector, 12
connection, 31, 157
abstract index notation, 39 connection 1-form, 52, 139
action principle, 50, 101 connection 1-forms, 131
active transformation, 144 conservation law, 109
adjoint transformation, 42 constraint, 112
affine connection, 31 constraint equation, 126
affine parameter, 48 contravariant gradient, 30, 57
affine tangent vector, 48 coordinate basis, 7
affine transformation, 48 coordinate singularity, 5
anti-linear function, 145 cosmetic factors, 20
asymptotic predictability, 98 cosmic censorship, 98
asymptotically flat spacetime, 71, 84 cosmological constant, 66
azimuthally symmetric spacetime, 36, 76 cosmological inflation, 29
background field, 112 cosmology, 121
Bianchi identity cotangent space, 4, 10
first, 44 covariant, 1
second, 45 covariant derivative, 31, 155, 156
bilinear form, 16 covariant volume element, 101
bivector, 20 covector, 4, 10
boldface, v covering, 143
Bondi coordinate system, 85 curvature, 44
boost, 143 curvature 2-forms, 135
bulk, 156 curve on a manifold, 3

canonical energy-momentum tensor, 110 DAlembertian, 47, 72, 137


canonical momentum, 111 Darboux theorem, 22
Cartan homotopy formula, 22 dark energy, 66
Cartan structure equation, 131, 135 de Sitter spacetime, 29, 73, 75
Cauchy horizon, 96 decomposition of identity, 40, 63
Cauchy surface, 81 derivation, 9
causal structure, 152 diffeomorphism, 3
caustic, 66 differential forms, 19
chain rule, 8 Dirac spinor, 138, 150
Christoffel symbol, 156 Dirac spinors, 141
tensor or non-tensor, 53 directional derivative, 7
Christoffel tensor, 32 distorsion tensor, 36
combing a sphere, 10, 138 distortion tensor, 62, 115
commutator of vectors, 4 divergence of vector fields, 37
comoving region, 62 domain of dependence, 96
comoving volume, 37 dual basis, 29
components, 1 dual space, 10
conformal factor, 47 dual tetrad, 125
conformal invariance, 62
of null geodesics, 49 Einstein equation, 46, 102, 106
conformal Killing vector, 73 Einstein frame, 106

183
Index

Einstein-Cartan theory, 46 Killing vector, 36, 46, 58, 71


Einstein-Hilbert action, 102 Koszul formula, 35
energy conditions, 65
energy-momentum tensor Lagrange multipliers, 105
covariant conservation, 46 lapse function, 119
Euclidean metric, 27 Levi-Civita connection, 7, 34
Euler-Lagrange equation, 101 Koszul formula, 35
event horizon, 98 variation of, 104
exponentiation map, 49 Levi-Civita symbol, 30
exterior differential, 21, 130 Levi-Civita tensor, 30
exterior product, 19 Lie derivative, 13
of vectors, 19 interpretation, 14
exterior product of vectors, 56 Lie group, 109, 143
extrinsic curvature, 115, 116 SL(2, C), 145
SU (2), 143
Faddeev-Jackiw formalism, 113 Lie-propagated vector, 47
flag bivector, 147 lightcone, 60
flag of a spinor, 147 lightcone coordinates, 76
focal point, 66 lightrays, 48
focusing theorem, 66, 67 line integral, 17
frame field, 125 local coordinates, 3, 4
free field, 102 local Lorentz transformation, 126
Frobenius theorem, 57 local variation, 108
Lorentz group, 143
gauge symmetry, 110 spinor representation, 145
Gauss-Codazzi equation, 115 Lorentz transformation, 151
geodesic curve, 48
geodesic deviation, 51 manifold, 4
geodesic equation, 48 definition, 3
geodesic vector field, 48 diffeomorphic, 6
geodesically complete spacetime, 89 globally orientable, 30
GNU Free Documentation License, v pseudo-Riemannian, 29
gradient, 10, 30 Riemannian, 29
gravitational potential, 72, 74 smooth, 3
for Schwarzschild spacetime, 73 metric, 28
Euclidean, 27
Hamilton equations, 112
nondenegeracy of, 28
Hamiltonian action principle, 112
metric connection, 34
Hamiltonian constraint, 119
metric energy-momentum tensor, 110
Hessian, 47, 137, 169
metricity, 33
Hodge duality, 38, 127
minimal coupling
Hodge star, 75, 149
to gravity, 102
hole argument, 111
minisuperspace, 121
holonomic basis, 126
Minkowski metric, 151, 152
Hubble law, 73
Minkowski spacetime, 151
Hubble rate, 90
multivectors, 20
iff, 1 mute vectors, 40, 42
index notation, 39
index-free approach, 38 Newtonian limit of GR, 71
index-free notation, 1 Noether current, 109
converting into, 39 nonholonomic basis, 126
induced connection, 52 nonlinear gravity, 105
induced metric, 52, 155 null curve, 50
Infeld-van der Waerden map, 145 null function, 61
inflation, 90 null generator, 61
internal symmetry, 110 null surface, 60
intrinsic description, 5 null tetrad, 67
irrelevant statement, v from spin basis, 146
null vector, 29, 55
Jacobi field, 92
Jacobi identity, 45 operator ordering, 120
Jordan frame, 106 oriented n-volume, 20
oriented area, 16
Killing equation, 36 orthogonal complement, 55

184
Index

orthonormal frame, 29 tetrad components of Maxwell tensor, 87


tidal effect, 51
parallel transport, 47, 158 time foliation, 112
partial inverse metric, 56 timelike surface, 60, 114
partial metric, 55 torsion, 46
passive transformation, 144 torsion tensor, 33
Pauli matrices, 142 torsion-free, 33
peeling property, 87 trace, 40
perfect fluid, 65, 112 index-free computation, 41
Poincar lemma, 27 transverse tensor, 59, 63
Poisson equation, 72 twist, 63
principal null direction, 147
projector, 52, 55 unitary matrix, 3
for null directions, 55
self-adjointness, 52 vector bundle, 138
self-adjointness of, 55 fiber, 138
proper length, 49 section of, 138
proper rotation, 141 trivial, 138
pseudo-exercises, v trivialization, 138
vector field, 9
quantum cosmology, 121 acceleration of, 58
quaternion, 141 divergence of, 56
with complex coefficients, 144 flow lines of, 9
hypersurface orthogonal, 57, 58
radiation gauge, 113 integrable, 57
rank of 2-form, 23 rotation of, 57, 63
Raychaudhuri equation, 63 shear of, 64, 65
in null tetrad formalism, 68 vielbein, 125
redshift factor, 73 vierbein, 29, 125
Ricci scalar, 45 volume 4-form, 31
Ricci tensor, 45 volume element, 31, 37
Riemann tensor, 44 vorticity, 63

scalar shear, 67 wave function of the universe, 120


scalar-tensor gravity, 105, 106 wedge product, 19
scale factor, 121 Weyl spinors, 141
Schwarzschild spacetime, 29, 60, 72, 127 Wheeler-DeWitt equation, 120
gravitational potential, 73 world tensor, 148
shift vector, 119
smooth function, 5
smooth manifold, 3
space of constant curvature, 53
spherical coordinates, 5
spin basis, 146
spin connection, 131
spinor, 141, 143
spinor bundle, 148
spinor field, 148
spinor space, 143
spinorial tensor, 146
star-shaped neighborhood, 27
static spacetime, 58, 71
stationary observer, 71
stationary spacetime, 58, 71
strongly causal spacetime, 96
sunchronous gauge, 117
superspace, 120
surface integral, 17

tangent bundle, 4, 10, 138


tangent space, 4, 6
tangent vector, 6
tensor field, 11
tetrad, 29, 125, 148

185

You might also like