Professional Documents
Culture Documents
Alexander Altland
ii
Contents
1 Electrodynamics 1
1.1 Magnetic field energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Electromagnetic gauge field . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Covariant formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Fourier transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Fourier transform (finite space) . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Fourier transform (infinte space) . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Fourier transform and solution of linear differential equations . . . . . 12
1.3.4 Algebraic interpretation of Fourier transform . . . . . . . . . . . . . . 14
1.4 Electromagnetic waves in vacuum . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Solution of the homogeneous wave equations . . . . . . . . . . . . . . 20
1.4.2 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Green function of electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Physical meaning of the Green function . . . . . . . . . . . . . . . . . 26
1.5.2 Electromagnetic gauge field and Green functions . . . . . . . . . . . . 27
1.6 Field energy and momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Field energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Field momentum ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.3 Energy and momentum of plane electromagnetic waves . . . . . . . . 32
1.7 Electromagnetic radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 Macroscopic electrodynamics 37
2.1 Macroscopic Maxwell equations . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.1 Averaged sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Applications of macroscopic electrodynamics . . . . . . . . . . . . . . . . . . 47
2.2.1 Electrostatics in the presence of matter * . . . . . . . . . . . . . . . . 47
2.2.2 Magnetostatics in the presence of matter . . . . . . . . . . . . . . . . 50
2.3 Wave propagation in media . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.1 Model dielectric function . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.2 Plane waves in matter . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.3 Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.4 Electric conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
iii
iv CONTENTS
3 Relativistic invariance 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Galilei invariance and its limitations . . . . . . . . . . . . . . . . . . . 61
3.1.2 Einsteins postulates . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.3 First consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 The mathematics of special relativity I: Lorentz group . . . . . . . . . . . . . 66
3.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Aspects of relativistic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 Proper time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.2 Relativistic mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Mathematics of special relativity II: Co– and Contravariance . . . . . . . . . . 76
3.4.1 Covariant and Contravariant vectors . . . . . . . . . . . . . . . . . . . 76
3.4.2 The relativistic covariance of electrodynamics . . . . . . . . . . . . . . 78
4 Lagrangian mechanics 81
4.1 Variational principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 Euler–Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.3 Coordinate invariance of Euler-Lagrange equations . . . . . . . . . . . 86
4.2 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.2 Hamilton’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.3 Lagrange mechanics and symmetries . . . . . . . . . . . . . . . . . . 92
4.2.4 Noether theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Hamiltonian mechanics 99
5.1 Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.1 Legendre transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.2 Hamiltonian function . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.1 Phase space and structure of the Hamilton equations . . . . . . . . . . 106
5.2.2 Hamiltonian flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.3 Phase space portraits . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2.4 Poisson brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.5 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.1 When is a coordinate transformation ”canonical”? . . . . . . . . . . . 113
5.3.2 Canonical transformations: why? . . . . . . . . . . . . . . . . . . . . 113
Chapter 1
Electrodynamics
1
2 CHAPTER 1. ELECTRODYNAMICS
Although derived for the specific case of a current loop, Eq.(1.1) holds for general current
distributions subject to a magnetic field. (For example, for the current density carried by a
point particle at x(t), j = qδ(x − x(t))v(t), we obtain E = qv(t) · A(x(t)), i.e. the familiar
Lorentz–force contribution to the Lagrangian of a charged particle.)
Now, assume that we shift the loop at fixed current against the magnetic R field. The change
−1 3
in potential energy corresponding to a small shift is given by δE = c d x j · δA, where
∇ × δA = δB denotes the change in the field strength. Using that ∇ × H = 4πc−1 j, we
represent δE as
1 1
Z Z
3
δE = d x (∇ × H) · δA = d3 x ijk (∂j Hk )δAi =
4π 4π
1 1
Z Z
3
=− d x Hk ijk ∂j δAi = d3 x H · δB,
4π 4π
where in the integration by parts we noted that due to the spatial decay of the fields no
surface terms at infinitey arise. Due to the linear relation H = µ−1
0 B, we may write H · δB =
1
R
δ(H · B)/2, i.e. δE = 8π δ B · E. Finally, summing over all shifts required to bring the
current loop in from infinity, we obtain
1
Z
E= d3 x H · B (1.2)
8π
for the magnetic field energy. Notice (a) that we have again managed to express the
energy of the system entirely in terms of the fields, i.e. without explicit reference to the
sources creating these fields and (b) the structural similarity to the electric field energy (??).
Finally, (c) the above construction shows that a separation of electromagnetism into static
and dynamic phenomena, resp., is not quite as unambiguous as one might have expected:
the derivation of the magnetostatic field energy relied on the dynamic law of induction. We
next proceed to embed the general static theory – as described by the concept of electric and
magnetic potentials – into the larger framework of electrodynamics.
1.2. ELECTROMAGNETIC GAUGE FIELD 3
Thus, from now on, we are up to the theory of the full set of Maxwell equations
∇ · D = 4πρ, (1.3)
1∂ 4π
∇×H− D= j, (1.4)
c ∂t c
1∂
∇×E+ B = 0, (1.5)
c ∂t
∇ · B = 0. (1.6)
∇·E = 4πρ,
1∂ 4π
∇×B− E = j,
c ∂t c
1∂
∇×E+ B = 0,
c ∂t
∇·B = 0. (1.7)
As in previous sections we will try to use constraints inherent to these equations to compactify
them to a smaller set of equations. As in section ??, the equation ∇ · B = 0 implies
B = ∇ × A. (1.8)
(However, we may no longer expect A to be time independent.) Now, substitute this represen-
tation into the law of induction: ∇×(E+c−1 ∂t A) = 0. This implies that E+c−1 ∂t A = −∇φ
can be written as the gradient of a scalar potential, or
1
E = −∇φ − ∂t A. (1.9)
c
−∆φ − 1c ∂t ∇ · A = 4πρ,
−∆A + c12 ∂t2 A + ∇(∇ · A) + 1c ∂t ∇φ = 4π
c
j (1.10)
These equations do not look particularly inviting. However, as in section ?? we may observe
that the choice of the generalized vector potential A is not unique; this freedom can be used
to transform Eq.(1.10) into a more manageable form: For an arbitrary function f : R3 × R →
4 CHAPTER 1. ELECTRODYNAMICS
R, (x, t) 7→ f (x, t). The transformation A → A + ∇f leaves the magnetic field unchanged
while the electric field changes according to E → E − c−1 ∂t ∇f . If, however, we synchronously
redefine the scalar potential as φ → φ − c−1 ∂t f , the electric field, too, will not be affected by
the transformation. Summarizing, the generalized gauge transformation
A → A + ∇f,
1
φ → φ − ∂t f, (1.11)
c
leaves the electromagnetic fields (1.8) and (1.9) unchanged.
The gauge freedom may be used to transform the vector potential into one of several
convenient representations. Of particular relevance to the solution of the time dependent
Maxwell equations is the so–called Lorentz gauge
1
∇ · A + ∂t φ = 0. (1.12)
c
i.e. a wave equation with inhomogeneity −(∇ · A + c−1 ∂t φ). We will see in a moment that
such equations can always be solved, i.e.
In the Lorentz gauge, the Maxwell equations assume the simplified form
1 2
∆ − 2 ∂t φ = −4πρ,
c
1 2 4π
∆ − 2 ∂t A = − j. (1.13)
c c
In combination with the gauge condition (1.12), Eqs. (1.13) are fully equivalent to the set of
Maxwell equations (1.7).
“vectorial” components A. While this indicates that the four components (φ, A) should
form an entity, the equations above treat φ and A separately. In the following we show that
there exists a much more concise and elegant covariant formulation of electrodynamics that
transcendents this separation. At this point, we introduce the covariant notation merely as a
mnemonic aid. Later in the text, we will understand the physical principles behind the new
formulation.
Definitions
Consider a vector x ∈ R4 . We represent this object in terms of four components {xµ },
µ = 0, 1, 2, 3, where x0 will be called the time–like component and xi , i = 1, 2, 3 space-
1
like components. Four–component objects of this structure will be called contravariant
vectors. To a contravariant vector xµ , we associate a covariant vector xµ (notice the
positioning of the indices as superscripts and subscripts, resp.) through
x0 = x 0 , xi = −xi . (1.14)
2
A co– and a contravariant vector xµ and x0µ can be “contracted” to produce a number:
xµ x0µ ≡ x0 x00 − xi x0i .
Here, we are employing theP
Einstein summation convention according to which pairs of indices
are summed over, xµ x0µ ≡ 4µ=1 xµ x0µ , etc. Notice that
xµ x0µ = xµ x0µ .
We next introduce a number of examples of co- and contravariant vectors that will be of
importance below:
. Consider a point in space time (t, r) defined by a time argument, and three real space
components ri , i = 1, 2, 3. We store this information in a so-called contravariant four–
component vector, xµ , µ = 0, 1, 2, 3 as x0 = ct, xi = ri , i = 1, 2, 3. Occasionally, we
write x ≡ {xµ } = (ct, x) for short. The associated covariant vector has components
{xµ } = (ct, r).
. Derivatives taken w.r.t. the components of the contravariant 4–vector are denoted by
∂µ ≡ δxδµ = (c−1 ∂t , ∂r ). Notice that this is a covariant object. Its contravariant partner
is given by {∂ µ } = (c−1 ∂t , −∂r ).
. The 4-current j = {j µ } is defined through j ≡ (cρ, j), where ρ and j are charge density
and vectorial current density, resp.
. the 4–vector potential A ≡ {Aµ } through A = (φ, A).
We next engage these definitions to introduce the
1
Following a widespread convention, we denote four component indices µ, ν, · · · = 0, 1, 2, 3 by Greek letters
(from the middle of the alphabet.) “Space–like” indices i, j, · · · = 1, 2, 3 are denoted by (mid–alphabet) latin
letters.
2
Although one may formally consider sums xµ x0µ , objects of this type do not make sense in the present
formalism.
6 CHAPTER 1. ELECTRODYNAMICS
Aµ → Aµ − ∂ µ f. (1.15)
(Exercise: convince yourself that this is equivalent to (1.11).) Specifically, this freedom may
be used to realize the Lorentz gauge,
∂µ Aµ = 0. (1.16)
Once more, we verify that any 4-potential may be transformed to one that obeys the Lorentz
condition: for if A0µ obeys ∂µ A0µ ≡ λ, where λ is some function, we may define the transformed
potential Aµ ≡ A0µ − ∂ µ f . Imposing on Aµ the Lorentz condition,
!
0 = ∂µ Aµ = ∂µ (A0µ − ∂ µ f ) = λ − ∂µ ∂ µ f,
!
we obtain the equivalent condition f = λ. Here, we have introduced the d’Alambert
operator
1
≡ ∂µ ∂ µ = 2 ∂t2 − ∆. (1.17)
c
(In a sense, this is a four–dimentional generalization of the Laplace operator, hence the four–
cornered square.) This is the covariant formulation of the condition discussed above. A gauge
transformation Aµ → Aµ + ∂ µ f of a potential Aµ in the Lorenz gauge class ∂µ Aµ = 0 is
compatible with the Lorentz condition iff f = 0. In other words, the class of Lorentz gauge
vector potentials still contains a huge gauge freedom.
The general representation of the Maxwell equations through potentials (1.10) affords the
4-representation
4π µ
Aµ − ∂ µ (∂ν Aν ) = j .
c
P
We may act on this equation with µ ∂µ to obtain the constraint
∂µ j µ = 0, (1.18)
4π µ
∂ν ∂ ν Aµ = j , ∂µ Aµ = 0. (1.19)
c
Before turning to the discussion of the solution of these equations a few general remarks
are in order:
1.3. FOURIER TRANSFORMATION 7
. The Lorentz condition is the prevalent gauge choice in electrodynamics because (a) it
brings the Maxwell equations into a maximally simple form and (b) will turn out below
to be invariant under the most general class of coordinate transformations, the Lorentz
transformations see below. (It is worthwhile to note that the Lorentz condition does
not unambiguously fix the gauge of the vector potential. Indeed, we may modify any
Lorentz gauge vector potential Aµ as Aµ → Aµ +∂ µ f where f satisfies the homogeneous
wave equation ∂ µ ∂µ f = 0. This doesn’t alter the condition ∂ µ Aµ = 0.) Other gauge
conditions frequently employed include
Coulomb gauge.) This gauge representation has also proven advantageous within the
context of quantum electrodynamics. However, we won’t discuss it any further in this
text.
1 X ikx
f (x) = e fk , (1.20)
L k
where the sum runs over all values k = 2πn/L, n ∈ Z. Series representations of this type are
summarily called Fourier series representations. At this point, we haven’t proven that the
function f can actually be represented by a series representation of this type. However, if it
exists, the coefficients fk can be computed with the help of the identity (prove it)
1 L
Z
0
dx e−ikx eikx = δkk0 , (1.21)
L 0
where k = 2πn/L, k 0 = 2πn0 /L and δkk0 ≡ δnn0 . We just need to multiply Eq. (1.1) by the
function e−ikx and integrate over x to obtain
Z L Z L
−ikx (1.1) 1 X 0
dx e f (x) = dx e−ikx eik x fk0
0 L 0 k0
X 1 Z L
0 (1.21)
= fk 0 dx e−ikx eik x = fk .
k0
L 0
We call the numbers fk ∈ C the Fourier coefficients of the function f . The relation
Z L
fk = dx e−ikx f (x), (1.22)
0
connecting a function with its Fourier coefficients is called the Fourier transform of f .
The ’inverse’ identity (1.1) expressing f through its Fourier coefficients is called the inverse
Fourier transform. To repeat, at this point, we haven’t proven the existence of the inverse
transform, i.e. the possibility to ’reconstruct’ a function f from its Fourier coefficients fk .
A criterion for the existence of the Fourier transform is that the insertion of (1.22) into
the r.h.s. of (1.1) gives the back the original function f :
Z L
! 1 0
X
f (x) = e ikx
dx0 e−ikx f (x0 ) =
L k 0
Z L
1 X ik(x−x0 )
= dx0 e f (x0 ).
0 L k
1.0
0.5
-0.5
-1.0
Figure 1.1: Representation of a function (thick line) by Fourier series expansion. The graphs
show series epansions up to (1, 3, 5, 7, 9) terms in *
where δ(x) is the Dirac δ–function. For reasons that will become transparent below, Eq. (1.23)
is called a completeness relation. Referring for a proof of this relation to the exercise below,
we here merely conclude that Eq. (1.23) establishes Eq. (1.22) and its inverse (1.1) as an
integraltransform, i.e. the full information carried by f (x) is contained in the set of all Fourier
coefficients {fk }. By way of example, figure 1.1 illustrates the approximation of a sawtooth
shaped function f by Fourier series truncated after a finite number of terms.
EXERCISE Prove the completeness relation. To this end, add an infinitesimal “convergence
generating factor” δ > 0 to the sum:
∞
1 X ik(x−x0 )−δ|k| 1 X 2πin(x−x0 )/L−2π|n|δ/L
e = e .
L L n=−∞
k
Next compute the sum as a convergent geometric series. Take the limit δ & 0 and convince
yourself that you obtain a representation of the Dirac δ–function.
where the sum runs over all vectors k = (k1 , . . . , kd ), kl = 2πnl /Ll , nl ∈ Z. Eqs. (1.21)
10 CHAPTER 1. ELECTRODYNAMICS
. Why is Fourier transformation useful? There are many different answers to this
question, which underpins the fact that we are dealing with an important concept: In
many areas of electrical engineering, sound engineering, etc. it is customary to work
with wave like signals, i.e. signals of harmonic modulation ∼ Re eikx = cos(kx). The
(inverse) Fourier transform is but the decomposition of a general signal into harmonic
constituents. Similarly, many “signals” encountered in nature (water or sound waves
reflected from a complex pattern of obstacles, light reflected from rough surfaces, etc.)
can be interpreted in terms of a superposition of planar waves. Again, this is a situation
where a Fourier ’mode’ decomposition is useful.
While this definition is formally consistent, it has a problem: only functions decaying suf-
ficiently fast at infinity so as to make f (x) exp(ikx) integrable can be transformed in this
way. This makes numerous function classes of applied interest – polynomials, various types of
transcendental functions, etc. – non-transformable and severely restrict the usefulness of the
formalism.
However, by a little trick, we may upgrade the definition to one that is far more useful.
Let us modify the transform according to
Z
2
f (k) = lim dx e−ikx−δx f (x). (1.26)
δ&0
Due to the inclusion of the convergence generating factor, δ, any function that does not
increase faster than exponential can be transformed! Similarly, it will be useful to include a
convergence generating in the reverse transform, i.e.
Z
2
f (x) = lim dk eikx−δk f (k).
δ&0
dd k ix·k−δk2
Z
f (x) = lim e f (k),
δ&0 (2π)d
Z (1.27)
d −ik·x−δx2
f (k) = lim d xe f (x).
δ&0
In these formulae, x2 ≡ x · x is the norm of the vector x, etc. A few more general remarks
on these definitions:
. To keep the notation simple, we will suppress the explicit reference to the convergence
generating factor, unless convergence is an issue.
12 CHAPTER 1. ELECTRODYNAMICS
. For later reference, we note that the generalization of the completeness relation to
infinite space reads as
dd k ik·(x−x0 )
Z
e = δ(x − x0 ). (1.28)
(2π)d
. Finally, one may wonder, whether the inclusion of the convergence generating factor
might not spoil the consistency of the transform. To check that this is not the case, we
need to verify the infinite space variant of the completeness relation (1.23), i.e.
dd k ik·(x−x0 )−δk2
Z
lim e f (k) = δ(x − x0 ). (1.29)
δ&0 (2π)d
(For only if this relation holds, the two equations in (1.27) form a consistent pair, cf.
the argument in section 1.3.1.)
EXERCISE Prove relation (1.29). Do the Gaussian integral over k (hint: notice that the integral
factorizes into d one–dimensional Gaussian integrals.) to obtain a result that in the limit δ & 0
asymptotes to a representation of the δ–function.
Fourier transform methods thus reduce the solution of linear partial differential equations with
constant coefficients to the computation of a definite integral. Technically, this integral may
be difficult to compute, however, the conceptual task of solving the differential equation is
under control.
EXAMPLE By way of example, consider the second order ordinary differential equation
(−∂x2 + λ−2 )ψ(x) = cδ(x − x0 ),
4
where λ, c and x0 are constants. Application of formula (1.31) gives a solution
dk e−ikx eikx0
Z
ψ(x) = c .
2π k 2 + λ−2
4
The solution is not unique: addition of a solution ψ0 of the homogeneous equation (−∂x2 + λ−2 )ψ0 (x) = 0
gives another solution ψ(x) → ψ(x) + ψ0 (x) of the inhomogeneous differential equation.
14 CHAPTER 1. ELECTRODYNAMICS
The integral above can be done by the method of residues (or looked up in a table). As a result
we obtain the solution.
ψ(x) = cλe−|x−x0 |/λ .
(Exercise: convince yourself that ψ is a solution to the differential equation.) In the limit λ → ∞,
the differential equation collapses to a one–dimensional variant of the Poisson equation, −∂x2 ψ(x) =
cδ(x − x0 ) for a charge c/4π at x0 . Our solution asymptotes to
ψ(x) = −c|x − x0 |,
However, the usefulness of Fourier transform methods has its limits. The point is that the
Fourier transform does not only turn complicated operations (derivatives) simple, the converse
is also true. The crux of the matter is expressed by the so–called convolution theorem:
straightforward application of the definitions (1.27) and of the completeness relation (1.29)
shows that
dd p
Z
f (x)g(x) −→ (f ∗ g)(k) ≡ f (k − p)g(p). (1.32)
(2π)d
In other words, the Fourier transform turns a simple product operation into a non–local inte-
gral convolution. Specifically, this means that the Fourier transform representation of (linear)
differential equations with spatially varying coefficient functions will contain convolution oper-
ations. Needless to say that the solution of the ensuing integral equations can be complicated.
Finally, notice the mathematical similarity between the Fourier transform f (x) → f (k) and
its reverse f (k) → f (x). Apart from a sign change in the phase and the constant pre–factor
(2π)−d the two operations in (1.27) are identical. This means that identities such as (1.30)
or (1.32) all have a reverse version. For example,
a convolution, Z
(f ∗ g)(x) ≡ dd xf (x − y)g(y) −→ f (k)g(k)
transforms into a simple product, etc. In practice, one often needs to compromise and figure
out whether the real space or the Fourier transformed variant of a problem offers the best
perspectives for mathematical attack.
A word of caution: this section won’t contain difficult formulae. Nonetheless, it doesn’t
make for easy reading. Please do carefully think about the meaning of the structures introduced
below.
Consider the space V of L2 functions f : [0, L] → C, i.e. the space of functions for which
RL
0
dx |f |2 exists. We call this a ’space’ of functions, because V is a vector space: for two
elements f, g ∈ V and a, b ∈ C, af + bg ∈ V , the defining criterion of a complex vector
space. However, unlike the finite–dimensional spaces studied in ordinary linear algebra, the
dimension of V is infinite. Heuristically, we may think of the dimension of a complex vector
space as the number of linearly independent complex components required to uniquely specify
a vector. In the case of functions, we need the infinitely many “components” f (x), x ∈ [0, L]
for a full characterization.
1.0
In the following we aim to develop some intuition for the “linear
algebra” on function spaces. To this end, it will be useful to 0.5
representative of a (smooth) function f (see the figure for a discretization of a smooth function
into N = 30 data points). We think of the N -component complex vectors fN generated in
this way as elements of an N -dimensional complex vector space VN . Heuristically it should be
clear that in the limit of infinitely fine discretization, N → ∞, the vectors fN carry the full
information on their reference functions, and VN approaches a representative of the function
5
space “limN →∞ VN = V ”.
5
The quotes are quite appropriate here. Many objects standard in linear algebra, traces, determinants,
etc. do not afford a straightforward extrapolation to the infinite dimensional case. However, in the present
context, the ensuing structures – the subject of functional analysis – do not play an essential role, and a
naive approach will be sufficient.
16 CHAPTER 1. ELECTRODYNAMICS
hei , ej i = N δij .
The N –normalization of the basis vectors ei implies that the coefficients of general
vectors f are simply obtained as
fi = hei , f i. (1.33)
What is the N → ∞ limit of these expressions? At N → ∞ the vectors ei become the
representative of functions that are “infinitely sharply peaked” around a single point (see the
figure above). Increasing N , and choosing a sequence of indices i that correspond to a fixed
7
point y ∈ [0, L], we denote this function by δy . Equation (1.33) then asymptotes to
Z L
f (y) = hδy , f i = dx δy (x)f (x),
0
8
where we noted that δy is real. This equation identifies δy (x) = δ(y − x) as the δ-function.
This consideration identifies the δ–functions {δy |y ∈ [0, L]} as the function space analog of
the standard basis associated to the canonical scalar product. Notice that there are “infinitely
many” basis vectors. Much like we need to distinguish between vectors v ∈ VN and the
components of vectors, vi ∈ C, we have to distinguish between functions f ∈ V and their
“coefficients” or function values f (x) = hδx , f i ∈ C.
solutions that reflect this symmetry, i.e. one might aim to represent the solution in terms
of a basis of functions that show definite behavior under rotations. Specifically, the Fourier
transformation is a change to a function basis that behaves conveniently when acted upon by
derivative operations.
To start with, let us briefly recall the linear algebra of changes of basis in finite dimen-
sional vector spaces. Let {wα |α = 1, . . . , N } be a second basis of the space VN (besides
the reference basis {ei |i = 1, . . . , N }). For simplicity, we presume that the new basis is
orthonormal with respect to the given scalar product, i.e.
hwα , wβ i = N δα,β .
where wα,i ≡ hei , wα i are the components of the new basis vectors in the old basis. The
coefficients of the expansion,Pvα0 , of v in the new basis are obtained by taking the scalar
product with wα : hwα , vi = β vβ0 hwα , wβ i = N vα0 , or
1 X ∗
vα0 = w v (1.36)
N j α,j j
1 X ∗
δij = w w . (1.37)
N α α,j α,i
This is the finite dimensional variant of a completeness relation. The completeness relation
holds only for orthonormalized bases. However, the converse is also true: any system of
vectors {wα |i = 1, . . . , n} that satisfying the completeness criterion (1.37) is orthonormal
and “complete” in the sense that it can be used to represent arbitrary vectors as in (1.34).
We may also write the completeness relation in a less index–oriented notation,
X
hei , ej i = hei , wα i hwα , ej i. (1.38)
α
EXERCISE Prove that the completeness relation (1.37) represents a criterion for the orthonor-
mality of a basis.
18 CHAPTER 1. ELECTRODYNAMICS
Let us now turn to the infinite dimensional generalization of these concepts. The (in-
verse) Fourier transform states (1.1) that an arbitrary element f ∈ V of function space (a
“vector”) can be expanded in a basis of functions {ek ∈ V |k ∈ 2πL−1 Z}. The function values
(“components in the standard basis”) of these candidate basis functions are given by
The inverse Fourier transform states that an arbitrary function can be expanded as
1X
f= fk e k . (1.39)
L k
which is the analog of (1.35). Substitution of ek (x) = eikx gives (1.1). The coefficients can
be obtained by taking the scalar product hek , i with (1.39), i.e.
Z L
fk = hek , f i = dx e−ikx f (x),
0
which is the Fourier transform. Finally, completeness holds if the substitution of these coef-
ficients into the inverse transform (1.40) reproduces the function f (x). Since this must hold
for arbitrary functions, we are led to the condition
1 X −ikx iky
e e = δ(x − y),
L k
The abstract representation of this relation (i.e. the analog of (1.38)) reads (check it!)
1X
hδx , δy i = hδx , ek i hek , δy i.
L k
For the convenience of the reader, the most important relations revolving around completeness
and basis change are summarized in table 1.1. The column “invariant” contains abstract
definitions, while “components” refers to the coefficients of vectors/function values (For better
readability the normalization factors N we have our specific definition of scalar products above
are omitted in the table. )
9
However, this by itself does not yet prove their completeness as a basis!
1.3. FOURIER TRANSFORMATION 19
Table 1.1: Summarizing the linear algebraic interpretation of basis changes in function space.
(In all formulas, a double index summation convention implied.)
With this background, we are in a position to formulate the motivation for Fourier
transformation from a different perspective. Above, we had argued that “Fourier transfor-
mation transforms derivatives into something simple”. The derivative operation ∂x : V → V
can be viewed as a linear operator in function space: derivatives map functions onto functions,
f → ∂x f and ∂x (af + bg) = a∂x f + b∂x g, a, b ∈ C, the defining criterion for a linear map.
In linear algebra we learn that given a linear operator, A : VN → VN , it may be a good idea
to look at its eigenvectors, wα , and eigenvalues, λα , i.e. Awα = λα wα . In the specific case,
where A is hermitean, hv, Awi = hAv, wi for arbitrary v, w, we even know that the set of
eigenvectors {wα } can be arranged to form an orthonormal basis.
These structures extend to function space. Specifically, consider the operator −i∂x , where
the factor (−i) has been added for convenience. A straightforward integration by parts shows
that this operator is hermitean:
Z Z
hf, (−i∂x )gi = dx f (x)(−i∂x )g(x) = dx (−i∂x f )∗ (x)g(x) = h(−i∂x )f, gi.
∗
Second, we have,
(−i∂x ek )(x) = −i∂x exp(ikx) = k exp(ikx) = kek (x),
or
(−i∂x ) ek = k ek
for short. This demonstrates that
The one difference to Eq. (1.27) is that the sign–convention in the exponent of the (ω/t)–sector
of the transform differs from that in the (k, x)–sector. We next subject the homogeneous wave
equation
1 2
∆ − 2 ∂t ψ(t, x) = 0,
c
(where ψ is meant to represent any of the components of the E– or B–field) to this transfor-
mation and obtain ω 2
2
k − ψ(ω, k) = 0,
c
where k ≡ |k|. Evidently, the solution ψ must vanish for all values (ω, k), except for those
for which the factor k 2 − (ω/c)2 = 0. We may thus write
ψ(ω, k) = 2πc+ (k)δ(ω − kc) + 2πc− (k)δ(ω + kc),
where c± ∈ C are arbitrary complex functions of the wave vector k. Substituting this
representation into the inverse transform, we obtain the general solution of the scalar ho-
mogeneous wave equation
d3 k
Z
i(k·x−ckt) i(k·x+ckt)
ψ(t, x) = c + (k)e + c − (k)e . (1.42)
(2π)3
In words:
1.4. ELECTROMAGNETIC WAVES IN VACUUM 21
1.4.2 Polarization
In the following we will discuss a number of physically different realizations of plane electro-
magnetic waves. Since B is uniquely determined by E, we will focus attention on the latter.
10
One may ask why, then, did we introduce complex notation at all. The reason is that working with
exponents of phases is way more convenient than explicitly having to distinguish between the sin and cos
functions that arise after real parts have been taken.
22 CHAPTER 1. ELECTRODYNAMICS
INFO It may be worthwhile to rephrase the concept of the solution of linear differential equations
by Green function methods in the present dynamical context: think of a generic inhomo-
geneity 4πg(x, t) as an additive superposition of δ–inhomogeneities, δ(y,s) (x, t) ≡ δ(x − y)δ(t − s)
concentrated in space and time around y and s, resp. (see the figure, where the vertical bars rep-
resent individual contributions δ(y,s) weighted by g(y, s) and a finite discretization in space–time
is understood.)
Z
4πg(x, t) = d3 yds (4πg(y, s))δ(y,s) (x, t).
The functions G(y,s)(x,t) ≡ G(x, t; y, s) are the Green functions of the wave equation. Knowledge
of the Green function is equivalent to a full solution of the general equation, up to the final
integration (1.46).
The Green function of the scalar wave equation (1.45) G(x, t; x0 , t) is defined by the
differential equation
1 2
∆ − 2 ∂t G(x, t; x0 , t0 ) = −δ(x − x0 )δ(t − t0 ). (1.47)
c
Once the solution of this equation has been obtained (which requires specification of a set of
boundary conditions), the solution of (1.45) becomes a matter of a straightforward integration:
Z
f (x, t) = 4π d3 x0 dt0 G(x, t; x0 , t0 )g(x0 , t0 ). (1.48)
INFO In fact, the most elegant and efficient solution strategy utilizes methods of the theory
of complex functions. however, we do not assume familiarity of the reader with this piece of
mathematics and will use a more elementary technique.
We first note that for the chosen set of boundary conditions, the Green function G(x −
x0 , t − t0 ) will depend on the difference of its arguments only. We next Fourier transform
Eq.(1.47)R in the temporal variable, i.e. we act on theR equation with the integral transform
dt
f (ω) = 2π exp(iωt)f (t) (whose inverse is f (t) = dt exp(−iωt)f (ω).) The temporally
transformed equation is given by
1
∆ + k 2 G(x, ω) = − δ(x),
(1.49)
2π
where we defined k ≡ ω/c. If it were not for the constant k (and a trivial scaling factor (2π)−1
on the l.h.s.) this equation, known in the literature as the Helmholtz equation, would be
equivalent to the Poisson equation of electrostatics. Indeed, it is straightforward to verify that
the solution of (1.49) is given by
1 e±ik|x|
G± (x, ω) = , (1.50)
8π 2 |x|
where the sign ambiguity needs to be fixed on physical grounds.
INFO To prove Eq.(1.50), we introduce polar coordinates centered around x0 and act with the
spherical representation of the Laplace operator (cf. section ??) on G± (r) = 8π1 2 eikr /r. Noting
that the radial part of the Laplace operator, ∆# is given by r−2 ∂r r2 ∂r = ∂r2 + (2/r)∂r and
∆# (4πr)−1 = −δ(x) (the equation of the Green function of electrostatics), we obtain
1 e±ikr
∆ + k 2 G± (r, ω) = 2 ∂r2 + 2r−1 ∂r + k 2
=
8π r
e±ikr 2 −1
−1 ∓ikr −1 ±ikr ∓ikr −1 2 2 −1
±ikr
= ∂r + 2r ∂r r + 2e (∂r r )∂r e +e r ∂r + k + 2r ∂r e =
8π 2
1 1
= − δ(x) ∓ 2ikr−2 ± 2ikr−2 = − δ(x),
2π 2π
as required.
For reasons to become clear momentarily, we call G+ (G− ) the retarded (advanced) Green
function of the wave equation.
26 CHAPTER 1. ELECTRODYNAMICS
t
x
x = ct
Figure 1.2: Wave front propagation of the pulse created by a point source at the origin
schematically plotted as a function of two–dimensional space and time. The width of the
front is set by δx = cδt, where δt is the duration of the time pulse. Its intensity decays as
∼ x−1 .
g(x0 , t − |x − x0 |/c)
Z
f (x, t) = d3 x0 . (1.52)
|x − x0 |
For any fixed instance of space and time, (x, t), the solution f (x, t) is affected by the sources
g(x0 , t0 ) at all points in space and fixed earlier times t0 = t − |x − x|0 /c. Put differently, a time
t − t0 = |x − x0 |/c has to pass before the amplitude of the source g(x0 , t0 ) may cause an effect
at the observation point x at time t — the signal received at (x, t) is subject to a retardation
mechanism. When the signal is received, it is so at a strength g(x0 , t)/|x−x0 |, similarly to the
Coulomb potential in electrostatics. (Indeed, for a time independent source g(x0 , t0 ) = g(x0 ),
we may enter the wave equation with a time–independent ansatz f (x), whereupon it reduces
to the Poisson equation.) Summarizing, the sources act as ‘instantaneous Coulomb charges’
which are (a) of strength g(x0 , t0 ) and felt at times t = t0 + |x − x0 |/c.
EXAMPLE By way of example, consider a point source at the origin, which ‘broadcasts’ a signal
for a short instance of time at t0 ' 0, g(x0 , t0 ) = δ(x0 )F (t0 ) where the function F is sharply
peaked around t0 = 0 and describes the temporal profile of the source. The signal is then given
by f (x, t) = F (t − |x|/c)/|x|, i.e. we obtain a pattern of out–moving spherical waves, whose
amplitude diminishes as |x|−1 or, equivalently ∼ 1/tc (see Fig. 1.2.)
The principle of cause–and–effect or causality is violated implying that the advanced Green
function, though mathematically a legitimate solution of the wave equation, does not carry
physical meaning. Two more remarks are in order: (a) when solving the wave equation by
techniques borrowed from the theory of complex functions, the causality principle is built in
from the outset, and the retarded Green function automatically selected. (b) Although the
advanced Green function does not carry immanent physical meaning, its not a senseless object
altogether. However, the utility of this object discloses itself only in quantum theories.
Using the convolution theorem, this transforms to f (k) = (2π)4 4π G(k)g(k) = − 4πg(k)
kµ kµ . Specifi-
cally, for the vector potential, we obtain
jµ
Aµ = −4π . (1.55)
kν k ν
13
Recall that x0 = ct, i.e. kµ xµ = k0 x0 − k · x = ωt − k · x.
28 CHAPTER 1. ELECTRODYNAMICS
This is all we need to check that the gauge condition is fulfilled: The Fourier transform of the
Lorentz gauge relation ∂µ Aµ = 0 is given by kµ Aµ = k µ Aµ = 0. Probing this relation on (1.55),
we obtain kµ Aµ ∝ kµ j µ = 0, where we noted that kµ j µ = 0 is the continuity relation.
We have thus shown that the solution (1.53) conforms with the gauge constraints. As an important
byproduct, our proof above reveals an intimate connection between the gauge behaviour of the
electromagnetic potential and current conservation.
Eqs. (1.53) generalize Eqs.(??) and (??) to the case of dynamically varying sources. In
the next sections, we will explore various aspects of the physical contents of these equations.
where use of Gauß’s theorem has been made. The first term is the sum over the electric and
the magnetic field energy density, as derived earlier for static field configurations. We now
interpret this term as the energy stored in general electromagnetic field configurations and
1
w≡ (E · D + B · H) (1.58)
8π
as the electromagnetic energy density. What is new in electrodynamics is that the field
energy may change in time. The r.h.s. of the equation tells us that there are two mechanisms
whereby the eletromagnetic field energy may be altered. First, there is a surface integral over
the so–called Poynting vector field
c
S≡ E × H. (1.59)
4π
We interpret the integral over this vector as the energy current passing through the surface S
and the Poynting vector field as the energy current density. Thus, the pair (energy density,
w)/(Poynting vector S) plays a role similar to the pair (charge density, ρ)/(current density,
j) for the matter field. However, unlike with matter, the energy of the electromagnetic field
is not conserved. Rather, the balance equation of electromagnetic field energy above
states that
∂t w + ∇ · S = −j · E. (1.60)
The r.h.s. of this equation contains the matter field j which suggests that it describes the
conversion of electromagnetic into mechanical energy. Indeed, the temporal change of the
energy of a charged point particle in an electromagnetic field (cf. section 1.1 for a similar line
of reasoning) is given by dt U (x(t)) = −F · ẋ = −q(E + c−1 v × B) · v = −qE · v, where we
observed that the magnetic component of the Lorentz force is perpendicular to the velocity
and, therefore, does not do work on the particle. Recalling that the current densityR of a point
particle is given by j = qδ(x − x(t))v, this expression may be rewritten as dt U = d3 x j · E.
The r.h.s. of Eq. (1.59) is the generalization of this expression to arbitrary current densities.
Energy conservation implies that the work done on a current of charged particles has to be
taken from the electromagnetic field. This explains the appearance of the mechanical energy
on the r.h.s. of the balance equation (1.60).
generally represent continuity equations: suppose we have some quantity – energy, oxygen
14
molecules, voters of some political party – that may be described by a density distribution ρ.
14
In the case of discrete quantities (the voters, say) ρ(x, t)d3 x is the average number of “particles” present
in a volume d3 x where the reference volume d3 x must be chosen large enough to make this number bigger
than unity (on average.)
30 CHAPTER 1. ELECTRODYNAMICS
It is useful to memorize the form of the continuity equation. For if a mathematical equation of the
form ∂t (function no. 1) + ∇ · (vector field) = function no. 2 crosses our way, we know that we
have encountered a conservation law, and that the vector field must be “the current” associated
to function no. 1. This can be non–trivial information in contexts where the identification of
densities/currents from first principles is difficult. For example, the structure of (1.60) identifies
w as a density and S as its current.
As a second example of a physical quantity that can be exchanged between matter and
electromagnetic fields, we consider momentum. According to Newtons equations, the change
of the total mechanical momentum Pmech carried by the particles inside a volume B is given
by the integrated force density, i.e.
1
Z Z
3 3
dt Pmech = d xf = d x ρE + j × B ,
B B c
where in the second equality we inserted the Lorentz force density. Using Eqs. (??) and (??)
to eliminate the sources we obtain
1
Z
dt Pmech = d3 x (∇ · E)E − B × (∇ × B) + c−1 B × Ė .
4π B
1
Z
d3 x (∇ · E)E − E × (∇ × E) + B(∇ · B) − B × (∇ × B) + c−1 dt (B × E) ,
dt Pmech =
4π B
15
The discussion of the field momentum in matter (cf. ??) turns out to be a delicate matter, wherefore we
prefer to stay on the safe ground of the vaccum theory.
1.6. FIELD ENERGY AND MOMENTUM 31
This equation is of the form dt (something) = (something else). Comparing to our earlier
discussion of conservation laws, we are led to interpret the ‘something’ on the l.h.s. as a
conserved quantity. Presently, this quantity is the sum of the total mechanical momentum
density Pmech and the integral
1 S
Z
Pfield = d3 x g, g≡ E × B = 2. (1.63)
4πc c
The structure of this expression suggests to interpret Pfield as the momentum carried by
the electromagnetic field and g as the momentum density (which happens to be given
by c−2 × the Poynting vector.) If our tentative interpretation of the equation above as a
conservation law is to make sense, we must be able to identify its r.h.s. as a surface integral.
This in turn requires that the components of the (vector valued) integrand be representable
as a Xj ≡ ∂i Tij i.e. as the divergence of a vector–field Tj with components Tij . (Here,
j = 1, 2, 3 plays the role of a spectator
R 3 index.) RIf this is the case,
R we may, indeed, transform
3
the integral to a surface integral, B d x Xj = B d x ∂i Tij = ∂B dσ n · Tj . Indeed, it is not
difficult to verify that
δij
[(∇ · X) − X × (∇ × X)]j = ∂i Xi Xj − X · X
2
Physically, dtdσni Tij is the (jth component of the) momentum that gets pushed through dσ
in time dt. Thus, dσni Tij is the momentum per time, or force excerted on dσ and ni Tij the
force per area or radiation pressure due to the change of linear momentum in the system.
32 CHAPTER 1. ELECTRODYNAMICS
1
l=x×g = x × (E × B).
4πc
However, in this course we will not discuss angular momentum conservation any further.
Ephys. (x, t) = Re E(x, t) = Re (E1 e1 + E2 e2 )eikx3 −iωt = r1 u1 (x3 , t)e1 + r2 u2 (x3 , t)e2 ,
where ui (x3 , t) = cos(φi + kx3 − ωt) and we defined Ei = ri exp(iφ1 ) with real ri . Similarly,
the magnetic field Bphys. is given by
From these relations we obtain the energy density and Poynting vector as
1 1
w(x, t) = (E2 + B2 ) = ((r1 u1 )2 + (r2 u2 )2 )(x3 , t)
8π 4π
S(x, t) = cw(x, t)e3 ,
EXERCISE Check that w and S above comply with the conservation law (1.60).
As in our earlier analyses of multipole fields, we assume that the observation point x is far
away from the source, r = |x| d. Under these circumstances, we may focus attention on
the spatial components of the vector potential,
0 ik|x−x0 |
1 3 0 j(x )e
Z
A(x) = dx , (1.66)
c |x − x0 |
where we have substituted the time–oscillatory ansatz for sources and potential into (1.53)
and divided out the time dependent phases. From (1.66) the magnetic and electric fields are
obtained as
B = ∇ × A, E = ik −1 ∇ × (∇ × A), (1.67)
where in the second identity we used the source–free variant of the law of magnetic circulation,
c−1 ∂t E = −ikE = ∇ × B.
We may now use the smallness of the ratio d/r to expand |x − x0 | ' r − n · x0 + . . . , where
n is the unit vector in x–direction. Substituting this expansion into (1.66) and expanding
the result in powers of r−1 , we obtain a series similar in spirit to the electric and magnetic
multipole series discussed above. For simplicity, we here focus attention on the dominant
contribution to the series, obtained by approximating |x − x0 | ' r. For reasons to become
clear momentarily, this term,
eikr
Z
A(x) ' d3 x0 j(x0 ), (1.68)
cr
generates electric dipoleR radiation. In section ?? we have seen that for a static current
distribution, the integral j vanishes. However, for a dynamic distribution, we may engage
the Fourier transform of the continuity relation, iωρ = ∇ · j to obtain
Z Z Z Z
d x ji = d x ∇xi · j = − d x xi (∇ · j) = −iω d3 x0 xi ρ.
3 3 3 0
eikr
A(x) ' −ikd , (1.69)
r
INFO This latter condition is usually met in practice. E.g. for radiation in the MHz range,
ω ∼ 106 s−1 the wave length is given by λ = 3 · 108 m s−1 /106 s−1 ' 300 m much larger than the
extension of typical antennae.
34 CHAPTER 1. ELECTRODYNAMICS
Near zone
For r λ or kr 1, the exponential in
(1.68) may be approximated by unity and we
obtain
k
A(x, t) ' −i d eiωt ,
r
where we have re–introduced the time de-
pendence of the sources. Using Eq.(1.67) to
compute the electric and magnetic field, we
get
k
B(x, t) = i n × deiωt ,
r2
3n(n · d) − d iωt
E(x, t) = e .
r3
The electric field equals the dipole field (??)
created by a dipole with time dependent mo-
ment d exp(iωt) (cf. the figure where the numbers on the ordinate indicate the separation
from the radiation center in units of the wave length and the color coding is a measure of the
field intensity.) This motivates the denotation ‘electric dipole radiation’. The magnetic field
is by a factor kr 1 smaller than the electric field, i.e. in the near zone, the electromagnetic
field is dominantly electric. In the limit k → 0 the magnetic field vanishes and the electric
field reduces to that of a static dipole.
The analysis of the intermediate zone, kr ∼ 1 is complicated in as much as all powers
in an expansion of the exponential in (1.68) in kr must be retained. For a discussion of the
resulting field distribution, we refer to[1].
Far zone
For kr 1, it suffices to let the derivative operations on the argument kr of the exponential
function. Carrying out the derivatives, we obtain
ei(kr−ωt)
B = k2n × d , E = −n × B. (1.70)
r
These asymptotic field distributions have much in common with the vacuum electromagnetic
fields above. Indeed, one would expect that far away from the source (i.e. many wavelengths
away from the source), the electromagnetic field resembles a spherical wave (by which we
1.7. ELECTROMAGNETIC RADIATION 35
mean that the surfaces of constant phase form spheres.) For observation points sufficiently
far from the center of the wave, its curvature will be nearly invisible and we the wave will look
approximately planar. These expectations are met by the field distributions (1.70): neglecting
all derivatives other than those acting on the combination kr (i.e. neglecting corrections of
O(kr)−1 ), the components of E and B obey the wave equation (the Maxwell equations in
16
vacuum. The wave fronts are spheres, outwardly propagating at a speed c. The vectors
n ⊥ E ⊥ B ⊥ n form an orthogonal set, as we saw is characteristic for a vacuum plane wave.
Finally, it is instructive to compute the energy current carried by the wave. To this end,
we recall that the physically realized values of the electromagnetic field obtain by taking the
real part of (1.70). We may then compute the Poynting vector (1.59) as
ck 4 2 2 2 h... it ck
4
S= n (d − (n · d) ) cos (kr − ωt) −→ n (d2 − (n · d)2 ),
4πr2 8πr2
Where the last expression is the energy current temporally averaged over several oscillation
periods ω −1 . The energy current is maximal in the plane perpendicular to the dipole moment of
the source and decays according to an inverse square law. It is also instructive to compute the
total power radiated by the source, i.e. the energy current integrated over spheres of constant
radius r. (Recall that the integrated energy current accounts for the change of energy inside
the reference volume per time, i.e. the power radiated by the source.) Choosing the z–axis of
the coordinate system to be colinear with the dipole moment, we have d2 − (n · d)2 = d2 sin2 θ
and
ck 4 d2 2 π
Z 2π
ck 4 d2
Z Z
2
P ≡ dσn · S = r sin θdθ dφ sin θ = .
S 8πr2 0 0 3
Notice that the radiated power does not depend on the radius of the reference sphere, i.e.
the work done by the source is entirely converted into radiation and not, say, in a steadily
increasing density of vacuum field energy.
INFO As a concrete example of a radiation source, let us consider a center–fed linear an-
tenna, i.a. a piece of wire of length a carrying an AC current that is maximal at the center
of the wire. We model the current flow by the distribution I(z) = I0 (1 − 2|z|a−1 ) exp(−iωt),
where a λ = c/ω and alignment with the z–axis has been assumed. Using the continu-
ity equation, ∂z I(z, t) + ∂t ρ(z, t), we obtain the charge density (charge per unit length) in the
wire as Rρ(z) = iI0 2(ωa)−1 sgn(z) exp(−iωt). The dipole moment of the source is thus given by
a
d = ez −a dz zρ(z, t) = iI0 (a/2ω), and the radiated power by
(ka)2 2
P = I .
12c 0
The coefficient Rrad ≡ (ka)2 /12c of the factor I02 is called radiation resistance of the antenna.
To understand the origin of this denotation, notice that [Rrad ] = [c−1 ] has indeed the dimension of
resistivity (exercise.) Next recall that the power required to drive a current through a conventional
resistor R is given by P = U I = RI 2 . Comparison with the expression above suggests to interpret
16
Indeed, one may verify (do it!) that the characteristic factors r−1 ei(kr−ωt) are exact solutions of the wave
equations; they describe the space–time profile of a spherical wave.
36 CHAPTER 1. ELECTRODYNAMICS
Rrad as the ‘resistance’ of the antenna. However, this resistance has nothing to do with dissipative
energy losses inside the antenna. (I.e. those losses that hold responsible for the DC resistivity of
metals.) Rather, work has to be done to feed energy into electromagnetic radiation. This work
determines the radiation resistance. Also notice that Rrad ∼ k 2 ∼ ω 2 , i.e. the radiation losses
increase quadratic in frequency. This latter fact is of great technological importance.
Our results above relied on a first order expansion in the ratio d/r between the extent of
the sources and the distance of the observation point. We saw that at this order of the
expansion, the source coupled to the electromagnetic field by its electric dipole moment. A
more sophisticated description of the field field may be obtained by driving the expansion in d/r
to higher order. E.g. at next order, the electric quadrupole moment and the magnetic dipole
moment enter the stage. This leads to the generation of electric quadrupole radiation and
magnetic dipole radiation. For an in–depth discussion of these types of radiation we refer
to [1].
Chapter 2
Macroscopic electrodynamics
∇·e = 4πρµ ,
1∂ 4π
∇×b− e = jµ ,
c ∂t c
1∂
∇×e+ b = 0,
c ∂t
∇·b = 0,
where ρµ and jµ are the microscopic charge and current density, respectively. Now, for several
reasons, it does not make much sense to consider these equations as they stand: First, it is
clear that attempts to get a system of O(1023 ) dynamical sources under control are bound
to fail. Second, the dynamics of the microscopic sources is governed by quantum effects and
cannot be described in terms of a classical vector field j. Finally, we aren’t even interested in
knowing the microscopic fields. Rather, (in classical electrodynamics) we want to understand
the behaviour of fields on ‘classical’ length scales which generally exceed the atomic scales by
1
far.
In the following, we develop a more praxis oriented theory of electromagnetism in matter.
We aim to lump the influence of the microscopic degrees of freedom of the matter–environment
into a few effective modifiers of the Maxwell equations.
1
For example, the wave–length of visible light is about 600 nm, whilst the typical extension of a molecule
is 0.1 nm. Thus, ‘classical’ length scales of interest are about three to four orders of magnitude larger than
the microscopic scales.
37
38 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
Figure 2.1: Schematic on the spatially averaged system of sources. A zoom into a solid (left)
reveals a pattern of ions or atoms or molecules (center left). Individual ions/atoms/molecules
centered at lattice coordinate xm may carry a net charge (qm ), and a dipole element dm (center
right). A dipole element forms if sub-atomic constituents (electrons/nuclei) are shifted asym-
metrically w.r.t. the center coordinate. In an ionic system, mobile electrons (at coordinates
xj contribute to the charge balance.)
E ≡ hei, B ≡ hbi,
R 3 0
where the averaging procedure is defined
R by hf (x)i ≡ d x f (x − x0 )g(x0 ), and g is a weight
function that is unit normalized, gt = 1 and decays over sufficiently large regions in space.
Since the averaging procedure commutes with taking derivatives w.r.t. both space and time,
∂t,x E = h∂t,x ei (and the same for B), the averaged Maxwell equations read as
∇·E = 4πhρµ i,
1∂ 4π
∇×B− E = hjµ i,
c ∂t c
1∂
∇×E+ B = 0,
c ∂t
∇·B = 0.
In the second line, we used the fact that the range over which the weight function g changes
exceeds atomic extensions by far to Taylor expand to first order in the offsets a. The zeroth
order term of the expansion
P is oblivious to the molecular structure, i.e. it contains only the
total charge qm = j qm,j of the molecules and their center positions. The first order term
P 2
contains the molecular dipole moments, dm ≡ j am,j qm,j . By the symbol
* +
X
P(x, t) ≡ δ(x − xm (t))dm (t) , (2.1)
m
we denote the average polarization of the medium. Evidently, P is a measure of the density
of the dipole moments carried by the molecules in the system. P
The average density of the mobile carriers in the system is given by hρf (x, t)i = hqe i δ(x−
xi (t))i, so that we obtain the average charge density as
hρµ (x, t)i = ρav (x, t) − ∇ · P(x, t) (2.2)
where we defined the effective or macroscopic charge density in the system,
* +
X X
ρav (x, t) ≡ qm δ(x − xm (t)) + qe δ(x − xi (t)) , (2.3)
m i
Notice that the molecules/atoms present in the medium enter the quantity ρav (x, t) as point–
like entities. Their finite polarizability is accounted for by the second term contributing to hρi.
Also notice that most solids are electrically neutral on average, i.e. the positive charge of ions
accounted for in the first term in (2.3) cancels against the negative charge of mobile electrons
(the second term) meaning that ρav (x, t) = 0. Under these circumstances,
where P is given by the averaged polarization (2.1). The electric field E in the medium
is caused by the sum of the averaged microscopic density, hρµ i, and an ’external’ charge
3
density,
is the average density of magnetic dipole moments in the system. The quantity
* +
D X E X
jav ≡ qe ẋi δ(x − xi ) + qm ẋm δ(x − xm )
m
is the current carried by the free charge carriers and the point–like approximation of the
molecules. Much like the average charge density ρav this quantity vanishes in the majority of
cases – solids usually do not support non–vanishing current densities on average, i.e. jav = 0,
meaning that the average of the microscopic current density is given by
hjµ i = Ṗ + c∇ × M. (2.8)
In analogy to Eq. (2.5), the magnetic induction has the sum of hjµ i and an ’external’ current
density, j as its source:
1 4π
∇ × B − ∂t E = (hjµ i + j) . (2.9)
c c
We next consider the different orders in a contributing to this expansion in turn. At zeroth order,
we obtain * +
X X
hjb (x)i(0) = qm g(x − xm )ẋm = qm ẋm δ(x − xm ) ,
m m
i.e. the current carried by the molecules in a point–like approximation. The first order term is
given by Xh i
hjb (x)i(1) = g(x − xm ) ḋm − (∇g(x − xm ) · dm ) ẋm .
m
The form of this expression suggests to compare it with the time derivative of the polarization
vector,
X Xh i
Ṗ = dt g(x − xm )dm = g(x − xm ) ḋm − (∇g(x − xm ) · ẋm ) dm .
m m
This expression is related to the density of magnetic dipole moments carried by the molecules.
The latter is given by (cf. Eq. (??) and Eq. (2.7))
1 X
M= qm,j am,j × ȧm,j g(x − xm ).
2c
m,j
As a result of a straightforward calculation, we find that the curl of this expression is given by
1 1 X
∇ × M = hjb i(2) + dt qm,j am,j (am,j · ∇g(x − xm )).
c 2c
m,j
42 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
The second term on the Pr.h.s. of this expression engages the time derivative of the electric
quadrupole moments ∼ qaa ab , a, b = 1, 2, 3 carried by the molecules. The fields generated by
these terms are arguably very weak so that
P
Adding to the results above the contribution of the free carriers hjf (x)i = qe h i ẋi δ(x − xi )i,
we arrive at Eq. (2.6).
Eqs. (2.4) and (2.8) describe the spatial average of the microscopic sources of electromagnetic
fields in an extended medium. The expressions appearing on the r.h.s. of these equations can
be interpreted in two different ways, both useful in their own right. One option is to look at the
r.h.s. of, say, Eq. (2.4) as an effective charge density. This density obtains as the divergence
of a vector field P describing the average polarization in the medium (cf. Eq. (2.1)).
To understand the meaning of this expression, consider the cartoon of a solid shown in
Fig. 2.2. Assume that the microscopic constituents of the medium are ’polarized’ in a certain
4
direction. Formally, this polarization is described by a vector field P, as indicated by the
horizontal cut through the medium in Fig. 2.2. In the bulk of the medium, the ’positive’
and ’negative ends’ of the molecules carrying the polarization mutually neutralize and no net-
charge density remains. (Remember that the medium is assumed to be electrically neutral on
average.) However, the boundaries of the system carry layers of uncompensated positive and
negative polarization, and this is where a non-vanishing charge density will form (indicated
by a solid and a dashed line at the system boundaries.) Formally, the vector field P begins
and ends (has its ’sources’) at the boundaries, i.e. ∇ · P is non-vanishing at the boundaries
where it represents the effective surface charge density. This explains the appearance of ∇ · P
as a source term in the Maxwell equations.
Dynamical changes of the polarization ∂t P 6= 0, – due to, for example, fluctuations in the
orientation of polar molecules – correspond to the movement of charges parallel to P. The
5
rate ∂t P of charge movement is a current flow in P direction. Thus, ∂t P must appear as a
bulk current source in the Maxwell equations.
Finally, a non-vanishing magnetization M (cf. the vertical cut through the system) corre-
sponds to the flow of circular currents in a plane perpendicular to M. For homogeneous M,
the currents mutually cancel in the bulk of the system, but a non-vanishing surface current
remains. Formally, this surface current density is represented by the curl of M (Exercise:
compute the curl of a vector field that is perpendicular to the cut plane and homogeneous
across the systems.), i.e. c∇ × M is a current source in the effective Maxwell equations.
4
Such a polarization may form spontaneously, or, more frequently, be induced externally, cf. the info block
below.
5
Notice that a current flow j = ∂t P is not in conflict with the condition of electro-neutrality.
2.1. MACROSCOPIC MAXWELL EQUATIONS 43
Figure 2.2: Cartoon illustrating the origin of (surface) charge and current densities forming in
a solid with non-vanishing polarization and/or magnetization.
Above, we have interpreted ∇·P, ∂t P and ∇×M as charge and current sources in the Maxwell
equations. But how do we know these sources? It stands to reason that the polarization, P,
and magnetization, M, of a bulk material will sensitively depend on the electric field, E, and the
magnetic field, B, present in the system. For in the absence of such fields, E = 0, B = 0, only
6
few materials will ’spontaneously’ build up a non–vanishing polarization, or magnetization.
Usually, it takes an external electric field, to orient the microscopic constituents of a solid so
as to add up to a bulk polarization. Similarly, a magnetic field is usually needed to generate
a bulk magnetization.
This means that the solution of the Maxwell equations poses us with a self consistency
problem: An external charge distribution, ρext immersed into a solid gives rise to an electric
field. This field in turn creates a bulk polarization, P[E], which in turn alters the charge
distribution, ρ → ρeff [E] ≡ ρext − ∇ · P[E]. (The notation indicates that the polarization
depends on E.) Our task then is to self consistently solve the equation ∇· E = 4πρeff [E], with
a charge distribution that depends on the sought for electric field E. Similarly, the current
distribution j = jeff [B, E] in a solid depends on the electric and the magnetic field which
means that ’Amperès law’, too, poses a self consistency problem.
In this section we re–interpret polarization and magnetization in a manner tailor made to
the self consistent solution of the Maxwell equations. We start by substituting Eq. (2.2) into
(2.5). Rearranging terms we obtain
∇ · (E + 4πP) = 4πρ.
6
Materials disobeying this rule are called ferroelectric (spontaneous formation of polarization), or fer-
romagnetic (spontaneous magnetization), respectively. In spite of their relative scarcity, ferroelectric and
–magnetic materials are of huge applied importance. Application fields include, storage and display technol-
ogy, sensor technology, general electronics, and, of course, magnetism.
44 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
D ≡ E + 4πP. (2.10)
∇ · D = 4πρ, (2.11)
(while the electric field has the spatial average of the full microscopic charge system as its
source, ∇ · E = 4πhρµ i.) Notice the slightly different perspectives at which Eqs. (2.5) and
(2.10) interpret the phenomenon of matter polarization. Eq. (2.5) places the emphasis on
the intrinsic charge density −4πhρµ i. The actual electric field, E, is caused by the sum of
external charges and polarization charges. In Eq. (2.10), we consider a fictitious field D, whose
sources are purely external. The field D differs from the actual electric field E by (4π×) the
polarization, P, building up in the system. Of course the two alternative interpretations of
polarization are physically equivalent. (Think about this point.)
The magnetization, M, can be treated in similar terms: Substituting Eq. (2.6) into (2.9),
we obtain
1 4π
∇ × (B − 4πM) − ∂t (E + 4πP) = j.
c c
The form of this equation motivates the definition of the magnetic field
H = B − 4πM. (2.12)
Expressed in terms of this quantity, the Maxwell equation assumes the form
1 4π
∇ × H − ∂t D = j, (2.13)
c c
where j is the external current density. According to this equation, H plays a role similar to D:
the sources of the magnetic field H are purely external. The field H differs from the induction
B (i.e. the field an magnetometer would actually measure in the system) by (−4π×) the
magnetization M. In conceptual analogy to the electric field, E, the induction B has the sum
of external and intrinsic charge and current densities as its source (cf. Eq. (2.9).)
To make progress with Eqs. (2.10) and (2.11), we need to say more about the polarization.
Above, we have anticipated that a finite polarization is usually induced by an electric field, i.e.
P = P[E]. Now, external electric fields triggering a non–vanishing polarization are ususally
much weaker than the intrinsic microscopic fields keeping the constituents of a solid together.
In practice, this means that the polarization may be approximated by a linear functional of the
2.1. MACROSCOPIC MAXWELL EQUATIONS 45
7
electric field. The most general form of a linear functional reads as
1
Z Z
3 0
P(x, t) = dx dt0 χ(x, t; x0 , t0 )E(x0 , t0 ), (2.14)
(2π)4
where the integral kernel χ is called the electric susceptibility of the medium and the factor
(2π)−4 has been introduced for convenience. The susceptibility χ(x, t; x0 , t0 ) describes how
8
an electric field amplitude at x0 and t0 < t affects the polarization at (x, t). Notice that the
polarization obtains by convolution of χ and E. In Fourier space, the equation assumes the
simpler form P(q) = χ(q)E(q), where q = (ω/c, q) as usual. Accordingly, the electric field
and the displacement field are connected by the linear relation
INFO On a microscopic level, at least two different mechanisms causing field–induced polar-
ization need to be distinguished: Firstly, a finite electric field may cause a distortion of the electron
clouds surrounding otherwise unpolarized atoms or molecules. The relative shift of the electron
clouds against the nuclear centers causes a finite molecular dipole moment which integrates to
a macroscopic polarization P. Second, in many substances (mostly of organic chemical origin),
the molecules (a) carry a permanent intrinsic dipole moment and (b) are free to move. A finite
electric field will cause spatial alignment of the otherwise dis–oriented molecules. This leads to a
polarization of the medium as a whole.
D = E,
As with the electric sector of the theory, the magnetic fields created by externally imposed
macroscopic current distributions (a) ‘magnetize’ solids, i.e. generate finite fields M, and (b)
are generally much weaker than the intrinsic microscopic fields inside a solid. This means that
we can write B = H + 4πM[H], where M[H] may be assumed to be linear in H. In analogy
to Eq. (2.14) we thus define
1
Z Z
3 0
M(x, t) = dx dt0 χm (x, t; x0 , t0 )H(x0 , t0 ), (2.16)
(2π)4
where the function χ is called the magnetic susceptibility of the system. The magnetic sus-
ceptibility describes how a magnetic field at (x0 , t0 ) causes magnetization at (x, t). Everything
we said above about the electric susceptibility applies in a similar manner to the magnetic
susceptibility. Specifically, we have the Fourier space relation M(q) = χm (q)H(q) and
where µ(q) is called the magnetic permeability. In cases where the momentum dependence
of the susceptibility is negligible, we obtain the simplified relation
B = µH, (2.18)
. The averaged charge densities and currents are sources of the electric displacement field
D and the magnetic field H, respectively.
. These fields are related to the electric field E and the magnetic induction B by Eqs. (2.15)
and (2.17), respectively.
. The actual fields one would measure in an experiment are E and B; these fields determine
the coupling (fields matter) as described by the Lorentz force law.
. Similarly, the (matter un–related) homogeneous Maxwell equations contain the fields E
and B.
2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 47
∇ · D = 4πρ,
∇ × E = 0,
D = E, (2.19)
where in the last equation we assumed a dielectric constant for simplicity. Now assume that
we wish to explore the fields in the vicinity of a boundary separating two regions containing
types of matter. In general, the dielectric constants characterizing the two domains will be
different, i.e. we have D = E in region #1 and D = 0 E in region #2. (For 0 = 1, we
describe the limiting case of a matter/vacuum interface.) Optionally, the system boundary
may carry a finite surface charge density η.
Arguing as in section ??, we find that the first and the second of the Maxwell equations
10
imply the boundary conditions
where the unprimed/primed quantities refer to the fields in regions #1 and #2, respectively,
and n(x) is a unit vector normal to the boundary at x. Thus,
the tangential component of the electric field continuous and the normal
component of the displacement field are continuous at the interface.
Since there are no charges present, the electric potential inside and outside the sphere
obeys the Laplace equation ∆φ = 0. Further, choosing polar coordinates such that the z–axis
is aligned with the orientation of the external field, we expect the potential to be azimuthally
symmetric, φ(r, θ, φ) = φ(r, θ). Expressed in terms of this potential, the boundary equations
(2.20) assume the form
∂r φ r=R−0
= ∂r φ r=R+0
, (2.21)
∂θ φ r=R−0
= ∂θ φ r=R+0
, (2.22)
φ r=R−0
= φ r=R+0
, (2.23)
where R±0 ≡ limη→0 R±η. To evaluate these conditions, we expand the potential inside and
outside the sphere in a series of spherical harmonics,
q(??). Due to the azimuthal symmetry of
2l+1
the problem, only φ–independent functions Yl,0 = 4π
Pl contribute, i.e.
Al rl
X , r < R,
φ(r, θ) = Pl (cos θ) × l −(l+1) ,
Bl r + Cl r , r ≥ R,
l
where we have absorbed the normalization factors (2l + 1/4π)1/2 in the expansion coefficients
Al , Bl , Cl .
At spatial infinity, the potential must approach the potential φ = −Dz = −Dr cos θ =
−DrP1 (cos θ) of the uniform external field D = Dez . Comparison with the series above
shows that B1 = −D and Bl6=1 = 0. To determine the as yet unknown coefficients Al and Cl ,
we consider the boundary conditions above. Specifically, Eq. (2.21) translates to the condition
X
Pl (cos θ) lAl Rl−1 + (l + 1)Cl R−(l+2) + Dδl,1 = 0.
l
Now, the Legendre polynomials are a complete set of functions, i.e. the vanishing of the l.h.s.
of the equation implies that all series coefficients must vanish individually:
C0 = 0,
A1 + 2R−3 C1 + D = 0,
lAl Rl−1 + (l + 1)Cl R−(l+2) = 0, l > 1.
l −(l+1)
P
A second set of equations is obtained from Eq. (2.22): ∂ θ l Pl (cos θ)((Al −Bl )R −Cl R )=
l −(l+1)
P
0 or l>0 Pl (cos θ)((Al − Bl )R − Cl R ) = const. The second condition implies (think
why!) (Al − Bl )Rl − Cl R−(l+1) = 0 for all l > 0. Substituting this condition into Eq. (2.23),
we finally obtain A0 = 0. To summarize, the expansion coefficients are determined by the set
of equations
A0 = 0,
(A1 + D)R − C1 R−2 = 0,
Al Rl − Cl R−(l+1) = 0, l > 1.
2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 49
D + +
+
+
+
+
+
+
+
+ + +
+ + + + + + + + + +
+ + + + + + + + + + + +
+ + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
η<0
+ + + + + + + + + + + + + + + + + + + + + + + +
η>0 +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + +
+ + + + + + + + + + + +
+ + + + + + + + + +
+ + + + + + + +
+ + + +
−1 3
C1 = DR3 , A1 = − D,
+2 +2
while Cl>1 = Al>1 = 0. The potential distorted by the presence of the sphere is thus given by
3
,r < R
+2
φ(r, θ) = −rE0 cos θ × −1 R
3 (2.24)
1 − , r ≥ R,
+2 r
3 3
. Inside the sphere the electric field E = −∇φ = ∇E0 z +2 = E0 +2 ez is parallel to the
external electric field, but weaker in magnitude (as long as > 1 which usually is the
case.) This indicates the buildup of a polarization P = −1
4π
E = 4π3 −1
E.
+2 0
Outside the sphere, the electric field is given by a superposition of the external field and
the field generated by a dipole potential φ = d · x/r3 = d cos θ/r2 , where the dipole
moment is given by d = V P and V = 4πR3 /3 is the volume of the sphere.
To understand the origin of this dipole moment, notice that in a polarizable medium, and
in the absence of external charges, 0 = ∇ · D = ∇ · E + 4π∇ · P, while the microscopic
charge density ∇ · E = 4πρmic need not necessarily vanish. In exceptional cases where
ρmic does not vanish upon averaging, we denote hρmic i = ρpol the polarization charge.
The polarization charge is determined by the condition
∇ · P = −ρpol .
However, arguing as in section ??, we find that the sphere (or any boundary between
two media with different dielectric constants) carries a surface polarization charge
ηind = Pn0 − Pn .
Specifically, in our problem, Pn0 = 0 so that η = −Pn = − 4π 3 −1
E cos θ. Qualitatively,
+2 0
the origin of this charge is easy enough to understand: while in the bulk of the sphere,
the induced dipole moments cancel each other upon spatial averaging (see the figure),
uncompensated charges remain at the surface of the medium. These surface charges
induce a finite and quite macroscopic dipole moment which is aligned opposite to E0
and acts as a source of the electric field (but not of the displacement field).
where ω0 is the characteristic frequency of the oscillator motion of the electron, γ a relaxation
constant, and the variation of the external field over the extent of the molecule has been
11
We emphasize dielectric function because, as shall be seen below, the magnetic permeability has a much
lesser impact on electromagnetic wave propagation.
52 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
neglected (cf. the discussion above.) Fourier transforming this equation, we obtain d ≡
ex(ω) = e2 E(ω)m−1 /(ω02 − ω 2 − iγω). The dipole moment d(ω) of a molecule
P with electrons
fi oscillating at frequency ωi and damping rate γi is given by d(ω) = i fi d(ω0 →ωi ,γ→γi ) .
Denoting the density of molecules in the by n, we thus obtain (cf. Eqs. (2.1), (2.10), and
(2.15))
4πne2 X fi
(ω) = 1 + 2
. (2.28)
m i
ωi − ω 2 − iγi ω
With suitable quantum mechanical definitions of the material constants fi , ωi and γi , the
model dielectric function (2.28) provides an accurate description of the molecular contri-
bution to the dielectric behaviour of solids. A schematic of the typical behaviour of real and
imaginary part of the dielectric function is shown in Fig. 2.4. A few remarks on the profile of
these functions:
. The material constant employed in the ‘static’ sections of this chapter is given by
= (0). The imaginary part of the zero frequency dielectric function is negligible.
2ω p
α(ω) = Im µ(ω) (2.29)
c
. For most frequencies, i.e. everywhere except for the immediate vicinity of a resonant
d
frequency ωi , the absorption coefficient is negligibly small. In these regions, dω Re (ω) >
0. The positivity of this derivative defines a region of normal dispersion.
Im
Re
γi
ωi ω
Figure 2.4: Schematic of the functional profile of the function (ω). Each resonance frequency
ωi is center of a peak in the imaginary part of and of a resonant swing of in the real part.
The width of these structures is determined by the damping rate γi .
ω i
k = Re k + i Im k = n(ω) + α(ω),
c 2
where n and α are the refraction and absorption coefficient, respectively. To better understand
the meaning of the absorption coefficient, let us compute the Poynting vector of the electro-
magnetic wave. As in section 1.4, the Maxwell equation 0 = ∇ · D = ∇ · E ∝ k · E0 implies
transversality of the electric field. (Here we rely on the approximate x–independence of the
dielectric function.) Choosing coordinates such that n = e3 and assuming planar polarization
for simplicity, we set E0 = E0 e1 with a real coefficient E0 . The — physically relevant — real
part of the electric field is thus given by
Re E = E0 e1 cos(ω(zn/c − t))e−αz/2 .
c cE02 2
h... it cE0 −αz
|S| = |E × H| = cos2 (ω(zn/c − t))e−αz −→ e ,
4π 4π 8π
where in the last step we have averaged over several intervals of the oscillation period 2π/ω.
According to this result,
2.3.3 Dispersion
In the previous section we have focused on the imaginary part of the dielectric function and
on its attenuating impact on individual plane waves propagating in matter. We next turn
to the discussion of the — equally important — role played by the real part. To introduce
2.3. WAVE PROPAGATION IN MEDIA 55
the relevant physical principles in a maximally simple environment, we will focus on a one–
dimensional model of wave propagation throughout. Consider, thus, the one–dimensional
variant of the wave equations (2.31),
ω2
2
k − 2 ψ(k, ω) = 0, (2.33)
c (ω)
p
where k is a one–dimensional wave ’vector’ and c(ω) = c/ (ω) as before. (For simplicity,
we set µ = 1 from the outset.) Plane wave solutions to this equation are given by ψ(x, t) =
12
ψ0 exp(ik(ω)x − iωt), where k(ω) = ω/c(ω). Notice that an alternative representation of
the plane wave configurations reads as ψ(x, t) = ψ0 exp(ikx − iω(k)t), where ω(k) is defined
implicitly, viz. as a solution of the equation ω/c(ω) = k.
To understand the consequences of frequency dependent variations in the real part of the
dielectric function, we need, however, to go beyond the level of isolated plane wave. Rather,
let us consider a superposition of plane waves (cf. Eq. (1.42)),
Z
ψ(x, t) = dk ψ0 (k)e−i(kx−ω(k)t) , (2.34)
where ψ0 (k) is an arbitrary function. (Exercise: check that this superposition solves the wave
equations.)
Specifically, let us chose the function ψ0 (k) such that, at initial time t = 0, the distribution
ψ(x, t = 0) is localized in a finite region in space. Configurations of this type are called
wave packets. While there is a lot of freedom in choosing such spatially localizing envelope
functions, a convenient (and for our purposes sufficiently general) choice of the weight function
ψ0 is a Gaussian,
2
2ξ
ψ0 (k) = ψ0 exp −(k − k0 ) ,
4
where ψ0 ∈ C is a amplitude coefficient fixed and ξ a coefficient of dimensionality ‘length’
that determines the spatial extent of the wave package (at time t = 0.) To check that latter
assertion, let us compute the spatial profile ψ(x, 0) of the wave package at the initial time:
Z 2
Z 2
−(k−k0 )2 ξ4 −ikx 2ξ
ψ(x, 0) = ψ0 dk e = ψ0 eik0 x
dk e−k 4 −ikx =
√
Z 2
2ξ 2 2
= ψ0 eik0 x
dk e−k 4 −(x/ξ) = 2 πψ0 ξ eik0 x e−(x/ξ) .
vg t
Figure 2.5: Left: Dispersive propagation of an initially sharply focused wave package. Right:
A more shallow wave package suffers less drastically from dispersive deformation.
. The center of the wave package moves with a characteristic velocity vg = ∂k ω(k0 ) to
the right. In as much as vg determines the effective center velocity of a superposition of
a continuum or ‘group’ of plain waves, vg is called the group velocity of the system.
Only in vacuum where c(ω) = c is independent of frequency we have ω = ck and the
group velocity vg = ∂k ω = c coincides with the vacuum velocity of light.
A customary yet far less useful notion is that of a phase velocity: an individual plane
wave behaves as ∼ exp(i(kx − ω(k)t)) = exp(ik(x − (ω(k)/k)t)). In view of the
structure of the exponent, one may call vp ≡ ω(k)/k the ‘velocity’ of the wave. Since,
however, the wave extends over the entire system anyway, the phase velocity is not of
direct physical relevance. Notice that the phase velocity vp = ω(k)/k = c(k) = c/n(ω),
where the refraction coefficient has been introduced in (2.30). For most frequencies and
in almost all substances, n > 1 ⇒ vp < c. In principle, however, the phase velocity
2.3. WAVE PROPAGATION IN MEDIA 57
13
may become larger than the speed of light. The group velocity vg = ∂k ω(k) =
(∂ω k(ω)) = (∂ω ωn(ω)) c = (n(ω) + ω∂ω n(ω))−1 c = vp (1 + n−1 ω∂ω n)−1 . In
−1 −1
regions of normal dispersion, ∂ω n > 0 implying that vg < vp . For the discussion of
anomalous cases, where vg may exceed the phase velocity or even the vacuum velocity
of light, see [1].
. The width of the wave package, ∆x changes in time. Inspection of the Gaussian fac-
2 −2 ∗−2
tor controlling the envelope of the packet shows that Re ψ ∼ e(x−vg t) (ξ +ξ )/2 , where
the symbol ’∼’ indicates that only the exponential dependence of the package is taken
into account. Thus, the width of the package is given by ξ(t) = ξ(1 + (2at/ξ 2 )2 )1/2 .
According to this formula, the rate at which ξ(t) increases (the wave package flows
apart) is the larger, the sharper the spatial concentration of the initial wave package
was (cf. Fig. 2.5). For large times, t ξ 2 /a, the width of the wave package increases
as ∆x ∼ t. The disintegration of the wave package is caused by a phenomenon called
dispersion: The form of the package at t = 0 is obtained by (Fourier) superposition
of a large number of plane waves. If all these plane waves propagated with the same
phase velocity (i.e. if ω(k) was k–independent), the form of the wave package would
remain un–changed. However, due to the fact that in a medium plane waves of different
wave number k generally propagate at different velocities, the Fourier spectrum of the
package at non–zero times differs from that at time zero. Which in turn means that the
integrity of the form of the wave package gets gradually lost.
The dispersive spreading of electromagnetic wave packages is a phenomenon of immense
applied importance. For example, dispersive deformation is one of the main factors limiting
the information load that can be pushed through fiber optical cables. (For too high loads,
the ’wave packets’ constituting the bit stream through such fibers begin to overlap thus loosing
their identity. The construction of ever more sophisticated counter measures optimizing the
data capacity of optical fibers represents a major stream of applied research.
13
However, there is no need to worry that such anomalies conflict with Einsteins principle of relativity to be
discussed in chapter 3 below: the dynamics of a uniform wave train is not linked to the transport of energy
or other observable physical quantities, i.e. there is no actual physical entity that is transported at a velocity
larger than the speed of light.
58 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
where σ = {σij (x, t)} is called the conductivity tensor. This equation states that a field may
cause current flow at different spatial locations in time. Equally, a field may cause current flow
in directions different from the field vector. (For example, in a system subject to a magnetic
field in z–direction, an electric field in x–direction will cause current flow in both x– and
y–direction, why?) As with the electric susceptibility, the non–local space dependence of the
conductivity may often be neglected, σ(x, t) ∝ δ(x), or σ(q, ω) = σ(ω) in Fourier space.
However, except for very low frequencies, the ω–dependence is generally important. Generally,
one calls the finite–frequency conductivity ’AC conductivity’, where ’AC’ stands for ’alternating
current’. For ω → 0, the conductivity crosses over to the ’DC conductivity’, where ’DC’ stands
for ’directed current’. In the DC limit, and for an isotropic medium, the field–current relation
assumes the form of Ohm’s law
j = σE. (2.36)
In the following, we wish to understand how the phenomenon of electrical conduction can be
explained from our previous considerations.
There are two different ways to specify the dynamical response of a metal to external
fields: We may either postulate a current–field relation in the spirit of Ohms law. Or we
may determine a dielectric function which (in a more microscopic way) describes the response
of the mobile charge carriers in the system to the presence of a field. However, since that
response generally implies current flow, the simultaneous speccificiation of both a frequency
14
dependent dielectric function and a current–field relation would over–determine the problem.
In the following, we discuss the two alternatives in more detail.
1
∇ × H(x, ω) = (−iω + 4πσ) E(x, ω), (2.37)
c
For the moment, we leave this result as it is and turn to
metal mey be thought of as charge carriers whose confining potential is infinitely weak, ωi = 0,
so that their motion is not tied to a reference coordinate. Still, the conduction electrons will
be subject to friction mechanisms. (E.g., scattering off atomic imperfections will impede their
ballistic motion through the solid.) We thus model the dielectric function of a metal as
4πne e2
iω(ω) 1
∇ × H(x, ω) = − E(x, ω) = −iω + E(x, ω), (2.39)
c c m(−iω + τ1 )
ne e 2
σ(ω) = , (2.40)
m(−iω + τ1 )
i.e. a formula for the AC conductivity in terms of electron density and mass, and the impurity
collision rate.
Eq. (2.40) affords a very intuitive physical interpretations:
15
... alluding to the fact that the attenuation of the electrons is due to collisions of impurities which take
place at a rate denoted by τ .
60 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS
Chapter 3
Relativistic invariance
3.1 Introduction
Consult an entry level physics textbook to re–familiarize yourself with the basic notions of
special relativity!
t0 = t
x0 = x − vt.
(xi are particle coordinates in system K and Vij is a pair potential which, crucially, depends
only on the differences between particle coordinates but not on any distinguished reference
point in universe.) we find that
d2 x0i X
K0 : 02
= −∇x 0 Vij (x0i − x0j ),
dt i
j
i.e. the equations remain form invariant. This is what is meant when we say that classical
1
mechanics is Galilei invariant.
1
More precisely, Galilei invariance implies invariance under all transformations of the Galilei group, the
uniform motions above, rotations of space, and uniform translations of space and time.
61
62 CHAPTER 3. RELATIVISTIC INVARIANCE
Now, suppose a certain physical phenomenon in K (the dynamics of a water surface, say)
is effectively described by a wave equation,
1 2
K: ∆ − 2 ∂t f (x, t) = 0.
c
Substitution of the Galilei transformation into this equation obtains
0 1 2 2 1
K : ∆ − 2 ∂t0 − 2 v · ∇ ∂t0 − 2 (v · ∇ ) f (x0 , t0 ) = 0,
0 0 0 2
c c c
i.e. an equation of modified form. At first sight this result may look worrisome. After all,
many waves (water waves, sound waves, etc.) are of pure mechanical origin which means that
wave propagation ought to be a Galilei invariant phenomenon. The resolution to this problem
lies with the fact, that matter waves generally propagate in a host medium (water, say.) While
this medium is at rest in K, it isn’t in K 0 . But a wave propagating in a non–stationary medium
must be controlled by a different equation of motion, i.e. there is no a priori contradiction
to the principle of Galilei invariance. Equivalently, one may observe that a wave emitted by a
stationary point source in K will propagate with velocity c (in K.) However, an observer in
K 0 will observe wave fronts propagating at different velocities. Mathematically, this distortion
of the wave pattern is described by K 0 ’s ‘wave equation’.
But what about electromagnetic waves? So far, we have never talked about a ‘host
medium’ supporting the propagation of electromagnetic radiation. The lack of Galilean invari-
ance of Maxwells equations then leaves us with three possibilities:
1. Nature is Galilean invariant, but Maxwells equations are incorrect (or, to put it more
mildly, incomplete.)
2. An electromagnetic host medium (let’s call it the ether) does, indeed, exist.
Figure 3.1: Measurement of Newtons law of gravitational attraction (any other law would be
just as good) in two frames that are relatively inertial. The equations describing the force
between masses will be the same in both frames
physics. Towards the end of the nineteenth century, experimental evidence was mounting that
the vacuum velocity of light was, independent of the observation frame, given by c; the idea
of an ether became increasingly difficult to sustain.
. The postulate of the constancy of the speed of light: the speed of light is inde-
pendent of the motion of its source.
What makes this postulate — which superficially might seem to be no more than an
2
innocent tribute to experimental observation — so revolutionary is that it abandons the
2
... although the constancy of the speed of light had not yet been fully established experimentally when
Einstein made his postulate.
64 CHAPTER 3. RELATIVISTIC INVARIANCE
notion of ‘absolute time’. For example, two events that are simultaneous in one inertial
frame will in general no longer be simultaneous in other inertial frames. While, more
than one hundred years later, we have grown used to its implications the revolutionary
character of Einsteins second postulate cannot be exaggerated.
another frame.) But how can we tell, whether a frame is inertial or not? And what, for that
matter, does the attribute ’inertial’ mean, if no transformation is involved?
Newton approached this question by proposing the concept of an ’absolute space’. Systems at rest
in this space, or at constant motion relative to it, were ’inertial’. For practical purposes, it was
proposed that a reference system tied to the position of the fixed stars was (a good approximation
of) an absolute system. Nowadays, we know that these concepts are problematic. The fixed stars
aren’t fixed (nor for that matter is any known constituent of the universe), i.e. we are not aware
of a suitable ’anchor’ reference system. What is more, the declaration of an absolute frame is at
odds with the ideas of relativity.
In the late nineteenth century, the above definition of inertial frames was superseded by a purely
operational one. A modern formulation of the alternative definition might read:
The meaning of this definition is best exposed by considering its negative: imagine an observer
tied to a rotating disc. A particle thrown by that observer will not move at a straight line (relative
to the disc.) If the observer does not know that he/she is on a rotating frame, he/she will have to
attribute bending of the trajectory to some ’fictitious’ forces. The presence of such forces is proof
that the disc system is not inertial.
However, in most physical contexts questions concerning the ’absolute’ status of a system do not
5
arise. Far more frequently, we care about what happens as we pass from one frame (laboratory,
say) to another (that of a particle moving relatively to the lab.) Thus, the truly important issue is
whether systems are relatively inertial or not.
What conditions will the transformation matrix Λ have to fulfill in order to qualify as a trans-
formation between inertial frames? Referring for a rigorous discussion of this question to
Ref. [2], we here merely note that symmetry conditions implied by the principle of relativ-
ity nearly but not completely specify the class of permissible transformations. This class of
transformations includes the Gallilei transformations which we saw are incompatible with the
physics of electromagnetic wave propagation.
At this point, Einsteins second postulate enters the game. Consider two inertial frames
K and K 0 related by a homogeneous coordinate transformation. At time t = 0 the origins of
the two systems coalesce. Let us assume, however, the K 0 moves relatively to K at constant
velocity v. Now consider the event: at time t = 0 a light source at x = 0 emits a signal. In
K, the wave fronts of the light signal then trace out world lines propagating at the vacuum
speed of light, i.e. the spatial coordinates x of a wave front obey the relation
x2 − c2 t2 = 0.
The crucial point now is that in spite of its motion relative to the point of emanation of the
light signal, an observer in K 0 , too, will observe a light signal moving with velocity c in all
directions. (Notice the difference to, say, a sound wave. From K 0 ’s point of view, fronts
moving along v would be slowed down while those moving in the opposite direction would
5
Exceptions include high precision measurements of gravitational forces.
66 CHAPTER 3. RELATIVISTIC INVARIANCE
0
propagate at higher speed, vsound = vsound − v.) This means that the light front coordinates
0
in x obey the same relation
x02 − c2 t02 = 0.
This additional condition unambiguously fixes the set of permissible transformations. To
formulate it in the language of linear coordinate transformations (which, in view of the linearity
of transformations between inertial frames is the adequate one), let us define the matrix
1
−1
g = {gµν } ≡ , (3.2)
−1
−1
where all non–diagonal entries are zero. The constancy of the speed of light may then be
expressed in concise form as xT gx = 0 ⇒ x0T gx0 = 0. This condition suggests to focus on
6
coordinate transformations for which the bilinear form xt gx is conserved,
ΛT gΛ = g (3.4)
Linear coordinate transformations satisfying this condition are called Lorentz transforma-
tions. We thus postulate that
All laws of nature must be invariant under affine coordinate transformations (3.1)
where the homogeneous part of the transformation obeys the Lorentz condition (3.4).
The statement above characterizes the set of legitimate transformations in implicit terms. In
the next section, we aim for a more explicit description of the Lorentz transformation. These
results will form the basis for our discussion of Lorentz invariant electrodynamics further down.
ct
space like
Figure 3.2: Schematic of the decomposition of space time into a time like region and a space
like region. The two regions are separated by a ‘light cone’ of light–like vectors. (A ’cone’
because in space dimensions d > 1 it acquires a conical structure.) The forward/backward
part of the light cone extends to positive/negative times
3.2.1 Background
The bilinear form introduced above defines a scalar product of R4 :
g : R4 × R4 → R, (3.5)
(x, y) 7→ xT gy ≡ xµ gµν y ν . (3.6)
A crucial feature of this scalar product is that it is not positive definite. For reasons to be
discussed below, we call vectors whose norm is positive, xT gx = (ct)2 − x2 > 0, time like
vectors. Vectors with negative norm will be called space like and vectors of vanishing norm
light like.
We are interested in Lorentz transformations, i.e. linear transformations Λ : R4 → R4 , x 7→
Λx that leave the scalar product g invariant, ΛT gΛ = g. (Similarly to, say, the orthogonal
transformations which leave the standard scalar product invariant.) Before exploring the struc-
ture of these transformations in detail, let us observe a number of general properties. First
notice that the set of Lorentz transformations forms a group, the Lorentz group, L. For if
Λ and Λ0 are Lorentz transformations, so is the composition ΛΛ0 . The identity transformation
obviously obeys the Lorentz condition. Finally, the equation ΛT gΛ = g implies that Λ is
non–singular (why?), i.e. it possesses an inverse, Λ−1 . We have thus shown that the Lorentz
transformations form a group.
det(Λ) ∈ {−1, 1}. (Λ is a real matrix, i.e. det(Λ) ∈ R.) We may further observe that the
absolute value of the component Λ00 is always larger than unity. This is shown
P by inspection
T 2 2
of the 00–element of the invariance relation: 1 = g00 = (Λ gΛ)00 = Λ00 − i Λ0i . Since the
subtracted term is positive (or zero), we have |Λ00 | ≥ 1. We thus conclude that L contains
four components defined by
Transformations belonging to any one of these subsets cannot be continuously deformed into a
transformation of a different subset (why?), i.e. the subsets are truly disjoint. In the following,
we introduce a few distinguished transformations belonging to the different components of the
Lorentz group.
Space reflection, or parity, P : (ct, x) → (ct, −x) is a Lorentz transformation. It belongs
to the component L↑− . Time inversion, T : (ct, x) → (−ct, x) belongs to the component
L↓− . The product of space reflection and time inversion, PT : (ct, x) → (−ct, −x) belongs
to the component L↓+ . Generally, the product of two transformations belonging to a given
component no longer belongs to that component, i.e. the subsets do not for subgroups of
the Lorentz group. However, the component L↑+ is exceptional in that it is a subgroup. It is
called the proper orthochrome Lorentz group, or restricted Lorentz group. The attribute
’proper’ indicates that it does not contain excpetional transformations (such as parity and time
inversion) which cannot be continuosly deformed back to the identity transform. It is called
‘orthochronous’ because it maps the positive light cone into it self, i.e. it respects the ‘direction’
of time. To see that L↑+ is a group, notice that it contains the identity transformation. Further,
its elements can be continuously contracted to unity; however, if this property holds for two
elements Λ, Λ0 ∈ L↑+ , then so for the product ΛΛ0 , i.e. the product is again in L↑+ .
where R is a three–dimensional rotation matrix trivially fall into this subgroup. However, it
also contains the second large family of homogeneous transformations between inertial frames,
uniform motions.
3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 69
x3
Consider two inertial frames K and K 0 whose origins K x03 K0
!
X X X
ATv σ3 Av ' (1 + λi TiT )σ3 (1 + λi Ti ) ' σ3 + λi (TiT σ3 + σ3 Ti ) = σ3 .
i i i
7
To actually prove this, notice that from the point of view of an observer in K 0 we consider a special trans-
formation with velocity −ve01 . Then apply symmetry arguments (notably the condition of parity invariance; a
mirrored universe should not differ in an observable manner in its transformation behaviour from our one.) to
show that any change in these coordinates would lead to a contradiction.
70 CHAPTER 3. RELATIVISTIC INVARIANCE
This relation must hold regardless of the value of λi , i.e. the matrices Ti must obey the
condition TiT σ3 = −σ3 Ti . Upto normalization, there is only one matrix that meets this
0 1
condition, viz Ti = σ1 ≡ . The most general form of the matrix Av is thus given by
1 0
0 1 cosh λ sinh λ
Av = exp λ = ,
1 0 sinh λ cosh λ
where we denoted the single remaining parameter by λ and in the second equality expanded
the exponential in a power series. We now must determine the parameter λ = λ(v) in such
a way that it actually describes our v–dependent transformation. We first substitute Av into
the transformation law x0 = Λv x, or
00 0
x x
10 = Av .
x x1
Now, the origin of K 0 (having coordinate x10 = 0) is at coordinate x = vt. Substituting this
condition into the equation above, we obtain
tanh λ = −v/c.
EXERCISE Repeat the argument above for the full transformation group, i.e. not just the set
of special Lorentz transformations along a certain axis. To this end, introduce an exponential
P
representation Λ = exp( i λi Ti ) for general elements of the restricted Lorentz group. Use the
8
defining condition of the group to identify six linearly independent matrix generators. Among
the matrices Ti , identify three as the generators Ji of the spatial rotation group, Ti = Ji . Three
others generate special Lorentz transformations along the three coordinate axes of K. (Linear
combinations of these generators can be used to describe arbitrary special Lorentz transformations.)
. In the limit of small β, the transformation Λve1 asymptotes to the Galilei transformation,
t0 = t and x01 = x1 − vt.
. We repeat that a general special Lorentz transformation can be obtained from the
transformation along e1 by a space–like rotation, Λv = ΛR Λve1 ΛR−1 , where R is a
rotation mapping e1 onto the unit vector in v–direction.
. Without proof (However, the proof is by and large implied by the exercise above.) we
mention that a general restricted Lorentz transformation can always be represented as
Λ = ΛR Λv ,
Thus, the coordinate transformation between the two frames assumes the general affine form
x = Lx0 + a. In K, the event takes place at some (as yet undetermined) time dt later than t.
It’s spatial coordinates will be x(t) + vdt, where v = dt x is the instantaneous velocity of the
moving observer. We thus have the identification (cdτ, 0) ↔ (c(t + dt), x + vdt). Comparing
this with the general form of the coordinate transformation, we realize that (cdτ, 0) and
(cdt, vdt) are related by a homogeneous Lorentz transformation. This implies, in particular,
that (cdτ )2 = (cdt)2 − (vdt)2 , or
p
dτ = dt 1 − v2 /c2 . (3.10)
This equation expresses the famous phenomenon of the relativity of time: For a moving ob-
server time proceeds slower than for an observer at rest. The time measured by the propagating
watch, τ , is called proper time. The attribute ‘proper’ indicates that τ is defined with refer-
ence to a coordinate system (the instantaneous rest frame) that can be canonically defined.
I.e. while the time t associated to a given point on the world line x(t) = (ct, x(t)) will change
from system to system (as we just saw), the proper time remains the same; the proper time can
be used to parameterize the space–time curve in an invariant manner as x(τ ) = (ct(τ ), x(τ )).
. In the rest frame of a particle subject to mechanical forces, or in frames moving at low
velocity v/c 1 relatively to the rest frame, the equations must asymptote to Newtons
equations.
Newton’s equations in the rest frame K are of course given by
d2
m x = f,
dτ 2
where m is the particle’s mass (in its rest frame — as we shall see, the mass is not an invariant
quantity. We thus better speak of a particle rest mass), and f is the force. As always in
3.3. ASPECTS OF RELATIVISTIC DYNAMICS 73
where E is some characteristic energy. (In the instantaneous rest frame of the particle,
E = mc2 and p = 0.)
Expressed in terms of the four–momentum, the Newton equation assumes the simple form
dτ p = f. (3.11)
Let us explore what happens to the four momentum as we boost the particle from its rest
frame to a frame moving with velocity vector v = ve1 . Qualitatively, one will expect that
in the moving frame, the particle does carry a finite space–like momentum (which will be
proportional to the velocity of the boost, v). We should also expect a renormalization of its
energy (from the point of view of the moving frame, the particle has acquired kinetic energy).
Focusing on the 0 and 1 component of the momentum (the others won’t change), Eq.
(3.9) implies 2 0
mc /c E /c γmc2 /c
→ ≡ .
0 p01 −(γm)v
This equation may be interpreted in two different ways. Substituting the explicit formula
E = mc2 , we find E 0 = (γm)c2 and p01 = −(γm)v. In Newtonian mechanics, we would have
expected that a particle at rest in K carries momentum p01 = −mv in K 0 . The difference to
the relativistic result is the renormalization of the mass factor: A particle that has mass m in
its rest frame appears to carry a velocity dependent mass
m
m(v) = γm = p
1 − (v/c)2
in K 0 . For v → c, it becomes infinitely heavy and its further acceleration will become progres-
sively more difficult:
Particles of finite rest frame mass cannot move faster than with the speed of light.
The energy in K 0 is given by (mγ)c2 , and is again affected by mass renormalization. However,
a more useful representation is obtained by noting that pT gp is a conserved quantity. In
74 CHAPTER 3. RELATIVISTIC INVARIANCE
This relativistic energy–momentum relation determines the relation between energy and
momentum of a free particle.
It most important implication is that even a free particle at rest carries energy E = mc2 ,
Einsteins world–famous result. However, this energy is usually not observable (unless it gets
released in nuclear processes of mass fragmentation, with all the known consequences.) What
we do observe in daily life are the changes in energy as we observe the particle in different
frames. This suggests to define the kinetic energy of a particle by
p
T ≡ E − mc2 = (mc2 )2 + (pc)2 − mc2 ,
For velocities v c, the kineatic energy T ' p2 /2m indeed reduces to the familiar non–
relativistic result. Deviations from this relation become sizeable at velocities comparable to
the speed of light.
d
p = f,
dτ
where the force acting on the particle is given by
To complete our derivation of the Newton equation in K, we need to identify the zeroth
component of the force, f0 . The zeroth component of the left hand side of the Newton
equation is given by (γ/c)dt E, i.e. (γ/c)× the rate at which the kinetic energy of the particle
changes. However, we have seen before that this rate of potential energy change is given by
dt U = −f · v = −qE · v. Energy conservation implies that this energy gets converted into
the kinetic energy, dt T = dt E = +qE · v. This implies the identification f0 = (γq/c)E · v.
The generalized form of Newtons equations is thus given by
γc q E · (γv)
mdτ = .
γv c γcE + (γv) × B
(in terms of which the four–momentum is given by p = mv, i.e. by multiplication by the rest
mass.) The equations of motion can then be expressed as
mdτ v µ = F µν vν , (3.13)
where we used the index raising and lowering convention originally introduced in chapter ??,
i.e. v 0 = v0 and v i = −vi , and the matrix F is defined by
0 −E1 −E2 −E3
E 1 0 −B3 B2
F = {F µν } = . (3.14)
E2 B3 0 −B1
E3 −B2 B1 0
The significance of Eq. (3.13) goes much beyond that of a mere reformulation of Newton’s
equations: The electromagnetic field enters the equation through a matrix, the so–called
field strength tensor. This signals that the transformation behaviour of the electromagnetic
field will be different from that of vectors. Throughout much of the course, we treated the
electric field as if it was a (space–like) vector. Naively, one might have expected that in the
theory of relativity, this field gets augmented by a fourth component to become a four–vector.
However, a moments thought shows that this picture cannot be correct. Consider, say, a
charged particle at rest in K. This particle will create a Coulomb electric field. However, in
a frame K 0 moving relative to K, the charge will be in motion. Thus, an observer in K 0 will
see an electric current plus the corresponding magnetic field. This tells us that under inertial
transformations, electric and magnetic fields get transformed into one another. We thus need
to find a relativistically invariant object accommodating the six components of the electric and
magnetic field ’vectors’. Eq. (3.14) provides a tentative answer to that problem. The idea is
that the fields enter the theory as a matrix and, therefore, transform in a way fundamentally
different from that of vectors.
76 CHAPTER 3. RELATIVISTIC INVARIANCE
Let x = {xµ } be a four component object that transforms under homogeneous Lorentz trans-
formations as x → Λx or xµ → Λµν xν in components. (By Λµν we denote the components
of the Lorentz transformation matrix. For the up/down arrangement of indices, see the disuc-
ssion below.) Quantities of this transformation behaviour are called contravariant vectors
(or contravariant tensors of first rank). Now, let us define another four–component object,
xµ ≡ gµν xν . Under a Lorentz transformation, xµ → gµν Λν ρ xρ = (ΛT −1 )µν xν . Quantities
transforming in this way will be denoted as covariant vectors (or covariant tensors of first
rank). Defining g µν to be the inverse of the Lorentz metric (in a basis where g is diagonal,
g = g −1 ), the contravariant anchestor of xµ may be obtained back by ‘raising the indices’ as
xµ = g µν xν .
Critical readers may find these ’definitions’ unsatisfactory. Formulated in a given basis,
one may actually wonder whether they are definitions at all. For a discussion of the basis
invariant meaning behind the notions of co- and contravariance, we refer to the info block
below. However, we here proceed in a pragmatic way and continue to explore the consequences
of the definitions (which, in fact, they are) above.
INFO Consider a general real vector space V . Recall that the dual space V ∗ is the linear space of
all mappings ṽ : V → R, v 7→ ṽ(v). (This space is a vector space by itself which is why we denote
its elements by vector–like symbols, ṽ.) For a given basis {eµ } of V , P
a dual basis {ẽµ } of ∗
µ
PV may
be defined by the condition ẽµ (eν ) = δµν . With the expansions v = µ v eµ and ṽ = µ ṽµ ẽµ ,
respectively, we have ṽ(v) = ṽµ v µ . (Notice that components of objects in dual space will be
indexed by subscripts throughout.) In the literature of relativity, elements of the vector space —
then to be identified with space–time, see below — are usually called contravariant vectors while
their partners in dual space are called covariant vectors.
EXERCISE Let A : V → V be a linear map defining a basis change in V , i.e. eµ = Aµν e0ν ,
where {Aµν } are the matrix elements of A. Show that
. The components of a contravariant vector v transform as v µ → v 0µ = (AT )µν v ν .
. The components of a covariant vector, w̃ transform as w̃µ → w̃µ0 = (A−1 )µν w̃ν .
. The action of w̃ on v remains invariant, i.e. w̃(v) = w̃µ v µ = w̃µ0 v 0ν does not change.
(Of course, the action of the linear map w̃ on vectors must not depend on the choice of a
particular basis.)
3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO– AND CONTRAVARIANCE 77
may be obtained as ṽµ = ṽ(eµ ) = hv, eµ i = v ν gνµ = gµν v ν , where in the last step we used the
symmetry of g. With the so–called index lowering convention
vµ = gµν v ν , (3.15)
The definitions above may be extended to objects of higher complexity. A two component
quantity W µν is called a contravariant tensor of second degree if it transforms under
0 0
Lorentz transformations as W µν → Λµµ0 Λν ν 0 W µ ν . Similiarly, a covariant tensor of second
0 0
degree transforms as Wµν → (ΛT −1 )µµ (ΛT −1 )ν ν Wµ0 ν 0 . Covariant and contravariant tensors
0 0
are related to each other by index raising/lowering, e.g. Wµν = gµµ0 gνν 0 W µ ν . A mixed
0 0
second rank tensor Wµ ν transforms as Wµ ν → (ΛT −1 )µµ Λν ν 0 Wµ0ν . The generalization of
these definitions to tensors of higher degree should be obvious.
Finally, we define a contravariant vector field (as opposed to a fixed vector) as a field
vµ (x) that transforms under Lorentz transformations as v µ (x) → v 0µ (x0 ) = Λµν v ν (x). A
Lorentz scalar (field), or tensor of degree zero is one that does not actively transform under
Lorentz transformations: φ(x) → φ0 (x0 ) = φ(x). Covariant vector fields, tensor fields, etc.
are defined in an analogous manner.
78 CHAPTER 3. RELATIVISTIC INVARIANCE
The summation over a contravariant and a covariant index is called a contraction. For
example, v µ wµ ≡ φ is the contraction of a contravariant and a covariant vector (field). As
a result, we obtain a Lorentz scalar, i.e. an object that does not change under Lorentz
transformations. The contraction of two indices lowers the tensor degree of an object by two.
For example the contraction of a mixed tensor of degree two and a contravariant tensor of
degree one, Aµν v ν ≡ wµ obtains a contravariant tensor of degree one, etc.
F µν = ∂ µ Aν − ∂ ν Aµ . (3.16)
(Just work out the antisymmetric combination of derivatives and compare to the definitions
B = ∇ × A, E = −∇φ − c−1 ∂t A.) This implies that the four–component object Aµ indeed
transforms as a contravariant vector.
Now, we have seen in chapter 1.2 that the Lorentz condition can be expressed as ∂µ Aµ =
0, i.e. in a Lorentz invariant form. In the Lorentz gauge, the inhomogeneous wave equations
assume the form
4π µ
Aµ = j , (3.17)
c
where = ∂µ ∂ µ is the Lorentz invariant wave operator. Formally, this completes the proof
of the Lorentz invariance of the theory. The combination Lorentz–condition/wave equations,
which we saw carries the same information as the Maxwell equations, has been proven to be
invariant. However, to stay in closer contact to the original formulation of the theory, we next
express Maxwells equations themselves in a manifestly invariant form.
4π µ
∂µ F µν = j . (3.18)
c
To obtain the covariant formulation of the homogeneous equations a bit of preparatory work
is required: Let us define the fourth rank antisymmetric tensor as
1, (µ, ν, ρ, σ) = (0, 1, 2, 3)or an even permutation
µνρσ ≡ −1 for an odd permutation (3.19)
0 else.
One may show (do it!) that µνρσ transforms as a contravariant tensor of rank four (under
the transformations of the unit–determinant subgroup L+ .) Contracting this object with the
covariant field strength tensor Fµν , we obtain a contravariant tensor of rank two,
1
F µν ≡ µνρσ Fρσ ,
2
80 CHAPTER 3. RELATIVISTIC INVARIANCE
known as the dual field strength tensor. Using (3.20), it is straightforward to verify that
0 −B1 −B2 −B3
B1 0 E3 −E2
F = {F µν } =
B2 −E3
, (3.20)
0 E1
B3 E2 −E1 0
i.e. that F is obtained from F by replacement E → B, B → −E. One also verifies that the
homogeneous equations assume the covariant form
∂µ F µν = 0. (3.21)
This completes the invariance proof. Critical readers may find the definition of the dual tensor
somewhat un–motivated. Also, the excessive use of indices — something one normally tries
to avoid in physics — does not help to make the structure of the theory transparent. Indeed,
there exists a much more satisfactory, coordinate independent formulation of the theory in
terms of differential forms. However, as we do not assume familiarity of the reader with the
theory of forms, all we can do is encourage him/her to learn it and to advance to the truly
invariant formulation of electrodynamics not discussed in this text ...
Chapter 4
Lagrangian mechanics
81
82 CHAPTER 4. LAGRANGIAN MECHANICS
INFO There are a number of conceptualy different types of mechanical constraints: con-
straints that can be expressed in terms of equalities such as f (q1 , . . . , qn ; q̇1 , . . . , q̇n ) = 0 are
called holonomic. Here, q1 , . . . , qn are the basal coordinates of the problem, and q̇1 , . . . , q̇n
the corresponding velocities. (The constraint of the Atwood machine belongs to this category:
x1 + x2 = l, where l is a constant, and xi , i = 1, 2 are the heights of the two bodies measured with
respect to a common reference height.) This has to be contrasted to non-holonomic constraints,
i.e. constraints formulated by inequalities. (Think of the molecules moving in a piston. Their
coordinates obey the constraint 0 ≤ qj ≤ Lj , j = 1, 2, 3, where Lj are the extensions of the
piston.)
Constraints explicitly involving time f (q1 , . . . , qn ; q̇1 , . . . , q̇n ; t) = 0 are called rheonomic. For
example, a particle constraint to move on a moving surface is subject to a rheonomic constraint.
Constraints void of explicit time dependence are called scleronomic.
In this chapter, we will introduce the concept of variational principles to derive Lagrangian (and
later Hamiltonian) mechanics from their Newtonian ancestor. Our construction falls short to
elucidate the beautiful and important history developments that eventually led to the modern
formulation. Also, it is difficult to motivate in advance. However, within the framework of a
short introductory course, these shortcomings are outweighed by the brevity of the derivation.
Still, it is highly recommended to consult a textbook on classical mechanics to learn more
about the historical developments that led from Newtonian to Lagrangian mechanics.
Let us begin with a few simple and seemingly un–inspired manipulations of Newton’s
equations mq̈ = F of a single particle (generalization to many particles will be obvious)
subject to a conservative force F. Now, notice that the l.h.s. of the equation may be written
as mq̈ = dt ∂q̇ T , where T = T (q̇) = m2 q̇2 is the particle’s kinetic energy. By definition, the
conservative force affords a representation F = −∂q U , where U = U (q) is the potential. This
means that Newton’s equation may be equivalently represented as
L = T − U. (4.2)
But what is this reformulation good for? To appreciate the meaning of the mathematical
structure of Eq. (4.1), we need to introduce the purely mathematical concept of
Figure 4.1: On the mathematical setting of functionals on curves (discussion, see text)
the limiting case of an N -dimensional vector; in many aspects, variational calculus amounts
to a straightforward generalization of standard calculus. At any rate, we will see that one may
work with functionals much like with ordinary functions.
4.1.1 Definitions
In this chapter we will focus on the class of functionals relevant to classical mechanics, viz.
1
functionals F [γ] taking curves in (subsets of) Rn as arguments. To be specific, consider the
set of all curves M ≡ {γ : I ≡ [t0 , t1 ] → U } mapping an interval I into a subset U ⊂ Rn of
2
n-dimensional space (see Fig. 4.1.) Now, consider a mapping
Φ :M → R,
γ 7→ Φ[γ], (4.3)
1
We have actually met with more general functionals before. For example, the electric susceptibility χ[E]
is a functional of the electric field E : R4 → R3 .
2
The set U may actually be a lower dimensional submanifold of Rn . For example, we might consider curves
in a two–dimensional plane embedded in R3 , etc.
84 CHAPTER 4. LAGRANGIAN MECHANICS
It assigns to each curve its euclidean length. Some readers may question the consistency of the
notation: on the l.h.s. we have the symbol γ (no derivatives), and on the r.h.s. γ̇ (temporal
derivatives.) However, there is no contradiction here. The notation [γ] indicates dependence on
the curve as a geometric object. By definition, this contains the full information on the curve,
including all derivatives. The r.h.s. indicates that the functional L reads only partial information
on the curve, viz. that contained in first derivatives.
Consider now two curves γ, γ 0 ∈ M that lie “close” to each other. (For example, we may
2 1/2
require that |γ(t) − γ 0 (t)| ≡ 0
P
i (γi (t) − γi (t)) < for all t and some positive .) We
are interested in the increment Φ[γ] − Φ[γ ]. Defining γ 0 = γ + h, the functional Φ is called
0
differentiable iff
EXERCISE Re-familiarize yourself with the definition of the derivative f 0 (x) of higher dimensional
functions f : Rk → R. Interpret the functional Φ[γ] as the limit of a function Φ : RN → R, {γi } →
Φ({γi }) where the vector {γi |i = 1, . . . , N } is a discrete approximation of the curve γ. Think how
the definition (4.5) generalizes the notion of differentiability and how F |γ ↔ f 0 (x) generalizes the
definition of a derivative.
This is about as much as we need to say/define in most general terms. In the next section
we will learn how to determine the extremal curves for an extremely important sub–family of
functionals.
Z t1
F γ [h] = dt (∂γ L − dt ∂γ̇ L) · h. (4.7)
t0
Here, we are using the shorthand notation ∂γ L·h ≡ ni=1 ∂xi L(x, γ̇, t) xi =γi hi and analogously
P
Z t1
S[γ + h] − S[γ] = dt L(γ + h, γ̇ + ḣ, t) − L(γ, γ̇, t) =
t
Z 0t1 h i
= dt ∂γ L · h + ∂γ̇ · ḣ + O(h2 ) =
t
Z 0t1
= dt [∂γ L − dt (∂γ̇ L)] · h + ∂γ̇ L · h|tt10 + O(h2 ),
t0
where in the last step, we have integrated by parts. The surface term vanishes because
h(t0 ) = h(t1 ) = 0, on account of the condition γ(ti ) = (γ + h)(ti ) = γi , i = 0, 1. Comparison
with the definition (4.5) then readily gets us to (4.7).
Eq. (4.7) entails an important corollary: the local functional S is extremal on all curves
obeying the so-called Euler-Lagrange equations
d dL dL
− = 0, i = 1, . . . , N.
dt dγ̇i dγi
The reason is that if and only if these N conditions hold, will the linear functional (4.7) vanish
on arbitrary curves h. (Exercise: assuming that one or several of the conditions above are
violated, construct a curve h on which the functional (4.7) will not vanish.)
86 CHAPTER 4. LAGRANGIAN MECHANICS
S : M → R,
Z t1
γ 7→ S[γ] ≡ dt L(γ(t), γ̇(t), t), (4.8)
t0
At this stage, one may observe a suspicious structural similarity between the Euler-Lagrange
equations and our early reformulation of Newton’s equations (4.1). However, before shedding
more light on this connection, it is worthwhile to illustrate the usage of Euler-Lagrange calculus
on an
EXAMPLE Let us ask what curve between fixed initial and final points γ0 and γ1 , resp. will have
extremal curve length. Here, extremal means “shortest”, there is no “longest” curve. To answer
this question, we need to formulate and solve the Euler-Lagrange equations of the functional (4.4).
Rt
Differentiating the Lagrangian L(γ̇) = t01 dt ( ni=1 γ̇i γ̇i )1/2 , we obtain (show it)
P
These equations are solved by all curves connecting γ0 and γ1 as a straight line. For example,
the “constant velocity” curve, γ(t) = γ0 tt−t 1
0 −t1
+ γ1 tt−t0
1 −t0
has γ̈i = 0, thus solving the equation.
(Exercise: the most general curve connecting γ0 and γ1 by a straight line is given by γ(t) =
γ0 + f (t)(γ0 − γ1 ), where f (t0 ) = 0 and f (t1 ) = 1. In general, these curves describe “accelerated
motion”, γ̈ 6= 0. Still, they solve the Euler-Lagrange equations, i.e. they are of shortest geometric
length. Show it!)
φ ◦ φ−1
Yet, nowhere in our derivation of the Euler–Lagrange equations did we make reference to
specific properties of the coordinate system. This suggests that the Euler–Lagrange equations
assume the same form, Eq. (4.9), in all coordinate systems. (To appreciate the meaning
of this statement, compare with the Newton equations which assume their canonical form
ẍi = fi (x) only in cartesian coordinates.) Coordinate changes play an extremely important
role in analytical mechanics, especially when it comes to problems with constraints. Hence,
anticipating that the Euler–Lagrange formalism is slowly revealing itself as a replacement of
Newton’s formulation, it is well invested time to take a close look at the role of coordinates.
Both a curve γ and the functional S[γ] are canonical objects, no reference to coordinates
made here. The curve is simply a map γ : I → U and S[γ] assigns to that map a number.
However, most of the time when we actually need to work with curve, we do so in a system
3
of coordinates. Mathematically, a coordinate system of U is a diffeomorphic map:
φ :U → V,
x 7→ φ(x) ≡ x ≡ (x1 , . . . , xn ),
φ0 ◦ φ−1 :V → V 0 ,
x 7→ x0 = φ0 ◦ φ−1 (x).
By construction, this is a smooth and invertible map between open subsets of Rm . For example,
if x = (x1 , x2 , x3 ) are cartesian coordinates and x0 = (r, θ, φ) are spherical coordinates, we
would have x01 (x) = (x21 + x22 + x23 )1/2 , etc.
The most important point is that x(t) and x0 (t) describe the same curve, γ(t), only in
different representations. Specifically, if the reference curve γ is an extremal curve, the coor-
dinate representations x : I → V and x0 : I → V2 will be extremal curves, too. According to
our discussion in section 4.1.2 both curves must be solutions of the Euler-Lagrange equations,
i.e. we can draw the conclusion:
γ extremal ⇒ (4.10)
d ∂L ∂L
− = 0,
dt ∂ ẋi ∂xi
d ∂L0 ∂L
0
− 0 = 0,
dt ∂ ẋi ∂xi
where L0 (x0 , ẋ0 , t) ≡ L(x(x0 ), ẋ(x0 ), t) is the Lagrangian in the φ0 –coordinate representation.
In other words
EXERCISE Above we have shown the coordinate invariance of the Euler–Lagrange equations by
conceptual reasoning. However, it must be possible to obtain the same invariance properties by
brute force computation. To show that the second line in (4.10) follows from the first, use the
P ∂x0 ∂ ẋ0 ∂x0
chain rule, dt x0i = j ∂xji dt xj , and its immediate consequence ∂ ẋji = ∂xji .
EXAMPLE Let us illustrate the coordinate invariance of the variational formalism on the example
of the functional “curve length” discussed on p 86. Considering the case of curves in the plane,
n = 2, we might get the idea to attack this problem in polar coordinates φ−1 (x) = (r, ϕ). The
polar coordinate representation of the cartesian Lagrangian L(x1 , x2 , ẋ1 , ẋ2 ) = (ẋ21 + ẋ22 )1/2 reads
(verify it)
With these preparations in store, we are now in a very good position to apply variational
calculus to a new and powerful reformulation of Newtonian mechanics.
Vg , and another, Vc , for the force Fc . We also know that the sought for solution curve
Rt
q : I → R3 , t 7→ q(t) will extremize the action S[q] = t01 dt L(q, q̇). (Our problem does not
include explicit time dependence, i.e. L does not carry a time–argument.) So far, we have
not gained much. But let us now play the trump card of the new formulation, its invariance
under coordinate changes.
In the above formulation of the problem, we seeking for an extremum of S[q] on the set of
all curves I → R3 . However, we know that all curves in R3 will be subject to the “infinitely
strong” potential of the constraint force, unless they lie right on the string S. The action of
those generic curves will be infinitely large and we may remove them from the set of curves
entering the variational procedure from the outset. This observation suggests to represent the
problem in terms of coordinates (s, q⊥ ), where the one–dimensional coordinate s parameterizes
the curve, and the (3 − 1)–dimensional coordinate vector q⊥ parameterizes the space normal
to the curve. Knowing that curves with non-vanishing q⊥ (t) will have an infinitely large action,
we may restrict the set of curves under consideration to M ≡ {γ : I → S, t 7→ (s(t), 0)},
i.e. to curves in S. On the string, S, the constraint force Fc vanishes. The above limitation
thus entails that the constraint forces will never explicitly appear in the variational procedure;
this is an enormous simplification of the problem. Also, our problem has effectively become
one–dimensional. The Lagrangian evaluated on curves in S is a function L(s(t), ṡ(t)). It is
much simpler than the original Lagrangian L(q, q̇(t)) with its constraint force potential.
Before generalizing these ideas to a general solution strategy of problems with constraints,
let us illustrate its potential on a concrete problem.
EXAMPLE We consider the Atwood machine depicted on p81. The Lagrangian of this problem
reads
m1 2 m2 2
L(q1 , q2 , q̇1 , q̇2 ) = q̇ + q̇ − m1 gx1 − m2 gx2 .
2 1 2 2
Here, qi is the position of mass i, i = 1, 2 and xi is its height. In principle, the sought for solution
γ(t) = (q1 (t), q2 (t)) is a curve in six dimensional space. We assume that the masses are released
at rest at initial coordinates (xi (t0 ), yi (t0 ), zi (t0 )). The coordinates yi and zi will not change in
the process, i.e. effectively we are seeking for solution curves in the two–dimensional space of
coordinates (x1 , x2 ). The constraint now implies that x ≡ x1 = l − x2 , or x2 = l − x. Thus,
our solution curves are uniquely parameterized by the “generalized coordinate” x as (x1 , x2 ) =
(x, l − x). We now enter with this parameterization into the Lagrangian above to obtain
m1 + m2 2
L(x, ẋ) = ẋ − (m1 − m2 )gx + const.
2
It is important to realize that this function uniquely specifies the action
Z t1
S[x] = dt L(x(t), ẋ(t))
t0
of physically allowed curves. The extremal curve, which then describes the actual motion of the
two–body system, will be solution of the equation
∂L ∂L
dt − = (m1 + m2 )ẍ + (m1 − m2 )g = 0.
∂ ẋ ∂x
4.2. LAGRANGIAN MECHANICS 91
For the given initial conditions x(t0 ) = x1 (t0 ) and ẋ(t0 ) = 0, this equation is solved by
m1 − m2 g
x(t) = x(t0 ) − (t − t0 )2 .
m1 + m2 2
Fj (x1 , . . . , xN , t) = 0, j = 1, . . . , 3N − f. (4.11)
L(q, q̇, t) ≡ L(x1 (q), . . . , xN (q), ẋ1 (q), . . . , ẋN (q), t).
In practice, this amounts to a substitution of xi (t) = xi (q(t)) into the original La-
grangian. The effective Lagrangian is a function L : V × Rf × R → R, (q, q̇, t) 7→
L(q, q̇, t).
The prescription above is equivalent to the following statement, which is known as Hamiltons
principle
at given initial and final configuration q(t0 ) = q0 and q(t1 ) = q1 . This curve describes
the physical motion of the system. It is an extremal curve of the action functional
Z t1
S[q] = dt L(q, q̇, t).
t0
To conclude this section, let us transfer a number of important physical quantities from
Newtonian mechanics to the more general framework of Lagrangian mechanics: the general-
ized momentum associated to a generalized coordinate qi is defined as
∂L(q, q̇, t)
pi = . (4.13)
∂ q̇i
Notice that for cartesian coordinates, pi = ∂q̇i L = ∂q̇i T = mq̇i reduces to the familiar
momentum variable. We call a coordinate qi a cyclic variable if it does not enter the
Lagrangian function, that is if ∂qi L = 0. These two definitions imply that
This follows from dt pi = dt ∂q̇i L = ∂qi L = 0. In general, we call the derivative ∂qi L ≡ Fi
a generalized force. In the language of generalized momenta and forces, the Lagrange
equations (formally) assume the form of Newton-like equations,
dt pi = F i .
For the convenience of the reader, the most important quantities revolving around the La-
grangian formalism are summarized in table 4.1.
hs :V → V,
q → hs (q). (4.14)
S[hs ◦ q] = S[q],
for all s and curves q. The action is then said to be invariant under the symmetry transfor-
mation.
Suppose, we have found a symmetry and its representation in terms of a family of in-
variant transformations. Associated to that symmetry there is a quantity that is conserved
during the dynamical evolution of the system. The correspondence between symmetry and its
conservation law is established by a famous result due to Emmy Noether:
94 CHAPTER 4. LAGRANGIAN MECHANICS
Noether theorem (1915-18): Let the action S[q] be invariant under the transfor-
mation q 7→ hs (q). Let q(t) be an solution curve of the system (a solution of the
Euler-Lagrange equations). Then, the quantity
f
X d
I(q, q̇) ≡ pi (hs (q))i (4.15)
i=1
ds
is dynamically conserved:
dt I(q, q̇) = 0.
Here, pi = ∂q̇i L (q,q̇)=(hs (q),ḣs (q)) is the generalized momentum of the ith coordinate,
and the quantity I(q, q̇) is known as the Noether momentum.
a solution curve samples during a time interval [t0 , tf ]. Here, tf is a free variable and the
transformed value hs (q(tf )) may differ from q(tf ). We next explore what the condition of
action invariance tells us about the Lagrangian of the theory:
Z tf
!
0 = ds dt L(hs (q), ḣs (q), t)
t0
Z t1
= dt ∂qi L q=hs (q) ds (hs (q))i + ∂q̇i L q̇=ḣs (q) ds (ḣs (q))i =
t
Z 0t1 tf
= dt (∂qi − dt ∂q̇i L) (hs (q),ḣs (q)) ds (hs (q))i + ∂q̇i L q̇=ḣs (q) ds hs (q)i .
t0 t0
Now, our reference curve is a solution curve, which means that the integrand in the last line
tf
vanishes. We are thus led to the conclusion ∂q̇i L d h (q)i
q̇=ḣs (q) s s
= 0 for all values tf , and
t0
this is equivalent to the constancy of the Noether momentum.
It is often convenient, to evaluate both, the symmetry transformation hs and the Noether
momentum I = I(s) at infinitesimal values of the control parameter s. In practice, this
means that we fix a pair (q, q̇) comprising coordinate and velocity of a solution curve. We
then consider an infinitesimal transformation h (q), where is infinitesimally small. The
Noether momentum is then obtained as
f
X d
I(q, q̇) = pi (h (q))i . (4.16)
i=1
d
4.2. LAGRANGIAN MECHANICS 95
4.2.5 Examples
In this section, we will discuss two prominent examples of symmetries and their conservation
laws.
EXERCISE Generalize the construction above to a system of N particles in the absence of external
potentials. The (interaction) potential of the system then depends only on coordinate differences
qi − qj . Show that translation in arbitrary directions is a symmetry of the system and that this
implies the conservation of the total momentum P = N
P
j=1 mj q̇j .
where the angle φ serves as a control parameter (i.e. φ assumes the role of the parameter s
above), and
cos(φ) sin(φ) 0
(3)
Rφ ≡ − sin(φ) cos(φ) 0
0 0 1
96 CHAPTER 4. LAGRANGIAN MECHANICS
is an O(3)–matrix generating rotations by the angle φ around the 3–axis. The infinitesimal
variant of a rotation is described by
1 0
R(3) ≡ − 1 0 + O(2 ).
0 0 1
∂L
I(q, q̇) = d =0
(R(3) q)i = m(q̇1 q2 − q̇2 q1 ).
∂ q̇i
This is (the negative of) the 3-component of the particle’s angular momentum. Since the
choice of the 3–axis as a reference axis was arbitrary, we have established the result:
Now, we have argued above that in cases with symmetries, one should employ adjusted co-
ordinates. Presently, this means coordinates that are organized around the 3–axis: spherical,
or cylindrical coordinates. Choosing cylindrical coordinates for definiteness, the Lagrangian
assumes the form
m 2
L(r, φ, ṙ, φ̇, ż) = (ṙ + r2 φ̇2 + ż 2 ) − U (r, z), (4.17)
2
where we noted that the problem of a rotationally invariant system does not depend on φ.
The symmetry transformation now simply acts by translation, hs (r, z, φ) = (r, z, φ + s), and
the Noether current is given by
∂L
I ≡ l3 = = mr2 φ̇. (4.18)
∂ φ̇
EXAMPLE Let us briefly extend the discussion above to re-derive the symmetry optimized rep-
resentation of a particle in a central potential. We choose cylindrical coordinates, such that (a)
the force center lies in the origin, and (b) at time t = 0, both q and q̇ lie in the z = 0 plane.
Under these conditions, the motion will stay in the z = 0 plane. (Exercise: show this from the
Euler–Lagrange equations.) and the cylindrical coordinates (r, φ, z) can be reduced to the polar
coordinates (r, φ) of the invariant z = 0 plane.
4.2. LAGRANGIAN MECHANICS 97
m 2 l2
L(r, ṙ) = ṙ + − U (r).
2 2mr2
The solution of its Euler–Lagrange equations,
l2
mr̈ = −∂r U − ,
mr3
has been discussed in section *
98 CHAPTER 4. LAGRANGIAN MECHANICS
Chapter 5
Hamiltonian mechanics
The Lagrangian approach has introduced a whole new degree of flexibility into mechanical
theory building. Still, there is room for further development. To see how, notice that in the
last chapter we have consistently characterized the state of a mechanical system in terms of
the 2f variables (q1 , . . . , qf , q̇1 , . . . , q̇f ), the generalized coordinates and velocities. It is clear
that we need 2f variables to specify the state of a mechanical system. But are coordinates
and velocities necessarily the best variables? The answer is: often yes, but not always. In
our discussion of symmetries in section 4.2.3 above, we have argued that in problems with
symmetries, one should work with variables qi that transform in the simplest possible way (i.e.
additively) under symmetry transformations. If so, the corresponding momentum pi = ∂q̇i L is
conserved. Now, if it were possible to express the velocities uniquely in terms of coordinates
and momenta, q̇ = q̇(q, p), we would be in possession of an alternative set of variables (q, p)
such that in a situation with symmetries a fraction of our variables stays constant. And that’s
the simplest type of dynamics a variable can show.
In this chapter, we show that the reformulation of mechanics in terms of coordinates and
momenta as independent variables is an option. The resulting description is called Hamilto-
nian mechanics. Important examples of this new approach include:
. It exhibits a maximal degree of flexibility in the choice of problem adjusted coordinates.
. The variables (q, p) live in a mathematical space, so called “phase space”, that carries
a high degree of mathematical structure (much more than an ordinary vector space.)
This structure turns out to be of invaluable use in the solution of complex problems.
Relatedly,
. Hamiltonian theory is the method of choice in the solution of advanced mechanical
problems. For example, the theory of (conservative) chaotic systems is almost exclusively
formulated in the Hamiltonian approach.
. Hamiltonian mechanics is the gateway into quantum mechanics. Virtually all concepts
introduced below have a direct quantum mechanical extension.
However, this does not mean that Hamiltonian mechanics is “better” than the Lagrangian
theory. The Hamiltonian approach has its specific advantages. However, in some cases it
99
100 CHAPTER 5. HAMILTONIAN MECHANICS
may be preferable to stay on the level of the Lagrangian theory. At any rate, Lagrangian
and Hamiltonian mechanics form a pair that represents the conceptual basis of many modern
theories of physics, not just mechanics. For example, electrodynamics (classical and quantum)
can be formulated in terms of a Lagrangian and an Hamiltonian theory, and these formulations
are strikingly powerful.
f (x) = C exp(x),
The function g no longer knows about C, so information on this constant has been irretrievably
lost. (Knowledge of g(z) is not sufficient to re-construct f (x).)
5.1. HAMILTONIAN MECHANICS 101
However, it turns out that a slight extension of the above idea does the trick. Namely,
consider the so called Legendre transform of f (x),
We claim that g(z) defines a transformation of functions. To verify this, we need to show
that the function f (x) can be obtained from g(z) by some suitable inverse transformation.
It turns out that (up to a harmless sign change) the Legendre transform is self-inverse; just
apply it once more and you get back to the function f . Indeed, let us define the variable
y ≡ ∂z g(z) = ∂x f x(z) ∂z x(z) − z∂z x(z) − x(z). Now, by definition of the function z(x) we
have ∂x f x(z) = z(x(z)) = z. Thus, the first two terms cancel, and we have y(z) = −x(z).
We next compute the Legendre transform of g(z):
h(y) = f (x(z(y)) − x(z(y))z(y) − z(y)y.
Evaluating the relation y(z) = −x(z) on the specific argument z(y), we get y = y(z(y)) =
−x(z(y)), i.e. f (x(z(y)) = f (−y), and the last two terms in the definition of h(y) are seen
to cancel. We thus obtain
h(y) = f (−y),
the Legendre transform is (almost) self–inverse, and this means that by passing from f (x) to
g(z) no information has been lost.
The abstract definition of the Legendre transform with its nested variable dependencies can
be somewhat confusing. However, when applied to concrete functions, the transform is actually
easy to handle. Let us illustrate this on the example considered above: with f (x) = C exp(x),
and x = ln(z/C), we get
g(z) = C exp(ln(z/C)) − ln(z/C)z = z(1 − ln(z/C)).
(Notice that g(z) “looks” very different from f (x), but this need not worry us.) Now, let us
apply the transform once more: define y = ∂z g(z) = − ln(z/C), or z(y) = C exp(−y). This
gives
h(y) = C exp(−y)(1 + y) − C exp(−y)y = C exp(−y),
in accordance with the general result h(y) = f (−y).
The Legendre transform of multivariate functions f (x) is obtained by application of
the rule to all variables: compute zi ≡ ∂xi f (x). Next construct the inverse x = x(z). Then
define X
g(z) = f (x(z)) − zi xi (z). (5.2)
i
and z ↔ p. (Notice that the coordinates, q, themselves play the role of spectator variables.
The Legendre transform is in the variables q̇!) Le us now formulate the few steps it takes to
pass to the Legendre transform of the Lagrange functions:
3. Define a function
X
H(q, p, t) ≡ pi q̇i (q, q̇, t) − L(q, q̇(q, p, t)). (5.4)
i
Technically, this is the negative of L’s Legendre transform. Of course, this function
carries the same information as the Legendre transform itself.
The function H is known as the Hamiltonian function of the system. It is usually called
“Hamiltonian” for short (much like the Lagrangian function) is called “Lagrangian”. The
notation above emphasizes the variable dependencies ofP the quantities q̇i , pi , etc. One usually
keeps the notation more compact,P e.g. by writing H = i pi q̇i − L. However, it is important
to remember that in both L and i q̇i pi , the variable q̇i has to be expressed as a function of
q and p. It doesn’t make sense to write down formulae such as H = . . . , if the right hand
side contains q̇i ’s as fundamental variables!
Hamilton equations
Now we need to do something with the Hamiltonian function. Our goal will be to transcribe
the Euler–Lagrange equations to equations defined in terms of the Hamiltonian. The hope
then is, that these new equations contain operational advantages over the Euler–Lagrange
equations.
Now, the Euler-Lagrange equations probe changes (derivatives) of the function L. It will
therefore be a good idea, to explore what happens if we ask similar questions to the function
H. In doing so, we should keep in mind that our ultimate goal is to obtain unambiguous
information on the evolution of particle trajectories. Let us then compute
∂qi H(q, p, t) = pj ∂qi q̇j (q, p) − ∂qi L(q, q̇(q, p)) − ∂q̇j L(q, q̇(q, p))∂qi q̇j (q, p).
The first and the third term cancel, because ∂q̇j L = pj . Now, iff the curve q(t) is a so-
lution curve, the middle term equals −dt ∂q̇i L = −dt pi . We thus conclude that the (q, p)-
representation of solution curves must obey the equation
Similarly,
∂pi H(q, p, t) = q̇i (q, p) + pj ∂pi q̇j (q, p, t) − ∂q̇j L(q, q̇(q, p))∂pi q̇j (q, p, t).
dt H(q, p, t) = pi ∂t q̇i (q, p, t) − ∂q̇i L(q, q̇(q, p, t))∂t q̇i (q, p, t) − ∂t L(q, q̇, t).
where the time derivative on the r.h.s. acts only on the third argument L( . , . , t).
Let us summarize where we are: The solution curve q(t) of a mechanical system de-
fines a 2f –dimensional “curve”, (q(t), q̇(t)). Eq. (5.3) then defines a 2f –dimensional curve
(q(t), p(t)). The invertibility of the relation p ↔ q̇ implies that either representation faithfully
describes the curve. The derivation above then shows that
The solution curves (q, p)(t) of mechanical problems obey the so–called Hamilton
equations
q̇i = ∂pi H,
ṗi = −∂qi H. (5.6)
Some readers may worry about the numbers of variables employed in the description of curves:
in principle, a curve is uniquely represented by the f variables q(t). However, we have now
decided to represent curves in terms of the 2f –variables (q, p). Is there some redundancy
here? No there isn’t and the reason can be understood in different ways. First notice that in
Lagrangian mechanics, solution curves are obtained from second order differential equations in
time. (The Euler–Lagrange equations contain terms q̈i .) In contrast, the Hamilton equations
2
are first order in time. The price to be payed for this reduction is the introduction of a second
set of variables, p. Another way to understand the rational behind the introduction of 2f
variables is by noting that the solution of the (2nd order differential) Euler–Lagrange equations
1
Here it is important to be very clear about what we are doing: in the present context, q̇ is a variable in
the Lagrange function. (We could also name it v or z, or whatever.) It is considered a free variable (no time
dependence), unless the transformation q̇i = q̇i (q, p, t) becomes explicitly time dependent. Remembering the
origin of this transformation, we see that this may happen if the function L(q, p, t) contains explicit time
dependence. This happens, e.g., in the case of rheonomic constraints, or time dependent potentials.
2
Ordinary differential equations of nth order can be transformed to systems of n ordinary differential
equations of first order. The passage L → H is example of such an order reduction.
104 CHAPTER 5. HAMILTONIAN MECHANICS
Either way, we need 2f boundary conditions, and hence 2f variables to uniquely specify the
state of a mechanical system.
EXAMPLE To make the new approach more concrete, let us formulate the Hamilton equations
of a point particle in cartesian coordinates. The Lagrangian is given by
m
L(q, q̇, t) = q̇2 − U (q, t),
2
where we allow for time dependence of the potential. Thus
pi = mq̇i ,
pi
which is inverted as q̇i (p) = m. This leads to
3
X pi p2
H(q, p, t) = pi − L(q, p/m, t) = + U (q, t).
m 2m
i=1
The Hamilton equations assume the form
p
q̇ = ,
m
ṗ = −∂q U (q, t).
Further, ∂t H = −∂t L = ∂t U inherits its time dependence from the time dependence of the
potential. The two Hamilton equations are recognized as a reformulation of the Newton equation.
(Substituting the time derivative of the first equation, q̈ = ṗ/m, into the second we obtain the
Newton equation mq̈ = −∂q U .)
p2
Notice the Hamiltonian function H = 2m + U (q, t) equals the energy of the particle! In the next
section, we will discuss this connection in more general terms.
EXAMPLE Let us now solve these equations for the very elementary example of the one-
2
dimensional harmonic oscillator. In this case, V (q) = mω 2
2 q and the Hamilton equations assume
the form
p
q̇ = ,
m
ṗ = −mω 2 q.
For a given initial configuration x(0) = (q(0), p(0)), these equations afford the unique solution
p(0)
q(t) = q(0) cos ωt + sin ωt,
mω
p(t) = p(0) cos ωt − mω sin ωt. (5.7)
3
But notice that conditions of this type do not always uniquely specify a solution – think of examples!
4
Formally, this follows from a result of the theory of ordinary differential equations: a system of n first
order differential equations ẋi = fi (x1 , . . . , xn ), i = 1, . . . , n affords a unique solution, once n initial conditions
xi (0) have been specified.
5.1. HAMILTONIAN MECHANICS 105
where U is the N –body potential and mj the mass of the jth particle. Proceeding as in the
example on p 104, we readily find
N
X p2i
H(q, p) = + U (q1 , . . . , qN ). (5.8)
j=1
2m i
The expression on the r.h.s. we recognize as the energy of the system. We are thus led to the
following important conclusion:
However, our discussion above implies another very important corollary. It has shown that
The Hamiltonian H = T + U is given by the sum of potential energy, U (q, t), and
kinetic energy, T (q, p, t), expressed as a function of coordinates and momenta.
106 CHAPTER 5. HAMILTONIAN MECHANICS
as the new fundamental variables of the theory. Given a coordinate system, we may think of
x as element of a 2f –dimensional vector space. However, all what has been said in section
4.1.3 about the curves of mechanical systems and their coordinate representations carries over
to the present context: in abstract terms, the pair “(configuration space points, momenta)”
defines a mathematical space known as phase space, Γ. The formulation of coordinate
invariant descriptions of phase space is a subject beyond the scope of the present course.
5
However, locally it is always possible to parameterize Γ in terms of coordinates, and to describe
5
Which means that Γ has the status of a 2f –dimensional manifold.
5.2. PHASE SPACE 107
its elements through 2f –component objects such as (5.9). This means that, locally, phase
space can be identified with a 2f –dimensional vector space. However, different coordinate
representations correspond to different coordinate vector spaces, and sometimes it is important
6
to recall that the identification (phase space) ↔ (vector space) may fail globally.
Keeping the above words of caution in mind, we will temporarily identify phase space Γ with
a 2f –dimensional vector space. We will soon see that this space contains a very particular sort
of “scalar product”. This additional structure makes phase space a mathematical object far
more interesting than an ordinary vector space. Let us start with a rewriting of the Hamilton
equations. The identification xi = qi and xf +i = pi , i = 1, . . . , f enables us to rewrite
Eq. (5.6) as,
ẋi = ∂xi+f H,
ẋi+f = −∂xi H,
ẋ = I∂x H. (5.11)
and 1f is the f -dimensional unit matrix. The matrix I is sometimes called the symplectic
unity.
Figure 5.1: Visualization of the Hamiltonian vector field and its flow lines
that the solution curve x(t) is tangent to that vector, x(t) = XH (x). One may visualize the
situation in terms of the streamlines of a fluid. Within that analogy, the value XH (x) is a
measure of the local current flow. If one injected a drop of colored ink into the fluid, its trace
would be a representation of the curve x(t).
For a general vector field v : U → RN , x 7→ v(x), where U ⊂ RN , one may define a
parameter–dependent map
Φ : U × R → U,
(x, t) 7→ Φ(x, t), (5.15)
!
through the condition ∂x Φ(x, t) = v(Φ(x, t)). The map Φ is called the flow of the vector
field v. Specifically, the flow of the Hamiltonian vector field XH is defined by the prescription
where x(t) is a curve with initial condition x(t = 0) = x. Equation (5.16) is a proper
definition because for a given x(0) ≡ x, the solution x(t) is uniquely defined. Further,
∂t Φ(x, t) = dt x(t) = XH (x(t)) satisfies the flow condition. The map Φ(x, t) is called the
Hamiltonian flow.
There is a corollary to the uniqueness of the curve x(t) emanating from a given initial
condition x(0):
For if they did, the crossing point would be the initial point of the two out-going stretches
of the curve, and this would be in contradiction to the uniqueness of the solution for a given
initial configuration. There is, however, one subtle caveat to the statement above: although
phase space curves do not cross, they may actually touch each other in common terminal
points. (For an example, see the discussion in section 5.2.3 below.)
5.2. PHASE SPACE 109
give a far–reaching impression of the motion of a point particle in the potential landscape.
Specifically, notice the threshold energy E2 corresponding to the local potential maximum. At
first sight, the corresponding phase space curve appears to violate the criterion of the non-
existence of crossing points (cf. the “critical point” at p = 0 beneath the potential maximum.)
In fact, however, this isn’t a crossing point. Rather the two incoming curves “terminate” at
this point: coming from the left, or right it takes infinite time for the particle to climb the
potential maximum.
Eventually it will rest at the maximum in an unstable equilibrium position. Similarly, the out-
going trajectories “begin” at this point. Starting at zero velocity (corresponding to zero initial
momentum), it takes infinite time to accelerate and move downhill. In this sense, the curves
touching (not crossing!) in the critical point represent idealizations that are never actually
realized. Trajectories of this type are called separatrices. Separatrices are important in that
they separate phase space into regions of qualitatively different type of motion (presently,
curves that make it over the hill and those which do not.)
EXERCISE Take a few minutes to familiarize yourself with the phase space portrait and to learn
how to “read” such representations. Sketch the periodic potential V (q) = cos(2πq/a), a = const.
and its phase space portrait.
where the terms ∂t g account for an optional explicit time dependence of g. The character-
istic combination of derivatives governing this expressions appears frequently in Hamiltonian
5.2. PHASE SPACE 111
Figure 5.2: Phase space portrait of a ”non–trivial” one-dimensional potential. Notice the
“critical point” at the threshold energy 2).
f
X ∂f ∂g ∂f ∂g
{f, g} ≡ − . (5.17)
i=1
∂p i ∂q i ∂q i ∂p i
This is expression is called the Poisson bracket of two phase space functions f and g. In the
invariant notation of phase space coordinates x, it assumes the form
where I is the symplectic unit matrix defined above. The time evolution of functions may be
concisely expressed in terms of the Poisson bracket of the two functions H and g:
dt g = {H, g} + ∂t g. (5.19)
Notice that the Poisson bracket operation bears similarity to a scalar product of functions.
Let CΓ be the space of smooth (and integrable) functions in phase space. Then
{ , } : CΓ × CΓ → R,
(f, g) 7→ {f, g}, (5.20)
112 CHAPTER 5. HAMILTONIAN MECHANICS
defines a scalar product. This scalar product comes with a number of characteristic algebraic
properties:
. {c, g} = 0,
. {f, {g, h}} + {h, {f, g}} + {g, {h, f }} = 0 (Jacobi identity),
where, c, c0 ∈ R are constants. The first three of these relations are immediate consequences
of the definition and the fourth follows from the product rule. The proof of the Jacobi identity
amounts to a straightforward if tedious exercise in differentiation.
At this point, the Poisson bracket appears to be little more than convenient notation.
However, as we go along, we will see that it encapsulates important information on the
mathematical structure of phase space.
S :M → R,
f
!
Z t1 X
x 7→ S[x] = dt pi q̇i − H(q, p, t) . (5.21)
t0 i=1
We now claim that the extremal curves of this functional are but
P the solutions of the Hamilton
equations (5.6). To see this, define the function F (x, t) ≡ fi=1 pi q̇i − H(q, p, t), in terms
Rt
of which S[x] = t01 dt F (x, t). Now, according to the general discussion of section 4.1.2, the
extremal curves are solutions of the Euler Lagrange equations
Following our general paradigm of valuing “form invariant equations”, we deem a coordi-
nate transformation “canonical”, if the following condition is met:
7
A diffeomorphism x 7→ X(x) defines a canonical transformation if there exists a
function H 0 (X, t) such that the representation of solution curves (i.e. solutions of
the equations ẋ = I∂x H) in the new coordinates X = X(x) solves the “transformed
Hamilton equations”
Now, this definition may leave us a bit at a loss: it does not tell how the “new” Hamiltonian H 0
relates to the old one, H and in this sense seems to be strangely “under–defined”. However,
before exploring the ways by which canonical transformations can be constructively obtained,
let us take a look at possible motivations for switching to new coordinates.
conserved. Within the framework of phase space dynamics, this means that there is a function
I(x) such that dt I(x(t)), where x(t) is a solution curve. We may think of I(x) as a function
that is constant along the Hamiltonian flow lines.
Now, suppose we have s ≤ f symmetries and, accordingly, s conserved functions Ii , i =
1, . . . , f . Further assume that we find a canonical transformation to phase space coordinates
x 7→ X, where P = (I1 , . . . , Is , Ps+1 , . . . , Pf ), and Q = (φ1 , . . . , φs , Qs+1 , . . . , Qf ). In
words: the first s of the new momenta are the functions Ii . Their conjugate configuration
8
space coordinates are called φi . The new Hamiltonian will be some function of the new
coordinates and momenta, H 0 (φ1 , . . . , φs , Qs+1 , . . . , Qf , I1 , . . . , Is , Ps+1 , . . . , Pf ). Now let’s
take a look at the Hamilton equations associated to the variables (φi , Ii ):
dt φi = ∂Ii H 0 ,
!
dt Ii = −∂φi H 0 = 0.
The second equation tells us that H 0 must be independent of the variables φi :
∂H 0 At
= 0.
∂φi
this point, it should have become clear that coordinate sets containing conserved momenta,
Ii , and their coordinates, φi are tailored to the optimal formulation of mechanical problems.
The power of these coordinates becomes fully evident in the extreme case where s = f ,
i.e. where we have as many conserved quantities as degrees of freedom. In this case
H 0 = H 0 (I1 , . . . , If ),
and the Hamilton equations assume the form
dt φi = ∂Ii H 0 ≡ ωi
dt Ii = −∂φi H 0 = 0.
9
These equations can now be trivially solved:
φi (t) = ωi t + αi ,
Ii (t) = Ii = const.
8
It is natural to designate the conserved quantities as “momenta”: in section 4.2.4 we saw that the
conserved quantity associated to the “natural” coordinate of a symmetry (a coordinate transforming additively
under the symmetry operation) is the Noether momentum of that coordinate.
9
The problem remains solvable, even if H 0 (I1 , . . . , If , t) contains explicit time dependence. In this case,
φ̇i = ∂Ii H 0 (I1 , . . . , If , t) ≡ ωi (t), where ωi (t) is a function of known time dependence. (The time dependence
is known because the variables Ii are constant and we are left with the externally imposed dependence Rt on
time.) These equations – ordinary first order differential equations in time – are solved by φi (t) = t0 dt0 ωi (t0 ).
5.3. CANONICAL TRANSFORMATIONS 115
?
A very natural guess would H 0 (X, t) = H(x(X), t), i.e. the “old” Hamiltonian expressed
in new coordinates. However, we will see that this ansatz can be too restrictive.
Progress with this situation is made once we remember that solution curves x are extrema
of an action functional S[x] (cf. section 5.2.5). This functional is the x–representation of an
“abstract” functional. The same object can be expressed in terms of X–coordinates (cf. our
discussion in section 4.1.3.) as S 0 [X] = S[x]. The form of the functional S 0 follows from the
condition that the X–representation of the curve be solution of the equations (5.22). This is
equivalent to the condition
f
Z t1 !
X
S 0 [X] = dt Pi Q̇i − H 0 (Q, P, t) + const.,
t0 i=1
where “const.” is a contribution whose significance we will discuss in a moment. Indeed, the
variation of S 0 [X] generates Euler–Lagrange equations which we saw in section 5.2.5 are just
the Hamilton equations (5.22).
116 CHAPTER 5. HAMILTONIAN MECHANICS
Bibliography
117