A First Course in Optics: Jan W. Thomsen

A First Course in Optics
Jan W. Thomsen
The Niels Bohr Institute
Spring 2010
Contents
1 Introduction page 1
2 Maxwell’s Equations 3
2.1 General form of Maxwell’s Equations 3
2.2 Maxwell’s Equations in vacuum 8
2.2.1 Measurements of the speed of light 10
2.2.2 Solutions to the wave equation 13
2.3 Model of electron motion in materials 17
2.3.1 Rayleigh scattering of light 20
2.3.2 Energy of an EMW 22
2.3.3 Momentum of light 25
2.4 Maxwell’s Equations for an insulator 28
2.4.1 Simple model of the index of refraction 31
2.4.2 Real and complex index of refraction 35
2.4.3 Group and phase velocity 37
2.4.4 Dispersion viewed in ω, k- diagrams 42
2.5 Negative refractive index 43
2.6 Maxwell’s Equations for a conductor 46
2.6.1 Frequency dependent conductivity: Drude’s
model 51
3 Propagation of light 57
3.1 Fermat’s principle 57
3.1.1 Solutions to Euler Lagrange equations 68
3.2 Snell’s law 69
3.2.1 Applications of Snell’s law 72
3.3 Fresnel’s laws of reflection 74
3.3.1 Application of Fresnel’s equations 82
3.3.2 Intensity relations 85
iv Contents
3.3.3 Metals 89
4 Geometrical optics 92
4.1 Optical elements 92
4.1.1 Aspherical surfaces 94
4.1.2 Imaging with spherical surfaces 98
4.1.3 Common spherical lens errors 107
4.2 Ray tracing 112
4.2.1 Thick lenses 117
4.2.2 Combination of two thin lenses 120
5 Polarization of light 122
5.1 Polarization states of monochromatic waves 123
5.1.1 Jones vectors 125
5.1.2 Mathematical description of light: Stokes
vector 129
5.1.3 Optically active crystals 132
5.1.4 Maxwell’s equations for an anisotropic
medium 136
5.1.5 Group velocity vg in anisotropic materials 141
5.2 Production and manipulation of polarized light 143
5.2.1 Polarizers 143
5.2.2 Wave plates and retarders 147
5.2.3 Optical activity, magneto-optical effect
and Faraday-rotators 150
6 Interference of light 152
6.1 Coherence of light fields 153
6.1.1 Interference - general considerations 156
6.2 Young’s experiment 159
6.3 Thin films 161
6.4 Interferometers and their applications 164
7 Diffraction of light 170
7.1 Interference of N sources - the grating 171
7.2 The modified Huygens-Fresnel construction 174
7.3 Fraunhofer diffraction 179
7.4 Babinet’s principle 192
7.5 Rayleigh’s criterion for angular resolution 194
1
Introduction
Before studying any field one should begin by asking an important ques-
tion; Why? Why study optics? Three good reasons:
• It is inherently beautiful.
• It is very important for present and future high-tech industry.
• It is the basis for many other physics disciplines.
The field of optics is one of the most fascinating and amazing topics
of science. It is visible and very appealing! Most of us perceive the world
with the incredibly powerful visual sense, so it is very natural to be cu-
rious about how vision and optics in general actually work. Yet many
elements of the theory are abstract and indeed complicated to imagine.
But in optics we can illustrate complicated theories by simple experi-
ments and demonstrations that support our understanding of light and
how it interacts with atoms and matter. Many beautiful optical phenom-
ena can be observed in nature, such as rainbows, halos, sun dogs, green
flashes etc. A good understanding of these phenomena is a must for any
physics student. On the front cover of these notes you can see a “camera
obscura” – people speculate that this is the oldest optical instrument
used already by cavemen early in the history of mankind.
The science of optics is one of the most important fields of physics
as it provides a firm basis for many other disciplines of science and
technology. In industry, telecommunication and optical fiber based data
transfer is becoming increasingly important. As an example, every day
the fiber industry makes fibers corresponding to a length that can reach
seven times around the planet. Any improvements in these systems will
have tremendous impact on economy and society. Here you can make a
difference!
2 Introduction
If you wish to study in detail any field of science or physics you are
bound to come across the field of optics. That may be laser technology
in Atomic Molecular and Optical physics (AMO), near field imaging
microscopes in biophysics, optical simulation of LHC detectors, gravita-
tional wave detectors, telescopes in astronomy. Therefore, it is not our
job to turn you into an optician. However, we will provide you with a
toolbox that you can hopefully use on your way in the science landscape.
In the future, many ways and things may come and go, but Maxwell’s
equations will never be abandoned.
This set of notes was prepared in connection with an optics class
taught at the Niels Bohr Institute spring 2010. The class was divided
into a series of lectures, problem sessions and experimental classes. As a
prerequisite some formal knowledge of Maxwell’s equations in differen-
tial form was assumed. These notes are a living document and you can
contribute to improve them by pointing out things that are unclear to
you to your teachers and instructors.
The notes are intended to help you to learn the ropes in optics, but
they cover by no means the full breadth of the field. A very useful supple-
ment is the splendid (alas, expensive) book “Optics” by Eugene Hecht,
which – even if somewhat old-fashioned – provides a very good overview
and many examples accessible for undergraduates. Check out your fa-
vorite book-store, ask older students, or browse the web to get a hand
on the 4th or 5th edition. For a quick brush-up on the use of vector anal-
ysis in electromagnetism you should always have your copy of Griffiths’
“Introduction to Electrodynamics” within reach.
2
Maxwell’s Equations
In this chapter we will study Maxwell’s four equations and their solutions
in three distinct cases. These four equations are truly a milestone in the
human history. So, enjoy the adventure that you are about to begin
exploring and think about for the rest of your life! After a small general
introduction, we begin with the simplest case study namely vacuum, or
empty space if you like, a medium with no sources or currents. In the
second case we look at the insulator where the conductivity is zero, and
finally, we will study a conductor, such as a metal, which has a finite
conductivity.
2.1 General form of Maxwell’s Equations

In the year 1865 James Clerk Maxwell (1831-1879) published a paper
on eight equations today condensed to the well-known four Maxwell’s
equations. A number of physicists worked on his laws. Later in 1873 he
published “A Treatise on Electricity and Magnetism”, where the equa-
tions are condensed into the four equations as we know them today.
∇·D = ρ (2.1)
∇·B = 0 (2.2)
∂B
∇×E = − (2.3)
∂t
∂D
∇×H = J+ (2.4)
∂t
4 Maxwell’s Equations
Figure 2.1 Portrait of James Clerk Maxwell.
These laws must be treated with a certain amount of respect. They tell
how fields are produced, what causes them and how the fields are inter-
connected. The first law we call “Gauss’ law”, then follows “no mag-
netic monopoles”, on to “Faraday’s law”, and finally, “Ampère’s
law” with Maxwell’s correction. Maxwell’s equations unify electricity,
magnetism and light. Gauss’ law expresses the electric field in terms of
its sources - the charges. The second equation expresses that there are
no magnetic “charges”, magnetic sources always come with a south and
a north pole. Faraday’s law states that a time varying magnetic field
produces a circulatory electric field, - an electromotive force. Ampère’s
law states how a magnetic field is related to its sources, - moving charges
and a time varying electric field. The perhaps most important law for
our everyday life is Faraday’s law, controlling electric motors, generation
of electricity, electromagnets etc.
As you may have observed Maxwell’s equations do not contain ex-
plicitly any material specific parameters such as the permeability µ and
permittivity ε. Those are introduced through the constitutive relations
stated below and express how fields interact with matter. They also de-
fine how the auxiliary fields D and H are related to the fields E and B
appearing in the Lorentz force law. For simple homogeneous materials
in moderate fields we have the linear field relations:
D = εE (2.5)
B = µH. (2.6)
Below we list for reference the connection to free space or vacuum per-
mittivity ε0 and vacuum permeability µ0 .
ε = ε0 (1 + χ) (2.7)
µ = µ0 (1 + χm ) (2.8)
The constants χ and χm express how easy it is to polarize or magnetize

the material, respectively. Take water, as an example, which is very easy
to polarize, especially at low frequencies. Open your water tap to have a
very thin stream and approach a balloon which you have rubbed against
your shirt. Here you will observe that you may bend the water stream
almost 90 degrees with the charged balloon. For the displacement field
we have
D = ε0 E + P = ε0 (1 + χ)E (2.9)
where the quantity P = ε0 χE represents the dielectric polarization of

the material1 , i.e., the average electric dipole moment per unit volume.
Similarly, we have
B = µ0 (H + M) = µ0 (1 + χm )H (2.11)
where M = χm H is the magnetization, i.e., the average magnetic mo-

ment per unit volume. However, in this course we will mainly consider
1 In intense fields the magnetization and the electric polarization is no longer
linear in the field. For example in case of the induced polarization one has:
P (E) = P0 + ε0 χ(1) E + ε0 χ(2) E 2 + ε0 χ(3) E 3 + · · · (2.10)
where the first term is the polarization of the medium in absence of external
applied electric field, often zero, the next term is linear in the field, third term
quadratic in the field etc. The non-linear terms give rise to a huge amount of new
physics. We call it non-linear optics, a very rich field of optics with phenomena
such as frequency doubling (two photons of frequency ω combine to one photon
of frequency 2ω ), down conversion (one photon of frequency ω disintegrates into
two new photons of frequency ω 0 and ω 00 ) etc.
optically magnetic inactive materials. In this case it is a good approxi-

mation to assume µ = µ0 , which we do unless otherwise specified.
The two constants vacuum permittivity, or permittivity of free space
ε0 , and vacuum permeability or permeability of free space µ0 , can both
be measured in static experiments. Already in Maxwell’s time these were
relatively well known quantities.
ε0 = 8.85 · 10−12 F/m (2.12)
µ0 = 4π · 10−7 H/m (2.13)
The units can also be expressed as: [F/m] = [As/(Vm)] and [H/m] =
[N/A2 ].
Problem 2.1 How would you measure ε0 and µ0 in a static experi-

ment?
Example 2.2 The frequency dependence of the dielectric con-

stant The dielectric constant links the displacement field D and the elec-
tric field E. We also call this link a constitutive relation, i.e. a relation,
in this case, describing how the electric field interacts with materials.
We have a similar equation for the B, H fields as you have seen above.
Generally, the dielectric constant is a complex quantity expressed as:
ε = ε0 + iε00 , (2.14)
and is even frequency dependent. Why is the dielectric “constant” de-

scribed by a complex function? Well, in this way we can describe how
the field amplitude is affected, but also how the phase of the induced
polarization behaves compared to the external applied field. Polarization
in a medium does not respond instantaneously to the applied fields, but
is accompanied by a delay linked to how fast charges inside the medium
move around. Think about a parallel plate capacitor driven with a pure
sine wave. Inside the capacitor we have a dielectric, say a gas, water or
piece of plastic. As the voltage across the capacitor switches, the electric
charges such as atoms, molecules, etc. will move and change the fields
locally. But it takes time to move, rotate or displace charges, so there
will be a phase delay, that’s what we may describe by the complex co-
efficient. The fact that charges move around and rearrange on a certain
time scale (not infinitely fast) is the key to understand the four regimes
ε’ and ε’’
Figure 2.2 Typical behavior of the real and imaginary components of

the complex dielectric constant ε0 , ε00 as a function of frequency. Each
of the four frequency regions, ionic, rotational, atomic and electronic,
are specific to how charges move around in the medium and to the
time scale on which they move around. Light charges such as electrons
are easy to move (electronic) while rotational and atomic regimes
involve atom cores or molecules to be moved around on a significantly
slower timescale. The high frequency part is the one responsible for
optical phenomena which we are going to study here. Note as the
field frequency ω → ∞ then ε0 → 1 as charges are no longer able to
follow the fast changing field.
depicted in figure 2.2. Notice that not all four frequency regimes, ionic,
rotational, atomic and electronic need to be present for a given material.
For some materials the rotational regime may dominate, others only the
ionic part. Inside metals, things are a bit different. A good conductor
is easy to polarize, but no DC electric fields can exist inside a good
conductor. So at DC we would expect ε0 to be infinite (or very large).
However at higher frequencies, this is not the case. For example a good
conductor (Ag, Au, Al, etc.) at optical frequencies ( ∼ 1014 Hz) has neg-
ative relative permittivity ε0 ∼ −5. This reflects that electrons cannot
follow infinitely fast, they have an inertial mass limiting how fast they
may be dragged around.
Let’s return to dielectrics. At zero frequency, i.e., at DC we will mea-
sure the DC dielectric constant. As the frequency increases we observe
a decreasing dielectric constant. In the frequency range 1-106 Hz we are
dominated by an effect called space charge polarization. Here the mobile

charge carriers are obstructed and impeded by the granular interfaces
and structure, which makes it hard for charges to move freely around.
This you may find in plastics, porous materials or other solids. At some-
what higher frequencies 106 - 109 Hz the polarizability is dominated by
dipolar effects. Think about a molecule such as water. Most molecules
have a permanent dipole moment. This dipole moment re-aligns in the
switching electric field and thus involves a rotation. The typical time
scale it takes to change the orientation of the molecule (certain mo-
ment of inertia and torque) is corresponding to a GHz frequency scale.
As the water molecule rotates it also collides with other molecules or
atoms. This reduces the effect of the polarization and finally sets the
scale of the polarizability. In the atomic regime 109 - 1012 Hz there is
a displacement of negative and positive ions with respect to each other.
This may be a crystal, NaCl, where negative and positive charge centers
move with respect to each other, but it can also happen for a regular
solid. Finally we consider the electronic polarization regime which arises
due to displacement of the valence electron relative to the atomic core.
The electron is light and may be shifted around fast at 1014 Hz. This
particular frequency part of the dielectric constant is what this optics
course is all about and gives rise to a number of beautiful and interesting
phenomena. Notice finally how high frequency phenomena leave a trace
in the low frequency response of materials but not the other way around
– this is a recurrent theme in many branches of physics.
2.2 Maxwell’s Equations in vacuum
Maxwell’s equations in vacuum reduce to:

Vacuum
∇·E = 0 (2.15)
∇·B = 0 (2.16)
∂B
∇×E = − (2.17)
∂t
∂E
∇×B = ε0 µ0 , (2.18)
∂t
as we have no sources. The material equations become particularly sim-
ple ε = ε0 , µ = µ0 . If we take the curl of Faraday’s law we get
∂(∇ × B)
∇ × (∇ × E) = − (2.19)
∂t
∂ 2 (E)
−∇2 E = −ε0 µ0 . (2.20)
∂t2
In the last step we used Amperes law and the vector identity ∇ × (∇ ×
A) = ∇(∇ · A) − ∇2 A together with Gauss’ law ∇ · E = 0. Finally, we
recognize the result as a wave equation for the E-field:
1 ∂2E
∇2 E = (2.21)
c2 ∂t2
with a wave speed c given as c2 = 1/ε0 µ0 . A similar derivation can be

carried out for the B-field:
1 ∂2B
∇2 B = . (2.22)
c2 ∂t2
These two equations constitute electromagnetic wave equations and we

see that one field, say the E-field, cannot exist without the B-field and
vice versa. In the framework of special relativity the two fields are in
fact a single entity.
Problem 2.3 Obtain the wave equation for the B-field using Maxwell’s
equations.
The speed of light in vacuum we find as:

1
c= √ = 299.792.458 m/s. (2.23)
ε0 µ0
In the International System of Units (SI) the speed of light in vacuum
is fixed to be exactly 299.792.458 m/s. This reflects our firm belief in
the postulate of special relativity that the speed of light is the same in
all reference systems and hence a fundamental constant of nature. As it
is defined it makes no sense to remeasure it for a better value! Today,
the SI-unit of length, the meter, is defined as the distance light travels
in vacuum in 1/299.792.458 of a second.
2.2.1 Measurements of the speed of light

The very first measurement of the speed of light was performed by the
Danish astronomer Ole Rømer and announced in 1676. He measured the
time delay in the eclipse of one of Jupiter’s four larger moons, Io, as a
function of the Earths position, see figure 2.3. This delay he ascribed
to the finite speed of light. Sometimes the eclipse appeared earlier and
earlier, position 4, as the distance between Earth and Io is decreased.
Sometimes the opposite, positions 2. Based on his observations he ex-
tracted the time it takes light to traverse the Earth’s orbit and found it
to be some 22 minutes. For the speed of light one may obtain a value
of 2.14 · 108 m/s. The by today’s standards quite low value is due to the
fact that one did not know the Earth’s orbital diameter sufficiently well.
Next time you go to Paris pass by L’Observatoire de Paris (close to

metro Denfert-Rochereau) and see the gold plate with his name and the
surroundings where he did his marvelous experiments. Notice we have
put a copy of Christian Huygens ”Treatise on light” on the course home
page that praises Ole Rømer’s work. Read pages 7-9 and enjoy!
Problem 2.4 Time delay in the eclipse of Jupiter’s moon Io.

Assume the Earth is moving with a speed of 30 km/s and that it takes
Io 42.5 hours to complete an orbit. At for example position 4, see figure
2.3, following two successive eclipses how many seconds earlier will you
observe the moon eclipse compared to the one you just observed?
In 1728 James Bradley (1693-1762), an English astronomer, estimated

the speed of light in vacuum to be around 301,000 km/s. He used the
Jupiter
3 1
Sun M
Earth orbit
4
Figure 2.3 Rømer’s measurement of the speed of light. The blue

planet is the Earth at four different positions during the year.
The moon M is Io. Jupiter has four larger moons called Europa,
Ganymede, Callisto and Io, where Io is the inner one.
so-called stellar aberration to calculate the speed of light. Stellar aber-

ration causes the apparent position of stars to change due to the motion
of Earth around the sun. Stellar aberration is approximately the ratio of
the speed that the Earth orbits the sun to the speed of light. He knew
the speed of Earth around the sun fairly well and he could also measure
this stellar aberration angle with a telescope. These two facts enabled
him to calculate the speed of light in vacuum.
A French physicist, Hippolyte Fizeau (1819-1896), carried out an amaz-

ing experiment at Montmartre in Paris, close to Sacré-Cœur. He shone
a light beam between the teeth of a rapidly rotating cogwheel, see figure
2.4. A silvered mirror more than 4 km away reflected the beam back
through the same gap between the teeth of the wheel. As the wheel
rotates light pulses will travel an 8 km distance. If the wheel rotates
sufficiently fast light on the way back will hit one of the teeth and
get absorbed. Measuring the rotating speed of the wheel and knowing
the travelled distance allowed Fizeau to determine the speed of light to
313.300 km/s.
An alternative method, used today, consist of measuring the wave-
length and the frequency of a light beam in vacuum. Then apply the
equation c = λν. A coherent light beam with a known frequency (ν),
Strong light source
Semi transparent
mirror plate
Observer
Rotating cogwheel
L = 4315 m
Figure 2.4 Fizeaus measurement of the speed of light in 1848-49. A

light beam is send to an 8633 m distance mirror in short pulses. The
pulses are produced by a fast rotating cogwheel. The lowest rota-
tion speed that blocks the returning light together with the travelled
distance determines the speed of light.
supplied by a very frequency-stable laser, is split to follow two paths and

then recombined. By carefully changing the path length and observing
the interference pattern, the wavelength of the light λ can be determined
with high accuracy.
The main difficulty in measuring c through interferometry is to mea-
sure the frequency of light in the optical region (typically frequencies
1014 Hz) and link it to the time standard, i.e., the definition of a sec-
ond, the normal second. The normal second is supplied by a hyperfine
transition in the groundlevel of Cesium, an atomic clock running at a
frequency of about 1010 Hz.
Optical frequencies are far too high to be measured with conventional
methods. The detectors are simply too slow. This was first overcome by a
group at the US National Institute of Standards and Technology (NIST)
laboratories in Boulder, Colorado, in 1972. By a series of photodiodes
and specially constructed metal-insulator-metal diodes, they succeeded
in coherently linking the frequency of a methane-stabilized infrared laser
to the frequency a cesium atomic clock nearly 10,000 times lower in
frequency. Their spectacular results for the frequency and wavelength of
the infrared laser, and the resulting value for c, is given below:
ν = 88.376181627 ± 0.000000050 THz (2.24)

λ = 3.392231376 ± 0.000000012 µm (2.25)
c = 299792456.2 ± 1.1 m/s (2.26)
This result was approximately hundred times more precise compared

to previous measurements of the speed of light and paved the way for
the modern SI definition of c.
2.2.2 Solutions to the wave equation

Many text books will have details on the solutions to the wave equations.
For any particular problem the choice of geometry is important. Below
we list the most common solutions to the wave equations:
• Plane waves
• Spherical waves
• Cylindrical waves
• Gaussian waves (approximate solution used in laser physics)
Other types of solutions may be found depending on the geometry con-

sidered. In this course we will mostly be dealing with plane wave so-
lutions. Why is that? The reason is the plane wave solutions constitute
what is called a complete set2 . That means any field solution may be
expressed in terms of plane waves. This is a very powerful tool, now
we only have to deal with plane waves, at least when we make general
conclusions. Plane traveling wave solutions are of the form:
A0 ei(k·r−ωt) (2.27)
where A0 and k are constant vectors. This we also call a monochromatic

wave. It contains only one frequency component ω and is an idealized
representation of real world fields. Phase fronts or wave fronts, are points
on the wave where the phase k·r−ωt is a constant, move with the phase
speed vp :
d dr ω
(k · r − ωt) = 0, i.e., vp = = , (2.28)
dt dt k
2 With plane waves constituting a complete set, we may express all other solutions
to Maxwell’s equations as a superposition of plane waves. You may think about
it as a Taylor expansion, but in plane waves. In quantum physics you will learn
more about complete sets and how to check if a given basis is complete.
where we in the last step only looked for the magnitude. Below we will
see that vp = c. Note that if we assume traveling plane wave solutions
for, for example the E-field,
E(r, t) = E0 ei(k·r−ωt) , (2.29)
it implies that all fields are harmonic plane wave solutions of this form.
So in total we operate with a set of solutions like:
E(r, t) = E0 ei(k·r−ωt) , (2.30)
i(k·r−ωt)
D(r, t) = D0 e , (2.31)
i(k·r−ωt)
B(r, t) = B0 e , (2.32)
i(k·r−ωt)
H(r, t) = H0 e . (2.33)
Just by acting with the ∇ operator on those equations, remembering the
Leibniz rule 3 and using Maxwell’s equations we can derive a number
of important relations for the propagation of light. For example from
Gauss’ law and the law stating ”no magnetic monopoles” we obtain:
∇ · E = ik · E = 0 and ∇ · B = ik · B = 0. (2.34)
so both the E- and B-fields are perpendicular to the k vector, the direc-
tion of propagation. This shows that the electromagnetic wave is trans-
verse. Using Faraday’s law we get:
∇ × E = ik × E = −(−iωB), so k × E = ωB. (2.35)
Then we can write E × B as:
1
E×B= E × (k × E)
ω
(2.36)
Using the vector identity
A × (B × C) = (A · C)B − (A · B)C
(E · E)
E×B= k. (2.37)
ω
3 Two tricks that often come in handy:
∇ · (Af (x, y, z)) = (∇ · A)f (x, y, z) + A · (∇f (x, y, z))
∇ × (Af (x, y, z)) = (∇ × A)f (x, y, z) + ∇f (x, y, z) × (A)
where f (x, y, z) is a well behaved function of x, y, z and A is a vector. For
example ∇ · E0 ei(k·r−ωt) = ik · E0 ei(k·r−ωt) and
∇ × E0 ei(k·r−ωt) = ik × E0 ei(k·r−ωt)
Problem 2.5 Show that for a traveling plane wave ω = ck , and de-
duce the phase velocity vp = c. Show also that E = cB linking the
magnitude of the electric field and the magnetic field, an important re-
lation we will use many times.
We can summarize our important conclusions as:
Traveling plane waves in vacuum
E ⊥ k (2.38)
B ⊥ k (2.39)
(E · E)
E×B = k (2.40)
ω
E = cB (2.41)
In figure 2.5 we show an electromagnetic wave (EMW) propagating along

the +z-direction. The line the E-field vector traces out we call the polar-
ization of the EMW. In this case the wave is linearly polarized along the
x-direction. Other types of polarization are possible, such as circularly
polarized waves where the E-field traces out a circle. This we return to
in chapter 5 of these notes.
Example 2.6 Polarization of scattered sun light.

If you look at the sky on a sunny day you will see light, mostly blue
light, scattered into your eyes. Light is scattered on molecules and tiny
particles in the air above us. Sun light is characterized by being randomly
polarized, i.e., there is no preferred direction of the E-field. However,
light received by our eyes will be perfectly linearly polarized when we
look at 90 degrees with respect to the incoming sun light, see figure
2.6. This follows directly from Maxwell’s equations! see equation (2.38).
Light originating from the sun is randomly polarized, but always such
that E ⊥ kout for all E-field directions. Light received by your eye must
in also obey E ⊥ kin . Thus we will observe polarized light and the effect
is maximal when the angle between kin and kout is 90 degrees. Next
time you walk outside on a sunny day bring a polarizer and observe
x
E
z
Figure 2.5 Traveling electromagnetic wave propagation in the +z
direction. Notice how the E-field and the B-field are in phase and
orthogonal to each other. The direction of the E-field we call the
polarization of the light, here linearly polarized light, polarized along
the x-direction.
Sun light
Sky or clouds Randomly polarized
k out k in
k in
Observer
k out polarized
Figure 2.6 Sun light is randomly polarized. When we observe the

scattered light at 90 degrees with respect to the incoming direction
light will be perfectly polarized. The eyes of bees and many animals
are sensitive to polarization. They use it not only as a compass for
navigation, but also to detect water surfaces, to enhance visual power
the light scatters on the sky.
Maxwell’s equations at work in the sky. Alternatively have a look at the

moon on a clear night and see wonders!
You can also observe the effect using a flashlight and a glass of water.
Shine the flash lamp through the water and observe the scattered light
at 90 degrees as you slowly add a bit of milk. When you see blue light
try to observe the polarization through a polaroid polarizer or polaroid
sun glasses. Orienting the polarizer along the flash lamp direction will
kill all the scattered light as it is perfectly polarized perpendicular to
this direction.
In a more advanced version add 20 grams of sodium thiosulfate to 2

l of water stored in a small fish tank. Then add 20 ml sulfuric acid and
stir. After about 5 minutes you will see nice blue color at 90 degrees
with respect to the incoming light. Again observe the polarization with
a polarizer.
The Human eye is only marginally sensitive to polarization. In the

macula there is a weak dichroism4 first observed by Haidinger in 1846.
We return to that later in the chapter concerning the polarization of
light.
2.3 Model of electron motion in materials

Optical properties of matter is controlled by electrons, their motion and
how they are bound to atoms or molecules. For dielectrics such as glass,
crystals, gases and liquids electrons are bound relatively tight. We can
often approximate the system by a tight inactive core with a few active
and less bound electrons further away from the positive nucleus. In con-
trast metals have many electrons that can move around freely in an ion
lattice structure.
Generally, both positive and negative constituents of matter will in-
teract with an incident electromagnetic wave. If we disregard very short
wave lengths such as x-rays, where also the inner core is active, we can
classically model the system through the Lorentz force:
Florentz = q(E + v × B), (2.42)
where q is the charge, E the electric field, v the charge velocity and B
the magnetic field.
We model the active electrons attracted to the positive charge (nu-
cleus) by a harmonic potential with a resonance frequency ω0 . Actually,
4 Dichroism means light rays having different polarizations are absorbed by
different amounts.
E Light Electron Electron

spring
wo
Positive Positive Positive
B k
Atom Molecule
Figure 2.7 An atom or molecule interacting with plane wave EMW
of the form E = E0 e−iωt . We can ignore the spatial dependence
following the dipole approximation, see main text for details.
it is a forced damped harmonic oscillator, how about that - doing sim-

ple mechanics pays off! The damping term has different interpretations
depending on which system is in play, say a dielectric or a metal.
For dielectrics, where the electron is tightly bound, the resonance fre-
quency ω0 plays the role energy difference between pronounced energy
states. In a simple picture we imagine the dielectric to be made up of
atoms or molecules with only two states. The upper state, which we
call the excited state |ei, and the lower state |gi, so that the energy
difference between the two states becomes ∆E = Ee − Eg = h̄ω0 . An
electron excited to the upper state |ei may decay to the lower state |gi
upon emission of a photon. A spontaneous emission of a photon goes
into a random direction, - there is no way of predicting which way it
will take! Physically, what is the damping coefficient then? - well the
damping coefficient γ describes the finite lifetime of the upper excited
state. When a certain part of the upper state excitations decays by spon-
taneous emission, the amplitude of the field oscillation decreases, - we
lose energy. A deeper understanding of this phenomenon really requires
advanced quantum mechanics, which you will do later. So you see the
system (atoms or molecules) actually removes energy from our incident
electromagnetic wave, as the wave drives the electrons and they lose en-
ergy by so-called spontaneous emission into random directions. That is
why the wave is damped.
For metals the behavior is completely different. Here, the electrons
are essentially free so ω0 → 0 and the damping γ describes the frictional
due to collisions with lattice vibrations and impurities in the metal.
Both the positive (nucleus) and negative (electrons) charges will be
accelerated by the electromagnetic wave. However, the mass of the nu-
cleus is more than 2000 times heavier than the electron, so the motion
of positive charges can be neglected. Only electrons play a role! Sec-
ondly, we can disregard the contribution of the B-field in the Lorentz

force. Electrons typically move with non-relativistic speeds αc c where
α = 1/137 is the fine structure constant. So vB = vE/c E and the
E-field strongly dominates. For heavy atoms this may not be entirely
true. Newton’s second law for our model system then yields (1D model
sufficient for a monochromatic wave, why?):
qE0 −iωt
ẍ + γ ẋ + ω02 x = e , (2.43)
m
which is the well-known damped driven harmonic oscillator. Notice we
may we leave out the spatial dependence of the electromagnetic wave.
The ikx term can be neglected as x has the dimension of the atom
typically a0 = 0.5 · 10−10 m and λ is in the range 100 nm or larger, so
kx ∼ 2πa0 /λ 1, this is what we call the dipole approximation. The
complete solutions to the differential equation (2.43) is the sum of two
terms: a transient solution damped at the time scale of γ typically 10−9
- 10−12 seconds, and a steady state solution5 . We will only be interested
in the steady state solution as we will study how continuous wave (CW)
light fields interact with materials and not pulsed light sources. The
steady state solution becomes6 :
qE0
x(t) = e−iωt . (2.44)
m(ω02 − ω 2 − iγω)
Alternatively, we could also perform the above analysis using a cos(ωt)

description of the wave rather than e−iωt . We will obtain the same
physics in the end when taking the real part. However, here is a trick.
By performing the analysis using complex analysis e−iωt the attenuation
and the phase delay of the incident electromagnetic wave will naturally
come out as you will see later. Before going on to the next section we
note an important thing about the dipole described by equation (2.44).
In steady state it is oscillating at exactly the same frequency ω as the
5 As the names suggest that the particular (= steady-state) solution will
eventually dominate. And indeed if even a slight amount of damping is present,
the homogenous component becomes negligible after sufficient amount of time
has elapsed. We can then focus our attention on the particular or steady-state
solution. This will allow us to exhibit the important phenomenon of resonance.
6 If we choose E0 e+iωt for the driving field the solution becomes
qE0 +iωt so be careful. Once you have chosen your sign for the
x(t) = m(ω2 −ω 2 +iγω) e
0
exponent stick to it in your further analysis. Later under the model for index of
refraction it is easy to get wrong signs if you are not consistent. The negative
sign, as used here, is preferred in physics and, funnily enough, called the positive
frequency component of a wave.
incoming light wave, very important. This has a number of interesting

consequences.
Notice, if you neglect damping, the solution for x(t) is given by:
qE0
x(t) = e−iωt . (2.45)
m(ω02 − ω 2 )
What does this mean? As the light frequency approaches the resonance
frequency of our atoms or molecule ω0 the amplitude becomes larger and
larger, eventually it diverges. This is because damping is not included,
we keep pumping energy into the system. This is absolutely unphysical,
however, such a limited model gives the correct physics in the limit we
will be discussing except when ω0 ∼ ω.
The model presented in this chapter seems at first eye a bit simplistic.
However, it turns out to be a very powerful tool to get the physics. In-
terestingly, the simple model of driven damped harmonic oscillator fully
agrees with the result of quantum mechanics. We will use the simple
model again and again. It is a good idea to familiarize yourself with it.
2.3.1 Rayleigh scattering of light

Accelerated charges produce light and charges are accelerated by elec-
tromagnetic radiation. We assume both to be present. The energy flow
per unit time per unit area (Poynting vector) from a radiating dipole
can be shown to be given by:
P̈2 sin2 θ r
S= · . (2.46)
16π 2 ε0 c3 r2 r
Here r is the vector to the observation point and θ the angle of the
dipole axis with respect to r. For a proof we refer to other texts and
books7 . The amount of power radiated by an accelerated charge is ob-
tained by integrating the above expression for the Poynting vector over
7 Classical Electrodynamics by J.D. Jackson, 3rd ed., Classical Theory of Fields by
L.D. Landau. et al., R. Feynman et al, Lectures on Physics vol.2, D.J. Griffiths
Introduction to electrodynamics, Reitz et al, Foundations Of Electromagnetic
Theory, 3rd Edition
the unit sphere and using that the dipole moment can be written as
P = qx(t). This gives:
q 2 2a2
Prad = , (2.47)
4πε0 3c3
where a = ẍ is the acceleration of the charge. Now we put up a simple
model of an atom or molecule, as above, to see what amount of power
that is radiated and at what frequency. The medium we have in mind is
a dilute transparent one, such as a gas where the molecules are placed
at random and uncorrelated positions8 like e.g. the atmosphere of earth.
In these systems the resonance transitions of the atoms or molecules are
in the UV or far infrared and usually not in the visible range 400 nm -
700 nm, hence there is low absorption in the visible range. Taking the
double time derivative of equation (2.44) and using the limit ω0 ω
gives the scattered power:
Rayleigh Scattering
ω4
Prad ∝ . (2.48)
ω04
Rayleigh scattering applies when particles, atoms and molecules are sig-
nificantly smaller than the light wavelength λ. Generally, scattering is
explained by the Mie theory for an arbitrary size. For small scatter sizes
the Mie theory reduces to the Rayleigh approximation.
Problem 2.7 Show the expression (2.48) for Rayleigh scattering.
Example 2.8 Rayleigh Scattering in the sky

Why is the sky blue and sunset red? This is due to Rayleigh scattering on
molecules and small particles in the air. Looking at the visible spectrum,
about 400 nm - 650 nm, blue light will be scattered a lot more compared
to red light. More specifically, a factor of (650/400)4 = 7 according to
Eq. (2.48). This means the majority of sun light (white) scattered on
the atmosphere will be blue. At sunset the sun rays traversing a long
path though the atmosphere will have all the blue part scattered out
8 Things change when the positions become correlated. A liquid or a piece of glass
can be as transparent as a dilute gas, despite the much higher density of
polarizable dipoles. For a more advanced discussion see the article Am. J. Phys.
78, 94-101 (2010) by A.Rojo and P.Berman.
and only red is left. The thing is the sky becomes ”more” intense blue
and sunset more intense red the more we pollute, sadly speaking. Then
we may ask why the sunset is not purple then? We need to include two
things, the detector response of our eyes and the wavelength distribution
of the sunlight. The eye is most sensitive to light at 550 nm and the sun
emission spectrum peaks at 500 nm. Taking that into account settles the
most intense observed light to the blue color.
As we discussed before, you can also observe the effect using a flash
lamp and a glass of water very similar to example 2.6. Shine the flash
lamp though the water and observe the scattered light at 90 degrees as
you slowly add a bit of fat milk. Milk contains tiny fat particles of sub-
micron size. As you add milk slowly you will see blue light at 90 degrees
and the flash lamp will appear red when you look through the milky
water. In a more advanced version add 20 grams of sodium thiosulfate
to 2 l of water stored in a small fish tank. Then add 20 ml sulfur acid and
steer. This will allow small sulfur particles to be generated that scatter
of light. After about 5 minutes you will see nice blue color at 90 degrees
and a nice sunset for light traversing fish tank.
It is a normal misconception that water is blue due to the Rayleigh

scattered light from the sky and subsequently reflected on the water sur-
face. This is not so. What is the color of water? Water appears often
blue due to absorption of red part of the spectrum in water (vibrational
coupled transitions in the molecule). Depending on the depth, bottom
conditions and particles in the water it can take a variety of colors. Obvi-
ously, water can take many colors by reflection, also blue from reflection
of light scattered from the sky. However, water itself has a blue color
due to the fact that out of a broadband white light spectrum mainly the
red/infrared part is absorbed.
Problem 2.9 The clouds are white. Is this a result of Raleigh scatter-
ing? What could be the reason for the white color?
2.3.2 Energy of an EMW

The electro-magnetic wave carries electric and magnetic energy. Let us
analyze the traveling wave. The total energy density U (with SI-units
cDt
A
k
Figure 2.8 Geometry to calculate the intensity of a traveling EMW

in vacuum from the energy density U . The amount of energy passing
per unit time ∆t through a unit area A in the direction of propagation
specified by the wave vector k defines the intensity. We also call it
energy flux per unit area.
[J/m3 ]) is given by:

1 1 2
U = UE + UB = ε0 E 2 + B . (2.49)
2 2µ0
The magnetic part of the energy density can be expressed as (E = cB
and ε0 µ0 = 1/c2 ):
1 2 1 1
UB = B = 2 E 2 = ε0 E 2 (2.50)
2µ0 2c µ0 2
so each part contributes with an equal amount, beautifully symmetric
as expected. Finally, we obtain:
U = ε0 E 2 = ε0 EBc. (2.51)
The intensity of the traveling wave9 is the amount of energy pass-

ing per unit area per unit time in the direction of propagation k.
Thus we find the total amount of energy passing the surface area A in
figure 2.8:
U · c · ∆t · A
I= = U · c = ε0 EBc2 . (2.52)
A · ∆t
This leads us to define a vector pointing in the direction of the wave
propagation but having the magnitude of the intensity, the so-called
Poynting’s vector S:
9 Notice we are talking about a traveling wave. For a standing wave, say between
two ideal mirrors this expression does not hold.
Poynting’s vector
E×B
S= =E×H (2.53)
µ0
representing the energy flux per unit area. Note that the rightmost ex-
pression can be used also inside all materials.
Example 2.10 Energy flux of a plane traveling monochromatic

wave.
Let us assume a wave traveling in vacuum of the form:
E(r, t) = E0 cos(k · r − ωt) , B(r, t) = B0 cos(k · r − ωt) (2.54)
this leads to a time dependent Poynting vector (constant vectors omit-

ted):
E0 B0
S(t) = cos2 (k · r − ωt). (2.55)
µ0
Often we are interested in the time averaged value of Poynting’s vector,
especially for optical fields (visible light). They typically have frequencies
in the 1014 − 1015 Hz range, impossible for any detector to follow. The
time average of a given quantity f (t) we define as,
1 t+T
Z
hf (t)iT = f (t0 )dt0 , (2.56)
T t
where T is chosen sufficiently large10 . For a monochromatic field T =
2π/ω is sufficient. The time averaged Poynting vector becomes:
E 0 B0 1
hS(t)iT ≡ S = = ε0 cE02 , (2.58)
2µ0 2
since the time average of a cosine or sine squared over an optical period
or longer is equal to 1/2. Remark: detectors are sensitive to intensity
and not electric field, this includes the eye. It is important to note that
the energy flux per unit area scales as the square of the E- or B-field.
10 For given electromagnetic field, not necessarily monochromatic, we need to
consider the limit
Z t+T
1
lim hf (t)iT = f (t0 )dt0 , (2.57)
T →∞ T t
Example 2.11 Energy of sun light.

Assume visible light and E0 = 100 v/m. This gives S = 13 W/m2 .
Compare this to you average body radiation energy of 60 W. In conclu-
sion, we will not consider this field very intense nor dangerous to the
human body. Let us increase the field by a factor of 10. Then the inten-
sity will grow by a factor of 100. Now S = 1300 W/m2 and this field
is indeed dangerous to you. In fact 1.3 kW/m2 is the solar constant at
normal incidence. The solar constant actually varies by ± 3% because
of the Earth’s slightly elliptical orbit around the Sun. So the maximal
energy you can harness from the sun at Earth is about 1.3 kW/m2 .
If you calculate the energy flux into your eye (pupil radius = 2 mm)
you obtain about 5 mW. This is also the danger level for working with
lasers and for red light an upper limit of 1 mW is set (max exposure
time one second). This means light levels below 1 mW (exposure times
at one second) will not seriously harm the functioning of your eye. Most
laser pointers, however, you can buy deliver about 5-20 mW total opti-
cal power or more. Most green laser pointers emit also infrared radiation
with considerable power. The output beams can all be very very danger-
ous, even after being reflected off common surfaces, like a glass window
or just a finger nail. Never look into lasers or laser pointers! Never
look directly at the sun!
2.3.3 Momentum of light

Light has momentum. This was known since the time of Johannes Ke-
pler (1571-1630) who argues that comet tails are formed by ”sun wind”.
The sun ”blowing” on small comet dust particles and atoms. In the
frame work of quantum electrodynamics and the quantization of elec-
tromagnetic waves photon momentum and angular momentum naturally
emerges. At this point we can see it in a classical context and from the
energy quantum (Max Planck 1905) and the special theory of relativity.
Photons move with the speed of light in vacuum, so they have zero rest
mass m0 = 0. This gives us:
E = (m20 c4 + (cp)2 )1/2 = cp, (2.59)
and the energy is given by E = hν so

Photon momentum
h
p= (2.60)
λ
In a more advanced treatment of classical electrodynamics it can be

shown from the Lorentz force that electromagnetic momentum in a vol-
ume V is given by:
Z Z
1 1
Pf ield = 2 E × H d3 r = 2 S d3 r (2.61)
c volume c volume
We will not show this here but refer to [Jackson] page 260. From this
expression we can find the time averaged radiation pressure (totally
absorbed by surface A) by considering a volume as shown in figure 2.8:
hSi∆tcA hSi
Plight = = . (2.62)
∆tc2 A c
This expression was, in fact, also shown by Maxwell.
Example 2.12 Radiation pressure on Earth. The radiation pres-

sure on the Earth due to the sun is then given by:
hSi
Plight = = 4.33 · 10−6 N/m2 . (2.63)
c
This number you can compare to the atmosphere pressure of about 105
N/m2 .
Example 2.13 Radiation pressure on atoms. Consider near-resonant

light, say a laser beam, interacting with two level atom, such as sodium
atom. When light is absorbed the momentum of the photon is trans-
ferred to the atom. On the other hand, by spontaneous emission the
atom emits the photon again in a random angle. The total momentum
thus averages to zero over many cycles for the emission process, except
in the direction of the incident light. Finally, we are left with the net
momentum transfer in the direction of the incident light. The life time
of the excited state is assumed to be τ . A total optical cycle takes on
average 2τ which gives us the force:
∆P h
F = = . (2.64)
∆t 2λτ
Using the numbers M = 23 · 1.6610−27 kg, λ = 589 nm and τ = 16 ns
life time t
v e
Figure 2.9 The principle behind laser cooling of neutral atoms.
we find an acceleration of a = 105 g . Even for thermal atoms moving

with speeds of 500 - 1000 m/s this acceleration can bring the atoms to
rest within a few µs on a length scale of about 50 cm. This technique is
a work horse in many laboratories world wide working with optics and
atoms.
The cooling effect happens as the laser light is tuned below resonance
of the atomic transition. Since the atom moves, by Doppler effect the
light is shifted into resonance νlaser < ν0 , and the absorption process
happens frequently. The emission, however, takes place at ν0 with very
high probability. So on average we remove the energy ∆E = νlaser − ν0
for each absorption-emission cycle. Remember we have about 108 of
these cycles per second. This energy is removed from the center of mass
motion of the atom and it slows down. Physicists were awarded the 1998
Nobel prize in physics for this spectacular achievement.
Problem 2.14 Solar sail.

Assume we travel radially away from the sun using an ideal solar sail
of area A 500 m times 500 m. The solar sail you can imagine as a
perfect mirror. You start at the position of the Earth. Find an analytical
expression for the acceleration and speed as a function of position r as
you move away from the Earth’s orbit. The total mass of our unit plus
personnel is M = 150 kg. What is the terminal speed as r → ∞?
The ”Breakthrough Starshot” is a 100 million dollar research program
aiming at reaching nearby stars in matter of years. The proposal involves
using a tiny 5 gram ”nanocraft,” called StarChip, attached to a solar
sail of several m2 . An array of lasers on Earth will then be used to
direct a powerful light beam approaching 50-100 gigawatts at this sail,
potentially accelerating the unit to 20 percent of the speed of light in a
matter of minutes. Is this possible according to you?
2.4 Maxwell’s Equations for an insulator

Now we will analyze Maxwell’s equations for a transparent dielectric
medium where the conductivity is zero. We assume the medium is isotropic
and homogeneous. The material you can have in mind is, say, a gas, or a
piece of glass or Plexiglas (PMMA). Maxwell’s equations in the insulator
reduce to:
Insulator
∇·E = 0 (2.65)
∇·B = 0 (2.66)
∂B
∇×E = − (2.67)
∂t
∂E
∇ × B = εµ , (2.68)
∂t
as we have no sources and the material equations are controlled by ε, µ.
This gives a wave equation similar to before:
∂2E
∇2 E = εµ ,
∂t2
with a propagation speed
1
v=√ .
εµ
With this relation in mind we define the index of refraction n as
the ratio:
Index of refraction n (2.69)
r
c ε µ
n= = .
v ε0 µ0
Often we also express the index of refraction as:

√
r
ε µ
n(ω) = ≡ εr · µr .,
ε0 µ0
where we have defined the relative permittivity εr and permeability µr .

Notice the index of refraction is a function of the light frequency ω
since the material constants χ, χm are functions of ω. In addition, as
most materials will not be magneto-optical active, µ ∼ µ0 and we can
consider µr = 1. Unless otherwise specified we will assume µr = 1 and
the refractive index can be expressed as:
r
ε
n(ω) = .
ε0
For harmonic waves, the wave equation (2.69) imposes the dispersion
relation:
c
ω = · k. (2.70)
n(ω)
This relation is more a general version of the traditional c = νλ. We say

the wave undergoes dispersion as it’s velocity depends on the frequency
or alternatively it’s wave number, or wavelength. This is a common
feature of all optical media. It is not a question if there is dispersion
or not but a question of how much.
The phase velocity in an insulator is given by:
c
vp = (2.71)
n(ω)
All the above considerations assumed n to be real. In general this is not

true for all materials, especially conductors may have a complex index
of refraction, see next section. For transparent insulators it is often a
good approximation.
Problem 2.15 Show equation (2.70) using the expression for a har-
monic wave E = E0 cos(ωt − k · r) and equation (2.69).
Analogous to the case of EMW in vacuum we deduce:
Insulator
E ⊥ k (2.72)
B ⊥ k (2.73)
1
E×B = (E · E)k (2.74)
ω
c
E = B (2.75)
n
S = E×H (2.76)
1 n 2
hSi = E (2.77)
2µ0 c 0
The last relation follows from the energy density which can be written
as:
1 n 2
U = εE02 = E .
µ0 c 0
Generally, since |S| = |E × H| = nE 2 /ωcµ we may establish a relation

between a unit vector, Ŝ, along the direction of S the flow of energy,
and the k-vector. Recall, S = (E × B)/µ = (E 2 /µω)k, so we find the
powerful relation:
Isotropic insulator
nω
Ŝ = k. (2.78)
c
This relation holds in isotropic materials and shows that the flow of
energy is directed along the k-vector, which seems very natural indeed.
Later we will see that this is not always the case, in particular for non-
isotropic materials, such as uniaxial and bi-axial crystals.
2.4.1 Simple model of the index of refraction

Now we consider a model of the index of refraction for a dilute medium.
We assume in the beginning that our atoms or molecules only have one
resonance frequency ω0 as before, i.e. just two energy levels. Later we
extend it to include several energy levels and the corresponding reso-
nances. The behavior predicted by the model is generally true, even for
dense materials, with only minor changes. The model is based on the
polarization of the material as viewed from a macroscopic point of view,
total polarization of all particles, and a microscopic picture, polarization
of single atoms and molecules. Recall the displacement field
D = ε0 E + P (2.79)
= εE , with ε = ε0 (1 + χ)
The χ value is a measure of how easy is it to polarize the medium, that

is to say, how easy is it to displace the electrons from their equilibrium
position in the material. For later use we rewrite the magnitude as:
P
ε = ε0 + .
E
The macroscopic polarization is the product of the particle density N/V
and single particle polarization Psingle . Our model from a previous sec-
tion, see equation (2.44), provides the time dependent displacement x(t)
of the charges in the EM field. We find
N N
P = Psingle · = q · x(t) · ,
V V
where N is the number of (moveable) charges, V the volume, q the
charge and x(t) the displacement. Using equation (2.44) and (2.80) and
the above equation we find:
N q2
ε = ε0 + · 2 ,
V m(ω0 − ω 2 − iγω)
and the index of refraction using equation (2.70)
N q2
n(ω)2 = 1 + · 2 . (2.80)
V mε0 (ω0 − ω 2 − iγω)
Figure 2.10 Index of refraction of transparent solids as a function of

density. (Data from http://physics.info/refraction/).
It is interesting to note that a quantum mechanical treatment leads to

exactly the same result. Generally, we expect the index of refraction to
increase with density, see figure 2.10, and with the molecular polariz-
ability, i.e. how easy it is to polarize the medium, see figure 2.11. Near
the resonance frequency ω0 of the atoms/molecules it becomes larger.
It is worth discussing now some features of the refractive index model
as described in equation (2.80). If we had used a model with no damp-
ing in the expression for x(t), see equation (2.45), we would obtain a
pure real index of refraction. When no damping is present we can only
have stimulated absorption and emission taking place in the direction
of the k vector. There is no spontaneous scattering into other direc-
tions, i.e. no spontaneous emission. This also means we will have no loss
of the incoming wave intensity in the material. In the case where we
included damping, giving a complex n, we will have spontaneous emis-
sion. Spontaneous emission of an atom or molecule takes place in all 4π
(all directions) and the incoming wave will be attenuated as we scatter
photons in random directions.
Index of refraction n(ω)
n increases with density of dipoles per unit volume (2.81)
n increases if active electrons are loosely bound (2.82)
n changes value significantly around the resonance frequency ω0 (2.83)
For a dilute medium (such as a gas at low pressure) the refractive

index will deviate only slightly from 1 and thus equation (2.80) can be
reduced by using n2 − 1 = (n + 1)(n − 1) ' 2(n − 1). We then obtain:
1 N q2
n(ω) = 1 + · · 2 ,
2 V mε0 (ω0 − ω 2 − iγω)
and real and imaginary part of the index can be retrieved. The general
separation of equation (2.80) into real and imaginary parts is, of course,
also possible, but quite tedious. It is nevertheless quite instructive, so
you should do this as an exercise.
Problem 2.16 Find real and imaginary part of expression (2.84). Plot
them as a function of ω.
In figure 2.11 we plot the real n0 and imaginary index n00 . Notice
the frequency region with dn0 /dω < 0, where the dispersion is called
anomalous – the ”rainbow” is inverted. For normal dispersion we have
dn0 /dω > 0, just like in a prism, where blue colors are bent more than
red colors. The imaginary part of the index n00 is proportional to the
attenuation of the wave, as discussed in the next section.
In a real material the atoms and molecules will have a number of
resonances ω0j corresponding to different transitions. Some transitions
will be strong and some less strong. The expression (2.80) thus becomes
the sum:
q2 N X fj
n2 = 1 + 2 . (2.84)
mε0 V j (ω0j − ω 2 − iγj ω)
where fj is the transition strength corresponding to transition j. In

P
quantum mechanics one can show that a sum rule j fj = 1 holds.
n n
ω0 ω ω0 ω
Figure 2.11 Plot of the real n0 and imaginary index n00 as a function of
ω using equation (2.84). The real part shows strong dispersion around
the resonance frequency while the imaginary part is a Lorentz curve
showing maximal absorption at ω = ω0 .
1.5
1.3
l
300 nm 1000 nm
Figure 2.12 The general form of the index of refraction of a trans-

parent medium as a function of wavelength in vacuum. At very low
wavelengths transitions become very strong and the index increases
fast. Absorption also becomes more important.
The general form of the index of refraction for transparent substances

such as glass is shown in figure 2.12. For wavelengths in the optical
domain 400 nm - 1100 nm no significant resonances are present. Closer
to the UV range below 350 nm a number of strong transitions typically
appear. The rising index is the result of the wing you see in figure 2.11
(there plotted as as function of ω) for a number of transitions. Further
in the UV the absorption becomes significant, blocking light, and for
glass it is the reason why you cannot become tanned sunbathing behind
a window. On the webpage http://refractiveindex.info/ you are able to
retrieve refractive index data for most common substances.
For a more accurate description of dense materials we need to modify
equation (2.84), or (2.80) if you prefer. The reason is that our assump-
tions regarding the local electric field are not completely true. In dense
materials atoms will feel neighboring atoms located in the vicinity which
modifies the local electric field. This field must be added to the external
field from the EMW. In many books on EM you may find the expression
for the total field on the atom:
P
Etot = E + .
3ε0
Using this expression the index of refraction becomes:
n2 − 1 q2 NX fj
2
= · 2 . (2.85)
n +2 3mε0 V j (ω0j − ω 2 − iγj ω)
This is known in optics as the Lorentz-Lorenz formula or in the static

(low frequency) case as the Clausius-Mosotti relation. Summing up, you
see in general the index of refraction has both a real and an imagi-
nary part. The real part is responsible for a phase shift of the wave in
the medium compared to propagation in vacuum, while the imaginary
part describes the attenuation of the wave as it propagates though the
medium.
Problem 2.17 Show expression (2.85).
2.4.2 Real and complex index of refraction

We can write the complex index of refraction as a sum of a real part and
a imaginary part:
n = n0 + in00 .
The findings in equations (2.72-2.76) are still correct. From the wave
equation we obtain again the dispersion relation
n(ω)
k=ω ,
c
as before. Mathematically, we have the following two possibilities: (a)
either the wave vector is complex and ω is real or (b) ω is complex and
the wave vector is real. Obviously, we prefer to have ω real and k-vector
complex k = k0 +ik00 , because we want to consider stationary situations.
Figure 2.13 Measured index of refraction for BK7 and fused silica
glass. BK7 is the most common glass type you will find use in camera
lenses etc. The BK7 glass is based on borosilicate glass compound.
In this way we also avoid interpreting complex times and frequencies.

Our plane wave solution then becomes:
E(r, t) = E0 exp(i((k0 + ik00 ) · r − ωt)) (2.86)
00
= (E0 e−k ·r
) exp(i(k0 · r − ωt)), (2.87)
so our wave will be attenuated in the material corresponding to the

spontaneous emissions scattering discussed above. If k0 and k00 are in
the same direction11 q̂ we have
(n0 + in00 )
k 0 + ik 00 = ω ,
c
and
11 This is usually the case when the source of light is residing inside the absorbing
medium. For light entering from the outside with an incidence angle away from
the surface normal, things become slightly more intricate. We will discuss that
when we derive and study the Fresnel equations.
E(r, t) = E0 exp(i((k0 + ik00 ) · r − ωt)) (2.88)
ωn00 n0
= E0 e − c q̂·r
· e−iω(t− c q̂·r)
(2.89)
where q̂ is a unit vector in the wave vector direction. The quantity δ =

c/ωn00 we call the attenuation length or sometimes the skin depth. In
most cases we will ignore this attenuation since it is very small for a
transparent insulator, and thus only work with the real part n0 . But for
metals the attenuation is significant and here it must be included in the
description as discussed further ahead.
Notice when we have a complex index of refraction in play the E- and
B-fields need not be in phase! To see this look at Faraday’s law that we
used before:
1 1
B = − k × E = − (k0 + ik00 ) × E,
ω ω
As any complex number can be written x + iy = r · eiθ the two vectors
will be out of phase, but still mutually perpendicular.
2.4.3 Group and phase velocity

So far we have assumed monochromatic waves where amplitude, phase,
angular frequency and wave number take definite values. This is an ide-
alization of nature, no such sources exists. There will always be a certain
spread in frequency and wave number – perhaps even in phase and am-
plitude. These conditions can be described by an appropriately weighted
sum over monochromatic waves and for a continuous description of the
spread you will end up with an integral over (probability) distributions.
The range of frequencies (or equivalently wave numbers) depends on the
source we are considering. Sometimes it may be a large interval corre-
sponding to a range on nm scale. Lasers, for example, have line widths
in the GHz to sub Hz range, whereas the spectral width of fluorescent
or incandescent lamps can range from GHz up to a few 100 nm12 .
The phase velocity for a monochromatic wave, which you may call the
speed of an elementary wavelet, is the speed of a point with constant
12 Different communities in optics have different preferences. Laser physicists
typically use frequency spread, while lighting engineers often prefer wavelength
spread. Knowing the dispersion relation it is, of course, easy to convert one into
the other.
phase. We found earlier:
ω
vp = . (2.90)
k
In dispersive media the angular frequency is a function of the wave

number ω(k), as expressed by the dispersion relation
c
ω= · k, (2.91)
n(ω)
so phase velocity is not the same for all frequency components of the
wave. Some components advance faster than others and consequently
change phase with respect to one another. Consider a weighted sum of
monochromatic waves with angular frequencies ω(k) and g(k) is some
peaked weight function. The resulting wave is given by the integral:
Z +∞
1
Φ(x, t) = √ dk g(k)ei(kx−ω(k)t) .
2π −∞
√
The pre-factor 1/ 2π is put for convenience so that the integral can be
interpreted as the (unitary) Fourier transform. It will make no difference
in our final conclusion, and could as well be omitted. With the Fourier
transform in mind the weight function g(k) can be obtained from, see
figure 2.14:
Z +∞
1
g(k) = √ dx Φ(x, 0)e−ikx .
2π −∞
As the k-space and the x-space are linked by the Fourier transform, the
initial spatial pulse spread ∆x and the spread in the g(k)-function ∆k
obey an ”uncertainty” relation ∆x∆k ∼ 1.
Assume the case of a peaked g(k) function, as depicted in figure 2.14.
Since n(ω) does not change much over a narrow range of ω we can Taylor
expand it to first order around k = k0 :
dω
ω(k) ' ω(k0 ) + (k − k0 ) + . . . (2.92)
dk
= ω0 + vg (k − k0 ), (2.93)
where vg is the group velocity. This expression we insert into equation

(2.92) and find:
Z +∞
1
Φ(x, t) = √ dk g(k)ei(kx−(ω0 +vg (k−k0 ))t) (2.94)
2π −∞
+∞
ei(k0 x−ω0 t)
Z
= √ dk g(k)ei(k−k0 )·(x−vg t) , (2.95)
2π −∞
| {z }
unimportant phase factor
which shows that the pulse form is unchanged and moving with speed
vg . More generally in three dimensions it is defined as:
Group velocity

∂ω
vg = (2.96)
∂k k0
Phase velocity
ω0 k
vp = (2.97)
k0 |k|
The energy transport associated with the wave is proportional to E · E ∗ .

As the envelope travels with the group velocity vg this will also be the
speed of the energy transport, at least in most cases. However, it is
possible to consider cases, where the light frequency is close to a strong
absorption line the medium, or a medium involving gain. There, the
energy transport is not given by the group velocity. But, for passive
dielectric medium it is certainly true.
Figure 2.14 The group velocity vg is the speed at which the pulse
(black envelope) travels along. Phase velocity vp is the speed of con-
stant phase. Here the overall phase speed is vp = ω0 /k0 . The figure
was constructed
√ using g(k) = exp((k − 1)2 /∆k2 ), ∆k = 0.1 and
2
ω(k) = k + 1. Time was put to t = 1.
The energy transport of a wave (in passive medium) is generally given

by:
Energy transport in passive medium:

∂ω
vg = (2.98)
∂k k0
Problem 2.18 For a dispersive medium ω = kc/n(ω) show that

dv
vg = vp + k dkp .
Problem 2.19 Using the dispersion relation given in equation (2.70)
show that
c
vg = dn

n(ω) + ω dω
Problem 2.20 Given the dispersion relation ω 2 − k 2 c2 = constant,

show that
vg · vp = c2
In the presence of dispersion both group and phase velocity may take
different values. A question often asked: is it possible for these velocities
to be larger than the speed of light in vacuum? The answer is yes for
both of them. It is a widely spread misconception that the phase velocity
may exceed c while the group velocity must be below the speed of light
since it is connected to the energy transport velocity. In fact, only when
the light frequency is far off strong absorptions lines and for systems
with no gain, it is true that the group velocity is the energy transport
velocity. The group velocity may be shown to be given by:
c
vg = dn
, (2.99)
n(ω) + ω dω
dn
see the problem above. The feature to notice is the factor n(ω) + ω dω .
dn
In a spectral region where we have normal dispersion dω > 0, the
group velocity is less than c, and may even be significantly less than c,
as show in for example Lene Hau’s experiment on slow light, which we
will encounter in the exercise class. In a spectral region of anomalous
dn
dispersion where dω < 0, it is possible for the group velocity to be
larger than c, even infinite or negative. For a long time this was believed
to be in conflict with Einstein’s special theory of relativity, but Som-
merfeld and Brillouin proved it not to be13 . The important thing here,
is to prove that no signal velocity, enabling transport of information,
may exceed c. In no experiment so far have we observed transport of
information faster than the speed of light!
You may wonder what is really a signal then? A pure sine wave is not
a signal since it has no beginning and no end, it extends from minus
infinity to plus infinity. We have no way of saying what the onset of
a pure sine wave is. If we want to say something about propagation
of a signal, we have to define a start, a beginning, an onset showing
an element of ”surprise” which we could not have predicted from the
eternal periodic wave motion itself.
A good example of a signal, which may be used to transfer information,
13 See for example L Brillouin ”Wave Propagation and Group Velocity” (New York:
Academic) 1960 or P.W. Milonni ”Fast Light, Slow Light and Left-Handed
Light” (IOP) Series in Optics and Optoelectronics (2004)
w
vg > vp vg = vp
vg = dw/dk
w
vg < vp
vp
k k
Figure 2.15 Dispersion relations plotted in an ω, k- diagram. The
straight line corresponds to no dispersion ω = vk. The two other
cases show dispersion. The local slope of the curves yield the group
velocity vg which is compared to the ratio vp = ω/k, by looking at
the subtended angle.
is the simple function

E(t) = E0 θ(t) · sin(ωt) (2.100)
where θ(t) is the Heaviside step function14 and E0 the signal amplitude.
This is a classic example of a signal and one may prove that it cannot
(also in more general form) propagate with a speed larger than c, even
in a spectral region of anomalous dispersion. It takes a bit of math to
do so. Interestingly, Sommerfeld was the first to address the problem of
signal velocity and made a beautiful proof that it is always less than c.
2.4.4 Dispersion viewed in ω, k- diagrams

As we have seen in the preceding section optical waves undergo disper-
sion in materials. A useful way of plotting dispersion curves is in the
ω, k- diagram. In figure 2.15 we plot dispersion relations (artificial, just
made up) in the case where you have no dispersion ω = vp k, and two
cases where you have dispersion.
Problem 2.21 Can you find examples of mechanical waves that show
dispersion and others that do not?
14 The Heaviside step function is a function defined by:

0, t ≤ 0
θ(t) =
1, t > 0.
2.5 Negative refractive index

The refractive index we have introduced as complex quantity, see equa-
tion (2.86):
n = n0 + in00 , (2.101)
and generally we considered n0 to be positive. In fact we did also assume

<(ε) > 0 and <(µ) > 0, but this need not be so.
One of the perhaps most fascinating discoveries of modern optics is
the introduction of a negative refractive index. At first this seems not
to make any sense, but it does. A negative index of refraction results
from negative values of both the dielectric permittivity and the magnetic
permeability. In fact, Maxwell’s equations do not preclude the existence
of one or both of the quantities dielectric permittivity and magnetic
permeability to be negative. Nor do such materials violate any other
fundamental physical laws. If both the dielectric permittivity and the
magnetic permeability are negative one may show that this condition
must lead to a negative index of refraction! We will consider this below.
Back in 1967, Veselago15 pioneered theoretical studies on the subject
and he was the first to consider the physics of a medium that had both
negative dielectric permittivity and negative magnetic permeability in
a given frequency range. (<(ε) < 0 and <(µ) < 0, the imaginary part
we disregard). He predicted a number of striking consequences for such
materials: modified Snell’s law of refraction, a reversed Doppler shift,
converging lenses are becoming diverging lenses etc. Despite Veselago’s
remarkable predictions it remained a rather academic discussion as ma-
terials with both negative ε, µ were not yet discovered. Most surprisingly,
it turns out these materials may not be found in nature.
In year 2000 a major breakthrough took place. A man-made engi-
neered structure was developed and experimentally shown to exhibit a
negative index of refraction (NRI) over a band of microwave frequen-
cies16 , as shown in figure 2.17. Since then a large number of experimen-
tal and theoretical studies have been carried out. On the experimental
side, groups are approaching the visible regime, making ever more clever
structures with peculiar characteristics.
Theoretically, why is the index of refraction negative when ε, µ are
simultaneously negative? To answer this recall equation (2.78) linking
the unit vector Ŝ along Poynting’s vector to the k-vector:
15 V. G. Veselago, Sov. Phys. Usp.10, 509 (1968)
16 D. R. Smith, et al., Phys. Rev. Lett. 84 4184 (2000)
Normal Normal
Figure 2.16 (Top) Materials with a negative index of refraction bend

beams of light the ”wrong” way. On the left you see an air interface
with an ordinary material having n > 0. According to Snell’s law
n1 sin(θ1 ) = n2 sin(θ2 ), n1 = 1 and n2 = n we may deduce the
refracted beam path in the material. For a NRI material n2 = −|n|,
the beam will bend the opposite way with an angle of −|θ2 |. (Bottom)
Illustration of how refraction would look like in regular water (left)
as opposed to water with a negative refractive index (right).
n
Ŝ = ω · k, (2.102)
c
where n is the refractive index and c the speed of light in vacuum.
Notice when both ε < 0 and µ < 0 the value n2 is naturally positive,
but effectively we lose the sign, due to the square. So what to do then?
Let us look at Maxwell’s equations, we obtain the relations (Faraday’s
law, Ampere’s law):
k × E = ωµH (2.103)
k × H = −ωεE. (2.104)
When ε > 0 and µ > 0, as for regular dielectric media, the vectors E,
H and k forms a right-handed triad as does the vectors E, B and k .
However, when ε < 0 and µ < 0 the vectors E, H and k forms a left-
handed triad! For this reason such materials are often named left-handed
materials. To recognize the left-handed triad we may recast equations
Split-ring resonantors Wires
Figure 2.17 (A) An NRI material formed by split-ring resonators

and wires deposited lithographically on opposite sides of a standard
circuit board. The height of the structure is 1 cm. (B) The power
detected as a function of angle in a Snell’s law experiment performed
on a Teflon sample (blue curve) and the NRI sample (red curve).
From R. A. Shelby, et al., Science 292 77 (2001)
(2.103) in the form (ε < 0, µ < 0):

k × E = ω|µ|(−H) (2.105)
k × (−H) = −ω|ε|E, (2.106)
since obviously |ε| and |µ| are now positive values. In agreement with
the conclusion above the H vector for left-handed materials is pointing
in the opposite direction compared to right-handed systems. Thus the
E, H and k system now forms a left-handed triad. Furthermore, the
Poynting vector is still given by S = E × H, so this vector is now point-
ing in the opposite direction compared to the k-vector! Then we must
conclude, based on equation (2.102), that the index of refraction is nega-
tive, otherwise this relation will not hold. This is indeed a new behavior
compared to ordinary dielectrics. The energy flow S is anti-parallel to
the k-vector - wow! Absolutely beautiful and all still in accordance with
Maxwell’s equations.
The novel properties of negative refractive index media and their po-
tential applications have generated a significant amount of research in-
terest over the past years. Examples of such media, termed “metamateri-
als”, have been constructed using periodic arrays of wires and split-ring
resonators, and by transmission line elements. All these exotic materi-
als have been shown to exhibit the properties originally predicted by
Veselago.
Some of the most notable consequences include effects predicted re-
cently by J. Pendry and co-workers. His group has theoretically shown
Figure 2.18 A ray-tracing program was used to calculate ray trajec-

tories in the cloak unit. The rays essentially following the Poynting
vector. (A) A two-dimensional (2D) cross section of rays striking the
system developed by R. Pendry et al, diverted within the annulus of
cloaking material contained within R1 < r < R2 to emerge on the
far side un-deviated from their original course. (B) A 3D view of the
same process. From J. Pendry, D. Schurig, and D. Smith, Science 312
1780-1782 (2006).
that cloaking is possible with metamaterials17 . Yes, Harry Potter with

the invisibility cloak. Figure 2.18 shows how light is deflected around an
object based on NRI material. Notice how the light rays resume to their
original direction as if the object was not present at all, spectacular.
Another interesting issue to consider is if you imagine yourself stand-
ing inside the NRI material. Here no light will strike you and it will be
totally dark! This not exactly what Harry Potter experiences, right?
Now you may wonder is this only possible in theory? No, experimen-
tally it has been demonstrated by a group of British scientists in 200618
right after the theoretical paper appeared. In their demonstration, a
copper cylinder was ’hidden’ inside a cloak constructed according to the
theoretical prescription. The cloak was based on artificially structured
metamaterials, designed for operation over a relatively narrow band of
microwave frequencies. The cloak decreased scattering from the hidden
object while at the same time reducing its shadow, so that the cloak and
object combined began to resemble empty space.
2.6 Maxwell’s Equations for a conductor

A metal or a good conductor can be considered as an ionic lattice embed-
ded inside a sea of electrons. The ionic lattice causes a periodic potential
17 J. Pendry, D. Schurig, and D. Smith, Science 312 1780-1782 (2006)
18 D. Schurig et al., “Metamaterial Electromagnetic Cloak at Microwave
Frequencies”, Science 314 977-980 (2006).
for the electrons. It is a little tricky to make analytical calculations for

the motion of electrons inside a periodic potential and for sure we should
treat the motion quantum mechanically including also the interaction
between electrons. To simplify things, we often adopt a free electron
model which treats the ions as a uniform fixed background of positive
charge with a sea of electrons superposed. This works, perhaps surpris-
ingly, quite well for most real metals. This approximation is called a free
electron gas model. We will see why it is that metals look shiny, i.e.
why they reflect most visible light. We will not be able to explain the
distinct colors of e.g. silver, gold and copper – this would require a more
refined treatment of the band structure of energy levels for electrons in
metals, which we leave to your courses in solid state physics.
Maxwell’s equations become:
Conductor
∇·D = ρ (2.107)
∇·B = 0 (2.108)
∂B
∇×E = − (2.109)
∂t
∂D
∇×H = J+ (2.110)
∂t
where the current density is related by Ohm’s law
J=σ·E
to the electric field and the conductivity σ of the metal ([σ]=(Ω m)−1 ).
Generally, the conductivity is a function of the frequency ω. Again, we
seek to derive the wave equation for the electric and magnetic field com-
ponents.
Firstly, we now approximate ∇ · E = 0. This follows from:
∂∇ · D
∇ · (∇ × H) = 0 = σ∇ · E + .
∂t
Using Gauss’ law and Ohm’s law we can write the continuity equation
in terms of charge density as
dρ σ
+ ρ = 0,
dt ε
which has the solution

ρ(t) = ρ0 e−t/τ ,
with τ = ε/σ. Since σ is typically few 106 (Ω m)−1 and ε ∼ 10−12 F/m,
the time constant τ becomes 10−18 s, significantly faster than 1/ω for
visible light, which is 10−15 s. What does this mean? Generally, in a
conductor if a charge density is built up, by some process, like in our
case where the electric field of the EMW pushes the free electrons, the
charge density will rapidly decay. This is due to the repulsive nature of
the electrons and the fact they are nearly free to move around. It cor-
responds to the same case you studied in electrostatics where charges
in a conductor reside entirely on its surface. We can therefore safely
put ∇ · E = 0 for a conductor interacting with visible light or or radi-
ation of lower frequencies19 . For x-rays or even higher frequencies this
approximation is not valid.
The wave equation is obtained, as before:
∂∇ × B ∂2E ∂E
∇ × (∇ × E) = −∇2 E = − = −µε 2 − σµ
∂t ∂t ∂t
We finally have:
Wave equation metal 1: (2.111)
∂2E ∂E
−∇2 E + µε + σµ =0
∂t2 ∂t
The first two terms we recognize as just the “normal” wave equation
ingredients for dielectric materials. The third term, on the other hand,
we did not have before. This term is an attenuation or damping term.
The higher σ is, the better a conductor our metal is and the larger the
damping term becomes. The damping term forces the amplitude of the
EMW in the metal to decay really quickly or, in other words, over very
short distances.
For monochromatic waves of the form E(r, t) = E(r)e−iωt , we have:
∇2 E + (εµω 2 + iωσµ)E = 0,
19 You might have spotted a flaw in our argument. We already encountered a
frequency dependent permittivity and we will see that the conductivity is also
frequency dependent. So how should one interpret this time constant? Let us just
state that the approximation chosen works.
using c2 = 1/ε0 µ0 and the definition of εr , µr in equation (2.70) we find:

ω2 σµ
∇2 E + (εr µr + i )E = 0.
c2 ωε0 µ0
We restrict ourselves to non-magnetic materials µ ∼ µ0 (at optical fre-
quencies), so we finally obtain:
ω2 σ
∇2 E + (εr + i )E = 0.
c2 ωε0
Wave equation metal 2: (2.112)
(∇2 + k 2 )E = 0
with k complex:
ω2 σ
k2 = 2
(εr + i )
c ωε0
Using that n2 = (ck)2 /ω 2 and introducing the complex index of re-

fraction as before, we can write:
σ
(n0 + in00 )2 = (εr + i ) = a + ib.
ωε0
On the right hand side we just split the square of the complex refrac-
tive index into its real (a) and imaginary (b) part. For the components
of the complex refractive index itself we arrive after a straightforward
calculation at:
r h
0 1 p i
n = a + a2 + b2 , (2.113)
2
r h
1 p i
n00 = −a + a2 + b2 . (2.114)
2
Note that, what we did is lumping the contributions of conductivity

and polarisability to the response of the material to electric fields into
a single complex dielectric constant. This is a reasonable thing to do
once we are interested in the behavior at some frequency different from

zero. Here the conduction electrons perform forced oscillatory motion of
finite amplitude and the difference to the forced oscillations of bound
electrons is only a quantitative one. In fact, up to optical frequencies
the response of a metal is dominated by the behavior of the conduction
electrons and we can neglect the material polarization due to the much
tighter bound core electrons. Further below, we will introduce a model
for the conductivity and its frequency dependence and we will see that
the complex conductivity contributes also massively to the real part of
the complex permittivity.
One important characteristic feature of the permittivity of metals is:
the imaginary part is very often significantly higher compared to the
real part, i.e., a |b|. This is not the case for dielectrics, as you may
remember. The skin depth, i.e., the depth where the electric field has
decreased to 1/e of its original value can be shown to be:
r r
c 2 2
δ' = ,
ω b µ0 ωσ
in the approximation b |a|.
Problem 2.22 Show equation (2.115) with ω σ/ε.
Example 2.23 Skin depth of Silver

Silver has a conductivity of σ = 6.3 · 107 S/m. At a microwave frequency
of 1010 Hz we find δ = 634 nm.
Example 2.24 Skin depth of seawater

Seawater has typically as conductivity of σ = 4 S/m. This leads to a
skin depth of about δ = 25 cm at 1 MHz. This is reason why it is very
difficult to communicate with a submersed submarine. In fact σ varies
from 2.5 S/m for cold, deep water to about σ = 6 S/m for warm surface
water. VLF radio waves (3-30 kHz) can penetrate seawater to a depth of
approximately 15-20 meters. However, VLF radio signals cannot carry
audio (voice), and only transmit text messages at a slow data rate, some
400-500 words per minute. Why not go to frequencies of say 10 Hz where
signals may reach 100 m? At these very low frequencies it becomes an
extreme challenge to produce the radiation. For 10 Hz the wavelength is
30.000 km so the usual half-wavelength dipole antenna method cannot
be used.
2.6.1 Frequency dependent conductivity: Drude’s model

The conductivity is frequency dependent. This is due to the finite iner-
tial mass of the electrons. As the frequency of the EMW increases the
conducting electron will follow the field with increasing difficulty, up to
a certain point were the electron cannot follow anymore. We thus expect
a steady decrease of the conductivity as a function of EMW frequency ω.
In the Drude’s free electron model, electrons move around with a drift
speed20 determined by Newton’s second law and the Lorentz force:
dv −e v
= E− .
dt m τc
Here, τc is the timescale for collisions that scramble the electron velocity
leading to damping of the drift speed. The differential equation looks
very similar to the equation we studied previously in connection with
Rayleigh scattering and the model of the index of refraction:
qE0 −iωt
ẍ + γ ẋ = e .
m
But now we have ω0 → 0 as the conduction electrons are free to move
within the metal without any local restoring force. Notice γ = 1/τc . The
stationary solution to this differential equation is:
eE0
x(t) = e−iωt
m(ω 2 + iγω)
The damping term is a result of electron-electron, ion-electron interac-

tions and collisions21 . The electron speed is damped on a characteristic
time scale of τc where τc ∼ 10−14 − 10−15 s. To find the conductivity
as a function of the frequency of the EMW we use the expression of the
current density:
J = σ · E = −ene v,
where ne is the electron density, e the electron charge and v the electron
speed imposed by the electric field E. The electron velocity is deduced
20 The drift speed is typically some mm/s as opposed to the mean velocity of the
electron in the atom which is of the order αc ∼ 106 m/s. For conduction
electrons in a metal the relevant velocity is the Fermi velocity, which has similar
magnitude.
21 Ultimately, these collisions are the cause for the conversion of electromagnetic
energy from the driving field into heat.
from equation (2.115) and inserted into equation (2.115) to yield for σ:
e2 ne i
σ= · (2.115)
m ω + iγ
e2 ne τc ε0
= ·
mε0 1 − iωτc
This leads to a dynamic conductivity of:
σ0
σ(ω) =
1 − iωτc
Dynamic conductivity: (2.116)
σ0
σ(ω) =
1 − iωτc
with
e2 ne τc
σ0 =
m
Let us analyze equation 2.116. For very low frequencies the real part of
the conductivity dominates. At d.c., i.e., for ω → 0 the conductivity σ0
becomes purely real. We conclude that current and electric field are in
phase for low EM frequencies. As the frequency increases, the imaginary
part becomes more pronounced and a significant phase shift between
electric field and current develop. Finally, the conductivity vanishes,
σ → 0, with increasing ω as we expected, - the electrons cannot follow
any more, due to their finite inertial mass.
We introduce a characteristic frequency ωp , the electron plasma
frequency which plays an important role in plasma and solid state
physics:
s
e2 ne
ωp =
ε0 m
where, as before, ne is the electron density and m the electron mass.
We can relate the electron plasma frequency to the DC-conductivity as
σ0 = ωp2 ε0 τc . For Cu it is 1.6 · 1016 Hz. As you can see, plasma frequency
is different for each species. Physically, the electron plasma frequency
Figure 2.19 Plot of the reflectance of a metal as s function of the

reduced frequency ω/ωp . For frequencies above the plasma frequency
the reflectance drops fast and the metal becomes transparent for in-
creasing frequencies. Below the plasma frequency the metals has a
reflectance close to unity. How close to unity depends on how good
the conductivity of the metal is.
corresponds to the characteristic oscillation frequency of the electrons

in a neutral metal when exposed to a small charge separation from its
equilibrium position, neglecting damping effects22 .
If we neglect the effect of damping τc → ∞23 and assume εr = 1
in equation (2.112)(this means we neglect the effect of the bound core
electrons), we can to a good approximation write the index of refraction
of a metal as:
ωp2
n2 = 1 −
ω2
22 Consider a one-dimensional case in which one of the charge species (say the
electrons) are displaced from its quasi-neutral position by a small (infinitesimal)
distance ∆x. The charge density which develops on one face of the slab is simply
ρ = ene ∆x. An equal but opposite charge density develops on the opposite face.
The resulting electric field generated inside the slab is given by gauss law:
Ex = −ρ/0 = −ene ∆x/ε0 . Newton’s law applied to an individual electron inside
the material gives
d2 ∆x
m = eEx = −m ωp2 ∆x,
dt2
giving ∆x = (∆x)0 cos(ωp t).

23 Effectively, we only need to assume ωp τc 1 and to consider frequencies
ωτc 1.
Table 2.1 Plasma frequency.

a
Atom λp calculated λp observed νp observed
[ nm ] [ nm ] [ Hz ]
Lithium 155 155 1.94 · 1015
Sodium 209 210 1.43 · 1015
Potassium 287 315 0.95 · 1015
Rubidium 322 340 0.88 · 1015
a Plasma wavelenghts and frequencies for some

alkali elements.
Let us analyze the consequences of equation (2.117). For ω < ωp the

the index of refraction becomes purely imaginary, the electric field decays
exponentially as a function of propagation distance. In contrast, when
ω > ωp the index of refraction becomes real, and the wave can propagate
inside the metal - the metal becomes transparent! Alkali metals are
transparent in the deep UV regime.
To look a little ahead, we consider what happens to a wave which
tries to enter a metal from the outside (vacuum). The reflectance of a
material interface (here, vacuum with n1 = 1 to a metal with index
n2 = n0 + in00 ) can be written as:
(n0 − 1)2 + n002

n1 − n2 n1 − n2
R= · = 0
n1 + n2 n1 + n2 (n + 1)2 + n002
where the overbar denotes complex conjugation, and n1 = 1 is used in
the last step. In figure 2.19 we show a typical case of the reflectance R
of a metal plotted against the reduced frequency ω/ωp . For light fre-
quencies close to the plasma frequency ω ∼ ωp the reflectance decreases
rapidly. At even higher frequencies the reflectance goes to zero. Here the
oscillation frequency of the field becomes so fast, that the conduction
electrons cannot follow anymore and the metal becomes transparent24 .
In figure 2.20 we show the reflectance of aluminum, silver and gold.
These metals provide high average reflectivity (> 95% - 97%) over a
broad range of wavelengths and are commonly used as “all wavelength”
mirrors for various purposes25 . Applications range from astronomical
24 Note that the refractive index in this approximation stays below one. This can
have useful applications for x-ray optics, where you could use total external
reflection.
25 Bathroom mirrors are usually made from aluminum or silver coated glass.
Figure 2.20 Plot of the reflectance of Al, Ag and Au metal surface

as s function of the incident wavelength. Metals usually have good
reflectance properties over a large range of wavelengths. The visible
range is approximately 0.4 − 0.7µm.
telescopes, broadband spectrometers, widely tunable laser sources, over

solar ovens to cameras. Aluminum mirrors provided with a protective
coating, a transparent dielectric layer added to increase the mirrors ro-
bustness, offer an inexpensive allround mirror over a large range of wave-
lengths. In the range 400 nm - 2000 nm silver has the highest average
reflectance and is often preferred when the reflectance property is crit-
ical. Gold mirrors have large average reflectance (> 96%) in the range
0.8 µm - 20 µm and are good in the infra red (IR) spectral region. An
additional coating layer can be added to enhance the mirror performance
(R > 98%) around a given design wavelength, but comes at the expense
of a narrower high reflective region.
High reflective mirrors (R > 99.5%) are achieved using a different tech-
nique. A glass substrate is supplied with alternate coatings of thin films
(50 nm -100 nm thickness), one layer with high index material such as ti-
tanium dioxide (n=2.40) TiO2 or zinc sulfide ZnS (n=2.32) followed by a
layer of low index material magnesium such as fluoride MgF2 (n=1.38) or

silicon dioxide SO2 (n=1.49). These mirrors are called dielectric mirrors
and have a limited working range of tens of nanometers. On a perfectly
smooth and clean surface the reflectance can exceed 99.999%, only lim-
ited by coating imperfections. We will look at the working principle of
those mirrors in more detail when we discuss thin film interference in
later chapters.
3
Propagation of light
In this chapter we will look at propagation of light disregarding any

diffraction phenomena. In some sense you can see this chapter as be-
ing a limit of λ → 0 by which we will only consider the straight line
trajectory of “rays of light” and ignore diffraction. As we will see, in
this picture light rays behave very similar to point particles in classical
mechanics. But many results we shall derive will be directly applicable
also to plane waves, at least in simple geometries, if we associate a light
ray with the k-vector of a plane wave. We will study two important
properties. The first deals with kinematics, such as, angles of reflection
and refraction. The second deals with dynamics such as the intensity
of reflected and refracted radiation, phase changes of the refracted and
reflected waves and polarization properties of the light refracted and
reflected off surfaces.
3.1 Fermat’s principle

Looking at a given substance where the index of refraction n(r) may
depend on position inside the material we may ask what path will light
follow in going from A to B inside the material? More generally what
quantities are important for determining the actual path? Back in the
1600’th century Pierre de Fermat (1601-1665), a French mathematician,
formulated an absolutely beautiful criterion for the propagation of light:
58 Propagation of light
Figure 3.1 Portrait of Pierre de Fermat (1601-1665).
Fermat’s principle: (3.1)
”Light takes the path between A and B

- which it traverses in the least amount of time”
Today we know that this is not exactly true. In fact the principle must
be modified to
Principle of stationary time: (3.2)
”Light takes the path between A and B in

which the optical path length is stationary”
By stationary time we mean here, that the travel time along the actual
path taken can be either minimal, maximal or a point of inflection (a
saddle point) when compared to an infinitesimally changed path. Proof
of the principle of stationary time can be carried out rigorously using
(a) (b)
Q
R R
P M
N
A B A B
Figure 3.2 Illustration of a stationary optical path, i.e., a maximal,

a minimal or a saddle point OPL. The index of refraction is set to 1.
We consider a mirror based on an ellipsoid. The light source is placed
in A and we consider the possible path ways from A to B. Both A
and B are located in the foci of the ellipse. Panel (a): Three different
routes from A to B. Since the mirror is an ellipse the three routes
have exactly the same path length. This applies to any ray reflected
of the mirror and ending in B. Thus, this is an example of a saddle
point. Panel (b): Two curves with a local radius of curvature smaller
(red curve M) and respectively larger (blue curve N) compared to the
ellipse. A mirror shaped as curve M leads to a minimum OPL, while
the curve N leads to an example for a maximum OPL.
quantum electrodynamics. Arguments can be established through Huy-

gens’s principle but cannot be considered a rigorous proof.
Example 3.1 Minimal, maximal or saddle point? An interest-

ing question arises in connection with the modern and more rigorous
version of Fermat’s principle. In which case is the travel time, or equiv-
alently the optical path length, actually minimal, maximal or a saddle
point? It is not difficult to give examples of an optical system where the
optical path length is minimal. One example is the reflection of a flat
mirror, or refraction at an interface, as we will see in the derivation of
Snell’s law in Sec. 3.2. But what about the two other cases, is it possible
to give examples here? To give a good example that includes all three
cases, consider the ellipsoidal mirror depicted in figure 3.2. It is a mirror
formed as an ellipsoid and we are just considering the cross section of it
for convenience. We assume the index of refraction inside the ellipsoid is
n = 1. Now we imagine light from a point A, located at focus 1, traveling
to a point B, located at focus 2. Naturally, light may just pick the route
going straight from A to B. Fine, but what about all the other routes?
Consider light going any route from A to B, excluding the trivial one
just described, see figure 3.2(a). Well, since the mirror is an ellipse the
distance from A to B is the same, no matter where you reflect off the
mirror! This is a unique property of the ellipse and cannot be attributed
to other curves. So this is a perfect example of a Saddle point. Next
we consider the two cases shown in figure 3.2(b). Here we display two
curves (M, N) in addition to the ellipse. At point R the red curve M has
a curvature which is less than that of the ellipse, while the blue curve
N has a greater curvature compared to the ellipse. For the curve M you
may easily see that the distance ARB is the shortest possible, while all
other routes have longer path lengths. We may conclude this is a case of
a minimum. In case of the blue curve N it is reversed. Here the distance
ARB is the longest and all other routes are shorter. Hence this is a case
of a maximum.
Generally, we need to consider the so-called optical path length. Why

is that? If we take a look at the principle of staionary time, the parameter
in play was the time interval ∆t:
Z t2 Z B
1 B
Z
ds
∆t = dt = = nds (3.3)
t1 A v c A
where n is the real part of the index of refraction. We used that v = ds/dt
and v = c/n. The differential of optical path length (OPL) is defined
as the product of the index of refraction n and the physical path length
differential ds, so
Optical path length:
Z B
OPL = n(r)ds (3.4)
A
For a two dimensional case, as shown in figure 3.3, we have ds =

p
(y 0 )2 + 1dx so the optical path length becomes:
Z x2 p
OPL = n(x, y) (y 0 )2 + 1dx
x1
So, when exactly is this integral stationary? Generally an integral of the

form
Z x2
I= f (x, y, y 0 ) dx.
x1
y
dx
A ds dy
B ds
x1 x2 x
Figure 3.3 Principle of stationary optical path in

p two dimensions.
We express the infinitesimal length ds as ds = (y 0 )2 + 1dx with
dy/dx = y 0 .
is stationary if the Euler-Lagrange criteria are satisfied1 :
Euler-Lagrange criteria: (3.5)
Z x2
I= f (x, y, y 0 ) dx
x1
stationary if

d ∂f ∂f
− =0
dx ∂y 0 ∂y
The program is clear. Specify the index of refraction n(x, y) and solve
the differential equation (3.5), then you know which path light takes.
So our optical version of the Euler-Lagrange differential equation be-
comes:
!
n(x, y) · y 0

d ∂n(x, y) p
p − · (y 0 )2 + 1 = 0
dx (y 0 )2 + 1 ∂y
In three dimensions:
Z z2 p
I= n(x, y, z) (x0 )2 + (y 0 )2 + 1 dz
z1
1 For a proof see additional Note1 on the course homepage.

the conditions becomes:

!
n(x, y, z) · x0

d ∂n(x, y, z) p
p − · (x0 )2 + (y 0 )2 + 1 = 0,
dz (x0 )2 + (y 0 )2 +1 ∂x
and
!
n(x, y, z) · y 0

d ∂n(x, y, z) p
p − · (x0 )2 + (y 0 )2 + 1 = 0.
dz (x0 )2 + (y 0 )2 + 1 ∂y
p
Using the ds/dz = (x0 )2 + (y 0 )2 + 1 we can rewrite the 3-D condi-
tions as:

d dx ∂n
n = (3.6)
ds ds ∂x

d dy ∂n
n= (3.7)
ds ds ∂y

d dz ∂n
n = . (3.8)
ds ds ∂z
The last equation follows from:

2 2 2
dx dy dz
+ + = 1.
ds ds ds
Finally we obtain:
Propagation of light: (3.9)

d dr
n(r) = ∇(n(r))
ds ds
It is important to note that the path is sensitive to values of the index

of refraction and its gradient!
Problem 3.2 Show equation (3.9).
Problem 3.3 Assume the index of refraction to be a continuous func-

tion of y, i.e., alone n(y). Show by applying Snell’s law (look ahead to
section 3.2) that

sin(θ)
= constant.
v(y)
where v(y) = c/n(y).
Example 3.4 Bending of light in the atmosphere

Bending of laser light in the atmosphere. The density gradient creates
a gradient in the index of refraction increasing towards the Earth’s sur-
face. As the index is slightly different for different colors, blue light will
bend more than green light that bends more than red light. A beauti-
ful phenomenon that relies on this effect is the so-called ”green flash”
which can be observed near sunset at the ocean, especially when the
air is clean2 . The green flash occurs as a (small) portion of the Sun’s
disk turns green for one or two seconds. The disk may appear right at
the horizon, one type of green flash, or even at the top of a setting sun!
They can also appear bluish, yellow or even purple, but green seems to
be most common color.
Right at sunset there will be a refraction caused delay which is a sec-
ond or so longer for blue and violet part of the spectrum compared to
the red part. Generally, the red color image of the Sun sets first, then fol-
lowed by yellow, green, blue and violet. At the horizon the light traverses
a very long path length and consequently the shortest wavelengths are
almost completely filtered out, just as in the case of Rayleigh scattering.
When the air is very clear violet is the last color seen. However, the
atmosphere usually contains enough haze that the blue part of the spec-
trum is removed and green is the last color seen at sunset.
For a green flash to appear above the Sun disk, it relies on additional
mirage effect where additional density gradients (mainly due to temper-
ature gradient) causes reflection of a part of the Sun disk. You might
have seen mirage effect on asphalt paving on a hot sunny day, seeing the
headlights of a car ”reflected” on the asphalt.
Example 3.5 Bending of laser light in an sugar solution
Get hands on a fish tank with a volume of around 20-30 liters and
fill 1.5 cm of regular sugar in the bottom. Fill the tank with hot water
by placing cup and a saucer on the sugar and pour slowly into the cup.
After adding water remove gently the cup and saucer. Let the tank rest
2 For further reading consult: Andrew T. Young Journal of the Optical Society of
America A, vol. 17, pp. 2129-2139 (2000)
Figure 3.4 Bending of laser light in the atmosphere. The density gra-
dient results in a gradient in the index of refraction increasing towards
the Earth’s surface. This makes light bend more and more.
for 2-3 days. This will result in an efficient sugar gradient in the water
where the sugar concentration increases towards the bottom of the tank.
Consequently, we will have an index gradient, increasing towards the
bottom of the fish tank, just as the atmospheres and light will bend
when entering the solution. This is shown in figure 3.5.
Example 3.6 Can you bend light around Earth in a circle?
With the refraction that takes place in the atmosphere and given
that the index of refraction changes with height, one may ask if it is
possible to bend light around in a circle with radius of curvature equal
to that of the Earth? Obviously, shooting a laser parallel to surface of
the Earth we know that it will soon be reduced in intensity and finally
become invincible, - at least for moderate intensities. Anyway, we are
going to investigate the problem using Fermat’s principle. Firstly, how
does the index of refraction vary with height? The index of refraction
at the Earth’s surface (300 K and 1 atmosphere) is about n =1.00029.
We assume the index n − 1 to be proportional to the density such that
Figure 3.5 Bending of laser light in a sugar solution. The fish tank
contains water with a strong gradient of sugar concentration increas-
ing towards the bottom which makes the light beam bend more and
more toward the bottom. Notice how light is reflected from the glass
bottom of the fish tank.
(isothermal atmosphere):
n(r) − 1 = ρ · exp(−(r − R)/A) (3.10)
where n(r) is index of refraction at radial height r measured from the

Earth’s center, R = 6.4 · 106 m is the Earth’s radius, and A = 9000 m
the 1/e height of the isothermal atmosphere. Now we apply Fermat’s
principle and the optical path length OP L.
We are going to use the Euler-Lagrange condition. But our geometry is
more adapted to a circular geometry so recall the length element ds in
polar coordinates:
ds2 = dr2 + (rdθ)2 , (3.11)
so
2 !1/2
dr
ds = + r2 dθ. (3.12)
dθ
However, we consider a fixed radius r which means dr/dθ = 0. The

expression we have for OP L is then3 :

Z θ2
OP L = n(r) · r · dθ (3.13)
θ1
Euler Lagrange criteria for a stationary solution becomes, (OP L is not

depending on r0 ):
∂ ∂n
(n(r) · r) = 0 = + n(r) = 0 (3.14)
∂r ∂r
Substituting the expression of n(r) into this expression we find:
R
·ρ=ρ+1 (3.15)
A
The condition for the density if light should bend around Earth. With
the current density we have
n(r = 0) − 1 = n0 − 1 = ρ0 = 0.00029 (3.16)
and finally we find ρ/ρ0 = 4.86. Thus if the air density was a factor
about five higher at sea level, light could bend around in a circle and we
could watch the sunset for ever! We would feel some pressure, though...
Example 3.7 Brachistochrone problem
Back in 1694 Johann Bernoulli posed a challenging problem to his con-

temporary scientist colleagues in the famous journal Acta Eruditorum.
He asked the following: Given a particle with mass m moving under in-
fluence of gravity along some curve connecting points A and B, see figure
3.6. What curve connecting point A and B will ensure the fastest decent
of the particle? Any type of friction is disregarded. About six months
after the problem was posed Newton (under a pseudonym), Gottfried
Leibniz, and Guillaume de l’Hopital among others presented their so-
lution. Here Johann spoke the famous words about Newton’s solution:
“tanquam ex ungue leonem” (we know the lion by his claw), since New-
ton had published his solutions anonymously.
Johann’s own solution was absolutely genius. He applied Fermat’s

principle and, in that way, combined mathematics, optics and mechanics
to his solution! Let us follow Johann’s solution. The velocity at any given
3 We could also just argue that the OP L is arc length times index,
OP L = n(r) · l = n(r) · r · ∆θ, which is the same as obtained above.
Figure 3.6 The brachistochrone (brachistos - the shortest, chronos -

time) problem posed by Johann Bernoulli 1694. What curve connect-
ing the points A and B will provide the fastest decent of a particle
under influence of gravity. Any friction is disregarded.
√
point y is given by v = 2gy since we have conservation of mechanical
energy. From the geometry, see figure 3.6, we find:
1 1
sin(θ) = cos(φ) = p 2
=p .
1 + (tan (φ)) 1 + (y 0 )2
If the mass is going from A to B in the least amount of time we can use
Snell’s law, which by Fermat’s principle ensures the travel time to be a
minimum. From equation (3.10) and equation (3.17) we have:
p
y( 1 + (y 0 )2 ) = q
where q is a constant.
The solutions to this differential equation is the cycloid curve, here
parameterized by α:
x(α) = r(α − sin(α)) (3.17)
y(α) = r(1 − cos(α))
The constant r is given by r = q/2. The cycloid curve is generated by

a point on the circumference of a circle with radius r rolling along the
x-axis. You may know this curve from elementary calculus as it has a lot
of beautiful geometric properties. For example the length of one arc is
four times the radius r and the area under one arch is three times that
of the generating circle πr2 . The brachistochrone problem can be solved
analytically including friction, but the solution is very complicated and

messy4 .
3.1.1 Solutions to Euler Lagrange equations

For reference we list solutions to Euler-Lagrange equations in three spe-
cial cases (This is not a part of the optics exam, but listed for your
convenience.):
Case A: f does not depend on x, y: (3.18)
∂ 2 f d2 y

d ∂f
= · =0
dx ∂y 0 ∂y 02 dx2
∂2f
if 6= 0 then
∂y 02
d2 y
= 0 and y = ax + b.
dx2
This part corresponds to straight lines, exactly the case when n(x, y) is
a constant.
Case B: y is missing from f : (3.19)

d ∂f
=0
dx ∂y 0
∂f
and = constant
∂y 0
In this part we will have n(x, y) = n(x).
Case C: x is missing from f : (3.20)

∂f
y 0 − f = constant
∂y 0
4 See Ashby, N., Brittin, W. E., Love, W. F., and Wyss, W. “Brachistochrone with
Coulomb Friction.” Amer. J. Phys. 43, 902-905, 1975.
Figure 3.7 Portrait of Willebrord Snellius (1580-1626) a Dutch as-

tronomer and mathematician.
Here we will have n(x, y) = n(y).
3.2 Snell’s law

Now we consider an interface between two insulators (say air and glass):
element one with index n1 and element two with index n2 . Figure 3.8
illustrates the geometry.
All k vectors must lay in the same plane, the plane of incidence,
spanned by the normal to the surface and the incident k1 vector. This
follows from continuity of the wave at point P. It also follows from mo-
mentum conservation. In this geometry Snell’s law holds:
Snell’s law (3.21)
θ1 = θ3
n1 sin(θ1 ) = n2 sin(θ2 )
Figure 3.8 Light travelling from one insulator n1 into another n2 .

The plane of incidence is spanned by the normal to the surface and
the incident k1 vector.
There are several ways to prove Snell’s law. We will not go through all
of them here, but refer to other optics books. However, let us look at two
prominent proofs both illustrating the kinematic nature of the law. The
first is perhaps the most general. Since there is translation symmetry
along the y -direction, momentum will be conserved in this direction.
For a photon entering along k1 and exiting along k2 the y-component
of its momentum its conserved:
h sin(θ1 ) h sin(θ2 )
= (3.22)
λ1 λ2
n1 sin(θ1 ) n2 sin(θ2 )
= (3.23)
λ0 λ0
n1 sin(θ1 ) = n2 sin(θ2 ) (3.24)
where we used the expression for the photon momentum p = h/λ in the
first line, h is Planck’s constant, and λ0 is the vacuum wavelength. The
reflection law θ1 = θ3 can be shown in a similar manner. From Fermat’s
principle we can also show Snell’s law. Looking at figure 3.9 we write up
Figure 3.9 Snell’s law shown by the use of Fermat’s principle (Prin-
ciple of least time).
the the OPL and demand it to be stationary:
p p
OPL(x) = n1 x2 + a2 + n2 (s − x)2 + b2 (3.25)
so
x −(s − x)
OPL0 (x) = 0 = n1 √ + n2 p (3.26)
x2 + a2 (s − x)2 + b2
as we recognize sine to the angles θ1 and θ2 ,
n1 sin(θ1 ) = n2 sin(θ2 ) (3.27)
As we can see Snell’s law is a consequence of the kinematics of the

problem and it is not a consequence of Maxwell’s equations. Obviously,
the law is not in conflict with Maxwell’s equations as we shall see later.
Problem 3.8 Show θ1 = θ3 using conservation of momentum.

3.2.1 Applications of Snell’s law

One important thing to note from Snell’s law is that as θ1 increases so
does θ2 . If n1 < n2 , say air glass interface, then θ1 can maximally be 90
degrees which leaves
n1
sin(θ2 ) = (3.28)
n2
We call this angle the critical angle. It is perhaps more natural to

look at at the case n1 > n2 , say a water-air interface. The critical angle
becomes:
Critical angle: n1 high → n2 low (3.29)
n2
sin(θc ) = .
n1
We go from an optically dense media to an optically less dense media

n1 > n2 . What will happen for angles θ1 > θc ? Here Snell’s law has no
real solutions and all light will be reflected at the interface for θ1 > θc .
This phenomenon we call total internal reflection (TIR). Next time you
are in a swimming pool try to look for this phenomena. For a water-air
interface the critical angle is about 48.7 degrees.
TIR has a number of important applications. Perhaps the most im-
portant one today is guiding light in optical fibers, Transmission of fast
optical signals in glass core fibers is crucial for optical communication.
Another application is optical systems such as prismatic binoculars.
Fingerprinting devices, which apply frustrated total internal reflection
(FTIR) in order to record an image of a person’s fingerprint without the
use of ink.
Even though there were no real solutions to Snell’s equation for θ1 >
θc , we can find the electric field in medium 2 using the complex solutions
to Snell’s law. Assume we have a plane wave input (in figure 3.8 with
the +z-axis is pointing along the normal and the +x-axis is out of the
papers plane):
E1 = E01 ei(k1 ·r−ωt) , (3.30)

with E01 pointing along the x-axis, for example, and
k1 · r = k1 sin(θ1 )y − k1 cos(θ1 )z (3.31)
For the transmitted plane wave we have analogous result:
E2 = E02 ei(k2 ·r−ωt) (3.32)
with
k2 · r = k2 sin(θ2 )y − k2 cos(θ2 )z (3.33)
s 2
n1 n1
= k2 y sin(θ1 ) − k2 z 1− sin2 (θ1 ) (3.34)
n2 n2
If we now consider n1 > n2 then for θ1 > θc the square root is purely
imaginary. We can thus write the transmitted field as:
E2 = E02 e−iωt eiαy e+βz , for z < 0 (3.35)
with ( θ1 > θc ) and z < 0 as we consider the wave in medium 2. The α

and β- coefficients are given by:
n1
α = k2 sin(θ1 ), (3.36)
n2
s
2
n1
β = k2 sin2 (θ1 ) − 1. (3.37)
n2
Notice the last term in equation (3.35) is a pure exponential decay of

the field amplitude along the z-direction, i.e., into the second media.
We call this an evanescent wave. We have no wave propagation into
the second media (in the z-direction), only a decaying field amplitude.
The typical distance over which the field amplitude decays is on the
scale of the wavelength as you can see. This, for example, is how finger
scanners work. The finger is brought close to the surface, actually within
a wavelength or so, and modifies the field which can then be detected
on the reflected beam. What about the Poynting vector? As we saw in
chapter 2 the Poynting vector is directed along the k-vector, in this case
k2 . So the wave will travel along the +y-direction, but have a complex
component describing the decay in medium 2.
In the optics industry optical wave guides play a huge role today and
evanescent waves allows for transferring of power between waveguides
which are brought together within a wavelength of each other. We are
talking 0.1 µm size objects on small chips where light may be manip-
ulated around to target high speed optical communication and super
computers of tomorrow. Also in biophysics there are quite a number of
applications too. One of the perhaps most prominent ones are the inter-
nal reflection fluorescence microscope (TIRF). In (TIRF) microscopy,
incident laser light is totally reflected within the glass of a coverslip,
producing an evanescent wave outside the glass that illuminates objects
with a short range of about a 100 nm beyond the coverslip. The portion
of the specimen within the evanescent field can therefore be excited to
emit fluorescence selectively.
3.3 Fresnel’s laws of reflection
So far we have focused on the kinematic properties of light propagation.

Now we turn our attention to dynamic properties. How much of the
electric field is reflected and transmitted at a given angle of incidence
and how does this depend on the polarization of the light? Here, we will
apply Maxwell’s equations using the boundary conditions. Recall the
boundary conditions for an insulator:
Figure 3.10 Portrait of A. Fresnel (1788-1827) a French physicist who

contributed significantly to the establishment of the theory of wave
optics.
Boundary conditions Insulator: (3.38)
(1) (2)
Etan = Etan
ρs
ε(1) (1) (2) (2)
r En − εr En =
ε0
Bn(1) = Bn(2)
(1) (2)
Btan Btan
(1)
= (2)
µr µr
where the upper index specifies wether we consider fields in medium 1

or in medium 2. The charge distribution ρs is a static distribution as
the conductivity is zero. Since we are interested in fields oscillating at
optical frequencies, we can safely set the free surface charge density to
zero. Recall also equation (2.70) in chapter 2 for the definition of relative
permittivity5 εr and permeability µr . Important, notice the spatial
orientation for the boundary conditions refers to the insulator surface.
When we write tangential (subscript tan) the Etan -field component is
5 We put an extra upper index to specify whether we talk about medium 1 or
medium 2.
Figure 3.11 Geometry used in the derivation of Fresnel’s laws of re-

flection. The Ek components are in the plane of incidence spanned
by the normal and y-axis (the papers plane) and E⊥ , represented by
the small circle indicating the arrow tip, is sticking out of the paper
plane. See figure 3.12 for a 3D version.
parallel to the insulator surface and when we write normal component

En , i.e., (subscript n) it means normal to the insulator surface. So, the
tangential E-field components and the normal components of the B-
field are continuous across the boundary, - just to mention two examples.
We will now make use of all four relations to deduce Fresnel’s equations.
Any incident electric (or magnetic) field can be decomposed into a

component parallel and perpendicular to the plane of incidence, see
figures 3.116 and 3.12:
Ei = Eik + Ei⊥ (3.39)
Remember ki , kr and kt are all belonging to the same plane, the plane
of incidence, in fact the three k-vectors span the plane of incidence. In
the first case we assume the E-field to be perpendicular to the plane of
incidence, i.e., we only look at the component Ei⊥ . The other two
6 Note that in fig 3.11 the plane of incidence is the paper plane.

flection. In the text we only consider the component Ei⊥ and the
corresponding Bik .
cases you should do yourself. Now we are hunting for two equations, in
the E-field, which enable us to give the ratio of transmitted and reflected
field to the incoming field. The situation is shown in figure 3.13. Since
the field Ei⊥ is tangential we use first law of the boundary conditions
(3.38):
Ei⊥ + Er⊥ = Et⊥ (3.40)
For the B-field, which we know is orthogonal to the E-field, we also look
for the tangential component, i.e. Bik,tan . Using figure 3.13 we find;
Bik Brk Btk

− cos(θ1 ) + cos(θ1 ) = − cos(θ2 ) (3.41)
µi µi µt
since θ1 = θ3 and we divided out µ0 . The equation in the B-field we

transform into an equation in the E-field, - again by use of the relations

flection. We only consider the component Ei⊥ and the corresponding
Bik . Sometimes we drop the subscripts ⊥ and k.
from chapter 2, see equation (2.72):
nj j
Bj = E , j ∈ {i, r, t} (3.42)
c
and obtain finally the two equations,
i r t
E⊥ + E⊥ = E⊥ (3.43)
and
i r t
n1 (E⊥ − E⊥ ) cos(θ1 ) = n2 E⊥ cos(θ2 ) (3.44)
We look for solutions to E r /E i and E t /E i and find

r

E⊥ n1 cos(θ1 ) − n2 cos(θ2 )
r⊥ = i
= (3.45)
E⊥ n1 cos(θ1 ) + n2 cos(θ2 )
and
t

E⊥ 2n1 cos(θ1 )
t⊥ = i
= (3.46)
E⊥ n1 cos(θ1 ) + n2 cos(θ2 )
These are the (amplitude) reflection and transmission coefficients for

an s-polarized incident wave7 . Since we have four boundary conditions
Eq. 3.38 we will have four Fresnel equations:
Fresnel’s equations: (3.47)
n1 cos(θ1 ) − n2 cos(θ2 )
r⊥ =
n1 cos(θ1 ) + n2 cos(θ2 )
2n1 cos(θ1 )
t⊥ =
n2 cos(θ1 ) − n1 cos(θ2 )
rk =
2n1 cos(θ1 )
tk =
Note that there are several ways to write the Fresnel equations. You
can see that we use both refractive indices, the incidence angle and
the refraction angle on the right hand side in 3.47. But, those are not
at all independent variables – they are constrained via Snell’s law! For
practical optics design applications it is often useful to eliminate the
refraction angle, since it is you who decides on which materials to use
and which incidence angles are interesting. This form is also very useful
for plotting reflectance and phase shift, so we do that at the end of
this section. Alas, the expressions become a little less easy to remember.
7 In most of the optics literature the ⊥-polarization is called s-polarization (from
the German word senkrecht), while the k polarization is denoted as
p-polarization.
We can also write the equations just in terms of incidence angle and
refraction angle, which looks so neat that we will do it later, when we
discuss Brewster’s angle. But now we want to look at a simple special
case, to see what the equations actually tell us and where we have to be
careful in interpreting the equations.
Example 3.9 Fresnel’s equations at normal incidence.
At normal incidence θ1 = θ2 = 0. Looking at r⊥ (rk ) and t⊥ (tk ) we
find:
n1 − n2 2n1
r⊥ = and t⊥ = (3.48)
n1 + n2 n1 + n2
For an air glass interface n1 = 1 and n2 = 1.5 we will have r⊥ = −0.2.

The reflected wave is thus
E r = −0.2 · E i = 0.2 · eiπ · E i = 0.2 · E0i · ei(k·r−ωt+π) . (3.49)
This means the reflected beam undergoes a phase change of π at the
interface! Wow. You may have noticed that if we use the parallel com-
ponent at normal incidence we would get E r = 0.2 · E i predicting same
amplitude ratio, but no phase shift? Actually, there is a π phase shift
and the conflict comes from the sign convention used for the two po-
larizations Ek and E⊥ , as drawn in fig.3.12. For parallel polarization, rk
is positive when Ek has an upward component for both the incident and
reflected beams. Look at the figure and imagine the angle of incidence
goes to θ1 → 0 (normal incidence) – you see that the incident and re-
flected electric fields are actually pointing in opposite directions! So, for
normal incidence (θ1 = 0) a positive sign of rk means that the reflected
amplitude has the opposite sign8 . On the other hand, for perpendicular
polarization, r⊥ is positive when E⊥ is in the same direction for both
the incident and reflected beams. Changing the angle of incidence in this
case does not change the direction of E⊥ .
Let us examine the intensity. The intensity reflected is I = 0.22 I0 =
8 Remember our discussion is taking place right at the point where the light beam
strikes, even though we draw in Fig.3.12 the vector further “up” the rays so that
the geometry becomes clear. For rk at θ1 = 0 the “plane of incidence” becomes a
line and the term upward for the reflected and incident components has no
meaning, - there is no upward component anymore. In fact, the incident
reference E-field vector points into the +y-direction and the reflected component
may point in the −y-direction. Unfortunately, not all optics literature uses the
same convention, but ours is the most common one. There is no ambiguity for r⊥
-component.
0.04I0 , so at each surface 4 % of the incident light is reflected. If you

have glasses that are not anti-reflection coated you will loose about 8
% (2 surfaces) in intensity at normal incidence. In optical instruments
you may easily have 20 surfaces and only 0.9620 percent survives, i.e.,
less than 50 % of the incident light will make it through the instrument.
Notice if you go from glass to air you will have no phase change, but
the power loss is the same as before, see figure 3.18. Let us consider
energy conservation. Observe that:
2
r⊥ + t2⊥ 6= 1 (3.50)
which is surprising at a first glance. Truly, the intensity is proportional

to the square of the electric field. When we operate with the r-coefficient
both incident and reflected wave are in the same medium namely the one
with index of refraction n1 . In this case r2 will be the ratio of intensities.
However, when we consider the t-coefficient, t2 is no longer representing
the intensity ratio. We have to take into account the change of the energy
density going from one medium into another. Remember, the intensity
scales with the refractive index I = 12 ε0 cnE02 so we need to scale the
coefficient t2⊥ properly to t2⊥ nn21 to account for energy density changes.
Then we obtain:

2 n2
r⊥ + t2⊥ =1 (3.51)
n1
Generally, at an angle of incidence θ1 different from zero the energy

expressions t2⊥ and t2k must be adapted further. The area of an incoming
ray bundle is different from the area of the transmitted ray bundle, since
beam is refracted and bends, see next section, where we do the general
case. So be careful.
Problem 3.10 Using the boundary conditions (3.38) show the two
other Fresnel equations.
Using Snell’s law we can further reduce Fresnel’s equations to:
Fresnel’s equations reduced form: (3.52)
sin(θ2 − θ1 )
r⊥ =
sin(θ1 + θ2 )
2 cos(θ1 ) sin(θ2 )
t⊥ =
sin(θ1 + θ2 )
tan(θ1 − θ2 )
rk =
tan(θ1 + θ2 )
2 cos(θ1 ) sin(θ2 )
tk =
sin(θ1 + θ2 ) cos(θ1 − θ2 )
3.3.1 Application of Fresnel’s equations

Fresnel’s equations are the working horse in most optics industry and
advanced optical applications. Before considering the intensity expres-
sions equivalent to Fresnel’s equations, let us look at the equations for
an air-glass interface, i.e., n1 = 1 and n2 = 1.5. In figure 3.14 we plot the
reflection coefficients of equation (3.47) as a function of incident angle
θ1 .
The first thing to notice in figure 3.14 is the form of r⊥ . It stays
negative for all values of θ1 . So for this component reflections will al-
ways be accompanied by a phase shift of π. The transmitted fields, on
the other hand, will never experience any phase shifts. Returning to
the reflectance, at a particular angle, the so-called Brewster’s angle
θB , the rk coefficient vanishes. Impressive! At this particular angle only
light with an electric field component parallel9 to the glass surface is
reflected! We will interpret the physics of this angle further below.
Looking at rk from equation (3.52) we observe that tan(θ1 + θ2 ) → ∞
for θ1 +θ2 → π/2, meaning the coefficient rk → 0. So at Brewster’s angle
θ1 + θ2 = π/2. Using Snell’s law, subject to this condition, gives:
π
n1 sin(θ1 ) = n2 sin(θ2 ) = n2 sin( − θ1 ) (3.53)
2
9 meaning perpendicular to the plane of incidence, hence r⊥ is the relevant

coefficient.
Figure 3.14 Plot of Fresnel’s reflection coefficients in the case of an

air-glass interface n1 = 1 and n2 = 1.5. Blue curves are for the
parallel case and red curves for the perpendicular case. Notice the
particular angle, the so-called Brewster’s angle, θB where the rk -
coefficient vanishes.
so
n2
tan(θ1 ) = (3.54)
n1
which is the expression for Brewster’s angle.
Brewster’s angle: (3.55)
n2
tan(θB ) = .
n1
For an air-glass interface we find θB ' 56.3 degrees and for glass-air θB '
33.7 degrees. Brewster’s angle has a beautiful physical interpretation.
When θ1 + θ2 = π/2 the transmitted ray and the reflected ray are at
90 degrees with respect to each other, see figure 3.15. But since the
Ek electric field is perpendicular to the transmitted ray kt , then it is
Figure 3.15 When θ1 +θ2 = π/2 the transmitted ray and the reflected
ray are at 90 degrees with respect to each other. The dipole induced
in the material generates the reflected ray. But here the reflected ray
is along the dipole axis and no light emerges.
parallel to kr . The electric field makes electrons oscillate in the insulator,

oscillating dipoles, which in turn emit radiation. But since we are viewing
along the direction of dipole oscillation (the E-field direction) we will
have no light send in the direction of observation. The relation generally
holds in isotropic materials and shows that the flow of energy is directed
along the k-vector, which seems very natural indeed. Later we will see
that this is not always the case, in particular for non-isotropic materials,
such as uniaxial and bi-axial crystals.
For applications where light is to be transported from point A to point
B with very low losses the Brewster geometry is very helpful. This fact
is exploited heavily in laser technology and optics applications.
Example 3.11 Brewster’s angle applied to laser resonators.

One important application of Brewster’s angle is found in laser physics.
The laser gain medium such as a Helium Neon gas mixture is kept in a
glass tube at low pressure, see figure 3.16. To ensure high gain, optical
losses at the windows must be as low as possible. One solution would be
to coat windows using dielectric coatings. A very good coating would give
around 0.2 % loss (windows mounted perpendicular to the light beam).
Achieving lower values by optical coatings is very difficult. However,
by mounting the windows properly at Brewster’s angle losses as low as
0.01-0.005 % are possible.
Figure 3.16 TOP: Schematic drawing of a HeNe laser. The gas mix-
ture is kept in a glass cell with mounted Brewster windows to mini-
mize optical losses. The black arrow at the output indicates the po-
larization state of the laser light. BOTTOM: Picture of a Brewster
mounted window in a real HeNe laser, here marked with the white
circle.
The output polarization is consequently fixed and very clean for this
type of laser. In figure 3.16 the polarization state of the laser is indicated
with a black arrow. The perpendicular component suffers too large a loss
in the cavity to make it above the laser threshold and the laser will not
emit light with this polarization state.
3.3.2 Intensity relations

Let us consider the relative amount of energy reflected and transmitted
from an interface. At first glance it seems like we just need to take the
square modulus of our coefficients r and t. This is not correct for the t
coefficients. The reflectance R and transmittance T are defined as ratio
of reflected or transmitted power to the incident power. That
Figure 3.17 Considering intensity relations. For incident and reflected

beams the beam diameter is unaffected, while for the transmitted
beam the diameter changes.
means areas are in play, see figure 3.17. Consider first the reflected beam
(reflected power/ incident power):
r 2
I r Ar n1 d1 E
= |r|2
Rk,⊥ = = (3.56)
I i Ai n1 d1 Ei
In the last step we used the expression for intensity from equation (2.76)
I = 1/2ε0 cn|E|2 . So for both R-coefficients it is just the square modulus
of r as we argued before10 . For the transmittance we have (transmitted
power/ incident power):
t 2
I t Ar n2 d2 E
= t2 · n2 · cos(θ2 )
Tk,⊥ = = (3.57)
I i Ai n1 d1 Ei n1 cos(θ1 )
You may wonder why is it not the square of d1 cos(θ1 ) that must be used
as we are talking about areas? But notice it is only one dimension of the
beam cross section that changes! The circle dimension “out of the paper”
10 Note, we left out the subscripts k, ⊥ on the left hand side. Orthogonal
polarizations do not interfere, so you can calculate reflectance and transmittance
separately for s- and p-polarization.
Figure 3.18 Plot of the reflectance (Rk , R⊥ ) and associated phases

(δk , δ⊥ ) for an air-glass n1 = 1, n2 = 1.5 and a glass air interface
n1 = 1.5, n2 = 1.
is unaffected leaving us with an elliptic type of beam configuration where

one axis is unaffected while the other axis is changed. These expressions
ensure Rk + Tk = 1 and R⊥ + T⊥ = 1.
Problem 3.12 Show that Rk + Tk = 1 and R⊥ + T⊥ = 1.

In figure 3.18 we show the reflectance for an air-glass interface. At the
Brewster’s angle “parallel” reflections vanish.
Example 3.13 Phases upon reflection.

In figure 3.18 we show the reflectance and phase relations for an air-glass
and glass-air interface. The first phase plot, air-glass interface, follows
directly from fresnels equations plotted in figure 3.14. Here you can see
r⊥ stays negative for all incident angles and a π phase shift is associated
with this type of reflection. For the other component rk it stays positive
until the Brewster angle then becomes negative.
A similar plot will explain the phase behavior for a glass-air interface
at angles below the critical angle θc . However, for angles greater that
the critical angle, we have total internal reflection, things become a bit
more complicated. Let us calculate the phase shift for θ ≥ θc . Recall the
Fresnel equations for glass-air (eliminated θ2 using Snell’s law)
r 2
n2 cos(θ1 ) − n1 1 − nn12 sin2 (θ1 )
rk = r (3.58)
2
n1 2
n2 cos(θ1 ) + n1 1− n2 sin (θ1 )
q
cos(θ1 ) − n 1 − n2 sin2 (θ1 )
= q (3.59)
cos(θ1 ) + n 1 − n2 sin2 (θ1 )
r 2
n1 cos(θ1 ) − n2 1 − nn12 sin2 (θ1 )
r⊥ = r (3.60)
2
n1 2
n1 cos(θ1 ) + n2 1− n2 sin (θ1 )
q
n cos(θ1 ) − 1 − n2 sin2 (θ1 )
= q (3.61)
n cos(θ1 ) + 1 − n2 sin2 (θ1 )
with n1 = 1 and n2 = n. For incident angles greater than the critical

angle θ1 ≥ θc the square root becomes negative:
q
cos(θ1 ) − i · n · n2 sin2 (θ1 ) − 1
rk = q , θ1 ≥ θc (3.62)
cos(θ1 ) + i · n · n2 sin2 (θ1 ) − 1
q
n cos(θ1 ) − i · n2 sin2 (θ1 ) − 1
r⊥ = q , θ1 ≥ θc (3.63)
n cos(θ1 ) + i · n2 sin2 (θ1 ) − 1
Both expressions are on the form:
x − iy
, (3.64)
x + iy
so we express the r parameters as:
x − iy e−iα
|r|eiδ = = |r| iα = |r|e−i2α , (3.65)
x + iy e
where δ is the phase we are looking for and α just a convenient param-
eter. Since δ = −2α = −2 arctan( xy ) we have:
q

δk
n n2 sin2 (θ1 ) − 1
tan =− , θ1 ≥ θc (3.66)
2 cos(θ1 )
q

δ⊥
n2 sin2 (θ1 ) − 1
tan =− , θ1 ≥ θc . (3.67)
2 n cos(θ1 )
These phase shifts are plotted on figure 3.18. This is not just a long
calculation - the result can be used to build useful optical components.
As you can see in the figure the difference in phase shift for the two po-
larization components can reach π/4 for standard glass. So, two bounces
without losses make a phase shift large enough to convert linearly polar-
ized light into circularly polarized light over a wide range of wavelength
limited only by material dispersion. This is used in Fresnel rhombs,
named after - guess who! We will discuss more methods to manipulate
the polarization state of light in a later chapter.
Problem 3.14 Considering polaroid sun glasses or polarizers you have

played with during the optics course. How do you calibrate the trans-
mittance axis of the polarizer?
3.3.3 Metals
For metals we can also consider Fresnel’s equations. The procedure is
identical to that shown above except now we need to take into account
the fact that metals have complex index of refraction, often with the
imaginary part dominating. We will not consider this in detail here, just
show the reflectance for the two cases of a nickel and a silver surface,
see figure (3.19).
When metal surfaces are involved, say an air metal interface, we will
have a case of n1 = 1 and θ1 real while n2 = nr + ini and θ2 are complex
numbers. The complex index we defined in the previous chapter. Let us
first consider normal incidence. The (intensity) reflectance is given by:
n1 − n2 2

R =
n1 + n2

n1 − n2 n1 − n2
= ·
n1 + n2 n1 + n2
(nr − 1)2 + n2i

= (3.68)
(nr + 1)2 + n2i
where the overbar denotes complex conjugation11 . This is the result we

used in equation (2.117) and the corresponding plot.
In figure 3.19 we show the reflectance of metallic silver and nickel at
532 nm. The plots are obtained from
q 2
1
cos(θ1 ) − 1 −
n22
sin(θ1 )2
R⊥ = q
1
cos(θ1 ) + 1 − sin(θ1 )2

n22
q 2
1
n2 cos(θ1 ) − 1 −
n22
sin(θ1 )2
Rk = q
n2 cos(θ1 ) + 1 − 1 2
sin(θ1 )

n22
(3.69)
where n1 = 1 is assumed equal and n2 = nr + ini .
Characteristic for metals is the absence of a genuine Brewster’s angle
where the reflectance goes to zero. Metals will typically have a minimum
in the Rk component and thus also be able to marginally polarize light
upon reflection. If the imaginary part ni nr then R → 1 and the
difference between Rk and R⊥ will be negligible. This happens, when
the conductivity is good. This explains the difference in the reflectance
of nickel and silver, as silver has a far better conductivity than nickel.
11 In the derivation we used

z1 z1
=
z2 z2
Figure 3.19 Plot of the reflectance Rk and R⊥ for an air-metal in-

terface at 532 nm. For silver we used n2 = 0.129 + i · 3.149 and for
nickel n2 = 1.737 + i · 3.193.
4
Geometrical optics
In this chapter we consider the optical imaging using lenses or a collec-

tion of lenses and mirrors. Again, we will ignore any diffraction phenom-
ena and consider only the straight line trajectory of light corresponding
to the limit λ → 0. Our main task in this chapter is to understand how
the shape of an optical determines where an image of a point source
is formed and how “perfect” this image is. While studying this intri-
cate question, Fermat’s principle will help us and be very handy, indeed.
In the last section we will introduce “ray tracing” as a universal and
powerful design tool for paraxial optical systems.
4.1 Optical elements

The optical elements we have in mind consist of:
• Sources, such as ideal point sources or plane wave sources.
• Mirrors which could be flat or curved.
• Lenses, which we study in two groups: spherical lenses and aspher-
ical lenses
In practice, an expanded and collimated laser beam can serve as a very
good approximation to a plane wave source. A point source can be sim-
ilarly constructed by letting a plane wave hit a small opening, an aper-
ture. Mirrors are made from metals, metal coated surfaces or coated
glass surfaces. Lenses are typically made from glass or similar dielectric
materials.
In order to form a perfect image of a perfect source, we place an
optical element, which redirects rays from the source in some manner,
at some distance from an idealized point source, as shown figure 4.1.
Figure 4.1 An optical element is depicting the image of a point

source. The object (here it is a point source) is placed at a distance of
So from the optical element, while the image is formed at a distance
of Si . Note that the optical element may be a lens or even a mirror.
We assume that the optical element is made of materials with a real

index of refraction. Any absorption or scattering losses are not taken
into account. To keep things simple, we use in illustrations transmissive
optical elements, like lenses or windows and assume that light rays travel
from left to right. Once you are more familiar with this type of drawings,
you will see that it is easy to accommodate also mirrors (e.g. by unfolding
the beam path). You should also convince yourself at some point, that
in almost all cases you can follow the beam path backwards from right
to left and have a valid description of an optical element1 .
Here, a few remarks about notation: With our convention of looking
at transmissive elements first, we can divide the xy-plane2 into an object
space and an image space. Distances in the (left) object space are de-
noted with subscript ’o’, similarly with subscript ’i’ in the (right) image
space. Most optical elements we discuss are mirror symmetric around
the x-axis, so we call this privileged axis often the optical axis. Dis-
tances on a ray along the optical axis to some reference point inside or
on the surface of an optical element we refer to as So and Si in object
and image space, respectively. Our most frequent task in the following
subsections will be to establish a relationship between So and Si .
1 We will see later in chapter 5 on polarization, that not all lossless optical
elements are reciprocal.
2 I.e. the paper plane in the drawings. Note that we implicitly assume either
rotational symmetry around the x-axis or translational symmetry along the
z-axis. We also do not consider skew rays – rays in 3-D which are not parallel to
the x-axis, but never intersect it.
94 Geometrical optics
4.1.1 Aspherical surfaces

To determine the shape of an optical element that forms an ideal image
of a point source, we split the task into two subtasks. We first look for
the surface that transforms waves (rays) from a perfect point source into
parallel plane waves (rays). Once we have achieved that, it is straight-
forward to solve the second part. We just need to apply the first solution
in reverse! In figure 4.2, we show a point source, which is placed at 0 on
the optical axis at a distance So to the curved surface. We assume that
light from the source will strike a reference plane Q located at Si inside
the optical element as a bundle of parallel light rays. Invoking Fermat’s
principle, we demand a constant optical path length for all rays arriving
at the reference plane:
OPL = n0 So + n1 Si = n0 do + n1 di = constant.
Accordingly, the travel time from 0 to the plane Q must be the same
Plane Q
Figure 4.2 Geometry for a curved surface collimating light rays from
a point source.
for all rays. Introducing the cartesian coordinates (x, y) for points on
the surface, we can rewrite the equation as:
p p
n0 So + n1 Si = n0 x2 + y 2 + n1 ((So + Si ) − x)2 .
p
When we transform to polar coordinates, r = x2 + y 2 and x = r cos(φ),
we find accordingly:
(n0 − n1 )S0
r(φ) =
n0 − n1 cos(φ)
for the shape of the surface achieving our task. This curve will be a
hyperbola, when n1 > n0 and an ellipse, when n0 > n1 .
Conic sections
If you remember the more advanced bits of your education in analytical
geometry, you will recognize the last equation as the parametric repre-
sentation of a conic section. A conic section is the curve formed by the
set of points lying on the intersection of the surface of a (double) cone
with a plane as illustrated in Fig.4.3.
In polar coordinates, (r, φ), the conic section has the form;
·d
r= ,
1 − · cos(φ)
where is the eccentricity, a measure of how strongly curved a conic
section is. When < 1, the curve is an ellipse. When = 1, it is a
parabola. For > 1, it is a hyperbola3
Problem 4.1 Show that the optimal shape r(φ) is indeed a hyperbola
for n1 > n0 and an ellipse for n0 > n1 . Establish = n1 /n0 for the
hyperbola and = n0 /n1 for the ellipse. What happens when n0 = n1 ?
Plot the curves for n0 = 1 and n1 = 1.5 and vice versa.
Aspherical surface: From point source to parallel beam; (4.1)
(n0 − n1 )S0
r(φ) =
n0 − n1 cos(φ)
Ellipse for n0 > n1

Hyperbola for n1 > n0
Alternative derivation of equation (4.1)

3 Notice, the three cases can be brought to the form:
a · (1 − 2 )
r= for < 1
1 − · cos(φ)
2a
r= for = 1
1 − cos(φ)
a · (2 − 1)
r= for > 1
1 − · cos(φ)
where a is a distance that characterizes the conic section.
Figure 4.3 Conic sections are obtained by slicing cones at various

angles. Generally, four different types of curves may appear: circles,
ellipses, parabolas, or hyperbolas.
We employ now Fermat’s principle in the more traditional way. Let

us look explicitly for a stationary point, rather than reading off the
stationary optical path length directly from the geometry. Even though
the last method is perfectly ok, here we will present an alternative way
in order to illustrate the method used in the remainder of this chapter.
As always, it begins with writing up the optical path length OP L (see
figure 4.2). We have:
OP L(φ) = n0 r(φ) + n1 ((So + Si ) − r(φ) cos(φ)), (4.2)
since we want to analyze the distance r(φ) as a function of the angle φ.
According to the Fermat’s principle, we should look for an extremum,
therefore,
dOP L
0= = n0 r0 (φ) − n1 (r(φ) sin(φ) − r0 (φ) cos(φ)). (4.3)
dφ
Now, we need to solve the differential equation:
r0 (φ)(n0 − n1 cos(φ)) + n1 r(φ) sin(φ) = 0, (4.4)
subjecting the condition; r(0) = So . Accordingly, the solution to this
Figure 4.4 Aspherical double surface as a perfect lens for imaging a

point source.
differential equation is
C1
r(φ) = (4.5)
(n0 − n1 cos(φ))
where C1 is a constant, which needs to be determined from r(0) = So

and which can be written in the following form;
(n0 − n1 )So
r(φ) = (4.6)
(n0 − n1 cos(φ))
as we had before. Notice that we have to assume n0 − n1 cos(φ) 6= 0.

Think about what this condition means!
Now that we have converted rays from the point source into perfect
parallel beams we can easily complete our task and construct an ideal
lens. We just mirror our surface at the plane Q, as shown in figure 4.4,
accordingly we will have a perfect image of our point source. Note that
we could also tune the distance to the image point by either inserting
a slab of glass in the middle and/or by choosing the second surface
with a different eccentricity. Up to here, we have assumed light from the
source to be perfectly monochromatic. But we know that the index of
refraction varies with the wavelength, so for a polychromatic source we
will in general observe an image position which varies with the color of
the light!
4.1.2 Imaging with spherical surfaces

Aspherical surfaces of high optical quality are, alas, very difficult to pro-
duce. It is possible, but the production costs are typically quite high. On
the other hand, spherical surfaces are comparably cheap to produce and
can be polished by simple means to extremely high optical quality4 . For
that reason, optical elements with spherical surfaces are in widespread
use and hence important to understand, even if they do not provide ideal
imaging.
Let us consider a spherical surface with radius of curvature R, as
shown in figure 4.5. We have:
OPL = n0 Lo + n1 Li (4.7)
2 2 1/2
= n0 ((So + R) + R − 2R(So + R) cos(φ))
+
n1 ((Si − R)2 + R2 + 2R(Si − R) cos(φ))1/2 ,
(4.8)
where we used cos(π − φ) = − cos(φ) in the last term. We now apply

Fermat’s principle:
dOPL 2R(So + R) sin(φ) 2R(Si − R) sin(φ)
= 0 = n0 − n1
dφ 2Lo 2Li
Accordingly,
n0 (So + R) n1 (Si − R)
=
Lo Li
We rewrite the equation by collecting the terms with R and get:

n0 n1 1 n1 Si n0 So
+ = −
Lo Li R Li Lo
We now apply the paraxial approximation by assuming only the very
central part of the lens to come into play, i.e., both h and φ in Fig.4.5
are small. Doing this, we can approximate Lo ' So and Li ' Si and
finally we obtain:
4 High optical quality means low surface roughness, measured in minute fractions
of the wavelength, and small deviations from the nominal shape on larger length
scales.
Figure 4.5 Single spherical surface with radius R used as a lens. The
point C denotes the center of the circle.
Imaging with a single spherical surface: (4.9)
n0 n1 n1 − n0
+ =
So Si R
Everything in 4.9 is now expressed in terms of quantities not dependent

on any particular ray path. This links the geometry and material param-
eters object distance (So ), image distance (Si ) and index of refraction
for the object space and lens (n0 and n1 , respectively) together, i.e. we
can determine any of them, when knowing the three others.
This equation tells us where inside the medium with index n1 an image
is formed in the paraxial approximation. If you cannot immediately come
up with an application example where this is important, take a look in
the mirror!
To interpret the equation further and to introduce important quan-
tities, let us now assume that Si = ∞. Accordingly, we transform light
from a point source on the optical axis to parallel rays inside the medium
n1 , which represents the lens. Then we have:

1 1 n1 − n0
= .
So R n0
The object distance So in this special case we call the (object side) focal
length fo . In this case:

n0
fo = R = 2R, for n0 = 1 and n1 = 1.5
n1 − n0
For the opposite case S0 = ∞ we obtain:

n1
fi = R = 3R, for n0 = 1 and n1 = 1.5
n1 − n0
as the (image side) focal length fi . Our optical element has only one
spherical surface. In the first considered special case beams are parallel
in the n1 medium (glass) while in the second case, beams are parallel
the n0 medium (air). Evidently, the two focal lengths are not equal to
each other.
Example 4.2 Interpreting eq.4.9 In this example we apply eq.(4.9)

to more cases and point out important sign conventions to be used when
working with the formula. We begin with looking at 3 cases (case 1-3) by
inspecting the formula, which is derived in the paraxial approximation
(PA). When we derived equation (4.9), we implicitly assumed light rays
propagating from left to right. This convention is very helpful to avoid
an odd number of sign errors and we will stick to it in the following. Also
for convenience, we assume rays going from air to glass, i.e., n1 − n0 is
a positive number.
Let us analyze the equation (4.9), for the figure 4.6.
1. In this case, we investigate what happens when the object is grad-

ually moved away from the spherical surface starting from a point
already to the left of the object side focal length, i.e. So > fo and
increasing (dashed ray path in fig.4.6 panel 1). We observe that the
image location Si will move towards the left. In the extreme case,
where So → ∞, the incoming beam will be parallel to x-axis and the
image will be moved from Si → Si0 . Thus, for an object to the left of
fo the leftmost possible image position inside the lens is Si0 coinciding
with the image side focal length fi defined before.
2. Let us analyze what happens when So becomes small, in particular
smaller then the object side focal length fo , in other words, when
the object is moved close to the refracting surface. As you can see
from figure 4.6 part (2), when So is smaller than fo , rays inside the
lens diverge from the optical axis. We can nevertheless extrapolate
those rays as straight lines backwards into the object space to find
the intersection with the optical axis, which defines the location of
LEFT RIGHT
(1) normal
x-axis
(2) normal
x-axis
(3)
c x-axis
Figure 4.6 Three examples of ray pathes through a single spherical

surface. The center of curvature of the spherical surface is denoted
by c.
the image. The image distance Si becomes negative, in the sense that
the (virtual) image point sits on the object side of the surface and to
the right of the object at So . This you may also see from equation
(4.9). On the right hand side, we have a positive number, n1 − n0
is positive, and R is positive. But when So becomes small, at some
stage we have
n1 n1 − n0 n0
= − , (4.10)
Si R So
so Si must become negative. We interpret this, as written above, as
both object and (virtual) image being located on the object side.
3. In the last case we look at a spherical surface with the center of
curvature flipped to the opposite side, as shown in panel 3 of the
figure. For this case, we assume an incoming parallel beam So → ∞.
Applying the same extrapolation as before we can see the image must
be to the left of the lens, i.e., again on the same side as the object.
As before, we must count Si as a negative value. However, if Si is

negative, in order to obtain a sound equation (4.9) R must be negative
as well. This fixes the sign convention for the radius of curvature. A
radius of curvature must be considered positive (negative)
if the center of curvature is located to the right (left) of the
spherical surface.
Example 4.3 Lens makers formula Now, we we can build a con-
LEFT RIGHT
x-axis
Figure 4.7 Lens of thickness d.
ventional lens from a slab of glass by polishing the two end faces into
spherical shape. Let us apply what we have just learned and deduce the
lens makers formula for a thin lens of thickness d → 0, as sketched in
fig.4.7. We have to consider the effect of the two surfaces sequentially and
be mindful about our sign conventions, when we stitch the two results
together. For the effect of the first surface we obtain:
n0 n1 n1 − n0
+ = . (4.11)
S1o S1i R1
To continue we need to look at the case, where the beam starts inside
the lens and meets the second spherical surface with radius of curvature
R2 . Here, we obtain:
n1 n0 n0 − n1
+ = . (4.12)
S2o S2i R2
These two formulas are considered exact within the paraxial approxima-
tion. So far, we have not yet used our assumption about the vanishing
thickness of the lens (d). From the geometry, we observe that d + S2o =
−S1i . With our thin lens assumption d → 0 we have S2o = −S1i . Ac-
cordingly, we can combine the two equations (4.11) and (4.12) to arrive
at:
n0 n0 1 1
+ = (n1 − n0 ) − . (4.13)
S1o S2i R1 R2
Finally, for the compound system we identify the object distance as S1o
and the image distance as S2i to find:

1 (n1 − n0 ) 1 1
= − . (4.14)
f n0 R1 R2
and
1 1 1
= + . (4.15)
f So Si
These two equations are so important, that they deserve a box and a
personal name.
Having a thin5 lens with two spherical surfaces of radii R1 and R2 one
may attribute an unique focal length and it is given by:
Lens makers formula: (4.16)

1 n1 − n0 1 1
= −
f n0 R1 R2
Imaging formula: (4.17)
1 1 1
= +
f So Si
The equations (4.16) and (4.17) are the two central equations for geo-
metrical optics! We will derive these equations in a more elegant way
using ray tracing matrices presented in the next section. For now, we
will just illustrate the use of lenses and of the lens makers formula.
Example 4.4 Symmetric positive and negative lenses
We start with some gymnastics with signs. Consider a glass lens in air
(n0 < n1 ) with radii of curvature R1 = −R2 = R. This lens type is
called by its shape a symmetric biconvex lens6 . The lens transforms
5 A thin lens has no thickness, according to the definition. That means the focal
length is measured relative to its center.
6 It looks like a lentil seed – and this is how the optical lens got its name, as is still
quite evident in Danish or German language.
parallel rays from the object side to converging rays on the image side,
accordingly it is called a converging or positive lens. The opposite part-
ner with R1 = −R2 = −R is called a symmetric biconcave lens. This
lens transforms parallel rays from the object side instead to diverging
rays on the image side and is hence referred to as a diverging or nega-
tive lens. Can you figure out what happens, when n0 > n1 as for an air
bubble in water or a concave air gap in glass? Looking at the signs, we
conclude that the role of the shapes will simply switch.
In the following examples we will image object points which are not
on the optical axis. In order to find graphically the location of the image
it is very convenient to put an arrow vertically up from the optical axis
with the tip on the object point. Now, we can easily draw three rays
from the object point source to locate the image point. The first ray
starting parallel to the optical axis has to cross by definition the optical
axis at the image side focal point fi . The second ray we draw, intersects
the optical axis at the center of the lens7 . This ray will not be deflected
by the lens. The third ray to draw from the object point crosses the
optical axis in the object side focal point fo . This third ray will be
transformed by the lens into a ray parallel to the optical axis, according
to the definition of the object side focal length. If we have set up the
drawing correctly, the three8 transformed rays will intersect in one point
located at the image of the object.
Example 4.5 Imaging with a thin lens

Look at figure 4.8. There we show imaging of an upward arrow using
a thin positive lens.
The lateral magnification of the arrow is given by M = yi /yo =
−Si /So as it is seen from the triangles 4OQyo and 4P Qyi .
Let’s assume now the arrow is positioned at So = 30 cm and the focal
length of the lens is f = 20 cm. Where will the image be formed? What
is the magnification? By using the imaging formula, we obtain Si = +60
cm, accordingly the image is real. The magnification is M = −Si /So =
−2, hence the imaged arrow is flipped and magnified by a factor of two.
7 This is convenient for thin lenses. The proper prescription for thick lenses will be
given later.
8 Of course, we can do with just two of those rays, but it is nice to use also the
third one as a sanity check. Do that with pencil and ruler in the next figures!
Figure 4.8 Imaging with a thin positive lens. A thin lens has no
extension and may be considered as a line. This is consistent with
the ray yo Qyi through the center not deviating when traversing the
lens.
Magnification of a lens: (4.18)
−Si
M=
So
Problem 4.6 What is the magnification if So = f ? What is it if So →

∞, as typically in a telescope? In those cases when we talk about the
magnification of an optical system we will not use M but rather use the
angular magnification, linking observation angles to image dimensions or
object dimensions to image angles. Try to construct a sensible definition
of angular magnification!
Example 4.7 Virtual image formed by a positive lens

In figure 4.9 we show the case where So < f . In this case a virtual image
is formed. For example, we take f = 30 cm and So = 10 cm. From the
image formula, we find Si = -15 cm, consequently the image is formed
at the same side as the object is. The magnification is calculated as
M = −Si /So = +1.5 so the image is not flipped and magnified by a
factor 1.5. This is the way you use a positive lens as a magnifying glass.
Example 4.8 Image formed by a negative lens

In figure 4.9 we also show imaging with a negative lens. For f = -30
cm and So = 20 cm, we find Si = -12 cm. The images are formed on
the same side as the object. The magnification is calculated as M =
−Si /So = +0.6. The image is not flipped and magnified by a factor 0.6.
Figure 4.9 Virtual images formed by a positive and negative lens.
You can use a negative lens to increase your field of view.
In table 4.19 we summarize sign conventions for spherical surfaces (see

the figure 4.10).
Sign convention for spherical surfaces: (4.19)
So , fo positive if located left of V
Si , fi positive if located right of V
R positive when C is right of V
xo positive when left of object focal point
xi positive when right of image focal point
Here xo and xi refer to the quantities xo = f + So and xi = f + Si .

Using these, we can obtain Newton’s image formula:
xo · xi = f 2
This is a very general formulation that also applies to thick lenses.

Figure 4.10 Sign convention for curved surfaces and lenses. When the
center of curvature C is to the right of of the reference vertex point
V, we associate a positive radius of curvature R > 0.
Problem 4.9 Demonstrate Newton’s image formula from above. Begin

with making a good drawing. Check that the sign conventions make good
sense here!
4.1.3 Common spherical lens errors

We have seen already that lenses with spherical surfaces do not provide
perfect imaging. We now discuss the deviations from ideality and what
we can do about them. The two most common lens errors are chromatic
aberration and spherical aberration. Chromatic aberration occurs
because the index of refraction changes with the wavelength. For the
common case of normal dispersion this means blue light “bends” more
than red light, as sketched in Fig.4.11. For a given lens, the focal length
for blue light will be shorter than the focal for red light, leading to
a colored blur in images. This lens error can be mitigated by using
an achromatic doublet (or achromat) in which two lenses of materials
with differing dispersion (dn(λ)/dλ) are cemented together to form an
optical element. This reduces the amount of chromatic aberration over
a certain range of wavelengths, typically some 100 nm, consequently, it
does not produce perfect correction over the entire visible range. For
more complex optical systems, which are a combination of two or more
lenses, such as an eyepiece of a microscope, it is also possible to minimize
chromatic aberrations by clever choice of distances. We will show this in
the next section using matrix methods.
Figure 4.11 Illustration of chromatic and spherical aberration.
Spherical aberration occurs as the spherical shape is not ideal for the
imaging. We abandoned the aspheric surfaces as they were generally too
expensive to produce. A convex spherical lens bends rays which enter
far from its center more strongly compared to the ideal hyperbolic lens.
This means that parallel incident light rays far away from the optical
axis will come to a focus closer to the lens than rays entering close to
the center, leading again to a blurred focus. This is the prize to be paid
for the paraxial approximation. To minimize spherical aberration, we
must restrict us to use the lens area, which is close to the center and
compromise between image sharpness and image brightness. One has
to invest in an aspherical or more exotic lens9 , if applications are very
critical.
Problem 4.10 Would you expect the largest spherical aberration to

be found in a lens with small f or large f ?
Example 4.11 Beyond the paraxial approximation

While deriving equation (4.9) from equation (4.9), we assumed all of the
9 Modern production techniques allow to produce also lenses with a gradient of the
refractive index both in the radial and axial directions, which gives more design
freedom to optimize the performance of a lens.
angles θ and φ shown in Fig.4.5 to be small:

θ3 θ5
sin θ ≈ θ − + + ··· ,
3! 5!
and we kept only the first order therm. This is the essence of the parax-
ial approximation. When we perform this calculation for the next non
vanishing order, i.e., the third order, we get:

n0 n1 n1 − n0 2 n0 1 1 n1 1 1
+ = +h + + − .
So Si R 2So R So 2So R Si
where h is the ray height above the optical axis, when the ray enters
the lens. Compared to the paraxial expression in eq.(4.9) we have an
additional term on the right hand side, showing the leading correction
to our first order calculation.
For a converging lens, rays that are closer to the edge (which have
larger h) bend more compared to those which are closer to the center
of the lens (which have smaller h). Therefore the edge rays will come to
the focus earlier/ faster. For the special case of parallel beams entering
a biconvex lens, based on the equation (4.20), it can be shown that the
focal length f (h) changes with ray height h as:
1 1 1 1
− = − = A · h2 , (4.20)
f (h) f (0) Sh Si
where Sh is the image distance for a ray entering at height h and Si is
the paraxial image distance. The constant A is given by:
n2

1 3n + 2
A= + (4.21)
f (0)3 8n 8(n − 1)2
where f (0) is the usual paraxial focal length, i.e., the focal length of the
lens in the limit h → 0. This correction describes the leading order of
the spherical aberration and it occurs already for monochromatic beams.
When optical aberrations such as the spherical aberration distort even
monochromatic light, the effect is called monochromatic aberration.
Example 4.12 Lenses and their performance
Plano-Convex lenses are a good choice for focusing parallel rays of
light to a single point. These lenses can be used to focus, collect and
collimate light. In order to minimize the spherical abberation by using a
single lens, one must limit the active used area of the lens to one third of
its radius. The optimum case occurs when the object is placed at infinity
(parallel rays entering the lens) and the final image is a focus point. Al-
though infinite conjugate ratio (object distance/image distance) is at the
Figure 4.12 Most common lens types.
optimum, plano-convex lenses will still minimize spherical aberrations

up to approximately 5:1 conjugate ratio. For optimum performance, the
curved surface must face the largest object distance or the infinite con-
jugate for reducing the spherical aberration.
Bi-Convex lenses are the best choice when the object and image are
at equal or approximately equal distances from the lens. When the object
and image distances are equal (1:1 magnification), not only the spherical
aberration is minimized, but also the coma, distortion10 , and chromatic
aberration are identically canceled due to symmetry. As a guideline,
bi-convex lenses perform with minimal aberrations at conjugate ratios
between 5:1 and 1:5. Outside this magnification range, plano-convex
lenses are usually more suitable than the bi-convex lenses.
Plano-Concave lenses are the best choice where the object and
the image are at absolute conjugate ratios greater than 5:1 and less
than 1:5, in order to reduce spherical aberration, coma, and distortion.
Plano-Concave lenses bend parallel incoming rays and the outgoing rays
diverge, therefore, they have a negative focal length. The spherical aber-
ration of a Plano-Concave lens is negative and can be used to balance
aberrations created by other lenses. Similar to the Plano-Convex lenses,
the curved surface should be placed with the object at the farthest dis-
tance or the infinite conjugate11 to minimize spherical aberration.
Bi-Concave lenses are the best choice when object and image are
at absolute conjugate ratios closer to 1:1 with converging input beam.
The output rays appear to be diverging from a virtual image, which is
10 We do not discuss those lens aberrations further in these notes. For a good
discussion with illustrations see Ch.6.3 in Hecht ”Optics”.
11 Except when used with high-energy lasers, where this should be reversed to
eliminate the possibility of a virtual focus, which might be harmful.
located on the object side of the lens; the distance from this virtual point
to the lens is known as the focal length. Similar to the Plano-Concave
lenses, the Bi-concave lenses also have negative focal lengths, thereby,
they affect collimated incident light to diverge. Bi-Concave lenses also
have equal radius of curvatures on the both sides of the lens. They are
generally used to expand the light or increase the focal length in existing
systems, such as the beam expanders and projection systems.
Positive Meniscus lenses are designed to minimize spherical aber-
ration and are generally used in small f/number applications (f/number
less than 2.5). The Positive Meniscus lenses have a larger radius of curva-
ture at the convex side, and a smaller radius of curvature at the concave
side. They are thicker at the center compared to the edges. Positive
meniscus can maintain the same angular resolution of the optical sys-
tem while decreasing the focal length of the other lens, resulting a tighter
focal spot size. A positive meniscus lens can be used to shorten the focal
length and increase the numerical aperture of an optical system when it
is paired with another lens. For the best performance, the curved surface
should face the largest object distance or the infinite conjugate in order
to reduce the spherical aberrations.
Negative Meniscus lenses are designed to be an alternative option
to other negative lenses. Without causing additional spherical aberra-
tion, negative meniscus can increase the divergence of the beam, making
it a good choice for beam expanding application. The Negative Menis-
cus lenses can be used to increase the focal length of another lens while
maintaining the same angular resolution of the optical system. The Neg-
ative Meniscus lenses have a small radius of curvature on the convex side
and a larger radius of curvature on the concave side. They are thinner
at the center compared to the edges.
Achromats (Achromatic Doublets) consist of a positive low-index
crown glass lens (low dispersion, high Abbe number) element cemented
to a negative high- index flint glass lens (high dispersion, low Abbe num-
ber) element. The elements are chosen to cancel chromatic aberrations
at two well separated wavelengths; usually in the blue and the red region
of the spectrum. Achromats are used to bring two wavelengths into fo-
cus in the same image plane, thus, shifts of the focal length are virtually
eliminated across a considerable range of visible wavelengths. Usually,
these lenses are computer designed from the manufactory to effectively
minimize spherical aberration and coma when operating at an infinite
conjugate ratio. Unlike the singlet lenses, this results in a constant focal
length independent of aperture and the far better off-axis performance.
Freedom from the spherical aberration and the coma means that the
achromats are superior to the singlet lenses for monochromatic applica-
tions at any visible wavelength.
4.2 Ray tracing

Ray tracing is a very powerful tool for analyzing an optical system both
analytically and with the help of computers. In its simplest incarnation
we assign to each optical element a 2 × 2 matrix which accounts for
its behavior and characteristics. One limitation of this method is that
all optical elements must be perfectly aligned with the optical axis. We
cannot account for tilts and rotations. Other methods based on a 3 × 3
and 5 × 5 matrix formalism can account for these perturbations. These
we will not consider here. It is also important to notice that we can only
treat linear systems i.e. optical elements which transform ray angles and
ray distances from the optical linearly. The standard rules of matrix
algebra allow us then to compose optical systems out of simple elements.
Referring to figure 4.13 we count angles corresponding to a positive slope,
such as α0 , for positive while those associated with negative slopes we
count as negative.
For a given optical element we will look for the position of a ray
to begin with and its horizontal angle (y0 , α0 ). The task is to specify
the ray after the interaction with the optical element12 , i.e. to predict
(y1 , α1 ) given (y0 , α0 ). First, we will look at the simplest optical element,
namely the free space propagation through a distance L. According to
the geometry defined in figure 4.13 we have:
α0 = α1 and y1 = L tan(α0 ) ' Lα0 .
as we adopt the paraxial approximation13 . Put in matrix form we have:

y1 1 L y0
= .
α1 0 1 α0
12 There is no law that says, we have to represent a ray by a column vector (y, α).
Different authors use different representations. For instance, in Hecht’s “Optics”
column vectors (α, y) are used. Be careful not to mix element matrices based on
different conventions!
13 Keep in mind that you have to measure angles in radians for the approximation
to make sense!
4.2 Ray tracing 113
Figure 4.13 Principle of ray tracing. Input at the starting point

(x0 , y0 ) allows us to propagate the light to the next point (x0 +L, y1 ).
Input consists of a y-coordinate and an angle α0 . At the final point
the output is a y-coordinate and an angle α1 . It is the job of ray
tracing to specify (y1 , α1 ) given (y0 , α0 ).
The matrix M is called the transfer matrix:

1 L
M= .
0 1
Example 4.13 Propagation through a distance L1 + L2

Just to get the idea, let us consider propagation though a distance L1
and L2 . We know that this must be L = L1 + L2 .

1 L2 1 L1
Mtot = (4.22)
0 1 0 1

1 L1 + L2
=
0 1
Example 4.14 Propagation through (n0 , n1 ) interface
Looking at the figure 4.14, we are looking for the matrix M for the
interface n0 , n1 . Notice that, here: y0 = y1 , since we are right at the
interface. For the angles we apply Snell’s law:
n0 sin(α0 ) = n1 sin(α1 ) or n0 α0 = n1 α1 ,
again applying the paraxial approximation. We finally obtain:

1 0
M= .
0 nn01
Figure 4.14 Ray tracing at an dielectric interface (n0 , n1 ). We adopt

the paraxial approximation such that sin(α) ' α.
Example 4.15 Slab of dielectric material

We now consider a material of index n1 and length d surrounded on both
sides by material with index n0 . As before, we try to find the transfer
matrix. Remembering the procedure used in the previous two examples
we can compose this as:

1 0 1 d 1 0
M= n1 n0 (4.23)
0 n0 0 1 0 n1
d nn01

1
= .
0 1
Notice the order, beginning from the first element M1 we meet (right-
to-left), which should be the last matrix, so M = M3 · M2 · M1 .
In conclusion, we have:
Transfer matrix for N elements: (4.24)
Mtot = MN · . . . M2 · M1 .
4.2 Ray tracing 115
Transfer matrix for various elements: (4.25)
Refraction at a spherical surface R, n1

1 0
M= n0 −n1 n0
n1 R n1
Refraction at n0 , n1 interface

1 0
M= n0
0 n1
Thin lens R1 , R2 material n1

" #
1 0
M=
− f1 1
Propagation through length L

1 L
M=
0 1
with
1 n1 − n0 1 1
= − .
f n0 R1 R2
Problem 4.16 Given a transfer matrix for an optical element (lens):

A B
M= ,
C D
shows that C = −1/f , where f is the focal length. Inject a ray with a
given y0 and α0 = 0, i.e., a beam parallel to the optical axis.
Example 4.17 The refraction matrix for a spherical surface
Consider a spherical surface of of radius R as shown in fig.4.15 where

the surrounding medium has index n1 and refracting material has index
n2 . We construct the transfer matrix by considering refraction of a ray
at point O in the paraxial approximation where:

y2 A B y1
= (4.26)
α2 C D α1
and obviously we have y1 = y2 = y. In the following, we search for the

expressions of A, B, C and D.
Y
q1 O a2
y
q2
R
a1
n1 n2
Figure 4.15 Refraction at a spherical surface.
Let us start by looking at the involved angles α1 , α2 and use Snell’s

law. In the small angle approximation, the acute angle between the line
CO and the optical axis is y/R, so we can write
y
α1 = θ1 − , (4.27)
R
and
y
α2 = θ2 − , (4.28)
R
(see fig.4.15). Now we apply Snell’s law n1 sin(θ1 ) = n2 sin(θ2 ), which in
the paraxial approximation becomes n1 θ1 = n2 θ2 and we conclude that:
n1 y y
α2 = α1 + − (4.29)
n2 R R
Now we can form the refraction matrix. Since y1 = y2 = y then A = 1
and B = 0. For the element C and D, we can use the equation (4.29)
above:
" #
1 0
M= 1

n1

n1 (4.30)
R n2 − 1 n2
According to the usual sign convention: R is counted positive when C is

right of V, see figure 4.15. It is healthy to sanity check the expression by
looking what happens in the limit R → ∞. In this limit we simply obtain
the refraction at a planar dielectric interface, observing that element
4.2 Ray tracing 117
Figure 4.16 A general lens is exemplified here by a meniscus lens.

Notice using the ray tracing method we could even allow a third
index n3 to the right of the lens. With conventional methods this is
rather tedious to calculate.
C → 0 as it should. Another limit to consider is the case n1 = n2 .

Here the refraction matrix becomes the unit matrix, which also is in
agreement with our expectations.
Example 4.18 Thin lens equation

To illustrate the power of the matrix formulation consider the compli-
cated lens depicted in figure 4.16. Using equation (4.25) we can write
the matrix as:

1 0 1 d 1 0
M= n1 −n0 n1 n0 −n1 n0 (4.31)
n0 R 2 n0 0 1 n1 R 1 n1
If we study the limit d → 0 we simply obtain:

" #
1 0
M = n0 −n1 1
1
.
n0 R1 − R2 1
which returns the thin lens equation in the paraxial approximation.
4.2.1 Thick lenses

Generally, we cannot ignore the the thickness of a lens. Applying the
matrix multiplication shown in equation (4.31) gives us the coefficients

A B
M= ,
C D
with

n0 − n1 d
A=1− (4.32)
n1 R1
n0
B= d (4.33)
n
1
n1 − n0 n0 − n1 d n0 − n1
C= 1+ + (4.34)
n0 R2 n1 R1 n0 R1

n0 − n1 d
D =1− (4.35)
n1 R2
Using the C element we can formulate the thick lens equation:
Thick lens equation: (4.36)

1 (n1 − n0 ) 1 1 (n1 − n0 ) d
= − +
f n0 R1 R2 n1 R1 R2
To avoid effects of the thick lens one often uses a plano-concave or

plano-convex lens, i.e., a lens where either R1 or R2 is infinite. In that
case we are left with the thin lens equation again.
Here, an important question arises. Which reference point should be
used to measure the focal length of a thick lens? The geometrical center,
the spherical surface vertex or some other reference point? In figure 4.17
we show a thick lens. In case A a beam from the focal point is refracted
through the lens. If incoming and outgoing rays are extended as virtual
rays inside the lens, they meet in a point called a principal point. In
general, principal points for different ray angles from the object side
focal point can form a curved surface. In the paraxial approximation
these points for any ray pair lie on a plane P P 1. The intersection of this
plane with the optical axis is called the principal point H1. Using rays
from the object and from image side focal points, we can construct the
two principal planes P P 1, P P 2 and corresponding principal points H1
and H2. These points ideally serve as reference points to define the focal
length of the lens. Using the A and B elements of M we can deduce the
4.2 Ray tracing 119
A R1 R2 B R1 R2
V1 V2
H1 H2
O ff1 ff2
f f
PP1 PP2
C R1 R2
N1
q
N2
q
NP1 NP2
Figure 4.17 Principal plane and nodal planes of a thick lens. Focal
distance is referred to the principal planes. The nodal planes are
coinciding with the principal planes when the surrounding medium
of the left and right side of the lens have identical indices of refraction.
location of the principal points as:

f (n1 − 1)d
|V 1H1| = − (4.37)
R2 n1
f (n1 − 1)d
|V 2H2| = − . (4.38)
R1 n1
So, the focal lengths of a thick lens are referenced to the two
principal planes. This also makes sure the focal length of a lens is
unique, when immersed in a single substance.
From the thin lens we are used to another special ray, the central ray,
which travels from an off-axis object point without angular deflection
and without displacement through the center of the lens. To find the
analogous ray for the case of a thick lens, we look for a ray which travels
from the object point without angular deflection through the lens, but
we have to allow for a parallel displacement on the image side. Extending
the object and image side rays as virtual rays inside the lens, we find
the intersection points of the virtual rays with the optical axis. Those
Figure 4.18 Schematic drawing showing the location of the principal

planes for different lens types.
points are the nodal points N 1 and N 2, with corresponding nodal planes.
For the simple case of a thick lens surrounded by the same medium on
the object side and the image side, the principal points and the nodal
points are at the same location. In more general settings they need not
coincide. Knowing the location of principal and nodal points for a lens
can simplify the task of ray tracing considerably, which is why they are
typically determined early when designing a lens.
4.2.2 Combination of two thin lenses

Imagine two thin lenses with focal length f1 and f2 separated by a
distance L. You can show, using the matrix method, that the focal length
of the combined lens system is given by:
Combination of two thin lenses f1 and f2 : (4.39)
1 1 1 L
= + − ,
f f1 f2 f1 · f2
which is an important result. Double lens systems can cancel the first
order chromatic effects. Consider the combination of two lenses f1 and
f2 described above. We can write their lens equations as:
4.2 Ray tracing 121

1 1 1
= (n − 1) − = K1 (n − 1) (4.40)
f1 R11 R12

1 1 1
= (n − 1) − = K2 (n − 1) (4.41)
f2 R21 R22
so the total focal length becomes:

1
= K1 (n − 1) + K2 (n − 1) − L · K1 · K2 (n − 1)2 .
f
We now want the total focal length or equivalently 1/f to be insensitive
to variations in n :

1
d f
0= = K1 + K2 − 2L · K1 · K2 (n − 1).
dn
We solve for L and obtain:
1
L = (f1 + f2 ),
2
which is so important in optical system design that it deserves its own
box:
Double lens system to minimize chromatic aberrations: (4.42)
1
L= (f1 + f2 )
2
Would this only apply to thin lenses? No – you should check that we
can use thick lenses, as long as we have R1 = ∞ or R2 = ∞ i.e. one of
the lens surfaces is flat (plano).
5
Polarization of light
This chapter deals with the polarization state of light. We discuss the
different polarization states of light: linear polarized, circular polarized,
and elliptical polarized light. We give two mathematical toolboxes to
describe the state of polarization – one for strictly monochromatic light
and a second more general one applicable also to incoherent mixtures of
light waves. Of course, we also want to know how we can change and
control the polarization state of light in practice. This will lead us from a
discussion of birefringent materials to optical devices, that can perform
almost any desired preparation and transformation of light polarization.
We know that electromagnetic waves are transverse, i.e. k · E = 0 at
least in free space. From this we can already conclude that there are just
two independent (orthogonal) polarizations. This makes the polarization
degree of freedom resemble a bit the two possible spin projections of an
electron. In fact, you will find in your quantum mechanics courses a few
mathematical tools that are just “borrowed”1 from optics. The analogy
is, however, not complete. In the quantum mechanical description of
light, the photon as a quantum of light is a spin one (S = 1) particle
(boson) with zero rest mass2 , which has two possible projections of its
spin onto the propagation direction. You certainly have heard about
research aimed at developing quantum computers based on photonic
qbits – one possible encoding of qbits relies on the two polarization
states of single photons.
1 ...and renamed, to cover up the theft...

2 We cannot define a center of mass frame for the photon, which is the deeper
reason why there are only two instead of three independent polarization states.
Figure 5.1 Linear (A) and elliptical (B) polarized states of light. For
the special case E1 = E2 and δ = π/2 we talk about circularly
polarized light.
5.1 Polarization states of monochromatic waves

The polarization state of light is defined with respect to the electric field
vector E. Any monochromatic E-field (plane waves) can be written as
the superposition of two orthogonal plane waves:
E(r, t) = E1 cos(k · r − ωt) + E2 cos(k · r − ωt + δ)
where E1 · E2 = 0 and δ is a constant phase. Without loss of generality
we can look at the problem for a wave propagating along the z-axis say:
E(r, t) = E1 cos(kz − ωt)x̂ + E2 cos(kz − ωt + δ)ŷ
To completely specify the direction of the E-field, that is to say, the
polarization, we need apparently to specify or know three quantities E1 ,
E2 and the phase δ. Later in this chapter you will see that the Stokes
vector, characterizing completely the polarization state of light, requires
exactly three independent measurements to be carried out. Exactly for
the above reason.
Let us look at two important polarization states. The first one where
δ = 0 we call linearly polarized light as we have dealt with before.
Here the tip of total E-field vector
p traces out a line, going from zero to
maximal amplitude of Etot = E12 + E22 , see figure 5.1 part (A).
For δ 6= 0 we have elliptically polarized light, as shown in figure
5.1 part (B). Here the E-field traces out an ellipse. The total E-field
never becomes zero, but each cartesian component in any given direction
does. When E1 = E2 and δ = π/2 the ellipse becomes a circle and we
talk about circularly polarized light. In the quantum description of
photons these polarization states (going clockwise and anticlockwise)
124 Polarization of light
are associated with the two possible ±h̄ angular momentum projections
onto the z-axis.
Remark By construction monochromatic light is always polarized. This
happens as the amplitude, frequency, and phase δ are constants. If, say
the phase fluctuated in time, then at one instant the E-vector is pointing
in some particular direction the next moment in another direction. If the
phase fluctuation behaves as random noise and all phases in the interval
0-2π are equally probable, we cannot talk about polarized light. In fact,
this case corresponds to randomly polarized light, sometimes also
called unpolarized light. On the other hand, if a light beam is polarized
we cannot conclude that it is monochromatic. Take the following E-field
as a counter example:
E(r, t) = E1 cos(kz − ωt + sin(Ωt))x̂ + E2 cos(kz − ωt + sin(Ωt))ŷ.
The field is clearly not monochromatic, yet it is perfectly polarized.

Generally, for given amplitudes and phase δ, we can express equation
(5.1) as a rotated ellipse, see figure 5.2. This can be seen by
Ey = E2 cos(kz −ωt+δ) = E2 cos(kz −ωt) cos(δ)−E2 sin(kz −ωt) sin(δ)
so
Ey Ex
− cos(δ) = − sin(kz − ωt) sin(δ).
E2 E1
But we can rewrite the last sine term in terms of the cosine term for Ex .
Then
2 2 !
Ey Ex Ex
− cos(δ) = 1 − sin2 (δ).
E2 E1 E1
On collecting terms we get

2 2
Ey Ex Ey Ex
+ −2 cos(δ) = sin2 (δ),
E2 E1 E2 E1
which is the equation of a tilted ellipse with the major axis oriented at
an angle α to the x-axis:
2E1 E2 cos(δ)
tan(2α) = .
E12 − E22
Figure 5.2 Tilted ellipse with the major axis oriented at an angle α
to the x-axis.
Polarization Ellipse: (5.1)
2 2
Ey Ex Ey Ex
+ −2 cos(δ) = sin2 (δ),
E2 E1 E2 E1
with
2E1 E2 cos(δ)
tan(2α) = , 0 ≤ α ≤ π/2
E12 − E22
Problem 5.1 Show that equation (5.1) defines a rotated ellipse in the
Ex , Ey frame, making an angle α with the Ex -axis.
5.1.1 Jones vectors

A convenient way of expressing states of fully polarized light (not par-
tially polarized light) is by using the so-called Jones vectors. Here, we
are not interested in the magnitude of the electric field only the relative
size and phase difference between Ex and Ey components. We generally
express the field as:
E1 eiϕx

Ex
E= =
Ey E2 eiϕy
A field polarized along the x-direction (horizontal), we could call E0
becomes:

1
E0 =
0
similarly a field polarized along the y-direction, we could call E90 :

0
E90 =
1
For a field linearly polarized along the + 45 degree direction we have:

1 1
E45 = √ .
2 1
A circularly polarized field, say with photons of angular momentum
projection +h̄
E1 eiϕx

iϕx 1 1 1
E+1 = = E1 e →√ .
E1 ei(ϕx −π/2) −i 2 −i
The angular momentum of light is defined with respect to the k vector
(right hand rule). Our wave is propagating in the + z direction and the
y-component lags behind the x-component by a phase of π/2, so we must
have angular momentum +h̄ in this case.
Below, we give a summary of the polarization states in the Jones
notation.
Polarized light Jones notation: (5.2)

1 0
E0 = , E90 =
0 1

1 1 1 1
E45 =√ , E135 =√
2 1 2 −1

1 1 1 1
E+1 = √ , E−1 =√
2 −i 2 i
Notice the orthogonality of the polarization states within a row and

their normalization. Optical elements such as linear polarizers and wave-
plates3 (e.g. λ/2-plate for rotating linear polarization and λ/4-plate to
3 How those work we will discuss in detail in a later section. For the moment just
produce circularly polarized light), can be expressed in terms of op-

erators represented by 2 × 2 matrices acting by the standard rules of
(complex) matrix algebra on the Jones vector.
Example 5.2 A linear polarizer oriented along the y-direction (90) is

given by:

0 0
M90 =
0 1
It will produce zero output for E0 , but 1 for E90 as expected. For E45
the output is

1 0 0 1 1 0
Eout = √ · =√ .
2 0 1 1 2 1
Hence, only 50 % of the light is transmitted.
Problem 5.3 For a E+1 state how much power is transmitted though a
M90 analyzer?
Problem 5.4 Show the orthogonality relations E0 ·E90 = 0, E45 ·E135 =

∗
0 and E+1 · E−1 = 0.
accept that wave-plates have a different optical path length for two orthogonal
linear polarization components.
Optical elements Jones notation: (5.3)

1 0
M0 =
0 0

0 0
M90 =
0 1

1 1 1
M45 =
2 1 1

1 1 −1
M135 =
2 −1 1

+iπ/4 1 0
M+λ/4 = e
0 i

−iπ/4 1 0
M−λ/4 = e
0 −i
Problem 5.5 Show that the matrix of a linear polarizer oriented with
transmission axis θ with respect to x-direction is given by:
cos2 (θ)

cos(θ) sin(θ)
Mlin (θ) =
cos(θ) sin(θ) sin2 (θ)
Problem 5.6 Show that a λ/2 plate with fast axis oriented at an angle
θ with respect to x-direction is described by the matrix:

cos(2θ) sin(2θ)
Mλ/2 (θ) =
sin(2θ) − cos(2θ)
Generally, for a series of elements the light encounters on its path, we
perform the operations element by element, leaving us with the rule:
Series of elements: (5.4)
Eout = MN · . . . · M1 · Ein
for elements in the sequence M1 , M2 , . . . , MN along the beam path.

5.1.2 Mathematical description of light: Stokes vector

We now introduce a more powerful mathematical representation, which
allows us also to deal with statistical mixtures4 of waves, i.e. partially
polarized light. To extract the three polarization parameters E1 , E2 and
δ we can perform three sets of intensity measurements. First we choose
an axis say the x-direction5 . Then, we perform the first set of intensity
measurements with a linear polarizer oriented along the x-direction with
result I(0) and along the y-direction with result I(90). The second set
of measurements is done with a linear polarizer oriented along +45 and
+135 degrees. For the final set we add a phase shift of π/2 to the field in
the y-direction and measure intensities transmitted through a polarizer
oriented along x with result I(σ+ ), and a polarizer along y with result
I(σ− ). In practice, the last part is carried out with the help of a circular
analyzer, consisting of a λ/4-plate followed by a linear polarizer M0 ·
M+λ/4 and M90 · M+λ/4 .
Recall to obtain the intensity we need to perform the time average of
the electric field hE(t)2 iT :
!
Z T
2 1 2
hE(t) iT = lim E(t) dt
T →∞ T 0
and in principle take the limit as T → ∞. For monochromatic fields

averaging over one cycle is enough, but for fluctuating fields we need to
integrate long enough to get a reasonable estimate of the limiting value.
With the results of the intensity measurements, we can now define the
Stokes vector as
4 This representation is completely equivalent to the density matrix, that you will
encounter in quantum mechanics.
5 We can choose any direction in the plane perpendicular to the k-vector.
Stokes vector: (5.5)
S0 = I(0) + I(90)
S1 = I(0) − I(90)
S2 = I(45) − I(135)
S3 = I(σ+ ) − I(σ− )
Problem 5.7 Show that for the general field in equation (5.1) the Stokes
vector becomes
S0 = E12 + E22
S1 = E12 − E22
S2 = 2E1 E2 cos(δ)
S3 = 2E1 E2 sin(δ),
where we have omitted 1/2ε0 c. You can use the Jones formalism or
project the E-field on the x- and y- direction. For S3 remember to add
the π/2 phase shift in one of the directions. Based on the measured
numbers S1 , S2 , S3 how do we extract E1 , E2 and δ? Finally Show S02 =
S12 + S22 + S32 holds for polarized light.
Problem 5.8 Let the phase δ be white noise (white random process) in
[0, 2π]. Show that S1 = S2 = S3 = 0. What is S0 ?
We summarize important properties of the Stokes vector in the fol-
lowing box.
S3
S2
A D
S1
Figure 5.3 Poincaré sphere for polarization states. Points on the unit
sphere surface represent fully polarized light. Partially polarized light
states are plotted inside the ball. Completely random polarized light
lies at the center (0, 0, 0). North and south poles correspond to right
and left circularly polarized light, while points on the equator describe
linearly polarized light. Note that walking along the equator by 90
degrees changes the linear polarization direction by only 45 degrees.
Polarized light: (5.6)
S02 = S12 + S22 + S32
Partially polarized light: (5.7)
S02 > S12 + S22 + S32
By normalizing (S1 , S2 , S3 ) with S0 we obtain a unit vector tracing out

a unit sphere for fully polarized light. The sphere is called the Poincaré
sphere, shown in Fig.5.3. If light is partially polarized it will by char-
acterized by a vector of length less than unity. Similarly to the Jones
vector formalism, we can represent optical elements that change the po-
larization states by operators performing rotations and projections of
vectors on or inside the Poincaré sphere. We will return to that once we
have discussed the working principle of wave-plates. By working with the
Poincaré sphere you will see that the formalism is compact and elegant.
In fact, it is so useful that it is used under different names in several
fields of physics.
Figure 5.4 Uniaxial crystal classes. Trigonal (calcite), tetragonal and

hexagonal systems.
5.1.3 Optically active crystals

In the previous chapters we have studied isotropic optical materials. We
assigned a unique index of refraction to the material. It may be complex,
but it is a scalar quantity. Inside these materials the index of refraction is
the same no matter which direction light propagates. This is not always
the case, especially not for crystals with a lattice of low symmetry. Now,
we will study materials and optical elements which have different indices
of refraction depending on the orientation of the crystal with respect to
the E-field. Such elements we call birefringent.
We consider crystals which have a single long axis of symmetry, called
uniaxial crystals. This is the case for trigonal (calcite), tetragonal and
hexagonal lattices, see figure 5.4.
Let us see how an anisotropy in the optical properties can arise. In
figure 5.5 we show a 3-dimensional toy model of an electron bound to a
crystal lattice.
The spring constants for the bound electron motion in the (x,y) plane
are equal, but different from the spring constant in the z-direction6 . We
call the z-direction the optic axis as it possesses the largest amount
of symmetry in the system. According to our model for the index of
refraction from chapter 2, the index will be different for an E-field aligned
along the optic axis compared to a field aligned perpendicular to the
optic axis. Let us call the index in the parallel direction for nk = ne
and the index in the perpendicular case n⊥ = no . The subscripts ”e”
and ”o” stand for extra-ordinary and ordinary for reasons that will
become clear later. Typically ∆n ≡ ne − no is of the order of 0.2.
6 This obviously means, that the displacement of the electron due to an applied
electric field is in general not in the direction of the applied field.
Figure 5.5 Illustration of a medium where the index of refraction

depends on the orientation of the E-field. Along the optic axis (z)
the electron is bound with a spring constant corresponding to a res-
onance frequency ω1 and in the (x,y) plane with a spring constant
corresponding to a resonance frequency ω0 . According to chapter 2
the index of refraction for Ek -field is different from E⊥ - field.
Table 5.1 Birefringent crystals.

a
Element Composition no ne
Quartz SiO2 1.5442 1.5533

Calcite CaCaO3 1.6585 1.4864
Rutile TiO2 2.616 2.903
ADP NH4 H2 PO4 1.53 1.48
KDP KH2 PO4 1.51 1.47
Lithiumniobat LiNbO3 2.296 2.208
Leadmolybdate PbMoO4 2.386 2.262
a Index of refraction at 589 nm for some bire-

fringent elements.
Example 5.9 For calcite (CaCO3 ) one measures n0 = 1.658 and ne =

1.486. So ∆n = −0.172 and calcite is called a negative uniaxial crystal.
The propagation speed in the crystal now depends strongly on the
Figure 5.6 A point source embedded in a negative uniaxial crystal.

Since ne < no wavefronts with polarization parallel to the optical axis
will move faster compared to wavefronts with polarization perpendic-
ular to the optical axis (dots on the circle). Wave fronts belonging to
Ek will trace out an ellipsoid while the E⊥ wave fronts will constitute
a sphere. At the semi-minor axis both polarizations are perpendicular
to the optic axis.
orientation of the E-field with respect to the optic axis:
c c
vk = =
nk ne
c c
v⊥ = =
n⊥ no
In figure 5.6 we imagine a point source embedded in a negative uniaxial
crystal. Wavefronts belonging to Ek , i.e., E-fields parallel to the optic
axis will, move faster compared to wavefronts with E-fields perpendicular
to the optic axis (dots on the circle). This happens as ne < no . At the
semi-minor axis both polarizations are perpendicular to the optic axis
and they move with the same speed v⊥ .
Example 5.10 Propagation of light through a uniaxial crystal.
Let the optic axis be oriented at 45 degrees with respect to incoming
wave vector, but in the plane of the paper. Consider first case (A) in
figure 5.7. Using Huygens principle we can construct secondary wave-
fronts to establish the next wavefront. As we observed above the speed
Figure 5.7 Propagation of light through a uniaxial crystal such as

calcite. In panel (A) the polarization is orthogonal to the optic axis,
while in panel (B) there is a polarization component along the optic
axis.
of propagation is the same in all directions in the paper plane, when the
electric field is perpendicular to the optic axis. Our secondary wavelets
are consequently circles. The wavefront will propagate straight through
as we have seen it with isotropic media. This case is named ordinary for
that reason.
Case (B) is different. Now the electric field has a component parallel to
the optic axis and a component perpendicular to it. Secondary wavelets
will be ellipses shaped and the beam according to Huygens principle will
get a kink upwards – the ray bends! The angle between the k vector and
wave propagation direction is about 6 degrees for calcite, so the effect
is actually quite small. Figure 5.8 provides a ”zoom in” on the physics
of case (B) summarizing the different fields involved. Notice that the
E-field is no longer oriented perpendicular to the wave vector, but still
perpendicular to the Poynting vector S. However, the D vector is still
Figure 5.8 Extra-ordinary wave propagating though a uniaxial crys-

tal. The D field is perpendicular to the k vector and no longer parallel
to the E vector. The energy flow is along the Poynting vector S and
E · S = 0. For calcite the angle α between k and S is only 6 degrees.
perpendicular to the k vector. This is in sharp contrast to the isotropic

case where S k k. For anisotropic media we have:
Anisotropic material generally: (5.8)

S not parallel to k
This we show in the next section using Maxwell’s equations.
5.1.4 Maxwell’s equations for an anisotropic medium

Recall Maxwell’s equations for an insulator. We just recast them into a
form useful for our purpose.
Anisotropic Insulator
∇·D = 0 (5.9)
∇·B = 0 (5.10)
∂B
∇×E = − (5.11)
∂t
∂D
∇×H = , (5.12)
∂t
The material we investigate is anisotropic and the material equation

P = ε0 χE must be modified. We will still assume a linear response,
but now an electric field along some direction can induce polarization
also in the other directions. This happens as the spring constants in
(x,y,z)-directions may be different in anisotropic materials7 . For a proper
description we introduce a tensor:
    
Px χxx χxy χxz Ex
 Py  =  χyx χyy χyz   Ey  = χ · E
Pz χzx χzy χzz Ez
From an energy argument it is possible to show that the polarizability
tensor χ is symmetric, i.e., χi,j = χj,i . Then it is possible to diagonalize
it, i.e. to find a coordinate system in which the tensor is represented by
a diagonal matrix.
 
χxx
χ= χyy .
χzz
For a uniaxial crystal we can choose χxx = χyy 6= χzz and the displace-
ment field D will be given by:
    
Dx ε⊥ Ex
 Dy  =  ε⊥   Ey 
Dz εk Ez
So D and E are not parallel.
Uniaxial crystal: (5.13)
Di = εi Ei
ε1 = ε2 = ε⊥
ε3 = εk
We can transform ε⊥ and εk to no and ne respectively, by using the

n2 = εr µr . Remember we assume µr ∼ 1. Later in this section we are
7 Only in two directions for a uniaxial crystal. For a biaxial crystal all three
directions will be different.
going to evaluate the indices, but first we want to work out what we can
say about the angles between the various field and wave vectors.
For the present analysis we assume a monochromatic plane wave as
usual. Let E = E0 ei(k·r−ωt) , all other fields will be of the same form.
From Gauss’ law we have
∇ · D = ik · D = 0
and
∇ · B = ik · B = 0
From Amperes law we obtain:
∇ × H = ik × H = −iωD
Specifically, now we show that E ⊥ H. We simply use Faraday’s law

(obviously):
∂B
∇×E=− , (5.14)
∂t
which with our assumed plane wave solution reduces to:
ik × E = iωB, (5.15)
so
1 1
H= B= k × E. (5.16)
µ µω
To show the orthogonality we prove E · H = 0 by inserting the above
1 1
E·H= E · (k × E) = − k · (E × E) = 0. (5.17)
µω µω
In the last step we used:
A · (B × C) = −B · (A × C). (5.18)
So we can conclude:
Figure 5.9 Vector relations in a uniaxial crystal. Two important re-

lations D = − ω1 k × H and S = E × H.
D, H and k vectors: (5.19)
k·D=0
k·H=0
1
D=− k×H
ω
In a uniaxial crystal the D, H and k vectors form a right handed or-

thogonal triad. From the definition of the Poynting vector we know that
S = E × H, so this triple forms another right handed system. The two
triads share the H vector. In figure 5.9 we show graphically or findings.
As shown in problem sessions we can find exact solutions for wave
propagation in uniaxial crystals. From Faraday’s law we obtain:
∂(∇ × B)
∇ × (∇ × E) = − .
∂t
Performing the derivatives gives
∂2E
k × (k × E) = µε = −µεω 2 E,
∂t2
where we used B = µH. Notice here ε is the tensor from equation (5.13).
Using c2 = 1/ε0 µ0 we find
µε ω 2
k × (k × E) = − E
ε0 µ0 c2
and finally
ω2
k × (k × E) = εt E
c2
with
n2o
 
εt =  n2o .
n2e
Now we use our favorite theorem A × (B × C) = (A · C)B − (A · B)C

and obtain:
Field equation for uniaxial crystals (5.20)
ω2
(k · E)k − k 2 E + εt E = 0.
c2
This gives a system of 3 coupled (ugly) equations in Ex , Ey , Ez which

has a non-trivial solution if the determinant is zero, leading to:
Dispersion relation for uniaxial crystals (5.21)
! !
kx2 ky2 k2 ω2 kx2 ky2 k2 ω2
2 + 2 + z2 − 2 · 2
+ 2
+ z2 − 2 = 0.
n0 n0 n0 c ne ne n0 c
Setting the first or the second factor in parentheses to zero we find two
independent classes of solutions for the k-vector. Fixing the frequency,
we see that for the first term k solutions lie on a sphere. This is the or-
dinary wave solution, in good agreement with the picture from Huygens
principle. For the second term k solutions form the surface of an ellipsoid
as we change direction. This represents the extraordinary solution.
Problem 5.11 Let the angle between the k vector and the optical axis
be 45 degrees. Show that the angle α between k and S, shown in figure
5.8, is given by:
1 n2 + n2o
cos(α) = √ · p e
2 n4e + n4o
and that α = 6.2 degrees for calcite. Is this the maximal α?
5.1.5 Group velocity vg in anisotropic materials

So far, we have looked at plane wave type solutions. In order to see how
a beam of light, like the beam from a laser pointer, propagates through
a crystal, we form a wave packet in directional space. This means we
look at a superposition of plane waves, all with the same frequency
but with slightly different directions of k centered around some mean
direction. We will show that the group velocity vg = ∂ω/∂k is parallel
to the Poynting vector S = E × H. The derivation is a bit lengthy
and mathematical, but the result is simple and important. If you get
lost in trying to follow all the steps, do not despair and jump to the
end result for the group velocity. Fasten seat belts, here we start! First,
using Faraday’s and Ampere’s law we have
ωD = −k × H (5.22)
ωB = +k × E, (5.23)
so
ω(B × D) = −ω(D × B) = −B × (k × H) (5.24)
= −((B · H)k − (B · k)H) (5.25)
= −(B · H)k, (5.26)

since k ⊥ B, as we saw in equation (5.14). We use the product rule

A × (B × C) = (A · C)B − (A · B)C again, now with a triple product
containing the electric field:
ω(D × B) = D × (k × E) (5.27)
= ((D · E)k − (D · k)H) (5.28)
= (D · E)k (5.29)
where we used D ⊥ k, as stated in equation (5.14). From this we see

D · E = B · H and that we also can express the wave vector k as:
D×B
k= ω. (5.30)
D·E
For the final steps we look for an expression of the form ω = A · k where
A is some vector. This will allow us to read of directly the group velocity
as vg = ∂ω/∂k = A. We can write

D×B E×H
· = (5.31)
D·E D·E
(D · E)(B · H) − (B · E)(D · H)
=
(D · E)(D · E)
= 1,
where we have used the algebraic vector identity (U × V) · (W × X) =

(U · W)(V · X) − (V · W)(U · X). To get an expression of the desired
form we insert the above way of writing k to get:

E×H
ω= · k. (5.32)
D·E
From this we read off the group velocity as:
∂ω E×H S
vg = = = . (5.33)
∂k D·E D·E
This result tells us that a light beam, modeled as a wave packet of plane
waves with different directions, moves along the direction of the Poynting
vector. This is, of course, also true for light beams in isotropic materials.
Problem 5.12 Show that E = −vg × B and H = +vg × D
5.2 Production and manipulation of polarized light

In scientific and technological applications the production and manip-
ulation of polarized light plays an important role. In this section we
describe some of the most common ways to generate linearly and cir-
cularly polarized light. We have seen already in previous chapters, that
we can generate linearly polarized light by scattering or by reflection at
Brewster’s angle. Both methods work, but they are not terribly efficient,
so people invented smarter ways by making good use of the properties
of birefringent materials.
5.2.1 Polarizers
The job of a polarizer is to separate two orthogonal polarization compo-
nents. This can be achieved by either splitting the path, e.g. as shown in
fig.5.7, or by selectively absorbing one of the polarization components,
as in the polaroid sheet polarizers you use in the lab experiments. The
quality of a linear polarizer is characterized by it’s extinction ratio
Rex , the ratio of the intensity transmitted when two ideal polarizers are
crossed at θ = π/2 to the intensity transmitted if the two polarizers are
parallel θ = 0. Typically the extinction ratio of good polarizers is about
10−5 (using quartz prisms) and values of 10−7 or better are possible
(using calcite prisms), however, going significantly beyond 10−7 is very
difficult 8 . Polaroid sheet polarizers have Rex ≈ 5 · 10−3 . How do we
8 To measure an extinction ratio of high quality polarizer (Rex < 10−5 ) is a
challenge. This means detecting very low power levels typically at the nano-watt
range and controlling input polarization to the required level. Eventual
absorption loss must be measured and taken into account as well. In some cases
Figure 5.10 Polarizer based on a Nicol calcite prism. The blue layer
is the glue cementing the prisms together. Here ne < nglue < n0 . The
optic axis is marked with ”OA”.
interpret the extinction ratio physically? Well, for a polarizer with ex-
tinction ratio of say 10−3 the direction of the polarization vector has an
angular uncertainty ∆θ, which can be estimated according to Malus law.
The law describes the transmitted intensity through an ideal polarizer
as a function of the angle θ between the incoming polarization and the
transmission axis of the polarizer: I = I0 cos2 (θ).
I π
Rex = = 10−3 = cos2 ( − ∆θ) = sin2 (θ) ≈ ∆θ2 (5.34)
I0 2
so
√
∆θ = 10−3 rad ' 1.8 degrees. (5.35)
The best performing polarizers are constructed from birefringent prisms,

taking advantage of the fact that one beam with a certain polarization
undergoes total internal reflection, yet allowing the orthogonal polar-
ization state to be transmitted through. This can be achieved as the
medium has different indices of refraction according to the polarization
state of the incoming light. One example is the so-called Nicol prism,
see figure 5.10 The Nicol prism consists really of two prisms glued to-
gether such that ne < nglue < n0 . As the incident angle is rather large
the ordinary beam will undergo total internal reflection, while the extra
ordinary beam sees a smaller index and will be transmitted, essentially
unaffected9 . Beautiful! The length to width ratio of the prism is typically
3:1 allowing an angular acceptance of ± 10 degrees. Most glues absorb
the detector is slightly sensitive to a polarized signal and this must be accounted
for as well.
9 Unaffected in direction. The intensity is reduced a little due to reflection losses,
but these are small since the difference in refractive indices between calcite and
the used glue, historically Canada balsam, is small.
Figure 5.11 Polarizers based on Glan prisms. Notice, the transmitted

beam is not displaced. The optic axis orientation is marked with
”OA”.
light around 250 nm which limits range of wavelengths where the Nicol
prism may be used. Also, glue reduces the amount of power that can be
used. An air spaced version of the Nicol prism, rather that using glue,
the so-called Foucault prism can be used at considerably higher power
levels10 .
A major drawback of the Nicol and Foucault prisms is the small but
notable parallel displacement of the output beam. Many applications
rely on the ability to rotate the polarizer without displacing the output
beam. The most common polarizing prisms used in modern optics belong
to the Glan prism “family”: Glan-Thompson, Glan-Taylor and Glan-
Foucault prism. These prisms are shown in figure 5.11. They all have the
advantage of a non displaced transmitted beam. The air gap versions
have furthermore the advantage of allowing high powers, but at the
expense of being more sensitive to alignment. Normally, the extinction
ratio for these prisms is in the range of 10−5 or better. It is always the
extraordinary ray which has the clean polarization and is used as the
output. Frequently the ordinary ray is sent into an absorbing layer or
caught in a beam dump.
There are, of course, many applications, where it is desirable to keep
both of the output beams. In figure 5.12 we show common prisms for
separating or combining beams. The most common type used is the
polarization beam splitter cube PBS. It relies on thin dielectric coat-
ing layers11 at the interface between the prisms, designed such as to
show near perfect transmission (reflection) for incoming p-polarized (s-
polarized) light12 . Typically, extinction ratios range from about 10−3 to
10−4 .
10 A Foucault prism has a length to width ratio of 1:1 and a reduced angular
acceptance of ± 4 degrees.
11 We will return to thin film coatings, when we discuss interference.
12 Note that this is opposite to most beam splitters based on birefringent materials.
Figure 5.12 Polarization beam splitters and combiners.
Example 5.13 The Wollaston prism
In this example we look at the Wollaston prism shown in one of the

lectures. Its is a special optical component with the ability to separate
in angle orthogonal polarization components. Two prisms of birefringent
material such as calcite are mounted with the two prism bases facing
each other. The light ray enters the first prism the optic axis is parallel
to the plane of the paper as indicated in figure 5.13. We have normal
incidence, so no refraction takes place here. However, when the light ray
enters prism number two the optic axis is turned 90 degrees and is now
pointing out of the paper plane. Let us compute the angle β between
the two rays having orthogonal polarization states leaving the Wollaston
prism. Let the prism wedge angle be given by α. The refraction angle
belonging to the polarization direction out of the paper is denoted by
θ and the one belonging to the in-plane polarization we denote by φ.
At the prism interface, the exit angle for the two polarizations states θ1
and φ1 with respect to the interface normal are obtained from Snell’s
law (θ1 = φ1 = α):
no sin(θ1 ) = no sin(α) = ne sin(θ2 ) (5.36)
ne sin(φ1 ) = ne sin(α) = no sin(φ2 ) (5.37)
This gives us the refracted beam angles θ2 , φ2 with respect to the

normal indicated on fig.5.13. The angle with respect to the normal of
the backside of the prism is then given by:
θ3 = θ2 − α (5.38)
φ3 = φ2 − α. (5.39)
Now we may compute the exit angle of the prism backside to be (assum-
Figure 5.13 Wollaston prim. The light emerges the prism configura-
tion as two orthogonally polarized beams separated by an angle β.
The prism wedge angle is α. Refraction angles belonging to the polar-
ization state pointing out of the paper are denoted by θ the orthogo-
nal ones as φ. At the prism interface incident angles are θ1 = φ1 = α,
while the refracted angles are θ2 , φ2 and so forth.
ing air on the outside):

ne sin(θ3 ) = sin(θ2 ) (5.40)
no sin(φ3 ) = sin(φ4 ). (5.41)

So the angle β is given by:
β = arcsin(ne sin(θ3 )) − arcsin(no sin(φ3 )) (5.42)
For a calcite prism with a wedge angle of α = 15 degrees we obtain
an angle between the two beams inside the Wollaston prism to be 3.37
degrees and the angle β = 5.28 degrees. As an application example we
mention that in the first CD ROM drives the Wollaston prism played
an important role, see figure 5.14. You should trace the path of the
laser beam in the illustration to see how the prism in combination with
a waveplate is used to deflect the beam reflected off the pits of the
optical disc onto the detector. Today the Wollaston prism and the related
Nomarski prism are also used for Differential Interference contrast (DIC)
microscopes where detects optical path length gradients in specimens by
converting them into intensity differences with high contrast.
5.2.2 Wave plates and retarders

Wave plates and retarders are used to generate and manipulate polar-
ized light. They have the ability to convert linearly polarized light into
elliptically polarized light and vice versa. In addition, they can rotate
the direction of linearly polarized light, just to mention a few examples.
Figure 5.14 The optical bench of the read head in an early CD ROM
drive.
In many cases one seeks to convert linearly polarized light into circu-
larly polarized light or reversed, converting circularly polarized light
into linearly polarized light. A retarder aimed at rotating the direction
of linearly polarized light is called a lambda half plate λ/2. In figure 5.15
we show how to generate such a plate using a thin slab of birefringent
material, say uniaxial material, with the optic axis orthogonal to the
propagation direction of light through the plate.
When light, in this case linearly polarized light, hits the plate we can
decompose the polarization into the components along the optic axis x̂
of the crystal and perpendicular to it ŷ.
The component polarized along the x̂-direction (optic axis) advances
faster compared to the ŷ component, since no > ne for calcite. Thus one
component will be phase shifted with respect to the other component.
The input electric field can be written as:
Ein = (Ex x̂ + Ey ŷ)ei(kz−ωt) . (5.43)
The electric field inside the crystal is given as:
Ecrystal = Ex ei(ke z−ωt) x̂ + Ey ei(ko z−ωt) ŷ, (5.44)

Figure 5.15 Retardation plates with optic axis along x and incident
polarization at 45 degrees to the optic axis. Left: λ/2-plate, the red
wave is shifted half a wavelength with respect to the blue wave, thus
flipping the E-vector at the output. Right:λ/4-plate, the red wave is
shifted by a quarter wavelength with respect to the blue wave, thus
producing circularly polarized light at the output.
with the notation:

2π
ko = no , (5.45)
λ
2π
ke = ne , (5.46)
λ
where λ is the vacuum wavelength. After the wave has traveled a distance
d in the material the accumulated phase difference becomes:
2π
δ= | no − ne | d. (5.47)
λ
If the thickness d of the material is chosen such δ = ±π/2 light will exit
the plate as circularly polarized. This plate we name quarter-wave
plate or λ/4-plate. On the other hand if the thickness d of the material
is chosen such δ = ±π light will exit the plate as linearly polarized, but
with the polarization direction flipped around the optic axis, see fig.5.15.
This plate we name half-wave plate or λ/2-plate. The formalism of
the Stokes vector and the Poincare sphere can be used to describe the
polarization transformation performed by arbitrary wave plates. The
action of a general wave plate with retardation angle δ is described by
a rotation of the whole Poincare sphere by the angle δ around an axis
lying in the equatorial plane. The location of the rotation axis in the
equatorial plane is given by twice the angle of the optic axis with the
horizontal.
Problem 5.14 Show graphically using the Poincare sphere, that circu-
larly polarized light entering a quarter-wave plate is transformed into
linearly polarized light and determine the polarization direction with
respect to the optic axis.
5.2.3 Optical activity, magneto-optical effect and

Faraday-rotators
So far, we have considered materials which show linear birefringence.
In these materials the eigen-polarizations are linear polarizations. There
are also materials showing circular birefringence, i.e. the refractive index
for left and right circularly polarized light is different. These materials
are called optically active. On the Poincare sphere a plate of circular
birefringent material performs a rotation of the Stokes vector around the
S3 -axis by an angle given by the optical path length difference for the
two circular components. You should convince yourself that this means
linear incident polarization is transformed to linear output polarization
with a rotated orientation.
Optical activity can arise from a chiral symmetry axis of the crystalline
structure, but also in amorphous materials like liquids or glasses, when
the constituent molecules have a chiral structure, i.e. are shaped similar
to left-handed or right-handed screws. This effect can be used to measure
the prevalence of one or the other stereoisomer of a chemical compound.
This is important in chemistry and e.g. in the food industry to measure in
a quick and reliable way the concentration of different sugars (dextrose,
fructose) in sweeteners.
In pure optics applications a more important effect is the magneto-
optic effect. Many materials show a slight change of the refractive index
for left and right handed circularly polarized light when exposed to a
magnetic field. The change in refractive indices is proportional to the
strength of the magnetic field and changes sign with the direction of the
magnetic field. Consider a rod of material inside a strong magnetic field
aligned with the rod axis. Linearly polarized light propagating along the
axis will be rotated in polarization. The important difference to natural
optical activity is here, that as you invert the propagation direction
the sense of polarization rotation will be the same. Looking naı̈vely at
Maxwell’s equations, we should expect that reversing the optical path
should be the same as reversing time – so what is going on here? The

trick is, of course, that we should also invert the direction of the applied
magnetic field13 , in order to have everything running backwards, but
this we don’t do by just using a mirror to retro-reflect the beam. The
magneto-optical effect, in this geometry also called the Faraday effect,
turns out to be enormously useful14 . We can build a diode for light, an
optical isolator, with the help of a Faraday rotator, that allows linear
polarized light to propagate (nearly) without suffering losses through
the diode in one direction, while light of any polarization propagating
in the backwards direction is absorbed or deflected from the input beam
path. This is quite neat for optical amplifiers in powerful laser systems,
where you want to be sure not to destroy your source by running the
amplifier accidentally backwards.
Problem 5.15 Consider a compound optical device consisting of an in-
put linear polarizer, a Faraday rotator with rotation angle 45 degrees,
followed by another (output) polarizer rotated at 45 degrees with re-
spect to the input polarizer. Work out the polarization transformations
for both the forward and backward direction and verify that the device
works as an optical isolator.
13 We have to invert the direction of all currents for complete time reversal, also the
currents in the source of the applied magnetic field.
14 Incidentally, much of the research done in the Quantop group at NBI uses the
Faraday effect in atomic vapors.
6
Interference of light
In this chapter we will study the interference of electromagnetic waves.

This is perhaps the most pronounced effect of waves: the ability to, well,
add up. At some points in space the light intensity increases, and at
other points light plus light gives nothing! This has some significant
applications in modern optical industry and technology where highly
reflecting mirrors and anti-reflection surfaces are important.
Many beautiful colour phenomena in nature have their origin in inter-
ference1 . Thin film interference gives rise to spectacular colour effects: oil
or turpentine spots on the street, soap-bubbles, the feathers of peacocks
and some humming birds, and butterfly wings are all common examples
of thin film interference.
The famous double slit experiment by Thomas Young from 1801 is
one of the most fundamental experiments in physics. It had far reach-
ing impact on our understanding of nature. The same experiment per-
formed with matter waves of electrons, neutrons, atoms, and even large
molecules played a major role in the development of quantum physics
and its interpretation. In the early days of quantum physics P.A.M.
Dirac wrote in his famous book “The principles of quantum mechanics”
on page 9: “Each photon then interferes only with itself. Interference
between two different photons never occurs”. This is sometimes referred
to as Dirac’s law for photon interference. Today, after almost a century
with major technological advances we know that this is not entirely true,
as one can observe high-contrast interference patterns by superposing
the light from two well stabilized but otherwise completely independent
lasers. Nevertheless, Dirac’s dictum helps to sharpen our ideas about
the term photon. In modern quantum optics the concept of interference
1 In contrast to the rainbow which is due to refraction and the wavelength
dependent index of refraction.
is generalized to include the interference of probability amplitudes for

experiment outcomes. For example, the interference of two identical pho-
tons at a beam splitter leads to a genuine quantum optical interference
effect, that shows up in the correlation of photodetection events at the
beam splitter output ports. This was shown by Hong, Ou, and Mandel
Phys. Rev. Lett. 59, 20442046 (1987) and the effect is used nowadays as
a quality check for single-photon sources.
6.1 Coherence of light fields

Coherence of optical fields plays an important role for understanding
how optical fields can interfere in general. To cohere means ”to stick
together”. For an EM field coherence describes our ability to predict the
amplitude and phase of waves in time and space. The ideal monochro-
matic plane wave is fully coherent: at any given point in time and space
we can predict the frequency, amplitude, and phase with certainty and
thus have a complete description of the wave. Most real sources of elec-
tromagnetic waves do not deliver waves of this kind. Similar to the con-
cept of partially polarized light, we have to allow for statistical mixtures
of waves with inherent statistical fluctuations in their properties. Mix-
tures of waves with fluctuating frequency can be characterized by a co-
herence time τc , the time interval where we can consider the field to have
monochromatic wavelike character, i.e., known stable frequency, ampli-
tude and phase. Already if we consider a pulsed or chopped wave train,
it has several frequency components ∆ν that can be related to the co-
herence time by τc ∆ν ∼ 1. This follows from the Fourier representation2
of the wave. Assume we have a finite wave pulse given by

 E0 for − τ /2 ≤ t ≤ +τ /2
E(t) = (6.1)

0 else
The Fourier transform of this field is the sinc function, so the intensity
2 The Fourier transform of a function f is given by:
Z +∞
fˆ(ω) = F (f (t)) = f (t)e−iωt dt
−∞
1
Z +∞
f (t) = fˆ(ω)e+iωt dω
2π −∞
154 Interference of light
Figure 6.1 Fourier bandwidth of time chopped source. The relation

between time and frequency bandwidth is τc ∆ν ∼ 1.
I(ω) becomes:
ωτ
I(ω) = E02 τ 2 sinc2 ( ). (6.2)
2
In figure 6.1 we show a graphical representation of this. One way to

model a partially coherent wave field, works by stitching rectangular
pulses together and allowing for random phase jumps at the beginning
of each pulse.
Coherence time τ and source bandwidth ∆ν: (6.3)
τc ∆ν ∼ 1
Coherence length:
lc = cτc
Given the coherence time we can also define a (longitudinal) coherence

length by multiplying with the speed of light. For a monochromatic wave
τc is infinite and the spectral bandwidth is described by a delta function
in frequency.
Problem 6.1 Show that the chopped cosine wave train


 E0 cos(ω0 t) for − τ /2 ≤ t ≤ +τ /2
E(t) = (6.4)

0 else
the coherence time is related to the bandwidth as ∆ν · τ = 0.88.
In a real sources such as a gas discharge lamp there will be violent
collisions taking place all the time. These, mostly elastic collisions, will
perturb the atoms (considered here as monochromatic sources of light)
and randomly change the phase of the waves. Typically, the yellow light
from a sodium lamp has a frequency bandwidth of a few GHz. Other
gas discharge lamps can have other bandwidths. Some sources can have
bandwidths in the nm range (single color LED’s). Incandescent thermal
light sources provide light with even broader bandwidth and very short
coherence lengths, like the sun or ordinary bulbs used for illumination.
The very best laser sources have bandwidths in the 50 mHz range.
Problem 6.2 In the above estimate for the coherence time of light from
a sodium lamp we have considered only collisions as the mechanism for
random phase changes. We have neglected the finite natural lifetime of
the atomic excited state. This lifetime is about 16 ns for the relevant
energy level in sodium. Is it reasonable to neglect this contribution?
Experiments to demonstrate interference must take source bandwidth
and coherence length into account. Typical optical path length differ-
ences in the experiment must be kept below the coherence length of
light from the source. At longer path length difference interference ef-
fects will wash out and may not be observed.
Besides the longitudinal coherence length, the transverse coherence
length needs to be considered as well in interference experiments. All
real light sources have a finite size of the emitting surface and typically
the light emitted from different surface patches of the source will be
statistically independent. This means, if we try to superpose the waves
from different surface patches to see interference, we will most likely fail.
In the next chapter, we will see how diffraction limits our ability to know
the exact location of a source – at first sight paradoxically, this limitation
can be used to restore interference with light from real sources and even
put to good use in astronomy to measure the angular distance between
close stars with a Michelson stellar interferometer. We will come back to
this in the chapter on diffraction, but for now we want to look at general
properties of interference.
6.1.1 Interference - general considerations

Assume two monochromatic fields (hence with stable amplitudes) that
meet at a point P far from the two sources. Generally, we can write up
the total (real) electric field as:
Etot (r, t) = E1 (r, t) + E2 (r, t)

= E01 cos(k1 · r + ϕ1 ) + E02 cos(k2 · r + ϕ2 ). (6.5)
Here, the phases ϕi = ωi t + φi contain the explicit time dependence

and the phase offset at t = 0. Placing a photodetector at position r, the
measurable quantities are intensities. So, we need to look for the time
average of E2tot , which is proportional to intensity. For the squared field
we find:
E2tot (r, t) = E12 + E22 + 2E1 · E2 . (6.6)

Figure 6.2 Interference of two monochromatic point sources.
The intensity 3
becomes (modulo 1/2ε0 c):
hEtot (r, t)2 i = I1 + I2 + 2E01 · E02 hcos(k1 · r + ϕ1 ) cos(k2 · r + ϕ2 )i
= I1 + I2 + E01 · E02 hcos((k1 − k2 ) · r + ϕ1 − ϕ2 ) + cos((k1 + k2 ) · r + ϕ1 + ϕ2 )i
= I1 + I2 + E01 · E02 hcos(δ)i

with the phase δ of the interference term
δ = (k1 − k2 ) · r + ϕ1 − ϕ2 (6.7)
and
I1 = hE12 i , I2 = hE22 i. (6.8)
3 The time average is defined as: hf (t0 )iT = T1 tt+T f (t0 )dt0 . In principle for non
R
monochromatic fields we may have to let T → ∞.
Here, we can draw two important conclusions: First, if the two fields
oscillate at the same frequency the interference phase and hence the
interference pattern is stable in time. For fields with different frequencies
a beat note at the frequency difference is observed. Whether or not this
beat note is actually registered with the photodetector depends on how
fast the photodetector reacts to changing intensity. Secondly, fields that
are in orthogonal polarization states will not produce interference by
virtue of the dot product structure of the interference term.
Problem 6.3 Let the sources S1 and S2 in figure 6.2 be randomly po-
larized. Would you be able to see interference?
Assume the two fields to have the same frequency and to be in the
same polarization state. Then we can write the intensity as:
p
I = I1 + I2 + 2 I1 I2 cos(δ). (6.9)
Varying the phase δ the intensity takes the maximum value:
p
Imax = I1 + I2 + 2 I1 I2 , δ = m · 2π , m = 0, ±1, ±2, . . . (6.10)
and the minimum value:
p
Imin = I1 + I2 − 2 I1 I2 , δ = (2m + 1) · π , m = 0, ±1, ±2, . . .(6.11)
Notice, at destructive interference the field intensity is at a minimum

but not necessarily at zero.
Example 6.4 If I1 = I2 = I0 equation (6.9) becomes:
δ
I = I0 + I0 + 2I0 cos(δ) = 4I0 cos2 ( ). (6.12)
2
6.2 Young’s experiment 159
Constructive interference: (6.13)
δ = m · 2π , m = 0, ±1, ±2, . . .
Destructive interference:
δ = (2m + 1) · π , m = 0, ±1, ±2, . . .
6.2 Young’s experiment

In 1801 Thomas Young devised a clever experiment to decide whether
light is actually a wave or a stream of particles, a matter of great debate
in those days. He did not have access to modern coherent light sources
we have today, so he had use a trick. His light sources had a very small
coherence length, but in addition were also spatially incoherent, i.e. the
transverse coherence length was extremely short. By allowing only light
waves from the source which passed through a small hole to enter the
double slit setup he could select a spatially coherent part of the wave
field for his experiment. Looking at fig.6.3 you can think of waves to
the left of screen 1 to come from all directions and to be out of phase.
After the small hole in the screen (point source) we have by diffraction
spherical waves with a suitable coherence length. It is important to have
the diffraction pattern from the hole covering both openings in screen 2
to be able to see interference fringes in the plane of observation4 . The
separation between the two holes (point sources) in screen 2 we call d and
the plane of observation is placed at a distance L away from our point
sources. If L d we can assume the angle θ of rays with the normal
to screen 2 to be the same for both k1 and k2 . To simplify matters we
neglect all effects of finite hole size, i.e. we assume ideal “point sources”.
Later in the chapter on diffraction we will use a more realistic model.
As seen from geometry in figure 6.3, the wave from point source 2 will
lag the one from point source 1 by an amount of d sin(θ). So,
4 If you should omit the first hole in the experiment your source must be spatially
coherent over a distance longer than d. A laser source will provide that, but also
some low pressure gas lamps such as sodium lamps. Young did not have access to
those.
Figure 6.3 In 1801 Thomas Young carried out an double slit exper-
iment to decide whether light was waves or particles. A small hole
pre-selects a spatially coherent part of the light. The spherical waves
hit two holes. The interference pattern is observed on a screen placed
a distance L d away. The intensity on the screen is I = 4I0 cos2 ( 2δ ).
2π
δ = (k1 − k2 ) · r + ϕ1 − ϕ2 = d sin(θ). (6.14)
λ
Since we assume the waves from the source to be in phase at the openings
in screen 2, we have ϕ1 = ϕ2 . Increasing now the observation angle θ,
every time source 1 is ahead of source 2 by an additional full wavelength
λ, the phase will increase by 2π. This means, constructive interference
will take place at observation angles:
λ
sin(θ) = m · , m = 0, ±1, ±2, . . . (6.15)
d
see figure 6.4. The intensity distribution becomes:

6.3 Thin films 161
Figure 6.4 Intensity distribution in Young’s interference experiment.

Notice the location of maxima and minima depends on the wave-
length, except at sin(θ) = 0.
δ
I = 4I0 cos2 ( ). (6.16)
2
Remark A remark to the above equation. As the waves from the two
openings are assumed spherical, the intensity for both source 1 and 2
scales as 1/R2 where R is the distance. That means the interference
cannot be complete, except at the center of the observation plane, where
waves from both source have traveled the same distance. Everywhere else
in the observation plane the distances to the two sources are different.
In figure 6.4 we show the intensity distribution as a function of sin(θ),
as it could be observed in the laboratory with monochromatic light.
Problem 6.5 Express the intensity I measured on the plane of obser-

vation in terms of coordinates: y along the plane of observation and L.
Assume the incoming wave on screen 2 to be a plane wave.
6.3 Thin films

Oil on a wet road, soap bubbles, butterfly wings some bird feathers
produce impressive colour effects. They are all a result of interference
in thin films. By thin films we understand transparent material layers
with a thickness comparable to the incoming wavelength, i.e., a few 100
nm for waves in the optical domain. Assume a geometry as shown in
Figure 6.5 Thin film of thickness d. The small electric field arrows
indicate when the field changes phase by π. This happens only at the
first interface as n1 < n2 .
figure 6.5, a freestanding oil film in air illuminated by light at normal

incidence.
Suppose the film is composed of oil with refractive index n1 = 1.5 and
the surrounding medium is air with n1 = 1. From Fresnel’s equations5
we can obtain the amount of reflected and transmitted power and the
phase shift for each interface. For the field amplitudes we have:

n1 − n2
rA = = −0.2,
n1 + n2
so we have a ±π shift at surface A and only 4% reflected power. For

surface B:

n2 − n1
rB = = +0.2.
n1 + n2
At surface B we have no phase shift and a fraction of 0.22 · 0.96 =

0.0384 power reflected. That is about 3.8% of the initial power. The
total phase shift between the reflected beam from surface A and the ray
going through the film and returning back from surface B is then given
by:
Thin film phase shift: (6.17)
4πd 4πdn2
δ= +π = +π
λoil λ0
5 We assume s-polarization here to make the sign conventions less confusing.

6.3 Thin films 163
λ0 δ cos2 (δ/2)
400 nm 3π 0
500 nm 2.6π 0.35
650 nm 2.3π 0.9
Table 6.1 Power reflected from a thin film with thickness d = 133 nm
at various wavelengths.
since we travel twice the distance d and have one phase shift of π. Ob-
viously, we cannot have complete destructive interference in this case,
as 3.7 % cannot match the 4 % reflected from the top surface, but the
interference contrast is still very high.
Problem 6.6 Above we only took one reflection on the oil air interface
into account which is a very good approximation, why? For incident
plane waves there will naturally be an infinite number of reflections
occurring. To get a more correct expression for the amount of reflected
power, sum up the amplitudes for all beam pathes and give a general
formula for the reflected power, assuming n1 < n2 .
Problem 6.7 Assume d to be small. What thickness is required for

having destructive interference (m = 1) for λ0 = 400 nm, 500 nm and
600 nm?
Example 6.8 Assume a thin film of thickness 133 nm. We shine white
light on the film. What colour will the film appear to have?
As we can see from table 6.1 the intensity reflected at 650 nm is signif-
icant. So the film appears red! At viewing angles different from normal
incidence the film will take other colours since d effectively changes.
One important application of thin films is anti-reflection coatings min-
imizing the amount of reflected light from a refracting surface. Those
reflections create unwanted, shifted and blurry images. In addition, you
lose power in an optical system. In optics experiments reflections pro-
duce ghosting (multiple interference patterns) which ultimately sets a
limit on the performance of your set-up. Here is one way to avoid that.
Consider a film with refractive index n2 deposited on a glass surface, as
shown in figure 6.6.
Let the index of the glass substrate be n3 > n2 > n1 . In that case the
Figure 6.6 Anti reflection coating of a glass surface.
the reflection phases from surface 1 and surface 2 are the same and we
find
4πtn2 4πt
δ= = .
λ0 λf ilm
If we choose the film thickness to be t = λf ilm /4, we have δ = π and the
field components reflected from surface 1 and 2 cancel to first order.
Example 6.9 Let n1 = 1 and n2 = 1.35 n3 = 1.8. We find

2 2
n1 − n2 n2 − n3
I1 = I0 , and I2 = I0 .
n1 + n2 n2 + n3
so the reflected intensity becomes I = I1 − I2 = 7.6 · 10−2 %. With no
anti reflection coating we get I ' 8%
Problem 6.10 Show that for given n3 > n2 > n1 the best choice of
√
coating n3 = n1 n2 results in a zero reflection to first order.
6.4 Interferometers and their applications

We will discuss three types of interferometers, the Michelson-Morley
interferometer, the MachZehnder interferometer, and finally the Sagnac
interferometer. In figures 6.7, 6.8 and 6.9 we show the amplitude splitting
interferometers6 .
6 Interferometers based on semitransparent mirrors or beam splitters are called
amplitude splitting interferometers. Another type based on beam dividers, as e.g.
the Fresnel biprism, is referred to as wavefront splitting interferometers. The
Figure 6.7 Michelson-Morley interferometer. The beam splitter is

coated on one side to give 50/50 reflection, while the backside is
AR-coated to give no reflection.
In the Michelson-Morley interferometer light from a source is split by

a coated beam splitter 50/50 (50% transmitted and 50% reflected). Two
mirrors reflect the beams back towards the 50/50 beam splitter where
they recombine in output port A to produce interference. If a perfect
plane wave is injected into port B of the interferometer and the mirrors
at the end of the two arms L1 and L2 are aligned for perfect retro-
reflection the interference phase will be constant over the cross section
of the beam exiting port A. This means the brightness at the output will
be constant over the cross section and changing from complete darkness
to the incident intensity as the path length difference is varied by moving
one end mirror or by changing the refractive index in one of the arms.
In practice, there is often a slight misalignment of the retro-reflecting
mirrors, which leads to a characteristic stripe pattern (fringes) in the
output with the width of the fringes decreasing as the tilt between the
two interfering wavefronts increases. If, even for perfect alignment, the
curvature of wavefronts after passage through arms L1 and L2 is differ-
ent, a pattern of concentric rings appears at the output.
We can write up the phase shift due to reflection from beam splitter
and mirror surfaces as
double slit experimental setup can be considered a rudimentary wavefront

splitting interferometer.
• Air → glass π
• Air → mirror π
• Glass → air 0.
If we now assume the reflecting surface of the beam splitter to face
L1, the interference phase in output A becomes:
δA = 2k(L1 − L2 ) + 2π − π = 2k(L1 − L2 ) + π,
as we neglect the thickness of the beam splitter. The 2π phase shift in the
middle expression is associated to the two air-glass interface reflections
for path L1.
The Michelson interferometer is typically used in connection with
• Characterization of optical flats (parallelism of surfaces)

• Thickness of films or thin elements
• Precise length measurements using wavelength calibrated lasers
In fundamental physics it is used in connection with
• Gravitational wave detectors such as LIGO and Virgo

• To check whether space has a preferred direction (Historically: the
aether theory, present: quantum vacuum)
In the case of perfect alignment, what happens to the light when there
is destructive interference and no output in the detector arm A? Well, the
beams traveling in the arms are sent back by the beam splitter towards
the source. This can be quite a nuisance, because not all light sources
like it to be confronted with their past output, but fortunately there are
ways to avoid it. Let us make a sanity check of the results by looking
at energy conservation. For the interference phase of light reflected back
into port B we have:
δB = 2k(L1 − L2 ) + 3π − π = 2k(L1 − L2 ) + 2π = δA + π,
Now, energy conservation demands that the total output power equals
the input power:
I0 I0
Itot = IA + IB = 4 cos2 (δA /2) + 4 cos2 (δB /2) = I0 ,
4 4
which works out nicely as expected. Notice, we are only using Maxwell’s
equations and conservation of energy comes out naturally.
The Mach-Zehnder interferometer, shown in Fig.6.8 is similar to the
Figure 6.8 Mach-Zehnder interferometer.
Michelson interferometer, but does not share the nasty property of re-
flecting one output beam back into the source. This allows to monitor
both output ports. By taking sum and difference of the output signals in
ports A and B, we can double the signal and reject the influence of input
intensity fluctuations. For these reasons the Mach-Zehnder configuration
is more popular than the Michelson interferometer in many applications.
Note, that in a Mach-Zehnder interferometer one can also use an unbal-
anced input beam splitter, R1 /T1 together with a 50/50 beam splitter
at the output, if one has a delicate sample in arm L1 which limits the
amount of power one can send through this path. This is often used in
spectroscopy experiments, where the signal-to-noise ratio needs to be
optimized by using enough power at the detectors, while at the same
time keeping a delicate sample “alive”.
The Sagnac interferometer differs essentially from the two previous
configurations. Light is injected into the interferometer and split 50/50
to go clockwise and anti-clockwise around the interferometer. Since the
optical path length is, at least at first sight, exactly the same for both
directions by construction, it appears pointless to use it as a measure-
ment instrument. A path length difference can only occur due to effects
which differentiate between the two path orientations. One such effect
is a global rotation of the whole setup. If the whole interferometer is
rotating with angular frequency Ω there will be a phase difference be-
tween the to directions when overlapped on detector D. The phase can
be calculated by considering the round trip times. We assume a circular
path for simplicity, around a circle with circumference 2πR and with
area πR2 . We take into account that the beam splitter moves while light
Figure 6.9 Sagnac interferometer. The whole setup, including the

input source and the detector, rotates at angular speed Ω.
is traveling around the loop. For the clockwise round trip time we have
2πR
t2 =
c − ΩR
and for the anti-clockwise sense we get
2πR
t1 = .
c + ΩR
Thus there will be an effective length difference traveled between the
two directions:
4AΩ
∆L = c(t2 − t1 ) '
c
and a corresponding phase change:
8πAΩ
∆φ = .
cλ
By measuring this phase change you can determine the rotation speed
of your frame7 ! Michelson and Gale where the first to measure the ab-
solute rotation speed of the Earth in this way in 1925, performed on the
prairies west of Chicago. They used an rectangular optical loop 2/5 mile
7 The derivation above for the phase shift leads to the correct result, but is a little
questionable in the light of special and general relativity. If you want to learn
more, take a look at the articles published on the course homepage.
long and 1/5 mile wide evacuated to reduce absorption and index vari-
ations from the air. Theory predicted a shift 236/1000 of a fringe while
they measured 230/1000 of a fringe. Today with stable laser systems we
can measure below 10−8 rad/s and readily observe changes in the earth
rotation rate of 1-2 ms per 24 h.
As you can see the phase difference is proportional to the area of the
interferometer. With a laser and up to 5000 m of optical fibre curled up
in a ring we have a powerful tool that can be used as sensor of rotation.
Today this type of Sagnac interferometer is commonly used in all high
performance satellites. The considerably smaller and self calibrating ring
laser gyroscopes also use the Sagnac effect to measure rate of rotation
and are frequently used in aviation.
7
Diffraction of light
Unlike particles waves have the ability to bend around corners. This
phenomenon we call diffraction. So far we have entirely disregarded
this effect. We have assumed the limit of λ → 0 in this respect. In optics,
diffraction effects become pronounced when the wavelength of light is of
similar size as the characteristic length of an object illuminated by the
light. The sharp edges of any object in the way of a wave always give
rise to diffraction.
As waves bend around and redirect, interference will take place. So
the distinction of when light is self-interfering or diffracting is rather
diffuse. Later in this chapter you will see that Young’s experiment, we
studied last chapter can be handled equally good with the theory of
diffraction. We will start this chapter with studying the interference
from multiple sources. This has not only important applications, but it
will lead naturally to the more general problem of diffraction.
The historical Huygens-Fresnel principle states that the propagation
of a wavefront in space can be seen as the result of interference of light
emitted from closely spaced virtual point sources sitting on the wavefront
in all places where the wavefront is not obstructed by opaque obstacles.
In fact, this principle has been put later by Kirchhoff and Sommerfeld
onto a more solid theoretical foundation with their scalar diffraction the-
ory1 . We will not study this theory in any detail, but rather focus on
two limiting cases, Fresnel and Fraunhofer diffraction, where the latter
has important implications for the resolving power of imaging instru-
ments. It should be mentioned, that with the computing power we have
at our hands today it is possible to solve electromagnetic wave propa-
1 If you want to learn more, chapters 10-12 and Appendix 2 in Hecht “Optics”
contain an accessible introduction into scalar diffraction theory and many
application examples.
Figure 7.1 Interference of N coherent identical sources in phase. The

point P of observation is assumed to be far away compared to d, the
distance between the sources.
gation problems numerically to any desired accuracy starting from the

wave equation, material properties, and boundary conditions. Neverthe-
less, it is important to have a good understanding of limiting cases, like
the Fraunhofer diffraction regime, in order to be able to do physically
motivated sanity checks on the software written to solve the task at
hand.
7.1 Interference of N sources - the grating

We start by looking at the interference pattern from N coherent identical
sources, e.g. little dipoles, placed on a regular grid with spacing d, as
shown in fig.7.1. For simplicity, we assume first that all source dipoles
oscillate at the same frequency and in phase. This could be achieved by
172 Diffraction of light
illuminating, and hence exciting the array of dipoles with a plane wave
from the left. The point P of observation2 is assumed to be far away
compared to d, the distance between the sources. From our discussion of
Young’s experiment in the last chapter we know that the phase difference
of two waves arriving at the observation point from two adjacent sources
is given by the expression:
2π
δ= d sin(θ).
λ
With this in mind we can write up the total electric field at point of
observation P:
Ep = E0 (r)ei(kr1 −ωt) + E0 (r)ei(kr2 −ωt) + . . . + E0 (r)ei(krN −ωt) .
Now, since δ = k(r2 − r1 ), 2δ = k(r3 − r1 ) and so forth we have
Ep = E0 (r)e−iωt eikr1 (1 + eiδ + ei2δ + . . . + ei(N −1)δ ).
This is just a geometric series which sums up to be
eiδN − 1
Ep = E0 (r)e−iωt eikr1 (7.1)
eiδ − 1

−iωt ikr1 i(N − 1)δ sin(N δ/2)
= E0 (r)e e exp .
2 sin(δ/2)
The observed intensity distribution becomes:
Grating formula N sources: (7.2)
!2
sin( N2δ )
I(δ) = I0
sin( 2δ )
with
2π
δ= d sin(θ)
λ
2 Strictly speaking, we consider a certain direction of observation specified by the

angle θ. In practice, with the help of a lens we can focus all parallel rays depicted
in Fig.7.1 onto a single point on an observation screen placed in the focal plane
of the lens.
The reference intensity I0 describes the brightness we would detect, if

only a single source was present. The intensity at δ = 0, for example in
the direction normal to the line of emitters, is given by:
I(δ = 0) = I0 N 2 .
We see that for large values of N the intensity into directions for which
δ = 0 becomes enormously higher, because there we have constructive
interference of all N sources. This happens in all directions satisfying
d
m= sin(θm ),
λ
where m is a whole number specifying the diffraction order of the grat-
ing. Inspecting this expression, we observe that for a spacing d > λ we
get solutions other than the trivial one at θ0 = 0 and that the corre-
sponding angles depend on the wavelength. This suggests immediately
an application – if we drive the emitters in phase and measure those
principal diffraction angles we can determine the wavelength of light, if
we know the spacing between the scatterers. Looking at the expression
in equation (7.2) we can also see that in the vicinity of the diffrac-
tion orders, we can approximate the sine function in the denominator
by its argument (Taylor expansion). This means around the diffraction
orders the intensity distribution follows an oscillating sinc2 (N δ/2) func-
tion – the first side maximum is less than 5% of the principal maximum
in height, so the diffraction orders really stick out. As we increase the
number of emitters the width of the main peak decreases inversely pro-
portional to N . Summarizing, we have:
Properties of the grating formula: (7.3)
I(0) = I0 N 2
λ
Peak FWHM ∆θ '
Nd
Example 7.1 N = 2 gives I = 4I0 cos2 (δ/2) in agreement with the

results of chapter 6.
Since the peak value of the emitted intensity scales as N 2 while the
width of the peak becomes narrower with 1/N as the number of emitter is
increased, the total emitted power will increase proportional to N . This
is what we should expect for dipoles radiating independently. Returning

to the application of a wavelength measurement device, we want to figure
out now how small a wavelength change we can resolve with a given N -
grating. To do this we look at the condition for a diffraction maximum
2π
m2π = sin(θ)d
λ
interpret λ as a function of θ and take the derivative
m∆λ = d · cos(θ)∆θ.
We demand that the change in diffraction angle should be as large as

the width (FWHM) of the peak from equation (7.3) and find
Grating resolution: (7.4)
∆λ cos(θ)
= .
λ m·N
This states the higher the diffraction order and the higher number of
sources the better is the resolution3 .
7.2 The modified Huygens-Fresnel construction

To make the connection between our simple summation of the fields
emitted by several point sources and the Huygens-Fresnel idea, we have
to let the spacing between the emitters go to zero and their number go
to infinity, while keeping the overall strength of the source finite.
Let us see how this works out by considering a monochromatic spher-
ical wave emitted by a source S and arriving at an observation point P .
It is, of course, a trivial matter to write down immediately the field at
point P given the spherical wave. But we want to take a more compli-
cated route, by drawing a spherical wavefront around the source point
S and using now the whole spherical surface as the source of secondary
waves, which should conspire to produce by interference exactly the right
amplitude at the observation point P .
3 For high resolution we need ∆λ to be small. You can see that it also helps if the
diffraction angle is close to π/2.
Figure 7.2 Propagation of a spherical wave from source point S to

observation point P . In the Huygens-Fresnel construction secondary
sources are placed on a wavefront of radius ρ. The wavefront is divided
into ring-shaped zones of equal distance r to the observation point.
(Figure adapted from E.Hecht “Optics”.)
This construction will allow us also to calculate what happens, when

we obstruct parts of the wavefront or introduce phase shifts to parts of
the wavefront.
Let ρ be the radius of the sphere around the source point S and r0 be
the distance between the surface of the sphere and the observation point
P , which lies outside the sphere, as illustrated in Fig.7.2. All points on
the sphere have, of course, the same distance to the source point, so
all virtual sources on the sphere will oscillate in phase. We cover the
surface of the sphere with ring shaped infinitesimal surface elements dS
characterized by the polar angle ϕ in a spherical coordinate system with
the north pole (ϕ = 0) along the line connecting source and observation
point. All points on the same surface element have the same distance r to
the observation point, hence waves from virtual sources in those points
arrive in phase at the observation point. The surface normal vector at
any point of a single surface element makes an angle θn with the line
connecting to the observation point. As we will see shortly, this means
waves from all points of a ring shaped surface element will contribute
with the same strength to the field at P . We realize, that the secondary
waves from the spherical surface cannot be simple spherical waves. If this
was the case, then the secondary waves from all points on the sphere
would interfere constructively in the backward direction towards the
source point. But this is not what happens in practice, so we need to
modify the angular dependence for the secondary sources such that the
backward propagating wave is avoided. This is achieved by multiplying
spherical waves with an inclination factor K(θn ) = 1/2(1 + cos(θn )),
which describes the emission pattern of the virtual sources. We can now
write the resulting field dEP at the observation point as:
K(θn )E0
dEP = exp(ik(ρ + r) − iπ/2)dS.
ρrλ
Here, E0 is the strength of the primary source and we have omitted
the time dependence, since we deal with a monochromatic source. The
peculiar extra phase shift of π/2 has to be added to keep the end result
after integrating over the spherical surface the same as the expression
one gets from just writing the primary spherical wave amplitude at the
observation point. We now want to interpret the expression. We see,
that as we scan the sphere starting from the north pole (ϕ = 0), the
distance r to the observation point grows and the inclination angle θ
slowly increases. Since r enters the optical path length in the complex
exponential function, the phase angle varies as we move away from the
north pole. It makes sense to divide the surface of the sphere into zones
of similar distance to the observation point within half a wavelength. All
surface elements within a zone will contribute with the same sign to the
resultant field at P , while contributions of adjacent zones tend to cancel
each other. We call the zones Fresnel zones. A more in depth analysis
shows4 , that when we integrate over the whole sphere, the resulting field
is to a very good approximation half as big as the contribution from the
first polar zone.
So far, this has just been a complicated way to describe the prop-
agation of a spherical wave between to points, but can we make any
practical use of the zone construction? A first surprising result is that
we can quadruple the intensity at the observation point by blocking all
the light apart from the first polar Fresnel zone with an opaque screen.
But we can achieve even more – if we number the zones starting from the
polar region and block all even numbered zones, we observe the on-axis
intensity grows quadratically with the number of zones. Optical elements
based on this idea are called zone plates and are used as focusing ele-
ments in wavelength regions where transparent materials are not readily
4 See, e.g. chapter 10.3 in Hecht.
Figure 7.3 Cut through the intensity distribution behind a circular

aperture of radius a. The horizontal axis scale is the inverse of the
Fresnel number, i.e. proportional to the observation point distance.
The vertical axis is scaled with the aperture radius. Dashed lines
indicate the zeroes of the far-field diffraction pattern. (Figure adapted
from E.Hecht “Optics”.)
available. Instead of blocking the light from every other zone, we can
also phase delay the light from every other zone by half a wavelength
by using transparent layers of suitable thickness. This is the basic idea
behind Fresnel lenses, which have been used for centuries in lighthouses
and nowadays can be found in more mundane applications as magni-
fiers on mobile phone display screens. Diffractive optical elements can
save a lot of space and material and play an important role in modern
integrated optics systems.
In general, we can now judge at least qualitatively what happens with
the intensity at the observation point, if we place a screen with a centered
hole of variable diameter between the primary source and the observa-
tion point. As we gradually decrease the diameter of the opening or
Figure 7.4 The Fresnel number F = a2 /Rλ determines which diffrac-

tion regime your set-up is in. For Fraunhofer diffraction observation
point and/or source point must be distant enough from the diffract-
ing aperture to fulfill F 1. The Fresnel diffraction regime occurs
when F ∼ 1, or equivalently, when several Fresnel zones are visible
in the aperture.
equivalently move the observation point further away from the screen,
the intensity at the observation point will oscillate. The intensity varies
between almost zero, when an even number of Fresnel zones is transmit-
ted, and some maximal values, whenever an odd number of Fresnel zones
covers the opening. When the opening becomes so small that only the
first Fresnel zone is transmitted, the intensity will vary smoothly from
four times the unobstructed intensity to zero. This behavior is shown in
Fig.7.3.
To set up a proper definition for the Fresnel number and to distin-
guish the different diffraction regimes for the case of a plane screen, we
consider an obstacle or small hole of diameter 2a, as shown in Fig.7.4,
between source and observation point. We ask to what extent the spher-
ical wavefronts from a given source or to a given observation point can
be approximated as planar wavefronts. When wavefronts may be con-
sidered as plane we speak of Fraunhofer diffraction and if they are
curved we speak of Fresnel diffraction.
Let us estimate the distances when this happens. There is a distance
R to the opening, so ∆, the deviation of the plane wavefront from the
curved wavefront becomes:

r
p a2 a2
∆ = R − R2 − a2 = R − R 1− '
R2 2R
For the wavefront to be approximated as planar we must have ∆ λ
so
Fraunhofer regime: (7.5)
a2
R
λ
Fresnel regime:
a2
R∼
λ
Fresnel number:
a2
F =
Rλ
The Fresnel number essentially counts how many Fresnel zones are
“visible” in the opening. We see that the Fraunhofer regime is charac-
terized by a Fresnel number much smaller than one, corresponding to
the case where a fraction of the first Fresnel zone already completely
covers the hole. This is the most common diffraction regime and we will
focus on the Fraunhofer regime in the following.
7.3 Fraunhofer diffraction

In figure 7.5 we show the case where a wave field hits a screen with a hole
or some other sort of opening. We call this aperture S with characteristic
length a (e.g. radius for a circular opening). At some distance R a2 /λ
the wavefront appears plane and we are in the correct regime for talk-
ing about Fraunhofer diffraction. Using Huygens principle we consider
spherical wavelets
A
Ψ = ei(kr−ωt)
r
Figure 7.5 Fraunhofer diffraction geometry. Both the illumination

source (not shown) and the detector plane (shown) are placed at
distances fulfilling Rs/o a2 /λ from the diffracting screen.
emerging from the aperture5 . Now it is our job to sum those spherical
wavelets up at some point of observation. The phase and field inten-
sity are constant across the aperture. The field contribution from the
differential surface element dS becomes:
EA i(kr−ωt)
dE = e dS,
r
where EA is a source strength with units electric field per unit length. Co-
ordinates (y,z) denoted with small letters belong to the aperture screen
while capital coordinates (X, Y, Z) are observation coordinates. Let us
5 Note that we neglect the inclination factor, since all angles in the problem can be
considered small.
look at the length of r:
1/2
r = X 2 + (Y − y)2 + (Z − z)2

(7.6)
1/2
(X 2 + Y 2 + Z 2 ) (y 2 + z 2 )

(yY + zZ)
=R + −2
R2 R2 R2
(yY + zZ)
'R−
R
Notice the phase is linear in the screen coordinates (y, z) in this approx-
imation. This is a characteristic of Fraunhofer diffraction. We can now
sum up our field contributions over S:
Fraunhofer diffraction formula: (7.7)
EA ei(kR−ωt)
Z
E= e−ik(yY +zZ)/R dS
R
Here we substituted r by R in the prefactor, but in the phase factor of the

integrand kr this is not possible as the wavelength is small compared to
any difference between r and R, i.e. the integrand is a rapidly oscillating
function.
We can also write equation (7.7) as
EA ei(kR−ωt)
Z
E= e−ik(yY +zZ)/R dS (7.8)
R
EA ei(kR−ωt)
Z
= A(y, z)e−ik(yY +zZ)/R dS,
R
where

 1 for (y, z) ∈ S
A(y, z) = (7.9)

0 else
is the aperture function describing the transmissivity of the aperture
screen. We see, that we can also handle a semi-transparent aperture,
Figure 7.6 Fraunhofer diffraction by a single slit of width a and length

l λ.
where a (y, z) position dependent absorption6 takes place. The most

important thing to notice: the Fraunhofer diffraction pattern is “just”
the Fourier transform7 of the aperture function A(y, z).
Fraunhofer diffraction is the Fourier (7.10)

transform of the aperture function A(y, z)
If the aperture under consideration is asymmetric with one dimension

huge compared to the wavelength8 we only need to integrate over the
“narrow” variable where diffraction takes place.
Example 7.2 Single elongated slit

See figure 7.6 for the geometry. As l λ we need only to integrate one
6 With a complex aperture function we can even take position dependent phase
shifts into account.
7 With our choice of signs it is actually the inverse Fourier transform, but this
plays no role as soon as we look at intensities.
8 This means the Fresnel number for this direction is much bigger than one.
variable. The electric field becomes:
Z a/2
E=ψ eikzZ/R dz (7.11)
−a/2
sin( kZa
2R )
=ψ·a· kZa
2R
where we named
EL ei(kR−ωt)
ψ= .
R
The source strength EL has now dimensions of an electric field, since we

only integrate over one coordinate. If we put sin(θ) = Z/R the intensity
can be written
Fraunhofer diffraction single slit: (7.12)

2
sin(β)
I = I0
β
with
β = ka sin(θ)/2
In figure 7.8 we plot the sinc expression as a function of sin(θ) for the
single slit.
Example 7.3 Double elongated slit

See figure 7.6 for the geometry. Again as l λ we need only to integrate
Figure 7.7 Fraunhofer intensity profile for a single slit.
one variable. The electric field becomes:
Z −d+a Z d+a
−ikzZ/R
E = ψ( e dz + e−ikzZ/R dz) (7.13)
−d−a d−a
2 h −iξ(a−d) i
= e − eiξ(a+d) + e−iξ(a+d) − eiξ(a−d) (7.14)
2iξ
−2
= [sin(ξ(a − d)) + sin(ξ(a + d))] (7.15)
ξ
sin(ξa)
= −ψ · 4a · cos(ξd) (7.16)
ξa
with ξ = kZ/R. In the last step we used
A+B A−B
sin(A) + sin(B) = 2 sin( ) cos( )
2 2
The intensity profile becomes:

Figure 7.8 Fraunhofer diffraction for a double slit with spacing d.

Each slit has a width a and the length l λ.
Fraunhofer diffraction double slit: (7.17)

2
sin(α)
I = I0 cos2 (β)
α
with
α = ka sin(θ)
and
β = kd sin(θ)
How does this compare to Young’s experiment described in chapter

6? If we transform our phase variable β = kd sin(θ) in to δ defined
in equation (6.14) we find (remember we have distance of 2d between
the sources) cos2 (β) = cos2 (δ/2). This expression is in agreement with
the intensity profile of Young’s double slit experiment. In addition to
the cosine representing interference of two point sources we have a sinc
Figure 7.9 Intensity distribution (blue curve) in Fraunhofer diffrac-

tion of a double slit. Notice how the intensity is the product of
Young’s interference pattern (black curve) and the single slit diffrac-
tion result (red curve).
function describing diffraction from a single slit. So, the ”real” dou-
ble slit interference pattern is the product of the single slit
diffraction pattern with the double point source interference
term (Young’s pattern).
In figure 7.9 we plot the intensity distribution as a function of sin(θ)
for the single slit.
Example 7.4 Delta function slit
Let us analyze what happens if you diffract light from a slit where the
diameter a → 0. Our aperture function is thus a delta function9 δ(z).
Our diffraction integral
Z +∞
E=ψ δ(z)eikzZ/R dz = constant (7.19)
−∞
simply gives a constant. There will be no modulated pattern in agree-

9 Integrals with delta functions are particularly simple. Let δ(x − x0 ) be a delta
function at x0 . The the integral:
Z +∞
f (x)δ(x − x0 )dx = f (x0 ) (7.18)
−∞
.
ment with the single slit expression in the limit a → 0. See figure 7.8,
when a → 0 the zero point moves towards infinity and we get a constant
diffraction pattern. Obviously, the light intensity will be distributed over
the entire z-axis so you may argue that the intensity also go to zero.
Example 7.5 Delta function double slit

If we have two delta function in play such that
A(y) = δ(y − y0 ) + δ(y + y0 ) (7.20)
were we define d so d = 2y0 our integral becomes:
Z +∞
E=ψ (δ(y + y0 ) + δ(y − y0 ))eikyY /R dy (7.21)
−∞
e+iξy0 + e−iξy0

= 2ψ = 2ψ cos(ξy0 ) (7.22)
2
with our definition of d = 2y0 the intensity becomes:
kd sin(θ) δ
I = 4I0 cos2 ( ) = 4I0 cos2 ( )
2 2
This is Young’s result! In chapter 6 we did assume our source to be
pointlike. Here we can learn: Interference is naturally taken into
account in our diffraction formalism.
Problem 7.6 Imagine a Young’s type of experiment using, not two, but
four point sources, two placed at the y-axis and two placed at the z-axis.
The aperture function is given by:
A(y, z) = δ(y − y0 ) + δ(y + y0 ) + δ(z − z0 ) + δ(z + z0 ).
Find the intensity distribution at the plane of observation. Note if we

would have done this in chapter 6 we had to work hard for it!
Example 7.7 Rectangular opening.

Figure 7.10 shows a rectangular hole with dimensions a and b, both of
the order of the wavelength λ. The diffraction integral is carried out in
Figure 7.10 Geometry for Fraunhofer diffraction by a rectangular

opening with dimensions a and b. Both a and b are comparable to
the wavelength.
the same way as above:
Z +b/2 Z +a/2
E = ψ( eiξy y dy eiξz z dz) (7.23)
−b/2 −a/2

sin(α) sin(β)
=a·b·ψ·
α β
and the intensity becomes:

Figure 7.11 Intensity profile for diffraction by a square opening. Y

and Z axis can be scaled to the rectangular case.
Fraunhofer diffraction rectangular aperture: (7.24)

2 2
sin(α) sin(β)
I = I0
α β
with
kaZ
α=
2R
and
kbY
β=
2R
Example 7.8 Circular aperture.

In figure 7.12 we show the geometry for diffraction from a circular hole
with diameter D = 2a. For calculations we use polar coordinates in the
Figure 7.12 Fraunhofer diffraction by a circular hole with diameter

D = 2a.
aperture and detector plane:

y sin(ϕ)
=ρ
z cos(ϕ)
and

Y sin(Φ)
=q
Z cos(Φ)
With the differential area element dS = ρdρdϕ we find:

Figure 7.13 Intensity profile for Fraunhofer diffraction from a circular

hole. Notice here the first zero is given by 1.22λ/D.
Z a Z 2π
E = ψ( eiξρ cos(ϕ−Φ) ρdρdϕ) (7.25)
0 0
Z a
= 2πψ · J0 (ξρ)ρdρ
0

J1 (ξa)
= 2πa2 ψ · .
ξa
Here, Jn is an n’th order Bessel function of the first kind. Using the
usual expression sin(θ) = q/R we find the intensity
Fraunhofer diffraction circular aperture: (7.26)

2
2J1 (α)
I = I0
α
with
α = ka sin(θ)
Important to notice that the zero’s are controlled by the Bessel func-
tion. This means the first zero, when plotted against sin(θ) is located at
1.22λ/D next at 2.23λ/D and so forth, see figure 7.13.
7.4 Babinet’s principle

In this section we examine an absolutely beautiful principle called Babi-
net’s principle. It states that the diffraction pattern from an opaque
body is identical to that from a hole of the same size and shape except
for the overall forward beam intensity. We begin with a slit in one di-
mension. Extending the results to the plane is left as an exercise for the
reader. As illustrated in figure 7.14, we have two cases. In case (A) we
have an opening given by the aperture function A(z):

 1 for z ∈ S
A(z) =

0 else
The Fraunhofer diffraction pattern, as stated in the above, is given by:
EA ei(−ωt+kR)
Z Z
E= A(y, z)eik(yY +zZ)/R dS (7.27)
R
Z
= Ψ A(z)eiαz dz (7.28)
= ΨF(A) (7.29)
where α = kZ/R. Here the initial integral runs over all two dimensional
space R × R, however, we reduced A(y, z) to A(z) since we are only
considering a slit in one dimension, i.e., the final integral only runs over
R meaning from −∞ to +∞. With F we designate the Fourier trans-
form and Ψ is just a complex constant scaling the diffraction pattern
intensity, but not influencing the shape of the diffraction pattern. Now
the aperture function of an obstacle rather than a slit, corresponding to
the ”negative” of case (A), is given by:

 0 for z ∈ S
Ã(z) =

1 else
7.4 Babinet’s principle 193
and the diffraction pattern of this structure is given by E 0 = Ψ0 F(Ã).

Here follows the trick: the sum of A(z) and Ã(z) is constant
A(z) + Ã(z) = 1. (7.30)
Let us take the Fourier transform on both sides,
F(A) + F(Ã) = δ(z) (7.31)
where δ(z) = F(1) is the Dirac delta function. For values z 6= 0 we have
a connection between the two diffraction patters:
|F(A)|2 = |F(Ã)|2 (7.32)
and thus we conclude the shape of the two diffraction patterns are the
same! The intensity at various points may be different as the constants Ψ
and Ψ0 are different, but the shape is the same. Any physical conclusion
one draws from the shape, such as location of maxima and minima for
example, will be the same for case (A) and (B). This is a really powerful
and beautiful principle indeed.
Screen
(A) (B)
1 1
0 0
Figure 7.14 Case(A): Diffraction from a structure punched out of an

opaque screen. The aperture function in this case is given by A(z).
Case (B): The negative of (A) where an opaque obstacle with exactly
the same dimensions as as the structure in (A) is illuminated with
an plane wave. Here the aperture function is given by Ã(z).
Figure 7.15 Rayleigh’s angular resolution criterion: Two distant ob-

jects 1 and 2 can be resolved in an optical system with opening di-
ameter D if their angular separation is bigger than the minimum
resolvable angle ∆θmin = 1.22λ/D.
7.5 Rayleigh’s criterion for angular resolution

In any optical instrument there will be some kind of aperture. The abil-
ity to resolve two objects, say for an astronomical telescope two stars,
at a given separation depends on the aperture diameter of the instru-
ment. As diffraction will take place in our instrument, images will blur
and at some point we cannot tell object 1 from object 2. On the ba-
sis of diffraction from a circular aperture Rayleigh formulated a simple
resolution criterion:
Rayleigh angular resolution criterion: (7.33)

1.22λ
∆θmin =
D
The criterion corresponds to the situation where the diffraction maxi-

mum of object one is exactly at the first minimum of diffraction pattern
two.
Example 7.9 Telescopes
Assume a modest telescope of diameter D = 20 cm. At 500 nm the
best angular resolution becomes ∆θ = 0.5 arc seconds10 . For D = 2.4
10 One arc min is 1/60 degree, and one arc second is 1/60 of an arc minute
7.5 Rayleigh’s criterion for angular resolution 195
m you find ∆θ = 1/25 arc seconds. The refractive index fluctuations

in the atmosphere limit the resolution to 1-0.5 arc seconds. Bigger tele-
scopes gather more light, but their resolution can only be improved with
adaptive optics which dynamically corrects for atmospheric fluctuations.
Example 7.10 Human eye
The human eye has an opening diameter in normal daylight of about
D = 2 mm. This gives a resolution in 550 nm of 2.2 · 10−4 rad. You can
easily check the resolution of your eyes by drawing two lines separated
by 1mm on a piece of white paper. Look at the paper from different
distances and judge when you cannot resolve the the lines anymore.
Then calculate the angles. You will end up with a number close to that
calculated above. A fun thing to try: If you do the same experiment
in darkness with a flash lamp to illuminate the paper your resolution
becomes better. Why is that?

A First Course in Optics: Jan W. Thomsen

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A First Course in Optics: Jan W. Thomsen

Uploaded by

Copyright:

Available Formats

A First Course in Optics

The Niels Bohr Institute

2.1 General form of Maxwell’s Equations

Figure 2.1 Portrait of James Clerk Maxwell.

The constants χ and χm express how easy it is to polarize or magnetize

where the quantity P = ε0 χE represents the dielectric polarization of

where M = χm H is the magnetization, i.e., the average magnetic mo-

P (E) = P0 + ε0 χ(1) E + ε0 χ(2) E 2 + ε0 χ(3) E 3 + · · · (2.10)

optically magnetic inactive materials. In this case it is a good approxi-

ε0 = 8.85 · 10−12 F/m (2.12)

µ0 = 4π · 10−7 H/m (2.13)

Problem 2.1 How would you measure ε0 and µ0 in a static experi-

Example 2.2 The frequency dependence of the dielectric con-

and is even frequency dependent. Why is the dielectric “constant” de-

Figure 2.2 Typical behavior of the real and imaginary components of

dominated by an effect called space charge polarization. Here the mobile

2.2 Maxwell’s Equations in vacuum

Maxwell’s equations in vacuum reduce to:

with a wave speed c given as c2 = 1/ε0 µ0 . A similar derivation can be

These two equations constitute electromagnetic wave equations and we

The speed of light in vacuum we find as:

2.2.1 Measurements of the speed of light

Next time you go to Paris pass by L’Observatoire de Paris (close to

Problem 2.4 Time delay in the eclipse of Jupiter’s moon Io.

In 1728 James Bradley (1693-1762), an English astronomer, estimated

Figure 2.3 Rømer’s measurement of the speed of light. The blue

so-called stellar aberration to calculate the speed of light. Stellar aber-

A French physicist, Hippolyte Fizeau (1819-1896), carried out an amaz-

Figure 2.4 Fizeaus measurement of the speed of light in 1848-49. A

supplied by a very frequency-stable laser, is split to follow two paths and

ν = 88.376181627 ± 0.000000050 THz (2.24)

This result was approximately hundred times more precise compared

2.2.2 Solutions to the wave equation

Other types of solutions may be found depending on the geometry con-

where A0 and k are constant vectors. This we also call a monochromatic

We can summarize our important conclusions as:

Traveling plane waves in vacuum

In figure 2.5 we show an electromagnetic wave (EMW) propagating along

Example 2.6 Polarization of scattered sun light.

Sky or clouds Randomly polarized

Figure 2.6 Sun light is randomly polarized. When we observe the

Maxwell’s equations at work in the sky. Alternatively have a look at the

In a more advanced version add 20 grams of sodium thiosulfate to 2

The Human eye is only marginally sensitive to polarization. In the

2.3 Model of electron motion in materials

Florentz = q(E + v × B), (2.42)

E Light Electron Electron

it is a forced damped harmonic oscillator, how about that - doing sim-

ondly, we can disregard the contribution of the B-field in the Lorentz

Alternatively, we could also perform the above analysis using a cos(ωt)

incoming light wave, very important. This has a number of interesting

2.3.1 Rayleigh scattering of light

Problem 2.7 Show the expression (2.48) for Rayleigh scattering.

Example 2.8 Rayleigh Scattering in the sky

It is a normal misconception that water is blue due to the Rayleigh

2.3.2 Energy of an EMW

Figure 2.8 Geometry to calculate the intensity of a traveling EMW

[J/m3 ]) is given by:

The intensity of the traveling wave9 is the amount of energy pass-

Example 2.10 Energy flux of a plane traveling monochromatic

Let us assume a wave traveling in vacuum of the form:

E(r, t) = E0 cos(k · r − ωt) , B(r, t) = B0 cos(k · r − ωt) (2.54)

this leads to a time dependent Poynting vector (constant vectors omit-

in the approximation b |a|.

Problem 2.22 Show equation (2.115) with ω σ/ε.