You are on page 1of 192


selected topics in physics




Assigned by Juan Pablo Fernández Department of Physics

University of Massachusetts Amherst, Massachusetts

selected topics in physics

selected topics in physics


x y

Assigned by Juan Pablo Fernández Department of Physics

University of Massachusetts Amherst, Massachusetts

Selected Topics in Physics: Intelligent Designs © University of Massachusetts, October 2008


Sam Bingham Samuel Boone

sbingham sboone mcervo

Morgan-Elise Cervo Jose Clemente Adam Cohen Robert Deegan Matthew Drake

jaclemen afcohen rdeegan mdrake cpemma sfischet

Christopher Emma Sebastian Fischetti Keith Fratus

kfratus dherbert

Douglas Herbert Paul Hughes

phughes crkerrig akiriako

Christopher Kerrigan

Alexander Kiriakopoulos Collin Lally Amanda Lund
clally alund

Christopher MacLellan Matthew Mirigian Tim Mortsolf



tmortsol anodonne

Andrew O’Donnell David Parker Robert Pierce Richard Rines Daniel Rogers Daniel Schmidt

dparker rpierce rrines drrogers dschmidt jzimmerm

Jonah Zimmerman


CONTENTS preface xi

i Seeing the Light 1 1 morgan-elise cervo: why is the sky blue? 3 1.1 Introduction 3 1.2 Waves 3 1.3 Electromagnetic Waves 3 1.4 Radiation 5 1.5 Blueness of the Sky 7 1.6 Problems 8 2 matthew mirigian: the physics of rainbows 9 2.1 Introduction 9 2.2 The Primary Bow 9 2.3 The Secondary Bow 10 2.4 Dispersion 11 2.5 Problems 12 3 matthew drake: the camera and how it works 13 3.1 Introduction 13 3.2 Lenses 14 3.3 The Camera Itself 15 3.4 Questions 17 3.5 Solutions 17 4 sebastian fischetti: holography: an introduction 19 4.1 Introduction 19 4.2 The Geometric Model 19 4.3 Types of Holograms 21 4.4 Making Holograms 22 4.5 Applications of Holography 23 4.6 Problems 24 5 colin lally: listening for the shape of a drum 27 5.1 Introduction 27 5.2 Eigenvalues 28 5.3 Problems 29 ii Mind Over Matter 31 6 christopher maclellan: glass 33 6.1 Introduction to Glass 33 6.2 Amorphous Solids 33 6.3 The Glass Transition 34 6.4 Simulating Amorphous Materials with Colloids 35 6.5 Practice Questions 37 7 robert pierce: the physics of splashing 39 7.1 Introduction 39 7.2 Pressure, Surface Tension, and other Concepts 39 7.3 Splashing 42 7.4 Summary 43 7.5 Chapter Problems 44 7.6 Multiple Choice Questions 45 8 sam bingham: freak waves 47 8.1 Introduction 47




8.2 Linear Model 48 8.3 Interference 49 8.4 Nonlinear Effects 52 8.5 Conclusion 53 8.6 Problems 54 9 paul hughes: friction 55 9.1 Overview 55 9.2 Amontons/Coulomb Friction 55 9.3 Toward A Conceptual Mesoscopic Model 56 9.4 Summary 58 9.5 Problems 58 10 keith fratus: neutrino oscillations in the standard model 61 10.1 Introduction 61 10.2 A Review of Quantum Theory 61 10.3 The Standard Model 62 10.4 The Weak and Higgs Mechanisms 64 10.5 The Origin of Neutrino Oscillations 67 71 10.6 Implications of the Existence of Neutrino Oscillations 10.7 Problems 73 10.8 Multiple Choice Test Problems 75 iii Information is Power 77 11 andy o’donnell: fast fourier transform 79 11.1 Introduction 79 11.2 Fourier Transform 79 11.3 Discrete Transform 80 11.4 The Fast Fourier Transform 81 11.5 Multiple Choice 83 11.6 Homework Problems 83 12 tim mortsolf: the physics of data storage 85 12.1 Introduction 85 12.2 Bits and Bytes — The Units of Digital Storage 85 12.3 Storage Capacity is Everything 87 12.4 The Physics of a Hard Disk Drive 88 12.5 Summary 94 12.6 Exercises 94 12.7 Multiple Choice Questions 95 13 tim mortsolf: the physics of information theory 97 13.1 Introduction 97 13.2 Information Theory – The Physical Limits of Data 97 13.3 Shannon’s Formula 99 13.4 The Physical Limits of Data Storage 101 13.5 Summary 103 13.6 Exercises 103 13.7 Multiple Choice Questions 105 14 sam boone: analytical investigation of the optimal traffic organization of social insects 107 14.1 Introduction 107 14.2 Ant Colony Optimization 109 14.3 Optimization By Hand 109 14.4 Applying ACO Meta-Heuristic to the Traveling Salesman Problem 110



14.5 Solution to the Traveling Salesman Problem Using an ACO Algorithm 112 15 christopher kerrigan: the physics of social insects 15.1 Introduction 117 15.2 The Electron and the Ant 117 15.3 Insect Current 118 15.4 Insect Diagram 118 15.5 Kirchoff’s Rules (for ants) 119 15.6 Differences 119 15.7 The Real Element 120 15.8 Problems 121 iv What’s Out There 123 16 robert deegan: the discovery of neptune 125 16.1 Introduction 125 16.2 Newton’s Law of Universal Gravitation 125 16.3 Adams and Le Verrier 126 16.4 Perturbation 127 16.5 Methods and Modern Approaches 128 16.6 Practice Questions 129 16.7 Answers to Practice Questions 129 17 alex kiriakopoulos: white dwarfs 131 17.1 Introduction 131 17.2 The Total Energy 132 17.3 Question 133 18 daniel rogers: supernovae and the progenitor theory 135 18.1 Introduction 135 18.2 Creation of Heavy Elements 135 18.3 Dispersal of Heavy Elements 136 18.4 Progenitor Theory 138 18.5 A Mathematical Model 138 18.6 Conclusion 139 18.7 Problems 140 19 david parker: the equivalence principle 143 19.1 Introduction 143 19.2 Weak? Strong? Einstein? 143 19.3 Consequences 144 19.4 Example 144 19.5 Problems 145 20 richard rines: the fifth interaction 147 20.1 Introduction: The ‘Four’ forces 147 20.2 The Beginning: Testing Weak Equivalence 147 20.3 A New Force 148 20.4 The Death of the Force 150 20.5 Problems 151 153 21 douglas herbert: the science of the apocalypse 21.1 Introduction 153 21.2 Asteroid Impact 153 21.3 Errant Black Holes 155 21.4 Flood volcanism 155 21.5 Giant Solar Flares 156 21.6 Viral Epidemic 157 22 amanda lund: extraterrestrial intelligence 159




22.1 Introduction 159 22.2 The Possibility of Life in the Universe 22.3 The Search for Intelligent Life 160 22.4 Will We Find It? And When? 161 22.5 Problems 162 22.6 Multiple Choice Questions 163 22.7 Summary 164 a Bibliography 165 Index 173


Physics is to be regarded not so much as the study of something a priori given, but rather as the development of methods for ordering and surveying human experience. — N. Bohr [59]


his book has been produced as an assignment for Physics 381, Writing in Physics, taught by the Department of Physics of the University of Massachusetts Amherst in the Fall 2008 semester.


instructor Juan Pablo Fernández; 1034 LGRT or teaching assistant Benjamin Ett

class times and office hours Most of this class will be taught long distance. I plan to come to Amherst twice a month; on those days we will meet at the usual class time and hold one-on-one conferences to discuss work in progress. The best dates and times we will agree upon in class. A good fraction of our communication will take place via email. Feel free to email me at any time with any questions, comments, requests for help, etc. There is one exception, though: Any questions or comments about end-of-semester grades must be submitted in hard copy. textbook Nicholas J. Higham, Handbook of Writing for the Mathematical Sciences, Second Edition. Philadelphia, SIAM, 1998. description As a professional physicist you will be expected to communicate with four kinds of audiences, each of which has a direct bearing on your livelihood: Professionals—including you—that work on your field, professionals that work on other fields of physics or science, students of physics or other disciplines, and the general public— i.e., the taxpayer. Most of this communication will be in writing, which in physics includes not just prose but also mathematics and displayed material. In this course you will acquaint yourself with the many different elements that contribute to successful physics writing and will put them to work in different contexts. objectives 1. Articulate concepts, methods, and results of theoretical or experimental physics to other physicists, other scientists, students, and laypeople. A 2. Be confident in the use of L TEX and other public-domain productivity tools for scientists. 3. Appreciate the amount of work that goes into correct, clear writing.




4. Show proper respect to your readers, making sure not to write above their heads nor “write down at them.” 5. Find the right combination of prose, mathematics, tables, and graphics that will help you make your points most clearly and economically. 6. Learn to deal with the limitations of a given medium. By the same token, learn to appreciate (and not abuse) the marvels that technology affords you nowadays. 7. Practice proper attribution when making use of other people’s work. 8. Collaborate with your classmates in the development of written materials. 9. Have a working knowledge of the peer-review system of publication. evaluation The grade for this course will be based on five writing projects assigned in the following order: 1. 2. 3. 4. 5. A journal paper A grant proposal A textbook A science newspaper A final project

In due time I will provide more details about each of the projects and propose a few different topics from which to choose. If you would rather write about something else you must tell me promptly. You will hand in two drafts of each project. The first one must be submitted in “draft” form and will receive extensive feedback. The second draft will be considered final in terms of content and form and will be assessed as such. You will have roughly three weeks to complete each project. The first week you can devote to experimenting with the physics and the technology, the second to producing the first draft, and the third to producing the final draft. At least the first two projects will be peer-reviewed: everybody will (anonymously) evaluate two papers and have theirs evaluated by two classmates. The evaluations will include a suggested grade, usually an integer from 7 to 10, and will themselves be graded. For the third and fourth projects you will have to collaborate with your classmates. The fifth project will be freestyle and may involve media other than paper.


M O R G A N - E L I S E C E RV O : W H Y I S T H E S K Y B L U E ?




ave you ever looked up at the sky and been amazed by its brilliant shade of blue? Or why the sun changes color at sunset? In this chapter we will provide an answer to these questions through the study of electromagnetic waves. We will uncover why sky is blue and not white. We will also investigate sunsets and learn why sunsets in some locations are more beautiful than others.




To understand what is happening up in the sky we first need to review the general properties of waves. A wave is a disturbance of a continuous medium that propagates with a fixed shape at a constant velocity [36]. A familiar wave equation is that of a sinusoidal wave. This wave can be represented by the equation, f (z, t) = A cos[k(z − vt) + δ]. (1.1)

In the above equation the variable A represents the amplitude. The argument of cosine represents the phase of the wave and δ represents a phase constant. We are familiar with the idea that the wavenumber k = 2π/λ. If we know the velocity v of the wave then we can find the period, T (the amount of time it takes for the wave complete a cycle), using the equation, T= 2π . kv (1.2)

Another useful property of waves to know is the frequency of the wave, which represents the number of oscillations that occur per a unit of time. The frequency is f = 1 kv v = = . T 2π λ (1.3)

Frequency can also be solved for in terms of angular frequency. We can find angular frequency by ω = 2π f = kv. (1.4)

Now that we understand the different properties of waves it is easy to rewrite the equation for a sinusoidal wave as f (z, t) = A cos(kz + ωt − δ). (1.5)


electromagnetic waves

Electromagnetic waves are formed when an electric field is combined with a magnetic field. The electric and magnetic fields lie orthogonal



morgan-elise cervo: why is the sky blue?

Figure 1.1: This chart provides wavelengths for the different types of electromagnetic waves [9].

to motion of the electromagnetic wave. Electromagnetic waves are categorized by their wavelength. In Figure 1, we have a chart which gives the wavelengths of familiar electromagnetic waves such as x-rays and microwaves. In order to understand how the equation for electromagnetic waves is derived we will first review Maxwell’s Equations. Gauss’s law describes the distribution of electric charge in the formation of an electric field by the equation,




where ρ represents the charge density and 0 is the electric constant. Gauss’s law, in other words, is a mathematical definition of an electric field. Another equation from Gauss that is useful to know is,

· B = 0.


The above equation states that the divergence in a magnetic field is zero; in other words, there are no monopoles in magnetic fields. The third Maxwell equation is known as Faraday’s induction law and it forms the basis of electrical inductors and transformers [62]. The equation reads,


∂B . ∂t


In other words, the line integral of the electric field around a closed loop is equal to the negative of the induced magnetic field [8]. The final Maxwell equation is a correction to Ampere’s law. It states that in a static electric field, the integral of the magnetic field around a closed loop is proportional to the current flowing through the loop. Rewritten in mathematical terms we have,

× B = µ0 J + µ0


∂E . ∂t


1.4 radiation


Figure 1.2: Notice how from the horizon the color is gradient of color of longest wavelength to shortest.

Now we have the necessary tools to write an equation for an electromagnetic wave. The equations to describe an electromagnetic wave in a vacuum are as follows: ˆ E(r, t) = E0 cos(k · r − ωt + δ)n. (1.10)

1 ˆ ˆ E0 cos(k · r − ωt + δ)(k × n). (1.11) c ˆ where k is the propagation vector, n is the polarization and ω is the frequency. B(r, t) = 1.4 radiation

Now that we know electromagnetic waves exist you might question, where do electromagnetic waves come from? The answer is radiation. When a charge is accelerating and therefore changing currents an electromagnetic wave is produced. Let’s consider the case of a charge being driven across a wire connecting to metal spheres. The equation for the charge with respect to time can be written as q(t) = q0 cos(ωt). (1.12)

The produced electric dipole as a result of the oscillating charge can then be written as, ˆ p(t) = q0 d cos(wt)z, (1.13)

where q0 d = p0 is the maximum value of the dipole moment [36]. The retarded potential of an electromagnetic wave describes the potential for an electromagnetic field of a time-varying current or charge distribution. The retarded potential of the dipole system, for a wave traveling through a vacuum at the speed of light, can be derived using the equation, V (r, t) = 1 4π ρ (r , tr ) dτ , γ (1.14)


where tr represents the retarded time. Using spherical coordinates the equation for the retarding potential can be rewritten as, V (r, Θ, t) = − p0 ω cos Θ ( ) sin[ω (t − r/c)] 4π 0 c r (1.15)


morgan-elise cervo: why is the sky blue?





Figure 1.3: Problem Geometry

To find the potential vector A, we use the equation for the current in the wire (the derivative of the z component of charge with respect to time), ˆ I (t) = −q0 ω sin(ωt)z. Making reference to Figure 3 we derive, A(r, t) = µ0 4π
d/2 − q0 ω sin[ ω ( t − γ/c )] dz. −d/2 γ



To eliminate the integral’s variable of d, we replace it with the value at the center leading to, A(r, Θ, t) = − µ0 p0 ω ˆ sin[ω (t − r/c)]z. 4πr (1.18)

The equation of the electric field can then be derived by plugging obtain values into the equation, E=− V− ∂A ∂t (1.19)

Using the values we obtained for the vector potential and potential we find, E=− µ0 p0 ω 2 sin Θ ˆ ( ) cos[ω (t − r/c)]Θ 4π r (1.20)

To find the magnetic field we use B = B=−

× A.

µ0 p0 ω 2 sin Θ ˆ ( cos[ω (t − r/c)]Φ. 4πc r

So far we have described a wave moving radially outward at a frequency of ω. We have also defined the electric and magnetic fields, which are orthogonal to each other and in the same phase. We now find the intensity of the wave. The energy radiated by an oscillating dipole can be found by the Poynting vector [36], S = 1 ( E × B) µ0 (1.22)

When we use our values for the magnetic and electric field we find, S = µ0 p2 ω 4 sin2 θ 0 ˆ r. 32π 2 c r2 (1.23)

1.5 blueness of the sky


Figure 1.4: Plot of three color cones that shows which wavelengths they best receive [9]

A visual interpretation of the intensity shows that the function takes the shape of a donut with the hole along the axis of the dipole. In other words there is no radiation along this axis. If we integrate S over a sphere of radius r we can find the total power radiated. µ0 p2 ω 4 0 32π 2 c µ0 p2 ω 4 sin2 Θ 2 0 r sin ΘdΘdφ = . 2 12πc r

P =


Notice that the power of radiation is not dependent on the radius, and therefore size, of the sphere. However, the power is highly dependent on the frequency, ω. 1.5 blueness of the sky

If we consider the radiation of the sun’s light to be compatible with our equations for an electromagnetic wave traveling in a vacuum, then we can say that the large dependence of frequency in the power equation produces the blueness of the sky. We know already that white light is composed of several wavelengths; each wavelength represents a different color. Shorter wavelengths, or light waves of high frequencies are more effective in the power equation than light waves of longer lengths. Blue has a shorter wavelength than red for example and consequently the sky appears to be blue to us. example 1.1: why isn’t the sky violet?
We now know that light of greater frequency is radiated most strongly. If we look at Figure 1 we would expect that the sky should be violet instead of blue because violet has the shortest wavelength. The reason that the sky looks blue and not purple is because our eyes are able to see some colors more easily than others. Our eyes have three types of color receptors, or cones. The cones are called blue, green and red; the names of the cones come from the color that the receptor most strongly correlates to. The red receptor best sees red, orange and yellow light. The green receptor, best sees green and yellow light and the blue receptor, blue light. Even though the sky appears blue we know that there is strong presence of violet light because we see blue with red tints. If the violet light were absence the sky would appear blue with a green tint.

So why does the sun turn red when the sun is setting? This is especially puzzling be cause we just explained that red light, with


morgan-elise cervo: why is the sky blue?

the lowest frequency, is the least powerful. When the sun is about to pass over the horizon, the sun’s light has to travel a greater distance through the atmosphere to reach the observer. The particles in the earth’s atmosphere cause more waves of shorter wavelengths to be scattered across this distance. For this reason, the sky is red when the sun is setting. Similarly, sunsets may be more brilliant in coastal areas or very dusty area because there are more particles in the air to scatter light. 1.6 problems

1. Using the equation for combining waves, A3 eiδ3 = A1 eiδ1 + A2 eiδ2 determine A3 and δ3 in terms of A1 , A2 , δ1 , and δ2 [36].
Solution: If we use the fact that eiΘ = cos Θ + i sin Θ we can get that A3 (cos δ3 + i sin δ3 ) = A2 (cos δ2 + i sin δ2 ) + A1 (cos δ1 + i sin δ1 ). Now by separating the real and imaginary parts, we have two equations A3 cos δ3 = A1 cos δ1 + A − 2 cos δ2 A3 i sin δ3 = A1 sin δ1 + A − 2 sin δ2 Dividing the second equation by the first and taking the inverse tangent,
A sin δ3 = tan−1 ( A11 cos δ1 + A−2 sin δ22 δ1 + A−2 cos δ

A3 is found by squaring and adding the two previous equations to get, A3 = A2 + A2 + 2A1 A2 cos(δ1 + δ2 ) 2 1

2. Check that the retarded potentials of an oscillating dipole ((1.15) and (1.18)) satisfy the Lorentz gauge condition. Solution: The Lorentz Gauge condition is · = −µ0 0 ∂V . Where A is the ∂t potential due to the current flowing through a loop of wire, A=
−µ0 pω 4πr

ˆ sin(ω (t − r/c))z.


·A= ·A = =
µ0 p0 ω 4π 0

∂Aφ 1 ∂ 2 1 1 (r Ar ) + r sin Θ ∂∂Θ(sin ΘAΘ ) + r sin Θ ∂φ r2 ∂r µ0 p0 ω 4π 1 ∂ 1 2 r sin(ωt − ωr ) c r2 ∂r r

cos Θ − cos Θ

ωr c

cos(ωt −

ωr c

) cos Θ −

2 sin Θ cos Θ r2 sin Θ

sin(ωt −

ωr c )

sin(ωt− ωr ) c r2 p0 ω 4π 0 1 r2


ωr c

cos(ωt −

ωr c )

= µ0

sin ω (t − r/c) +

ω rc

cos ω (t − r/c) cos Θ .

Solving for the partial of the scalar potential,
∂V ∂t


p0 cos Θ 4π 0 r

−ω 2 c

cos [ω (t − r/c)] −
ω2 c

ω r

sin(t − r/c)

= =

p0 cos Θ 4π 0 r p0 ω 4π 0 1 r2

omega r

sin(ω (t − r/c) +
ω rc

cos [ω (t − r/c)]

sin [ω (t − r/c)] +

cos ω (r − tc) cos Θ

By plugging the solved values into the Lorentz Gauge condition we have,

· = − µ0

∂V 0 ∂t .

M AT T H E W M I R I G I A N : T H E P H Y S I C S O F R A I N B O W S




ainbows are one of nature’s most beautiful sights. Some might argue that the rainbow is equally beautiful in its neat and compact packaging of optical phenomena in the physics of the rainbow. The rainbow is formed by a combination of physical properties of light, namely refraction, reflection and dispersion that occur in a drop of rain. When conditions are right a bright inner bow can be observed as well as a fainter outer bow. The inner, primary bow, is red on the outside and violet on the inside, whereas the outer secondary bow is red on the inside and violet on the outside. In this chapter we will explore the physics responsible for producing the rainbow.



the primary bow

To understand how rainbows form we start by analyzing the geometrical optics involved. At its most fundamental level, a rainbow is formed by the reflection and diffraction of sunlight in spherical raindrops. To understand how the shape of the rainbow comes about it is useful to think of the light as monochromatic, made up of one wavelength. Secondly, it should be understood that the sunlight strikes raindrops as rays that are parallel to one another. Figure 2.1 shows a ray incident on the surface of a drop. The ray is parallel to an axis that, if extended backward, would pass through the sun. The ability of a medium, water in this case, to bend light is called the refractive index, designated n. It is the ratio of the speed of light c in a vacuum to the speed of light v in a medium. c v



Snell’s Law summarizes an important experimental observation relating the refractive index of media to the angle that light is refracted. It says that if the incident angle i and the refracted angle r are measured with respect to the normal, the ratio of the sines of the angles is equal to the inverse ratio of the corresponding indexes of refraction. [60]. sin i n = 1. sin r n2


In the drop of rain the light rays that form the primary bow are twice refracted and once reflected. The angle D1 between the incident light ray and the ray that enters the eye to produce the image of a rainbow is determined by the incident and reflected angles, i and r. More precisely it is given by D1 = 180◦ − 2(2r − i ).




matthew mirigian: the physics of rainbows

Figure 2.1: This is the path of a ray producing the primary bow. [93]

Figure 2.2: This is the path of a ray producing the secondary bow. [93]

It is clear that D1 is determined from the incident angle, i. If the values of D1 are plotted with varying i we see that D1 reaches a minimum of approximately 138◦ . The supplementary angle, 42◦ , corresponds to angle above the horizon that the peak of the primary rainbow is seen, and subsequently accounts for the circular shape of the bow, shown in figure 2.3. This means that as the sun rises in elevation a smaller and smaller portion of the bow is visible to the observer. This also means that an increase in elevation of the observer would allow for a larger part of the bow to be visible, and it is even possible for a complete circular rainbow to be observed [70]. Figure 2.3 demonstrates where the rainbow is visible to the observer. It is also important to note that this means that the sun can not be more than 42◦ above the horizon. 2.3 the secondary bow

The secondary bow, which can be visible above the primary bow, is formed by two internal reflections in the drop of rain shown in figure 2.2. Using the same treatment as for the primary bow we can see the angle of deviation, D2 to be D2 = 2(180◦ − 2(3r − i ). (2.4)

If we again plot D2 for values of i we see that a minimum occurs where D2 = 231◦ . This corresponds to the angle of 51◦ that the observer

2.4 dispersion


can view the peak of the secondary rainbow. Notice in figure 2.3 that the ray is deviated in the opposite direction as for the primary bow. This explains the reverse in color distribution compared to the primary bow. 2.4 dispersion

What we neglected in the discussion of the bow formation from monochromatic light we will address in a discussion of dispersion which is central to the dramatic separation of the visible wavelengths that compose white light. The previous sections merely explain how sunlight shining from behind the viewer will be diffracted and internally reflected in raindrops so as to produce a bow shape. Dispersion explains why it is that the viewer sees the light separated into the full visible spectrum. Dispersion is simply the phenomenon in which the change in phase velocity of waves due to interactions in a medium is related to their frequency. A simplified consequence of this effect is that the amount light is refracted depends on the frequency of the light. A light ray that passes through a vacuum will arrive at some detector, like our eyes, unchanged with constant velocity. A correct assumption would be that each photon that is detected originated from the light source. However, this is not true for light that passes through any medium, such as air, glass, or water. The light is transmitted through the medium when the incoming photons are absorbed by the atoms of the medium and then immediately reemitted by each of the atoms in the ray’s path. The principle that light propagates by successive emission of wavelets from particles in the beam’s path is known as Huygens’ Principle and is seemingly in opposition to Newton’s corpuscular theory of light. This mechanism of transmission through media results in the slowing in phase velocity, thus leading to refraction as it passes between media of differing refractive properties, like from vacuum into a medium such as glass. We saw a consequence of this earlier, in our discussion of Snell’s Law. Newton observed the spectrum of visible light by using a prism. He determined that the index of refraction is related to the the wavelength of light, what we call dispersion. He observed “that the rays which differ in refrangibility will have different limits of their angles of emergence, and by consequence according to their different degrees of refrangibility emerge most copiously in different angles, and being separated from one another appear each in their proper colours. [93]" This is the mechanism by which white light is split into the colors seen in rainbows. Trends of dispersion through a wider range beyond visible display some consequences of complex interactions on the atomic level like absorptions into lattice vibrations of low energy and absorptions due to electronic excitations of high energy. However, as we consider the rainbow, it is enough to know that in the region of visible wavelengths the index of refraction of water increases for higher energy. As sunlight enters a raindrop the violet end of the spectrum is refract the most and red light the least so the full spectrum of visible wavelengths is observed. We see that the refractive index should be expressed as some function of the frequency of light n( f ).


matthew mirigian: the physics of rainbows

Figure 2.3: The paths of light forming the primary and secondary bow. [89]

example 2.1: an example
Light traveling through air (n=1) is incident on the surface of water (n=1.33) at an angle of 30◦ . To determine the direction of the refracted ray we can use Snell’s law, equation 1.2. 1 sin(r ) = 1.33 sin(30◦ ) r = 22.2◦



1. A red laser produces light with wavelength 630 nm in air and 475 nm in water. Determine the index of refraction water and the speed of the light in water.
Solution: We know that in any material v = λ f . In vacuum this is c = λ0 f . The frequency is constant in all materials and wavelength changes in correspondence with the change in velocity. So we can say, f = c/λ0 = v/λw . We can combine this with equation 1.1 and see λw = λ0 , n n= λ0 630 nm = 1.33 = λw 475 nm

Then from n = c/v we can determine the velocity of the light in water. v= 3.00 × 108 m/s c = = 2.26 × 108 m/s n 1.33

2. A secondary bow is fainter than the primary bow of a rainbow because A. It is a partial reflection of the primary bow. B. Only larger drops produce secondary bows. C. The sunlight reaching the raindrops is less intense. D. There is an extra partial reflection in the raindrops. Answer: D The secondary bow is produced by two internal reflections inside a raindrop shown in figure 2.2.





o examine the camera, we must first look back to earlier times. The first reference to a modern camera was known as a “camera obscura,” which consisted of a large dark room with a pin-hole on one side. An image would be formed on the wall opposite the pin-hole as shown in Fig. 3.1. The camera obscura was first introduced as a model for the eye by Abu Ali Al-Hasen ibn al-Hasan ibn Al-Haytham, or Alhazen for short. Alhazen investigated the camera obscura by placing three candles outside of the camera obscura and systematically obstructing each candle and observing the effects on the produced image. Alhazen observed that the image produced was reflected about the pin-hole point, as seen in Fig. 3.1. To explain why the human eye sees images right-side up, Alhazen interpreted these findings to mean that the image was sensed on the outside of the eye, despite having the nerve and retina structure in his model [43]. The camera obscura was later improved by Girolamo Cardano when he suggested putting a convex lens in the pin-hole to help focus the image. By inserting a convex lens of proper focal length, the image would become far more clear. In his original description however, Cardano failed to explain this mathematically [44].


Take a piece of paper, and place it opposite the lens as much removed [from the lens], that you minutely see on that paper all that is outside the house. This happens most distinctly at a certain point, which you find by approaching or withdrawing with respect to the lens, until you find the convenient place.

Figure 3.1: The inverted image is projected on to the opposite wall from the light coming in through the pin-hole. Image courtesy of [43]



matthew drake: the camera and how it works

Principles from the camera obscura are still used in modern cameras which we will explore in detail later in this chapter. The camera obscura, along with the lenses will be explained in further sections. 3.2 lenses

To begin our discussion of the camera, we must first understand the role of the lens. Lenses come in two forms, converging and diverging. To use the most simple description, a converging lens is shaped like the outside of a sphere, while a diverging lens is shaped like the inside of a sphere. To understand a lens’ application to the camera, we only need to understand the thin lens approximation as opposed to more detailed equations. In general, the thin lens approximation is valid when the thickness of the lens is small compared to the object and image distances. The thin lens equation states that 1 1 1 = + f s s (3.1)

where f is the focal length, s is the distance of the object to the lens, and s is the distance of the image that is created by the lens, where all distances are measured along the optical axis. If the image has a positive distance, then it is a real image. If the image is a negative distance, then it is a virtual image. Two important properties emerge from the form of Eq. 3.1. 1. If the object is at the same distance as the focal length, than for 1 Eq. 3.1 to hold, s must go to zero and thus, the image forms at | ∞ |. 2. If the object is very far away (approaching |∞|), than for Eq. 3.1 to hold, the image forms at the focal length of the lens. I specify that the distances are at absolute value of ∞ because they can be towards the positive or negative end of the optical axis. Which end of the optical axis an image is located on is dependent on whether the lens is converging or diverging. By definition, if the lens is a converging lens, than the focal length is positive. If the lens is a diverging lens, than the focal length is negative. Figure 3.2 shows a typical image formation from a converging lens. The optical axis has the origin located at the center of the lens with the left side of the lens typically being the negative direction. Along with the placement of the image, we can also examine the size of the image that is formed. Because the lens bends the light rays, the size of the image will be different from the size of the object, except in a special case. The magnification of the image comes from the equation m=− s s (3.2)

The magnification tells us the ratio of the size of the image to the size of the object. A negative sign in the magnification tells us that the image is inverted, as is the case in the camera obscura.

3.3 the camera itself


Figure 3.2: The rays from point Q are refracted by the thin lens to bend toward the focal point. Because the lens is thin, the ray is considered to bend at the middle of the lens. Image courtesy of [99]

example 3.1: forming an image
Suppose there is an object 1 m away from a converging lens. If the focal length of the lens we use is 20 cm, where will the image be formed? What will the magnification of the image be? What does the placement of the image and the magnification tell us about the image? To do this, we apply Eq. 3.1 and solve for s . Then we use Eq. 3.2 to find the magnification. 1 1 1 cm−1 + = cm−1 100 s 20 5 1 4 1 = cm−1 − cm−1 = cm−1 s 100 100 100 (3.3)


s = 25 cm



25 cm = −.25 100 cm


The negative magnification tells us that the image is inverted and one fourth of it’s original size. Since the image distance is positive, we also know that it is a real image.


the camera itself

Now that we have an understanding of lenses, we may apply that knowledge to understand how the camera works. The basic method that the camera uses is to form a real, inverted image onto a small screen using a converging lens. In general, only a real image may be projected onto a screen, so a converging lens is the natural choice for this application. The camera is made up of a fully enclosed box, a converging lens, a shutter to open the lens for a small period of time, and a recording medium. Figure 3.3 shows how an object forms on a camera screen. The screen of the camera is typically some photosensitive material or an electronic detector, depending on if the camera is a digital camera or not. When the picture is taken, the shutter opens to allow the light in for a short period of time and the image is first recorded inverted.


matthew drake: the camera and how it works

Figure 3.3: The image of the key is real, inverted, and smaller than the original object when it is seen through the camera lens. Image courtesy of [99]

The image will have to be inverted again in order to be viewable in the correct orientation. On a digital camera, there is an electronic process which will do this. On a film-style camera, this is done when printing the pictures through a screen process. Most objects that one would want to photograph would be at a distance considerably larger than the distance of the lens to the screen, and so the focal length of the lens must be accordingly small. To see this, you can work out the Forming the Image example again, but instead solve for the focal length with a small image distance. As the object distance goes towards infinity, you will see that the image distance becomes equal to the focal length. For an image to be properly recorded on to the medium, the correct amount of light intensity needs to reach the screen. If the screen gathers too much light, the image will look white from being too bright. If the screen gathers too little light, the image will be too dim to recognize. The amount of light gathered is controlled by the time the shutter is open, and also by a property known as the f-number. The f-number is dependent on the focal length of the lens and on the diameter of the aperture as controlled by the diaphragm. The aperture is a circular area where the lens is placed and the diaphragm is an adjustable piece which controls the size of the aperture. The f-number is defined as f-number = f D (3.7)

where f is the focal length and D is the diameter of the aperture. The light intensity that may reach the film is proportional to the square of the inverse f-number, the time that the shutter is open, and also the brightness of the actual object itself. Nearly all cameras contain options to zoom in or out on objects. One method of doing this is to arrange two converging lenses as shown in Fig. 3.4 The primary lens is a converging lens and brings the light rays together near it’s focal point and inverted. The image is then seen through the secondary lens, another converging lens. Because the image is inverted a second time, it now has the correct orientation and can be magnified. To obtain the magnification of the zoom lens, you must

3.4 questions


Figure 3.4: The secondary lens projects the image of the primary lens to make a magnified image. Image courtesy of [2]

simply combine the two magnifications of the lenses by multiplying them. mtotal = m1 m2 (3.8)

This type of zoom lens is one of the most simple lens and is often referred to as the “astronomer’s lens.” 3.4 questions

1. An image produced by a converging lens of an object at a distance greater than twice the focal length is • • • • a) real b) inverted c) smaller d) all of the above

2. If the focal length of the lens on a camera is 1cm, how far should the screen be from the lens to take a picture of an object 10 meters away? Explain why camera manufacturers would place the screen a distance equal to the focal length. 3. Consider two converging lenses in an “astronomer’s lens” setup with a primary focal length of 30cm and at a separation of 50cm apart. What focal length should the secondary lens have if you desire a magnification of 2 for an object at a distance of 20m? You may use the approximation that for s » f, s ≈ f 3.5 solutions

1. Answer: d. If you work out the magnification, it will be negative and have absolute value less than 1. The image distance will be positive 2. Answer: The image distance would be s = 1000 cm. This is very 999 nearly equal to the distance of the focal length and since most pictures will have an object distance much greater than that of the focal length, the screen will be placed at a focal length away. 3. Answer: You must use Eq. 3.8 and solve for m1 and m2 and then fit those to the conditions. Since f 2 is only used in the s2 expression, solve to get s2 = 2sf1 s2 Continuing from there, you will eventually 1 end up with f2 = ( f1 1 − ) −1 2s1 s2 s2 (3.9)





he art of holography, although a relatively recent discovery in itself, is based on the same principles of optical interference that you have undoubtedly studied in your introductory physics courses. A detailed discussion of holography would be too lengthy to include here; rather, we will only provide an introduction to the mechanisms, production, and applications of holography. The interested student is welcome to read more detailed books on the subject [47]. Furthermore, because this section assumes a thorough understanding of optical interference, we encourage you to review the topic if you feel the need; see, for example, any standard introductory physics text [100] or, for a more enjoyable read, the Feynman Lectures on Physics [32].



the geometric model

Let us begin with a brief review of the mechanism behind classical photography. An object is photographed by using an optical setup to project the three-dimensional object into a two-dimensional image; this image, which consists almost always of incoherent light, strikes a plate covered in a chemical emulsion, triggering a reaction that essentially stores the “brightness” of the light striking it; in this sense, a standard photograph only records information about the amplitude of the light waves striking each section of it, and can only record a two-dimensional image (we assume, for simplicity, that the light and photograph are monochromatic, so we needn’t worry about how the light’s wavelength is recorded). How, then, are we able to store a third dimension in a two-dimensional plate? The answer lies in exploiting the interference effects of coherent light. If coherent light is emitted from two point sources, it will interfere with itself to yield constructive and destructive interference fringes: you have studied this phenomenon in your lab as Young’s double-slit experiment. If we were to trace the locations of constructive interference throughout space, we would obtain a set of surfaces called interference fringes. By placing a photographic plate in the setup, we can record the locations and directions of these surfaces (see Figure 4.1(a)). This will effectively create a set of “partially reflecting surfaces” within the plate, which, when illuminated with the same type of light used to create the image, will reflect light in such a way as to reproduce the original image (see Figure 4.1(b)). example 4.1: the shape of interference fringes
Assume coherent light is being emitted from two point sources A and B, as in Figure 4.1(a)). What shape will the resulting interference fringes be? The interference fringes occur at points of total constructive interference. Imagine we find a single point P in space where the beams from the



sebastian fischetti: holography: an introduction



Figure 4.1: Figure (a) shows the hyperbolic interference fringes produced by two point sources of coherent light, and how these fringes are recorded in a photographic plate; Figure (b) shows how the interference fringes recorded by the holographic plate create a virtual image when illuminated by coherent light [47]. This is an example of a simple transmission hologram.

two sources interfere constructively; this means that the difference in the path lengths is an integral number of wavelengths: AP − BP = nλ. The interference fringe passing through P must therefore consist of the locus of points such that the difference of the distance to A and B is a constant. This is none other than the geometric definition of a hyperbola; thus the interference fringes produced by two point sources of coherent light will be hyperbolic surfaces. In particular, the total number of interference fringes is limited to the number of wavelengths that can fit into the distance between A and B.

In order to take holograms of extended objects, we exploit this same principle: we split a beam of coherent light, using one beam (the object beam) to illuminate the object to be photographed, and using the other one to illuminate the plate directly (the reference beam). The interference between the two beams will be far more complex than in the case of two point sources mentioned above, but can nonetheless be recorded within a photographic plate. By illuminating the processed plate from the same side as the reference beam, the interference patterns within the plate will reproduce the image. Note that as a result, holograms are redundant: every section of the plate contains an image of the entire object. This can be conceptualized by imagining the plate as a “window” through which we look to see the holographic image: if all of the plate is covered except for a small opening, we can still discern the entire image through the the opening, indicating that the uncovered portion of the plate still contains information about the entire image (this ceases to be true, however, as we approach the length scale of the interference fringes themselves). In this sense, holograms store far more information about an image than ordinary photographs, where each part of the plate only contains information about a small portion of the image. example 4.2: diffraction
Based on your knowledge of the propagation of light, what important phenomenon are we neglecting to take into account in the above geometrical model? We are neglecting to take into account the diffraction of light as it passes through the various “slits” formed by the interference fringes within the plate. For our purposes, the geometrical does give a good enough sense

4.3 types of holograms


Figure 4.2: Here is how interference fringes are recorded in a reflection hologram; notice that the fringes are approximately parallel to the surface of the plate [47].

of what’s happening, but there are cases where it fails, at which point we need to take into account diffraction effects with the zone plate model, which will be discussed further in Example 4.3.


types of holograms

There are two broad types of holograms: transmission and reflection holograms. A transmission hologram is one in which the reference beam and the object beam strike the plate from the same side, as in the simple model shown in Figure 4.1(a). In this case, the plane of the interference fringes recorded in the plate is more or less perpendicular to the plane of the plate, as shown in Figure 4.1(a), which allows us to explain the image as nothing but the reflection of light off of the various “partially reflecting surfaces” produced by the interference fringes. In fact, two images are produced simultaneously: a virtual image is produced when the interference patterns within the plate cause the reference beam to diverge, making the image appear behind the plate; this is the image we generally view when looking at holograms, and is the one illustrated in Figure 4.1(b). However, the interference patterns can also cause some of the reference beam to converge into the real image, in which case the image can be projected onto a screen (or, with more difficulty, be viewed directly by placing the eye at its location). However, attempting to focus all will be impossible because of the image’s depth; this is a result of the hologram’s inherent threedimensional information content. In contrast, a reflection hologram is produced by illuminating the plate with the object and reference beams from opposite sides, as shown in Figure 4.2. Now, the plane of the interference fringes is approximately parallel to the plate, so that when the plate is illuminated to produce a hologram, the light undergoes Bragg reflection as it penetrates through the various interference fringes. As a result, the reflected light, and hence the image, is only visible from a relatively small range of angles. Furthermore, the virtual and object images are not produced simultaneously; if the plate is illuminated from one side, the reflected light will diverge, creating a virtual image; if the plate is illuminated from the other side, the reflected light will converge, producing a real image. example 4.3: thick vs. thin holograms
Based on the above discussion of how light is reflected from a holographic plate to produce images, explain why thicker holographic plates tend to yield higher-quality images than thinner plates.


sebastian fischetti: holography: an introduction

The thicker the emulsion used to store the interference fringes, the more well-defined the shape of the fringes, and the more effectively light can reflect off of them to produce an image. In the case of reflection holograms, the thickness of the emulsion is especially crucial, because it places an upper limit to how many Bragg-reflecting surfaces can fit within the emulsion. If the thickness of the emulsion is significantly greater than the separation between successive interference fringes, then the hologram is considered thick. If instead, the emulsion is thinner than the separation between interference fringes, the hologram is considered thin. A reflection hologram ceases to exist when it becomes too thin because the Bragg interference is lost. A thin transmission hologram can still exist, but in this case is called a surface hologram, since it essentially consists only of surface striations and yields a lower-quality image than its thick counterpart. A surface hologram cannot be explained using the geometric model we described, and instead requires use of the zone plate model described above. In general, thick holograms are better than thin, as they are capable of containing more information throughout their depth.


making holograms

Because the quality of holograms relies so heavily on the formation of interference fringes within the emulsion, the components of the optical setup cannot move more relative to each other over a distance greater than the wavelength of the light being used. For creating visible-light holograms, this necessitates the use of an optical workbench dampened to external vibrations. Furthermore, in order to maintain coherence of the light, a single source of light must be used, and the scale of the image is dictated by the coherence length of the light source. The typical light source is a laser, modern versions of which can have very long coherence length, so there is essentially no limit to the scale of the hologram (except, perhaps, for budgetary concerns). Finally, the possible emulsions to use vary greatly, depending on the wavelength and intensity of the light used, the type of hologram being produced, budget, and desired exposure time. Also of importance is the reduction of noise in the image; generally, this is done by adjusting the beam ratio. The beam ratio is defined as the ratio of the amplitude of the reference beam to that of the object beam, and is crucial for filtering out intermodulation noise, which is caused by the object beam interfering with itself (while we want it to interfere only with the reference beam). By changing the beam ratio, we can change the relative amplitudes of the object and reference beams, and therefore change the relative amplitudes of the various possible interference effects between them. Generally, the ideal beam ratio depends on the particular geometry of a setup, and is best found by trial and error. To produce a high-quality transmission hologram, we set up an arrangement similar to that shown in Figure 4.3. Notice first of all that only a single laser is used; all three beams originate from the laser. The reason for this is mentioned above. The particular arrangement illustrated is convenient because it actually uses two object beams to illuminate the object more uniformly. Generally, transmission holograms use a beam ratio greater than 1:1.

4.5 applications of holography


Figure 4.3: A typical setup for producing a transmission hologram [47].

Figure 4.4: A typical setup for producing a reflection hologram [47].

To produce reflection holograms, the apparatus used looks more like Figure 4.4. This particular setup works best with a beam ratio of approximately 1:1. 4.5 applications of holography

The applications of holography are varied and complex; here we can only mention them in passing, but we encourage you to research them on your own if any seem particularly interesting. The most promising application of holography is data storage. Unlike conventional storage devices (optical disks, magnetic disk drives, etc.), which store information on a two-dimensional surface, holograms are capable of storing information throughout their entire threedimensional volume. As a result, holograms can (theoretically) store information much more densely and efficiently than current means. The current challenges that holographic data storage faces is the lack of read-write holographic media and the complexity involved in reading holograms via computerized means. As of yet, current progress in this field has been limited, but the potential for significant advancement exists.


sebastian fischetti: holography: an introduction

Figure 4.5: An example of pattern recognition using Fourier holograms. To the left, the transparency of the page in which a certain letter is to be found is illuminated by a beam of laser light; the leftmost lens creates a Fourier transform of the transparency, which is projected onto the Fourier hologram of the letter to be identified. The lens at right reverses the Fourier transform and projects an array of dots onto the screen wherever the letter was found.

The “ordinary” holograms discussed so far are lensless – they do not require focusing light onto the holographic plate, as conventional photography does. However, it is possible to make a hologram using lenses, made by placing a converging lens between the illuminated object and the holographic plate such that the object is in the focal plane of the lens. The resulting hologram cannot be viewed via conventional means because the lens destroys the crisp image of the object. Nonetheless, optical information about the object is stored in the hologram. In fact, this configuration produces the Fourier transform of the object at the plate; the resulting hologram is called a Fourier hologram. On its own, a Fourier hologram is not of much use, but can be very useful in pattern recognition. Imagine, for instance, we wish to find all instances of a given pattern (say, a particular letter) in a page of text. We can do so with the arrangement like the one illustrated in Figure 4.5. We first create a Fourier hologram of the desired pattern. Then we create an optical Fourier transform of the page of text and project it onto the Fourier hologram of the desired pattern. Finally, we reverse-Fourier transform the combined beams. The result will be an array of dots indicating the location of every instance of the desired pattern in the original text. You might have heard of this process of combining two Fourier transforms in your math classes: it is called convolution. 4.6 problems

1. Transmission holograms are visible over a virtually 180◦ range (as long as the plate remains in the line of sight between yourself and the virtual image), while the angular range over which reflection holograms are visible is very limited. Explain.
Solution: This difference can be understood from the geometric model. In a transmission hologram, the plane of the interference fringes is essentially perpendicular to the plane of the plate; this geometry allows the reference beam illuminating the plate from behind to reflect in virtually any direction as it passes through the plate. On the other hand, the plane of the interference fringes in a reflection hologram is parallel to the plane of the plate, and the image is formed via Bragg reflection. In Bragg reflection, a light wave passing through multiple reflective layers is reflected multiple times, interfering constructively for some angles of incidence and destructively at others. Therefore, a reflection hologram can only be viewed over a narrow angular range in the vicinity of the

4.6 problems


angle of incidence (or reflection) at which the constructive interference in maximum. 2. Why would a higher beam ratio be preferable to a low one in the making of a hologram? Solution: The beam ratio is meant to reduce intermodulation noise, due to the interference of the object beam with itself. If the beam ratio is high, the reference beam’s amplitude is stronger than the object beam’s, and so the interference of the object beam with itself gives a low amplitude compared to the high amplitude of the interference of the object beam and the reference beam, drowning out intermodulation noise. 3. Imagine that in the apparatus of Figure 4.5 we replace the screen to the right with a transparency containing an array of dots and we replace the transparency to the left with a screen; then we project a laser beam from the right. What might we expect to happen? a) Nothing - the apparatus only works in one direction. b) An array of letters will be projected on the screen at left, mirroring the array of dots. c) It is impossible to tell, since the answer depends on what exactly the reference hologram is of. Solution: The correct answer is choice 3b. This is simple symmetry: the Fourier transform of the array of dots, when convoluted with the Fourier hologram of the letter, will yield an array of letters.





classic problem in analysis is that posed by Mark Kac in his landmark paper “Can One Hear the Shape of a Drum?” [45]. We shall follow his exposition, for a while. Then, we shall become absorbed in the simpler question “Can one hear the size of a drum?” Along the way, we shall develop the important tool of normal-mode analysis, and have a quick introduction to asymptotic analysis. For now, consider the following: A membrane Ω, such as that depicted in Figure 5.1, stationary along it’s boundary Γ, is set in motion. Its displacement F ( x, y; t) ≡ F (r; t) in the direction perpendicular to its original plane is known to obey the wave equation


√ where c is some constant that we shall normalize to be c = 1/2. There exist special solutions to the wave equation, of the form
F (r; t) = U (r)eiωt , (5.2)

∂2 F = c2 ∂t2




which are called normal modes. Each of these solutions corresponds to a fundamental frequency at which the membrane can vibrate. By substituting U (r)eiωt into (5.1), we find the corresponding equation for U: 1 2

U + ω 2 U = 0,

U = 0 on Γ


An illustration of the solution of (5.3) follows as part of the example below. example 5.1: normal modes of a string
This is the one-dimensional limiting case of the general problem being set-up above Consider a string of length stretched taut between two walls (Figure 5.2). Its end points are fixed; that is, it obeys the boundary conditions F (0, t) = 0 = F ( , t),

Figure 5.1: A membrane Ω; from [45]



colin lally: listening for the shape of a drum

where F is the displacement of the string, as given before. We want to find the normal modes of this string. Let F = U ( x )eiωt be one of the normal modes we seek. Then U ( x ) is a solution of the equation d2 U + k2 U = 0, dt2

(k2 = 2ω 2 )

which is just (5.3) in one dimension (minus the boundary conditions included in 5.3). The general solution to this equation is U = A cos kx + B sin kx. Note that since the boundary conditions on F involve only x, they are effectively boundary conditions on U. Applying them, we have U (0) = 0 U( ) = 0

=⇒ =⇒

A=0 B sin k = 0 =⇒ k = nπ ,

where n is a positive integer. Thus, nπ ω= √ 2 and our normal mode (now labeled by the integer n) is Fn ( x, t) = B sin where B is a normalization constant. nπx einπt/



A very interesting phenomenon appears in this example: there is a discrete sequence of normal-mode frequencies ω1 ≤ ω2 ≤ ω3 ≤ . . . . Each of these frequencies corresponds to exactly one Un through the relation between k and ω. 5.2 eigenvalues

It turns out that this result holds generally, regardless of problem dimensionality or geometry. Thus, for any region Ω bounded by a smooth (i.e., differentiable) curve Γ there exists a sequence of eigenvalues λ1 ≤ λ2 ≤ . . . such that there corresponds to each an eigenfunction ψn (r), which satisfies 1 2

ψn + λ2 ψn = 0. n

Naturally, ψn (r) = 0 for any point r that lies on the boundary Γ. Note that the eigenfunctions are normalized such that
Ω 2 ψn (r) d2 r = 1.

We are now in a position to formulate the problem to which we alluded earlier. We wish to consider two separate regions (Ω1 and



Figure 5.2: The string considered in 5.1

5.3 problems


Ω2 ) with distinct boundaries (Γ1 and Γ2 ). Let us then consider these membranes’ respective eigenvalue problems: 1 2 U + λU = 0, U = 0 on Γ1 (5.4) 2 1 2 Ω2 : V + µV = 0, V = 0 on Γ2 (5.5) 2 Suppose that, for each n, λn = µn . We want to determine whether, if the eigenvalue spectra are identical, the two regions Ω1 and Ω2 necessarily “have the same shape.” Kac more correctly (but less intuitively) couches the question in terms of Euclidean congruence [45]. Before we continue, we should note that this particular problem has recently been answered in the negative: it is possible to have differentlyshaped membranes that possess the same spectrum [35]. This result was discovered only in 1992, and is highly mathematical in nature. We shall therefore do no more than note it, and proceed to examine some interesting things that can be deduced from eigenvalue spectra. Hence, we shall first see whether one can “hear” the size (really the area) of a drum. In order to answer this question, we shall use the methods of asymptotic analysis [5]. Quite briefly, this analysis deals with finding the limiting behavior of some expression. If we have two functions f ( x ) and g( x ), then we call them approximately equivalent and write Ω1 : f ∼g if
x →∞

( x → ∞)
f (x) = 1. g( x )


We can thus find the qualitative behavior of f at large values of x. Proceeding, we would like to know how many eigenvalues (from our eigenproblem) exist that are less than a given number λ (keep in mind that we are considering large λ). A result posited by H. A. Lorentz (in different form than is used here) and proved by Hermann Weyl is that, for any eigenproblem, the number N (λ) of eigenvalues less than some given λ is N (λ) =

λn <λ


|Ω| λ, 2π


where |Ω| is just the area of the membrane. It follows that

|Ω| ∼ 2π

N (λ) . λ


That is, the area |Ω| of a “drum” (the membrane of Figure 5.1) can be inferred from a knowledge of the number of small normal-mode frequencies. We can “hear” the area of a drum. 5.3 problems

1. Carry out the analysis done in the example, but use the boundary conditions ∂F F (0, t) = 0, = 0. ∂x x= These boundary conditions correspond to the physical scenario of a string with one fixed end at x = 0, and one end free to move in the plane at x = .


colin lally: listening for the shape of a drum

2. Prove the result (5.6) 3. Exam question: Can one hear the shape of a drum? a) Yes b) No

Part II M I N D O V E R M AT T E R




introduction to glass

lass is a substance that people encounter frequently in their everyday lives. The average person may assume there is nothing particularly interesting about glass and would describe it as a brittle clear solid formed from molten sand. Many others believe the common misconception that glass is a liquid that flows extremely slowly and that proof can be found in medieval windowpanes that are thicker at the bottom than at the top. As we will see, this misunderstanding is not entirely rooted in fiction. Glasses are actually a diverse group of substances with a unique set of properties. Despite the fact that humans have been making use of glass for thousands of years and its importance in our everyday lives, the details that underlie the formation of glasses is still a hotly debated topic in the fields of chemistry and physics. [16] In the common sense glasses are often considered to be a group of substances like the ones we encounter in our everyday lives. These glasses are usually hard, brittle, and sometimes transparent. In the scientific sense glasses are a much broader group of substances. For example, many organic polymers such as polystyrene and polyvinyl chloride are technically glasses. Although the exact definition of glasses may vary slightly from publication to publication, glasses can be described as amorphous solids that undergo a glass transition. To further understand the properties of glasses one must understand what it means to be an amorphous solid and what a glass transition is. [27]



amorphous solids

On the macroscopic scale the differences between a solid and liquid are obvious. A commonly unknown fact about solids is that there are two fundamental types: crystalline and amorphous. The distinction between these types can only be seen on the atomic scale. Crystalline solids are made up of molecules that form a well defined lattice structure that is repeated throughout the substance. This is called long-range order, or translational periodicity. On the other hand amorphous solids have no long-range order. Atoms in amorphous solids do show connectivity that resembles the structural units of crystalline states but it is not regular and does not repeat. This quasi-periodicity is indicative of what is called short-range order [31]. The difference between the molecular structures of crystalline solids and amorphous solids is shown in Figure 1. On the molecular level liquids look exactly like amorphous solids and as one may expect the molecules in an amorphous solid have a significant degree of freedom. Then what is the difference between and amorphous solid and a liquid, and why don’t they actually flow? There are a few different ways to draw the line between amorphous solids and liquids. One way is to say that a substance is a solid when the time required for it to undergo a structural change is much longer than the



christopher maclellan: glass

Figure 6.1: Atomic Makeup of Crystalline Solids vs. Amorphous Solids [7]

time of the observation [11]. In other words, a substance is a solid when it doesn’t flow in a reasonable amount of time. In fact, some amorphous solids would take billions of years or more time than the universe has existed before they would show a noticeable structural change due to flow. Another way to describe the difference is to define a liquid as a substance with a viscosity below an arbitrary value. For example, in his book Physics of Amorphous Materials SR Elliot defines a solid as a material with a viscosity above 1014.6 poise. Although these two definitions appear different at first glance, they both attempt to quantify the same obvious difference between liquids and amorphous solids. If we return to the common misconception that glass is a slow flowing liquid we will find that under these definitions what we commonly consider to be glass cannot be a liquid; it simply does not flow fast enough. So why are medieval windowpanes thicker at the bottom than at the top? The answer lies in the method used to create the glass, not in its atomic structure. Now that we have defined amorphous solids and how they are different from liquids and crystalline solids we must examine the glass transition, which is the property that distinguishes glasses from amorphous solids. [27] [16] example 6.1: glass in our everyday lives
We have defined glass in a scientific sense, but what about the glass we use in our everyday lives? Glass in the common sense is nothing more than an amorphous solid made up of silica(SiO2 ). Silica is found in many forms, the most abundant of which is sand. This is not the whole story though. Almost all of what we call glass is not made up of pure silicon dioxide for practical purposes. Silicon dioxide (crystalline form) has a very high melting point of around 1700o C [27]. Because of this high melting point soda ash ( Na2 CO3 ) is often added to the silicon dioxide. This lowers the melting point to about 1000o C, making the molten glass much easier to handle. To provide extra hardness and chemical stability lime (CaCO3 ) or dolomite ( MgCO3 ) is added to the mixture, which is called soda-lime glass. The concentrations of these materials can vary, but the makeup is commonly 60-75% silicon dioxide, 12-18% soda and 5-12% lime. Other impurities can be added to change the glass properties for aestetic or functional purposes. Soda-lime glass makes up most of the glass we use everyday and can be found in everything from drinking glasses to windows.


the glass transition

When a liquid is cooled it may undergo a first order phase transition and crystallize at the melting point of the substance. In our everyday experience the transition from water to ice is a perfect example of this.

6.4 simulating amorphous materials with colloids


Figure 6.2: A Thermodynamical View of Liquid-Solid Transitions [7]. The path AD represents the transition from liquid to glass (for an arbitrary cooling rate), while the path ABC represents the transition from liquid to crystalline solid. Notice the discontinuous change in volume at the melting point during the crystalline transition . Compare this to the glass transition where there is a continuous change in volume at the the glass transition temperature.

This is not the only thing that can happen when a liquid is cooled. If cooled fast enough it will not crystallize and will remain in a liquid state below its melting point. This is called a supercooled liquid and is also indicated by increasing viscosity as temperature is reduced. Eventually the supercooled liquid cools to the point that it forms a glass. This transition is marked by a glass transition temperature. This temperature is defined as the range in which thermodynamical variables (e.g. volume, enthalpy, entropy) begin to change. It is a range because the value of the variables changes in a continuous fashion, unlike the change that occurs at the melting point. It is important to note that a substance does not have a specific glass transition temperature. As mentioned earlier, supercooled liquids only result from sufficiently fast cooling rates. The cooling rates that will allow a supercooled liquid to form depends on the substance being cooled. This cooling rate also changes the region over which a substance can be supercooled. If one reduces the rate of cooling, the range over which the temperature can be supercooled increases, which effectively lowers the glass transition temperature. In fact, the glass transition temperature can vary as much as 10-20% for very different cooling rates. An odd consequence of these properties is that the measurement of the glass transition temperature varies by how the glass is prepared. Despite this knowledge of the glass transition, there are still many questions left to be answered. Physicists and chemists still debate the details of the glass transition on a regular basis and new methods are constantly being formulated to try to analyze glass structure and transition. [27] 6.4 simulating amorphous materials with colloids

Some of the most promising ways to study the structure of glasses are being developed using colloids to simulate the molecular structure of amorphous materials. A colloid is a mixture in which a system


christopher maclellan: glass

Figure 6.3: A diagram of the experiment. A binary colloid is allowed to settle in a capillary tube and is imaged with a confocal microscope. The larger particles are found in higher concentrations at the bottom, while the smaller particles are found in higher concentration at the top [69]

of particles between the sizes of 10−7 and 10−5 cm is dispersed in a continuous medium [17]. In our case we will consider colloids in which the medium is a liquid and the particles are micron sized hard spheres. When colloids are allowed to settle the particles pack together and resemble a continuous solid. In fact, the arrangement of particles is directly analogous to the arrangement of molecules in a solid. Because of the relatively large size of colloid particles we can measure each one’s position individually. We exploit this property of colloids to analyze their structure and extend our findings to actual solids. In the laboratory we create a colloid made up of a binary system of the aforementioned hard spheres. We use two different sizes because it is well established that monodisperse systems only settle in crystalline arrangements. In contrast, binary systems show areas of amorphous arrangement. In our case we place our colloid is in a capillary tube so gravity determines the distribution of large and small particles. As you would expect the larger particles are more prevalent at the bottom while the smaller particles are found in higher frequency at the top. In the middle there is a continuum of different compositions with respect to sphere size. It is this middle area where a truly binary colloid exists with amorphous structure. A confocal microscope can be used to measure the position of each particle in the capillary tube. A diagram of this setup is shown in figure 3. [69] The particle arrangement is determined by defining a radial distribution function g (r ) which is defined as, L2 2π∆rN ( N − 1)

g (r ) =

i =k

∑ δ (r − |rik |) ,


where g (r ) is the ratio of the ensemble average of particles in a region r ∼ r + ∆r to the average number density N/L2 where L is the square = image length and δ is the Dirac delta function. This can be thought of as the relative number of particles present at a distance r from a central particle. This single parameter can give a quantitative measurement of the degree of long-range order present [69]. The radial distribution function at different heights is shown in Figure 4. You can see that when z = 1.8mm and z = 6.5mm g (r ) some

6.5 practice questions


Figure 6.4: The radial distribution function g (r ) at different heights where σ is the mean particle diameter. Notice how the oscillatory behavior first decreases and then increases as r increases indicating a change transition from crystalline state to glassy state back to crystalline state. Not that the lines graphs are offset for clarity. [69]

oscillatory nature exists. This indicates that there is some regular change in density as you get further away from the central particle, which is what we would expect from a structure with long-range order. This also agrees with our assertion that crystallinity exists where there is a relatively monodisperse composition of particles. When z = 3.3mm we would expect more or less equal mixing between the different size spheres and an amorphous structure. g (r ) at this height confirms this, as there is no pattern in how the density of particles changes as r increases [69]. Using this analysis we have found a way to measure the long range order in a substance. Although the above treatment is mainly qualitative, more analysis techniques are being developed to describe the positioning of the spheres in both 2D and 3D. [69] Nevertheless, this method has been proven to be a efficient way of creating system that is analogous to solids that can be experimentally measured and easily manipulated. Analyzing colloidal system like the one above provides one of the best chances for physicists and chemists to unlock the secrets of how and why glasses form. 6.5 practice questions

1. In a technical sense glasses are identified by their lack of • • • •
a)long range order b)short range order c)flexibility d)color

2. When cooling a liquid to form a gas, raising the cooling rate the glass transition temperature • a)raises • b)lowers • c)does not change


christopher maclellan: glass

3. Colloids are used to simulate the formation of solids because they: • a)are the same thing • b)colloid particles are large enough to have their individual positions measured • c)never show crystalline arrangements • d)all of the above

R O B E RT P I E R C E : T H E P H Y S I C S O F S P L A S H I N G




ave you ever noticed how the water droplets that come from your faucet splash at the bottom of the sink? Have you ever watched rain droplets splash as they fall and hit your car windshield while driving? The phenomenon known as splashing occurs in many situations such as these. Splashing is also important in technological and industrial situations such as the ignition of fuel in your car, or when your printer puts ink onto a piece of paper. This chapter will help you understand situations such as these by developing an understanding of the physics of splashing. Scientists have studied splashing for over a hundred years. Because of the beauty of the motion involved with splashing, it has been one of the most highly praised phenomena studied via the use of high speed photography. From famous sketches and photographs of scientists such as Harold Edgerton and A.M Worthington[96] , the beauty of splashing has been available to mainstream society. Yet the physics of this tremendous phenomenon is still not fully understood. The goal of this chapter is to develop an understanding of some of the physics involved when a drop of liquid falls and strikes a smooth surface. We will see that the there are many factors (characteristics of our system) that determine how the system will evolve in time. We will review fundamental concepts such as pressure, surface tension, viscosity, and others to develop a groundwork for understanding splashing as well.



pressure, surface tension, and other concepts

Recently, fronted by the research of Dr. Wendy Zhang[97], Sidney Nagel[97][98], and others, it has been shown that air pressure has a

Figure 7.1: A photograph taken by Martin Waugh[90] as part of his liquid sculpture images. This is an image of milk splashing on a smooth surface. This is just another example of the spectacular beauty of splashing, and why splashing has grabbed the attention of many photographers around the world.



robert pierce: the physics of splashing

tremendous influence on whether or not a splash will occur when a liquid drop falls onto a smooth surface. The research has shown that the lower the gas pressure surrounding the splash zone, the smaller the splashes become, until they disappear all together. You can see through high speed imaging that at a certain critical air pressure, the corona (a layer of liquid spreading out from the center of impact with the smooth surface) disappears all together, and there is no splashing. Instead, the liquid just spreads out along the surface in the same way that one would see spilled milk spread out across the kitchen table. This area of the physics of splashing is a bit too complicated to be addresed here, and we will focus on something less difficult. We will focus our attention towards what characteristics of the system produce, not only a corona, but splashing (when the corona breaks up into thousands of individual droplets). In order to understand these characteristics of splashing to an appropriate degree, we need to review some of the physics of pressure, viscosity, and surface tension. 7.2.1 Pressure

Pressure is defined as the force per unit area of a surface directed perpendicularly to the surface. In our context of falling liquid drops, we will be focusing on atmospheric pressure due to the force of the gases in the atmosphere above the surface. We have that P= FN , A (7.1)

where P is pressure, FN is the normal force on the surface, and A is the area of the surface. In our system, FN will be equivalent to the weight of the gas particles above the surface, as well as the force these gas particles exert onto the surface due to their speed. For ideal gases, we may express the pressure as PV = Nk b T =⇒ P = N ρ k T =⇒ P = A k b T, V b Mg (7.2)

Figure 7.2: Photographs taken by Lei Xu, Wendy W. Zhang, and Sidney R. Nagel in their experiment involving falling alcohol drops onto a smooth surface. The top three frames represent alcohonl under an air pressure of 100 kPa, and the bottom three frames represent alcohol under an air pressure of 30 kPa. The left frames (top and bottom) are the drop just above the surface, at time t = 0 ms, the middle frams are the drop at time t = .276 ms, and the right frames are the drop at time t = .552 ms. We see that in the middle frames a corona or liquid layer is spreading out from the center, and by the time expressed in the right frames, the stress due to air pressure has won, and we see splashing.

7.2 pressure, surface tension, and other concepts


where Mg is the molecular mass of the gas g, V is volume, N is the number of particles, T is the temperature, k b is the Boltzmann constant, and ρ A = Mg /V is the density in the atmosphere. Equation 7.2 is the Ideal Gas Law. From 7.2 we readily find an expression for the gas density in the atmosphere to be ρA = PMg kb T (7.3)

We will see that after a drop strikes the surface, a liquid layer will spread outwards (the corona), and interact with the atmosphere. There will be stress on the liquid layer due to the atmosphere that will be proportional to, among other things, the density of the gas in the atmosphere. We will learn what the other things are later on, but first make sure that you are capable of using the above equations to analyze this system in terms of the atmospherically applied stress. Make sure that you understand pressure in the context of our system by performing the following example. example 7.1: using the ideal gas law

1. When observing a liquid drop spread out after coming into contact with a smooth surface, we notice that the liquid spreads out very rapidly. Under these circumstances the stress applied to the spreading liquid layer due to the atmosphere will be proportional to the density of the gas in the atmosphere. If our atmosphere is made of air, and if air has a molecular mass of M29 = 29u (u = 1.66x10−23 kg), what is the pressure that the atmosphere may exert on a liquid layer that is spreading outwards? Assume that we observe this at room temperature (T = 295K). The density of air is measured to be ρ A = 1.2kg/m3 , and Boltzmann’s constant is k b = 1.381x10−23 m2 kgs−2 K −1 . Does this pressure look familiar? Solution: Here we may use 7.2 to solve the problem. We are given values for the variables ρ A , Mg , and T. Solving for pressure, we find P= ρ 1.2kg/m3 (1.381x10− 23m2 kgs− 2K − 1)(295K ). kb T → P = Mg 29 ∗ (1.66x10− 23kg) (7.4) So we find that pressure P is, P = 101, 500kgm− 1s− 2 → P = 101.5KPa (7.5)

This pressure is known as atmospheric pressure. It is the pressure measured due to the atmosphere at sea level. We get this value because the density given in the problem is the density of air molecules at sea level.


Viscosity and Surface Tension

Viscosity may be defined as the resistance to flow in a liquid. A general way to view a system involving viscosity is to imagine a flowing liquid section that is divided into many infinitesimally thick layers from top to bottom. Individual layers will move at different velocities from eachother. This is because the layers at the bottom will interact with the surface, layers at the top will move with the initial speed of the


robert pierce: the physics of splashing

liquid section itself, and the middle layers will move with velocities in between the two extremes. Let’s make shear stress more clear: stress is defined as the average force applied to a surface per unit area (see equation 7.6), and shear means that the stress is being applied parallel to the surface. Shear stress will result as layers move towards or away from other layers. Stress τ is proportional to force F by τ= F . A (7.6)

Where A is area. Shear stress is directed parallel to the liquid. Ultimately we will see that the stress in the liquid is acting to hold the liquid together. We will see that this stress will interact with the stress on the liquid from the atmosphere (due to pressure as described in earlier sections), and when the atmospheric stress is stronger, splashing will occur. First, let us devote some time to viscosity. In our study of viscosities we will be interested in what is known as kinematic viscosity. This is when we look at the relationship between the resistive force in a liquid due to viscosity, and the inertial force of the liquid. Kinematic viscosity νL of a liquid is defined as νL = µ , ρ (7.7)

where µ is the dynamic viscosity of the liquid, and ρ is the density of the liquid. In our system of a liquid drop falling onto a smooth surface, layers of thickness d will advance away from the center (where the drop initially made contact with the surface). We can estimate the thickness of the boundary layer, the first layer closest to the surface, to be √ d = νL t, (7.8) where t is the time elapsed since the drop struck the surface. Now we are in a position to consider the stress on the expanding liquid layer due to the liquid layer itself. The stress on the boundary liquid layer will be due to the surface tension of the liquid striving to keep the layer intact. With d defined as above, we have that the surface tensional stress (shear stress) is ΣL = σ σ → ΣL = √ d νL t (7.9)

And since we have this knowledge of the stress on the expanding liquid layer due to the liquid layer itself (ultimately the force trying to hold the liquid layer together), we can use it in combination with the stress on the expanding liquid layer due to the atmosphere (ultimately the force trying to break the liquid layer apart), and observe under what combinations we get splashing. 7.3 splashing

Splashing in our system will occur when the stress on the liquid layer due to the atmosphere is stronger than the stress on the liquid layer due to the liquid. This is because the stress due to the atmosphere is trying to break the liquid layer apart, and the stress due to the liquid is trying to keep the liquid layer together. We’ll have two equations for the

7.4 summary


two different stresses. Earlier we discussed how the stress due to the atmosphere should be dependent, among other things, on the density of the air. We also know that the stress is proportional to the rate of expansion of the liquid layer, and the speed of sound in the atmosphere. If Σ A is the stress due to the atmosphere, then our equation for the stress is Σ A = (ρ A )(C A )(vl ) −→ Σ A = PMg kb T γk b T Mg RV0 . 2t (7.10)

Here C A is the speed of sound in the atmosphere, and vl is the velocity of the boundary layer that has undergone shear stress due to the surface.1 Also in the second form of the equation, R is the initial radius of the liquid drop, V0 is the velocity of the liquid drop as it lands on the surface, gamma is the adiabatic constant of the gas in the atmosphere, and t is the amount of time elapsed after the drop impact. We also saw in equation 7.9 that σ ΣL = √ . νL t If we take a ratio of Σ A with Σ L , we’ll get ΣA = ΣL
√σ νL t
PMg kb T γk b T Mg RV0 2t



γMg P

√ RV0 νL 2k b T σ


From this ratio we can see that a more viscous liquid will splash more easily than a less viscous liquid. This is counter intuitive because one would think that a more viscous liquid would stay together more easily, however, it is apparent that a more viscous liquid will make the value of the ratio in (7.12) larger, which implies that the liquid layer is more likely to break apart and splash. So splashing is clearly not a straightforward phenomenon at all. 7.4 summary

Ultimately we see that splashing is an extremely interesting phenomenon that involves some physics that even runs contrary to intuition. We see that atmospheric pressure determines whether or not a liquid will splash, and we see that when the pressure is low enough to allow for splashing, we can quantify when we will and will not see splashing. We will see splashing when the stress due to the atmosphere is greater than the stress due to the liquid layer. The stress due to the atmosphere is working to pull the liquid layer apart, and the stress within the liquid layer is working to hold the layer together, if the atmospheric stress is stronger, the liquid will break apart. Ultimately a strong knowledge of splashing may be used to control when we want to have splashing with some kind of liquid, and when we don’t want to have splashing. This, in turn, can be used to improve many forms of industry and technology, and may prove to be very important towards a more efficient future.

1 For a derivation of this equation, see the paper Drop Splashing on a Dry Smooth Surface by Lei Xu, Wendy W. Zhang, and Sidney R. Nagel, 2005.


robert pierce: the physics of splashing


chapter problems

1. A liquid drop of alcohol with initial radius R = 1.77 mm falls onto a smooth surface. After the drop strikes the surface, a corona or a liquid layer moves outward with an initial velocity V0 = 5 m/s, and we observe the moving corona for t = .5 ms. Let’s suppose that our experiment is performed at room temperature (T = 295 K, but our atmosphere is made of helium. The pressure due to the helium atmosphere near our experiment is still equivalent to atmospheric pressure, and the molecular mass of helium is Mg = 4u, whereu = 1.66x10−23 kg. If the speed of sound in our helium atmosphere is 1000 m/s, what will be the stress on the corona due to the atmosphere Σ A ?
Solution: Here we may use the given conditions to solve for Σ A using the two parts of equation 7.13. Σ A = (ρ A )(C A )(vl ) −→ Σ A = PMg kb T γk b T Mg RV0 . 2t (7.13)

Looking at the left part of the equation; we are given the speed of sound C A , and we can readily find ρ A and vl by looking at their terms in the right part of the equation: Namely: ρ A −→ PMg , kb T RV0 . 2t (7.14)

vl −→


Plugging in all of the values given, we find: ΣA =

(101.3 kPa)(4u) (1.381 × 10−23 m2 kg-s−2 K−1 )(295 K) (1000 m/s)


(1.77 mm)(5 m/s) 2(.5 ms)



and we arrive at the answer, Σ A = (1651.1)(1000)(2.97) = 4903.8 kPa. (7.17)

2. Consider the answer found in the previous question. If the shear stress in the expanding liquid due to the liquid is Σ L = 4500 kPa when observing for time t = .5 ms, will there be any splashing? Explain, why or why not? Solution: From the first question we have that the stress on the expanding liquid layer due to the atmosphere is Σ A = 4903.8 kPa. (7.18)

We know that if the stress due to the atmosphere is greater than the stress due to the liquid, then the expanding liquid layer will break apart, and we will see splashing. So the ratio from equation 7.12 will be greater than 1 when we have splashing, and less than 1 when we don’t have splashing. In our case, ΣA 4903.8 kPa = = 1.09 > 1 ΣL 4500 kPa (7.19)

We do see splashing. Again, this is because of the fact that the stress due to the atmosphere (the stress that is trying to break the expanding liquid layer apart), is stronger than the stress due to the liquid itself (the stress that is trying to hold the liquid together). Ultimately the atmosphere wins, the liquid breaks apart, and we see splashing.

7.6 multiple choice questions



multiple choice questions

1. Which of the following does the atmospheric stress Σ A on the expanding liquid layer NOT depend on? (a) The speed of sound in the atmosphere. (b) The density of the atmosphere. (c) The density of the expanding liquid. (d) The speed of the expanding liquiud boundary layer. (e) None of the above.
Solution: (c) The density of the expanding liquid. See equation 7.10 2. How thick is the boundary layer of an expanding liquid on a smooth surface of it has been expanding for time t = .4 ms with a kinematic viscosity of νL = 1 × 10−6 m2 /s? (a) 2x10−5 m (b) 2x10−4 m (c) 2x10−3 m (d) 2 m (e) None of the above Solution: (a) 2x10−5 m Here we may use equation 7.8 to solve the problem: d=

νL t −→ d =

10−6 m2 /s.0004s = 2x10−5 m.






n the deep ocean it is common for mariners to see wave heights that are in the 5-meter to 12-meter range. In the worst storm conditions waves generally peak at 15 meters (49 ft). Commercial ships have been built to accommodate waves of such heights. About once a week a ship at sea sinks to the bottom of the ocean [4]. Over the last two decades more than two dozen cargo ships measuring over 200 meters in length have been lost at sea [38]. Most of the time these ships will sink without a mayday signal or any trace left behind of its existence. For centuries mariners have described waves that appear out of nowhere that approach 35 meters (115 ft.) in height. However there was no scientific evidence to back up this anecdotal evidence. It is important to distinguish that these types of waves are described to appear out of nowhere in the deep sea. This freakish wave is drastically different than that of the tsunami. A tsunami is created by a disturbance in the ocean. The most basic example of this is the example of a stone being dropped in water, which creates waves that are generated radially outward. Likewise, a quick shift of plates making up the sea floor creates a series of low waves with long periods. Disturbances known to create tsunamis range from seismic activity, explosive volcanism, submarine landslides, or even meteorite impacts with the ocean [13]. Tsunamis occur near land as their name suggests; tsun translates from Japanese in harbor and ami into wave (The phenomena known was originally known as a tidal wave, however, this name incorrectly implies that they are related to the tides). While tsunamis are much more deadly than the freak waves described by mariners (The megatsunami of December 26, 2004, killed over 250,000 with waves in the 10-to-30 meter range along the coast of Indonesia, India, Thailand, Sri Lanka, and elsewhere), their behavior is much more understood as about three fourth’s of tsunamis originate from submarine seismic disturbances. These disturbances produce a great amount of energy that translates into the waves as they travel at speeds approaching 200m/s with wavelengths on the order of hundreds of kilometers [67]. Tsunamis are not felt at the deep sea as they only produce wave heights around a meter or two, which is not detectable with the typical ocean swell. However, once the waves created by the disturbance approach areas of shallow water, more energy is packed into the waves, creating waves in the tens of meters along the shore. In the 20th century over 1000 tsunamis have been recorded with 138 being labeled as a “damaging” tsunamis [12]. Since the last half of the 20th century oceanographers have been able to identify the instigating disturbance for the occurrence of a particular tsunami. This is completely different than freak waves out of nowhere described by mariners which carried the same type of lore and about as much comprehension as mythical monsters like Loch Ness. For a long time oceanographers dismissed the possibility of the existence of these types of waves with such a high rate of occurrence. What




sam bingham: freak waves

Figure 8.1: The crest heights during a 20-minute interval at the Draupner Oil platform at 15:20 GMT. The peak is 18.6 meters from the surface.

is known as the linear model has been used to predict wave heights out at the deep sea with great accuracy. The commercial shipping industry has based most of the designs for its ships off of this model because of the accuracy of the linear model. The freak waves seen by the mariners are part of the natural wave spectrum but should only occur about once every ten thousand years according to the accepted linear model. Clearly something was wrong. While oceanographers had started to put more faith and energy into the belief of such freak waves in the 1970’s the major breakthrough came with the Draupner wave on January 1st, 1995 at the Draupner oil platform in the North Sea. The Draupner oil platform, which is located directly west of Norway, had a laser-based wave sensor at an unmanned platform that would record waves in a fashion that would be uninterrupted by the platform’s legs [41]. It should be noted that oceanographers base much of their analysis on what is called the significant wave height, or swh, which is the average height from trough to crest of one third of the largest waves during a given period. On January 1st at the Draupner Oil platform the swh was in the 11-to12-meter range. The maximum wave height measured that day was 26 meters and the peak elevation was 18.5 meters, as shown in figure 8.1. The annual probability for a wave of that height in the 12 meter swh is 10−2 , or once in a hundred years. While the causes for such waves are understood with much more confidence since the Draupner wave, it is important to determine what waves are freak waves by nature. 8.2 linear model

Now that we understand the Draupner wave and a bit about these freak waves let’s take a look at just how odd these waves are when using the linear model. In the most simple model of ocean waves, the sea elevation is taken to be a summation of sinusoidal waves of different frequencies with random phases and amplitudes. In the linear approximation the wave field is taken as a stationary random Gaussian process. The probability density distribution is described as 1 2πσ e−η
2 /2σ2

f( η ) = √



8.3 interference


In Eq. 8.1, η is the elevation of the sea surface with a zero mean level, η = 0, and σ2 is the variance computed from the frequency of the spectrum, S(ω ) σ2 = η 2 = S(ω )dω. (8.2)

If we take the wind wave spectrum to be narrow, the cumulative probability function of the wave heights will be described by the Rayleigh distribution such that P( H ) = e− H
2 /8ω 2



Equation 8.3 gives the probability that the wave heights will exceed a certain height, H. We can now introduce the significant wave height, Hs , into the equations and begin to see its relevance. Extensive work by Stanislaw Massel has shown that Hs ≈ 4σ[58]. From this we can find a direct relation into the probability for a certain wave height to occur under the conditions that are present by rewriting Eq. 8.3 as P( H ) = e−2H
2 /H 2 s



At this point a mathematical definition for what constitutes a wave being a freak wave is necessary. Looking back at the Draupner wave, the maximum wave height would not be completely out of the ordinary for severe storm conditions that created 15 meter significant wave heights. The fact that the Draupner wave occurred with a significant wave height of 12 meters is what makes it a freak wave. From this point on we will refer to these freak waves as rogue waves, which by definition are waves that have a wave height that is more than double the significant wave height. example 8.1: shallow water rogue waves
Could a rogue wave occur in shallow water of only 20 meters in depth with the top third average waves only around 2 meters in height? If one could what would be the most probable rogue wave and with what probability would it occur? Solution: This problem is quite simple but requires careful thought on the definition we established for a rogue wave. We stated that a rogue wave is a wave with a wave height of twice that of the significant wave height, thus a 5-meter wave would be a rogue wave if the swh was 2 but not if it was 4. So if we return to the question we can surely have a rogue wave in shallow water with a swh of 2 meters if the wave height is greater than 4. This 4-meter wave would also be the most probable rogue wave with a probability that is found 2 2 from Eq. 8.4. With our conditions we get P(4) = e−2(4) /2 = 3.5 × 10−4 . These types of waves are referred to as shallow water rogue waves and are just as freakish as the giant ones in the deep sea. The highest recorded ratio H/HS was a wave height of 6m in conditions with a significant wave height of 2m.



While the Draupner wave was a tremendous breakthrough for the field of oceanography, as it proved that such waves do exist and that there


sam bingham: freak waves

were inaccuracies in the standard linear model, it did not show how or why such waves form. In order to examine the possible causes for these waves researchers looked at the areas in which they occurred. By plotting areas in which freak wave phenomena were described in the past a pattern was found. A great number of the waves occur in areas where there are fast currents that run counter to the wave direction. The most extreme example of this is the Agulhas current off of the Cape of Good Hope and the tip of South Africa. The Agulhas current is the fastest current on the planet. Figure 8.3 shows 14 reported freak waves found in a study by Captain J. K. Mallory prior to 1975. His study found similar results in the Gulfstream current, the Kuroshio current and any others in the Pacific Ocean.

Figure 8.2: These are the results of a study by Captain Mallory done in 1976. The circles represent cases where abnormal waves were observed, the dotted line is the continental shelf, and the other depths show the continental slope. The abrupt change in depth is what creates the Agulhas current and gives it its jet like function parallel to the shoreline [49].

Traveling around the tip of South Africa has always been a rough stretch for mariners, but the area of Agulhas current created waves that mariners said could reach 30 meters in wave height. In this region the waves are generated off Antarctica and move northward towards South Africa. They run unopposed until they meet the Agulhas current, which moves in a southwest direction parallel to the east coast of South Africa. Because of this area of interaction the wind waves are made up of two different systems: the long swell waves coming from the Antarctica and short/steep waves generated by the sea wind of South Africa. The superposition of these two systems will undoubtedly contribute to an increase in wave height due to possible constructive interference, but not enough to account for the rogue waves reported in the area. The solution to these types of waves relates to geometric optics. If we think about a strong variational current as a lens then it could focus opposing waves and create points where enough energy would converge to explain the freak waves. This would be possible because the areas of the strong flow of the current would retard the wave advancement much more than the weaker flow, which would cause the wave crests to bend. This refraction allows for the focusing of the waves energy and allows the waves to grow in height as shown in figure 8.3.

8.3 interference


The most commonly accepted research on this subject variational current focusing was done by I.V. Lavrenov in the late 1990’s. His work showed that the focusing by currents would shorten the wavelength of storm-forced waves that ran against the current and the wave trains would press together possibly into a rogue wave. A major concept that Lavrenov derived was that H/H0 would reach a maximum value of 2.19 for the conditions of the Agulhas current. In this formula H0 is the mean height of the swell propagating on still water. From this we can easily see that the maximum height for a wave caused by current refraction would 2.19 times the mean height of the swell. Since rogue waves are not limited to occurring in areas where there is a strong counter current, there has to be some explanation for these other extreme waves. The solution is actually very similar to the one just discussed. The thought was to look for other possible ways that wave energy could be focused into one spot. The easiest way to do this is through the ocean basin. It is not too hard to see that geometric focusing can occur with the help of the underwater topography. The behavior of the rays in the basin, however, is rather complicated as the real topography creates many caustics.


Variational Current b

Wave crests

Wave orthogonals

Figure 8.3: a shows the interaction of the two systems that are present off the tip of South Africa [49]. b shows the refraction that waves undergo when flowing against a current that has a variable surface speed. This refraction can lead to the build up of rogue waves when the wave orthogonals converge [Figure made in Inkscape].


sam bingham: freak waves


nonlinear effects

With the understanding of refraction applied to water waves oceanographers could generally predict where the rogue waves would occur. Since the ships in the shipping industry are built to handle 15-meter waves there was no need to travel through areas in which they could encounter the rogue waves. The ships could simply travel another route and there was no need to redesign millions of ships across the world. This apparent success turned out to be rather short lived. In February of 2001 two ships, the Caledonian Star and the Bremen, were nearly sunk when they came into contact with 30-meter waves in a region of the South Atlantic. In this region there is no current strong enough to create the waves and the basin cannot focus enough energy to account for the waves. Once again, something was missing from the deep ocean wave models. The breakthrough came from the world of quantum mechanics this time and specifically the non-linear Schrödinger’s equation. There is a modified version of the non-linear Schrödinger’s equation that can be used to describe motion in deep water. The simplified nonlinear model of 2D deep water wave trains is ∂A ω ∂ 2 A ω0 k 2 ∂A 0 + c gr )= 0 + | A|2 A, ∂t ∂x 8k0 ∂x2 2



and the surface elevation, η ( x, t) is given by 1 ( A( x, t)ei(k0 x−ω0 t) + c.c. + ...). 2

η ( x, t) =


Here k0 and ω0 are the wave number and frequency of the carrier wave, c.c. denotes the complex conjugate, (...) determines the weak highest harmonics of the carrier wave, and A is the complex wave amplitude, which is a function of x and t. At this point what is known as the Benjamin-Feir instability comes into play. It is very well known that a uniform wave train with an amplitude of A0 is unstable to the BenjaminFeir modulational instability corresponding to long disturbances of wave number, ∆k, of the wave envelope satisfying the relation

√ ∆k < 2 2k0 A0 . k0


The highest instability will occur at ∆k/k0 = 2k0 A0 which gives a maximum growth a rate of ω0 (k0 A0 )2 /2 [48]. There has been a great deal of research on the nonlinear stage of the Benjamin Feir instability analytically, numerically, and experimentally (see Grue and Trulsen (2006) and Kharif (2003)). Through this research it is apparent that wave groups can appear or disappear on the order of 1/[ω0 (k0 A0 )2 ] on the timescale. This behavior can be explained by breather solutions of the non-linear Schrödinger equation. Figure 8.4 shows the formation of waves that have very high energies and are created by the Benjamin Feir instability. There are many different breather solutions that can create various large waves. What is referred to as the algebraic solution has a maximal wave height of 3 times the waves around it and some solutions such as the Ma-breather and the Akhmediev breather are periodic in time and space.

8.5 conclusion


Figure 8.4: The evolution of weakly modulated wave train (numbers - time normalized by the fundamental wave period). Highly energetic wave group formed due to the Benjamin Feir instability [48]

In short, these breather solutions are considered as simple analytical models of freak waves. To get more qualitative with the analysis these models say that there are two types of waves in the deep sea. There are waves that move around comprised of sines and cosines that can become focused to create the special rogue waves, and then there are these non-linear beasts that can really come out of nowhere. These non-linear waves come from an almost rogue sea where as you watch the sea state it is random yet tame and controlled. Then one of these waves comes up by sucking energy from the waves around it in a very bizarre non-linear fashion. This non-linear fashion also tends to help describe the verticality of the waves described by the mariners. The waves become so steep that they actually can tend to break and have been likened to the white cliffs of Dover [4]. These breaking waves can lead to as much as 100 tons per square meter, much greater than the 15t/m2 that the shipping industry has its ships designed for. It would be practically impossible to design ships to withstand these monsters of the deep, and because of that they will remain untamed monsters. 8.5 conclusion

One area that has received the least amount of research to date comes from the focusing potential of wind. Experimental research in this area has shown that with out wind waves will focus at a fixed are and create a rather high amplitude wave but with strong wind the high amplitude wave will be further downstream and with a higher amplitude. Experimental results has also showed that rogue waves can occur under the conditions of a strong wind when they typically would not if the wind was not present. This line of research has lead most to believe that strong winds can increase the rate of occurrence of rogue waves. The effect the wind has is to weaken the defocussing process which leads to a greater chance for the rogue wave. As a result of this finding it makes sense that rogue waves are more likely to occur when storms with vicious winds are present.


sam bingham: freak waves



1. Under the simplified linear model what is the probability of finding a 30-meter wave in a storm off the coast of Iceland in which the average of the highest third of the waves is 11 meters?
Solution: In this problem the height of the wave and the significant wave height are given so the solution is rather simple and only requires the use Eq. 8.4. Since H is 30 and HS is 12 we simply plug those into the equation and get
2 2 2 2 P( H ) = e−2H /Hs ⇒ P(30) = e−2(30) /12 = 3.73 × 10−6 .

2. Which of the following is not a possible instigating factor for a tsunami? a Meteorites landing in the ocean b Underwater earthquakes c Constructive interference of waves caused by wave trains in the deep ocean d Underwater volcanoes 3. Which of the following are possible causes for rogue waves? a Focusing of wave energy by ocean basins b Focusing of wave energy by variational currents that run counter to the swell c Sudden changes in the depth of the ocean caused by a steep continental slope d Non-linear effects that lead to a giant wave from a random background of smaller waves Answer: a, b, c.

PA U L H U G H E S : F R I C T I O N




riction represents one of the most basic and yet impenetrable phenomena within the purview of classical physics. While its role in an idealized system can appear deceptively simple, its mechanisms and intrusions into realistic systems are of a complexity well beyond the scope of any undergraduate course, much less a single textbook chapter.



amontons/coulomb friction

The quintessential description of friction is that of Amontons at the end of the 17th century, according to which friction is a force linearly proportional to the normal force and independent of the macroscopic surface area of contact–nearly a century later, Coulomb (of electrostatic fame) appended this to include that the proportionality of friction to the normal force is independent of the relative velocity of the surfaces in question. This coefficient of friction µ in the Coulomb/Amontons formulation does vary in relation to the material compositions of the interfacing surfaces, however. In fact, every set of material interfaces has its own coefficient of friction. This is where the matter begins to grow more complicated: applying this simple Amontons-Coulomb description of friction by itself would require a standardized table of friction coefficients for every possible combination of materials, as well as sub-tables to account for any lubricants or adhesives applied to the surface of interface. Velocity and temperature also play complicated roles in determining the actual friction force between two real surfaces. example 9.1: simple friction

F n w v


Figure 9.1: The basic Amontons-Coulomb model of friction, Ff = µ F.



paul hughes: friction

m n w θ
A block of mass m sits on a conveyor belt inclined at an angle θ from the horizontal. The conveyor belt surface moves at a velocity v such that the block remains stationary with respect to the surroundings. The coefficient of kinetic friction between the block and the belt surface µk = 0.3. Using only simple Amontons/Coulomb friction, we can find θ (v). First, we know that the normal force n = −w · cos(θ ): clearly, the magnitude of n should be equal to that of w when θ = 0 and 0 when θ = π/2, and will be in the positive direction rather than the negative like w = m g. From the block’s perspective, it appears to be sliding with velocity vb l down along the slope of the conveyor belt due to the effective force ge f f = m g sin(θ ), where vb l = −v. It is opposite this velocity that we see a friction force arise which, for the equilibrium condition stated above, Ff = ge f f = m g sin(θ ) = µn (9.1)


m g sin(θ ) = µm g cos(θ )


tan(θ ) = µ


So we see that θ is in fact independent of velocity v, at least in the Coulomb representation of kinetic friction. The critical angle θc = arctan(µ).

This direct relationship between incline slope and coefficient of static friction is one method of measuring µ values for different materials. Another is the arrangement of a mass m1 on a string connected via pulley to a mass m2 on a flat plane: m2 , attached to the upper test surface, with the lower attached to the level plane, provides the normal force, and m1 provides the motive force which the friction force will resist [20]. The drawback, of course, is that all µ values must be acquired experimentally, and apply only to the specific materials tested. 9.3 toward a conceptual mesoscopic model

In actuality, friction is the result of a number of independent microscopic surface phenomena appearing in tangential interactions. These phenomena appear on the properties of the surface materials as well as the substrate materials, the geometries and small-scale topologies of the interaction surfaces, viscosity and other properties of any interstitial substance such as lubrication, relative velocity of the interaction surfaces, and other influences. Even these individual sources of friction may each represent a range of models, and any one model may address any one or more of the above contributory phenomena [29]. Amontons’s familiar Ff = µFn law evolves on the microscopic scale from the assumption that the microscopic contact area A = αFn increases as the product of the normal force—i.e., with the force de-

9.3 toward a conceptual mesoscopic model


a. b. c.
Figure 9.2: a. Interface surfaces near contact showing asperities. b. Interface surfaces under normal force Fn , showing some deformation of asperities. c. Interface surface with lubrication, showing fluid space-filling and “coasting” ability.

forming the surfaces to reduce areas of separation—and the plastic deformability α characteristic of the weakest major constituent of the surfaces. At the same time, until the characteristic shear strength τ is exceeded, the deformation remains elastic, so that Ff = ταFn . Thus we see that µ = τα. Taking into account the direction of motion, the result is the familiar friction law, Ff = −µn · sgn(v), (9.4)

where sgn(v) = |v|/v. This model assumes that what we call “roughness” on the macroscopic scale appears, microscopically, as “asperities” or projections from the mean plane of each surface [29]. One way in which this model can break down is in the case of microscopically smooth, geometrically complementary surfaces: the microscopic contact area reaches its maximum with very little surface deformation, meaning that an increased normal force will no longer increase shearing stress or friction. In such a case, various atomic forces (such as covalent bonds, electrostatic attraction, and “Van der Waals” forces between dipoles) are also particularly likely to play a role: as a noteworthy example, the energy required to separate the interface surfaces is equal to the free energy associated with their bonding, minus the elastic potential energy made available as elastic deformation is relaxed [28]. Obviously, then, a limit on elastic deformation implies greater influence of adhesion forces as surfaces become smoother, and when adhesive forces become involved, the friction quickly becomes less Coulombic–that is, the relationship between friction force and normal force becomes less linear–as a dependence on the area of interface is introduced. In the same way, lubricants serve to fill gaps and minimize the deformation necessary for asperities to bypass one another by functioning somewhat like ball bearings: asperities are permitted to “coast” past one another, rather than striking, pressing, and producing a deformation on the scale of their geometric overlap—the friction gains, instead, a dependence on the lubricant’s viscosity. Unsurprisingly, then based on this conceptual model, the highest coefficients of friction are seen


paul hughes: friction

in materials which are either pourous and highly elastic (e.g., rubber), or which otherwise exhibit a highly elastic ‘clinging’ action (e.g., velcro). Interfaces such as a highly porous, elastic rubber on hard, rough concrete can yield coefficients of friction well above µ = 1; others involving such materials as teflon or ice (especially near the melting point, where the contact surface can melt and self-lubricate) can drop below µ = 0.04. 9.4 summary

Friction is one of the most familiar, everyday forces, experienced whenever we rub our hands together to warm them or wrestle with a jammed door, but the mechanisms from which it evolves are very complicated and remain fertile ground for the development of new descriptions. With today’s faster and more powerful computer modeling, highly sophisticated models of friction can be tested more completely than ever before, and the advancement of nanotechnology may play a significant role in the experimental methods employed to verify new models’ predictions. The study of friction encompasses not only physics, but also mechanical engineering, chemical engineering, and even consumer products: for example, many cleaning products are specially engineered so that their residues have frictional properties we associate with a “clean” feeling. 9.5 problems

1. A model elevator of mass m = 1.5kg is allowed to slide freely down a shaft, starting with v0 = 0 at height h0 = 4m. At h1 = 3m a brake is activated, consisting of a brake pad with coefficient of friction µ = 0.42 on a spring with spring constant k = 1000N/m. The spring, whose natural length n = 5cm is compressed to = 1cm. Where will the model elevator stop? If it strikes the bottom of the shaft, what will its velocity be? Fs p = −∆ · k = −( − n ) · k = (0.04m)(1000N/m) = 40N v2 = v2 + 2g∆h = −2(9.8m/s2 )(1m) 0√ 1 v1 = − 19.6m/s = −4.43m/s |n| = Fsp ergo Ff = µn = µ · 40N = 16.8N ˆ Ftotal = Ff + w = (16.8 + (1.5kg)(−9.8m/s2 ))zN 16.8 ˆ ˆ atotal = F/m = 1.5kg − (9.8m/s2 ))z = 1.4zm/s2 v2 = v1 + atotal t Since we are looking for the time at which v2 = 0, t2 = − a v1 = 4.43 = 3.16s 1.4
1 h2 = h1 + v1 t2 + 2 atotal t2 2 1 h2 = 3m + (−4.43m/s)(3.16s) + 2 (1.4m/s2 )(3.16s)2 h2 = 0.019m

2. A block of mass m is dropped onto an inclined conveyor belt, with which it has a coefficient of kinetic friction µ. The conveyor belt’s upper surface is running at a rate v from the lower end toward the higher end, a total distance of , and its angle of inclination from the horizontal is θ. a. Assuming that θ is below θc , how long will the box slide on the

9.5 problems


conveyor belt? We define the initial velocity vblock of the block relative to the conveyor belt as vblock (t = 0) = −v. Thus Ff = µn = µmg cos(θ ), opposed by mg sin(θ ) Ftotal = mg[µ cos(θ ) − sin(θ )] so a = g[µ cos(θ ) − sin(θ )] The box may accelerate to match the conveyor belt’s velocity, in which case: v1 = v0 + at, where v1 = v and v0 = 0, so tstop = g[µ cos(θv sin(θ )] )− Conversely, the box may still be sliding relative to the conveyor belt when it reaches the end: s1 = s0 + v0 t + 1 at2 where s0 = v0 = 0 and s0 = , so tedge = 2
2 g[µ cos(θ )−sin(θ )]

Whichever happens first is the time at which the block stops sliding and reaches its peak velocity.
π b. If m = 1kg, µ = 0.75, v = 20m/s, θ = 12 , and = 2m, at what horizontal distance d from the edge of the conveyor belt will the the block land? 4 First, tedge = 9.8[0.75 cos(π/12)−sin(π/12)] = 0.936s, 20 and tstop = 9.8[0.75 cos(π/12)−sin(π/12)] = 4.38s. Therefore, we see that the block leaves the conveyor belt at t = 0.936s with velocity vedge = at = g[µ cos(θ ) − sin(θ )]t = 4.27m/s and at height hedge = sin(θ ) = 0.518m From here, the problem is simply one of projectile motion: h gnd = 0 = hedge + vedge sin(θ )t − 1 gt2 so tland = 0.231s 2 d = vedge cos(θ )tland = 0.953m.

3. A very soft, microscopically smooth surface is in contact with a hard, smooth surface. The foremost contributor to the friction force is likely to be: a. A very strong normal force. b. Lack of lubrication. c. Elastic deformation. d. Interatomic forces. e. Particulate contamination between the surfaces. d: Since the surfaces are smooth, increasing normal force will play relatively little role in increasing the friction force. There is little asperity interaction to be mitigated by lubrication or to produce significant elastic deformations. Because the first surface is soft, contamination would tend to deform it and press into the deformed recess, reducing the contaminating particles’ influence.





rguably one of the most peculiar aspects of quantum mechanics, that most undergraduates are familiar with, is that of the principle of superposition . That is, because a quantum system is treated in ways very similar to wave-like phenomena, it is possible to take two quantum states and add them together in a linear combination, generating another valid quantum state. But as it turns out, this “weird” property of quantum mechanics can be extended to some even more counter-intuitive realms. It just so happens that a fundamental particle in and of itself can exist as a linear combination of two other particles. This amazing property of fundamental particles can lead to some very interesting phenomena, such as the one we will discuss in this chapter - that is, neutrino oscillations. But before we get into just exactly what these neutrino oscillations are, let’s start with a quick review of quantum theory, and also touch upon what it is we currently know about the fundamental particles of nature.



a review of quantum theory

As we all have been taught in our introductory quantum mechanics classes, the theory of quantum mechanics deals with the idea of a “state,” which a particle (or system of particles) may happen to find itself in. Every observable property of that particle is represented by an “operator,” or a transformation that acts on the state of the particle. We learn that the eigenfunctions (or eigenstates, as they are called) of this operator represent a set of possible states the particle can attain, and the corresponding eigenvalues represent the values of the observable property associated with that state. If a particle happens to find itself in a particular eigenstate of an observable’s operator, every time that observable is measured, it will return the corresponding eigenvalue associated with that state [37]. For example, we can have the “spin” operator, Sz = h/2 ¯ 0 0 , (10.1)

−h/2 ¯

which is the operator corresponding to the z-component of intrinsic angular momentum for a particle with a spin of 1/2, in matrix form. Because our operator is represented by a two by two matrix, this observable can only acquire two eigenvalues, which correspond to the spin pointing along or opposite to the z-axis. Actually, different components of angular momentum never commute in quantum theory, which means their values can never be simultaneously known. Thus, if the spin were to be measured to be completely along the z direction, this would imply that we know the x and y components to be zero, which is not possible. So, in reality, these two states represent the projection



keith fratus: neutrino oscillations in the standard model

of spin along the z axis being either along the positive or negative z axis [37]. We can represent the two eigenstates as vectors in a complex-valued Hilbert Space: 1 0 ; 0 1 . (10.2)

The theory makes a restriction on the form of the operators that can represent observables, which is that they must be hermitian. Because of this fact, all possible eigenvalues must be real, and the eigenstates of the operator must form a basis of all possible states that the particle can attain. In our example of spin, this means that the particle’s zcomponent of intrinsic angular momentum can in general be a linear combination of the two basis states, or, a b

= a

1 0

+ b

0 1



where a and b are some given complex numbers. But the question now arises, what value will we get when we measure the z-component of spin, for the particle in this generalized state? The answer is that we could get either of the two eigenvalues, with some given probability. The probability of returning each eigenvalue is given by the square of the norm of the coefficient on the associated eigenstate. This, in essence, is the principle of superposition in quantum mechanics [37]. Because there are often many physical properties that a particle can possess, there will generally be many different hermitian operators that can act on the state of a particle. Each of these operators will thus define a new basis in which we can define the state of our particle, corresponding to the eigenstates of this operator. 10.3 the standard model

The world as we know it today, at least from the perspective of physics, can generally be summed up in a theory dubbed the “Standard Model.” This theory asserts that all of physical phenomena can be traced to a few fundamental particles, interacting through a few fundamental interactions. This is often described pictorially, as in Figure 10.1. There are two types of fundamental particles, fermions, which have half-integer spin, and bosons, which have integer spin. The fermions are generally what we would consider particles that make up matter, and the bosons are the particles that transmit the fundamental “forces” that act on the fermions. The fermions tend to be arranged in three different “families,” or “generations,” with the particles in one family being identical to the particles in the next family, except for mass (in the figure, the families are delineated by the first three columns). For example, in the figure, an “up quark” is a fermion, and it interacts with other quarks partly through interaction with what is called the “strong force.” The strong force is mediated by the gluon, which is a boson, and we can think of it sort of like the two quarks interacting with each other by passing gluons back and forth. Quarks can combine together to make composite particles, such as protons (two up quarks and a down quark) or neutrons (two down quarks and an up quark) [88].

10.3 the standard model


Figure 10.1: The Standard Model of elementary particle physics. Image courtesy Fermilab.

The fermions come in two types, the quarks, which are in the first two rows, and the leptons, which are in the last two rows, with one of the biggest differences among them being that the leptons do not interact with the strong force. In reality, matter particles can form composites which have integer spin and are thus bosons, and the particular bosons represented in the standard model are specifically referred to as gauge bosons. But in terms of fundamental particles, we can safely make the distinction between fermions and bosons without too much consequence. This general analogy of passing bosons back and forth to mediate a force can be extended to the rest of the standard model. For each interaction in the standard model, there is a set of mathematical rules describing how it acts on certain particles, and one or more bosons that mediate it. In addition to the strong force, there is the familiar electromagnetic force, mediated by the photon, and also the weak force, something we will discuss in more detail later. One of the more peculiar aspects of the standard model is that the strong force actually grows stronger with distance, as if the quarks were attached by springs. The result of this is that a free quark can never be observed, and that the existence of quarks must be inferred by indirect means. There are some unresolved issues in the standard model, some of which hint at physics beyond the standard model, but aside from this, the standard model, for the most part, still represents our current understanding of fundamental physics [88]. The origin of these particles is described by something called “Quantum Field Theory.” Quantum Field Theory (or QFT) generally states that each of the fundamental particles is sort of a “bump” in something called a quantum field, that can move through space. For example, there is an electron field that extends throughout space, and actual electrons are like ripples moving across the “surface” of that field (of course, this visual analogy can only be pushed so far). Speaking some-


keith fratus: neutrino oscillations in the standard model

what more mathematically, the fields that exist throughout space have certain normal modes of oscillation, and when energy is input into these fields, the resulting excitations of the field are what we experience as fundamental particles [14, 78]. Furthermore, QFT associates with each of these fields something called a “creation operator” (there are also annihilation operators, but never mind that for now). Many students have encountered these sorts of operators without ever having been aware of it. A common problem in introductory quantum mechanics is to solve for the possible energy eigenstates of the simple harmonic oscillator. One method for solving this problem is to use lowering and raising “ladder” operators, which create lower and higher energy states from an existing one. When we take an eigenstate of the simple harmonic oscillator, and apply the lowering operator to it, we attain the eigenstate with one less unit of energy. What we are actually modeling when we do this is the creation of a photon that is emitted from the system, as it carries away the difference of the energies of the two states. In QFT, these operators “create” all of the possible particles. A given operator acts on the vacuum of space, and “creates” an instance of the particle associated with that operator. For example, the operator associated with the electron field “creates” instances of electrons. When a photon turns into an electron-positron pair, the creation operators for the electron and positron model this process (along with the annihilation operator for the photon) [37, 14]. 10.4 the weak and higgs mechanisms

While all of this may seem like useless information, seeing as how we do not intend to pursue all of the associated mathematical details of Quantum Field Theory, there is an important caveat to all of this. As we mentioned before, there are often many different operators associated with a particle. This is also true in QFT. Looking at Figure 10.1, we see a type of boson called a W particle, which is responsible for mediating an interaction called the weak force. In essence, the weak force is responsible for transforming one type of particle into another. The W boson actually represents two different particles, the W − and W + particles, which are named according to their electric charge. Because of this, a particle must lose or gain a unit of charge when emitting a W particle (remember that forces are generally mediated by the exchange of bosons). Since total electric charge must always be conserved, and because every particle has an intrinsic value of electric charge, this means that one type of particle must transform into another type of particle when emitting a W boson. There is actually another type of particle associated with the weak force, the Z0 boson, but it is involved in a slightly different interaction. It is responsible for the scattering of one particle off of another, similar to how charged particles can scatter off of one another via the electromagnetic interaction. This resemblance to electromagnetic phenomena is actually due to a very deep connection between the electromagnetic and weak forces. Electroweak theory states that above a certain energy threshold, electromagnetic and weak interactions become indistinguishable, and the two interactions are “unified.” Of course, this subject could easily form the basis of an entirely separate chapter, but for now we can safely concentrate on just

10.4 the weak and higgs mechanisms


the W bosons (though, not to discourage the reader from investigating this subject on his or her own!) [76]. As we can see in Figure 10.1, the fermions appear to come in pairs. Each particle in a pair is capable of turning into the other particle in the pair via the weak force. As a matter of fact, the weak interaction actually predicted the existence of the charm, top, and bottom quarks, partly due to this pair behavior. For example, it is possible for a down quark to turn into an up quark, and for a muon to turn into a muon neutrino. Of course, there is a (somewhat complex) mathematical formalism that describes how all of this occurs, which involves a set of quantum mechanical operators. So in some sense, remembering our previous discussion, it is possible to create a “basis” in which we describe the action of the weak force, represented by the eigenstates of these quantum mechanical operators. These eigenstates are often referred to as the “weak states” of the given fundamental particles. In short, we can describe the behavior of the fundamental particles of nature by discussing them in the context of how the weak force acts on them [76]. As a matter of fact, the weak force represents a sort of tarnished beauty in the standard model. It turns out that the weak force is capable of predicting the existence of the fundamental fermions, but, by itself, it predicts that they should all be massless. Of course, we know that particles in the real world have mass, so something else must be responsible for giving mass to the fundamental particles of nature. This mechanism is referred to as the Higgs mechanism. It creates “Higgs bosons” out of the vacuum, and these particles interact with the other particles in the standard model, creating a “drag” on them which we observe as inertia. Of course, it also has its own associated mathematical operators, which constitute another “basis” in which to describe the different particles of nature. The Higgs boson is not included in the standard model currently, because, at the time of publication, it has still not been definitively detected. However, so much of physical theory depends on its existence that is essentially assumed to exist [76, 88]. But here’s the crucial point to all of this: it turns out that the manner in which the two mechanisms describe the fundamental particles of nature are actually different. In other words, when we approach the theory from the standpoint of the weak force, and describe the interactions of the particles from this perspective, we find that the interactions behave differently than if we were talking about them from the standpoint of the Higgs mechanism! For example, an up quark defined by the weak force is not the same particle as an up quark defined by the Higgs mechanism. So we in essence have two different descriptions of the natural world. How are we to reconcile this? [76] There is actually a simple solution to this dilemma. Quantum mechanics postulates that it is indeed possible to describe a physical system from the vantage points of two different mechanisms (or operators, or whatever you care to call your mathematical devices), and that these two representations can be related via a change of basis, which is very straightforward from a mathematical perspective, using the language of linear algebra. To talk about how we would do this, let’s consider the simplified case where we only consider the first two families of fermions (which, as it turns out, is actually a fairly reasonable simplification). For now, let’s look at the first two pairs of quarks. Because the phenomenon we are dealing with is related to the weak force, which acts on particles in pairs, we need only consider one particle from each


keith fratus: neutrino oscillations in the standard model

pair of quarks, since the weak interaction will take care of how the second particle from each pair is affected by the phenomenon. Let’s work with the down and strange quarks, and call them d’ and s’ when referring to their representation by the weak force, and d and s when discussing them in terms of the Higgs mechanism. Quantum theory says that these two representations should be related to each other by a sort of “change of basis” matrix in some given Hilbert space in which we describe the state of our physical system, which will look something like d s


cos θ sin θ

− sin θ cos θ

d s



The angle θ is a measure of how much of a “rotation” there is between the two bases. If we expand this relation, we get d = cos θ d − sin θ s (10.5a)

s = sin θ d + cos θ s .


But look at what we are saying: it is possible for one particle to actually be represented as a linear combination of other particles. This may seem incredibly counter-intuitive, and perhaps non-sensical or even impossible, but the truth of the matter is that in a mathematical sense, this explains a whole host of phenomena that would appear as inexplicable anomalies otherwise. One example that illustrates this quite simply has to do with the rate of neutron and lambda beta decay (see the first homework problem) [76]. Actually, there are other scenarios in which we can speak of linear combinations of particles. It turns out that if we have two particles whose interactions are identical, it should be possible to interchange them, and the physics of the situation should be the same overall. As a matter of fact, we can even take a generalized linear combination of these particles, and the physics will be the same. Suppose we have two particles in a physical system, called m and n. We can represent this as a vector: m n . (10.6)

It is then actually possible to apply a “rotation” to this set of particles, just as for the weak and mass states of the down and strange quarks. If we have a unitary matrix M that is two by two and has a determinant equal to one, then applying this matrix to the above vector actually represents a new set of particles, ones which do not change the overall physics of the system (matrices of this form are members of a group of matrices called SU(2), or the special unitary group of degree two, which has tremendous implications for high-energy physics). If you have objections to the fact that two particles can be added to each other like mathematical objects, welcome to the world of quantum mechanics. While it may seem questionable to speak of fundamental particles in this way, it fits quite naturally into the quantum mechanical description of the universe, in which a given physical system can lend itself to a variety of outcomes upon measurement, each outcome being weighted with a certain probability. Only now, we are talking about the very type of particle that we will detect upon measurement [14].

10.5 the origin of neutrino oscillations



the origin of neutrino oscillations

But back to the issue of the weak and Higgs mechanisms. It is often customary, as mentioned previously, to call d and s “mass states,” and d’ and s’ “weak states.” Each of them can be represented as linear combinations of the others, and are related by the equation given previously. This analogy actually extends to other particles. In particular, we will consider the neutrinos, which are in the bottom row of Figure 10.1. Recent evidence suggests that they behave in a manner similar to quarks, in that their mass states and weak states are different (for a while it was believed that neutrinos were massless, and hence were not subject to this behavior, but this has been shown to not be true). Neutrinos are fermions, and there are three known types of them. They each come paired with another fermion, one that is either the electron (in the case of the electron neutrino), or one that is similar to the electron (either the muon or tau particle, being identical to the electron, except for mass) [14]. So what is the significance of this property of neutrinos? It actually just so happens that this property of neutrinos is responsible for something called neutrino oscillations. That is, it is possible for one type of neutrino to turn into another type of neutrino as it travels through space. This can also occur for quarks, but because of the fact that a free quark can never be observed, it involves the composite particles made up of quarks, so the details are somewhat simpler for the case of neutrinos. The case of neutrinos is also particularly relevant, not only because this phenomenon occurs to a greater extent among neutrinos than it does for quarks, but because the existence of neutrino oscillations implies that they interact with the Higgs mechanism, and thus indeed have mass, something that was not believed to be true for several decades. Neutrino oscillations are also significant in the sense that they are of great experimental interest. Our current models that describe the nuclear activity in the sun predict that we should be detecting a greater number of electron neutrinos being emitted from it than we actually are. It is believed that the explanation for this discrepancy is that of neutrino oscillations; that is, electron neutrinos emitted from the sun have the ability to change into other types as they travel between the sun and earth. There are of course many other reasons why the phenomenon is of great interest, but suffice it to say, it is a behavior that beautifully demonstrates some of the basic properties of quantum mechanics, without too much mathematical complexity [14]. Having discussed the significance of these “oscillations,” let’s see if we can get a basic mathematical derivation of them. For the sake of simplicity, we’ll consider the case of neutrino oscillations among the first two families. That is, oscillation among the electron and muon neutrinos. Once again, this is a somewhat reasonable approximation to the more complex case of three-family mixing. Because each neutrino comes paired with an electron-type particle by the weak force, we can consider the mixing matrix with respect to just one particle in each of the pairs. In other words, we can write the mixing equation with respect to just the neutrinos. If we denote the electron and muon neutrinos with respect to the Higgs mechanism as e and µ, respectively, and denote


keith fratus: neutrino oscillations in the standard model

them as e’ and µ’ with respect to the weak mechanism, the general relation between the two representations will be given by: e µ


cos θ

sin θ cos θ

e µ

− sin θ



where θ is the angle that describes the “rotation” among the representations (note that we are now referring to the weak states as the result of the mass states being rotated through some angle, instead of the other way around; this is possible because rotations are always invertible). Let’s consider a muon neutrino that is created through the weak force. Because it is created through the weak force, and the weak force operates with respect to weak states, the neutrino can definitely be said to be a muon neutrino with respect to the weak mechanism. In other words, it is in a pure weak state. This means that it can be given by a linear combination of mass states, or,

| µ = − sin θ | e + cos θ | µ ,


in traditional ket notation. We now want to determine how this state evolves with time. A particle that is moving freely through empty space of course has no time-dependent influences acting on it, so we can treat this as a typical time-independent problem in quantum mechanics. If our weak state is treated as a superposition of mass states, then to describe the timedependence of our neutrino, we simply add the time-dependent factor to each mass state. So we can write
h h | ν(t) = − sin θ | e e−iEe t/¯ + cos θ | µ e−iEµ t/¯ ,


where Ee and Eµ are the energies of the electron and muon neutrino mass states, respectively. We use the letter ν now, because as our state evolves over time, it will not necessarily be a muon neutrino, or even an electron neutrino; in general it will be some linear combination of the two. What we are interested in, of course, is the probability that we will detect the original neutrino as being an electron neutrino some time after it is created. To do this, we follow the standard prescription of quantum mechanics, which says that the probability of measuring an electron neutrino is the square of the norm of the projection of our state at a given time along the basis vector representing the electron neutrino. Now, when me detect a neutrino, we do so by studying the way it interacts with matter via the weak force. So any neutrino we detect will be in a definite weak state, since the weak force always acts on weak states. This means we need to calculate the projection of our state onto the weak state of the electron neutrino. Symbolically, we can write this as P ( µ → e ) = | e | ν ( t ) |2 . (10.10)

If we take this inner product in the basis of the mass states, then mathematically, our expression for the inner product becomes e | ν(t) = ( cos θ , sin θ )
h − sin θ e−iEe t/¯ h cos θ e−iEµ t/¯



10.5 the origin of neutrino oscillations


The computation of the dot product is straightforward, and leads to
h h e | ν(t) = − cos θ sin θ e−iEe t/¯ + sin θ cos θ e−iEµ t/¯ ,


which allows us to write
h h P( µ → e ) = ( cos θ sin θ )2 | − e−iEe t/¯ + e−iEµ t/¯ |2 .


The square of the norm of a complex number of course is that number multiplied by its complex conjugate, so, using this fact, along with a trigonometric identity for the sinusoidal factor, we have P( µ → e ) = ( 1 h h h h sin 2θ )2 ( −e−iEe t/¯ + e−iEµ t/¯ ) ( −eiEe t/¯ + eiEµ t/¯ ). 2 (10.14)

If we expand this expression, we come to P( µ → e ) = 1 h h sin2 2θ ( e0 + e0 − (ei(Ee − Eµ )t/¯ + e−i(Ee − Eµ )t/¯ )), 4 (10.15)

which of course we can simplify to P( µ → e ) = 1 sin2 2θ 4 2 − 2 cos ∆Et h ¯ , (10.16)

where ∆E is the difference between the two energies, and we have used the definition of the cosine function in terms of complex exponentials. If we factor out the two, and use the trigonometric identity 2 sin2 χ 2

= 1 − cos χ,


then this ultimately simplifies to P( µ → e ) = sin2 2θ sin2 ∆Et 2¯ h (10.18)

for the probability that we will measure an electron neutrino instead of the original muon neutrino. Amazingly, this real world problem with experimental relevance indeed lends itself to such a simple mathematical treatment. One of the most immediately obvious things about this expression is the factor determined by the rotation angle. Note that this factor can be any value between zero and one, determined entirely by the rotation angle. If the rotation angle happens to be equal to π/4, then the factor will be equal to one, and for any angle between zero and π/4, the factor will be less than one. If we consider the second sinusoidal term, the one that oscillates with time, we can see that the factor determined by the rotation angle becomes the amplitude of these oscillations in time. So the probability of measuring an electron will reach a maximum when it is equal to the factor determined by the rotation angle, since the maximum value of the oscillation term is of course one, being a sinusoidal function. So immediately we see that the amount of rotation, or “mixing” between the two families of neutrinos is what determines the maximum probability of measuring a switch in the type of neutrino. Anything less than 45 degrees of rotation implies that the muon neutrino will never have a one hundred percent chance


keith fratus: neutrino oscillations in the standard model

of being measured as an electron neutrino, implying that the state of the neutrino will never be exactly that of an electron neutrino (speaking in terms of weak states, of course). It is also immediately apparent that the oscillations depend on the energy difference of the two mass states. This is a feature common to most quantum mechanical systems, since the Hamiltonian, or total energy, is what determines the time evolution of a state. The best way to explore this phenomenon is with an example, similar to what might be encountered when working in an actual neutrino experiment. example 10.1: a relativistic neutrino experiment
Suppose we have a scenario in which muon neutrinos are created with a given energy at a known location, an energy large enough that they are highly relativistic. Experimentally, we would like to know the probability of measuring an electron neutrino at a given distance from the source of muon neutrinos. Let’s see how we could go about doing this. First, we’ll make an approximation when it comes to the energy of the neutrinos. We know from special relativity that the energy of an object can be given by E = ( p2 c2 + m2 c4 ) 2 ,


where p is the momentum of the object, c is the speed of light, and m is the mass of the object. If we factor out the first term in the square root, we can write E = pc m2 c2 1+ 2 p
1 2



Using a Taylor expansion, the approximation

(1 + x ) 2 ≈ 1 +


x 2


allows us to write the energy as being approximately E ≈ pc 1 + or, E ≈ pc + m2 c3 . 2p (10.23) m2 c2 2p2 , (10.22)

This approximation is reasonable, because our neutrinos will generally be moving relativistically, and this implies that the energy attributable to the mass of the particle is small compared to the kinetic energy of the particle, and the second term in equation 10.20 is small enough for the approximation to be valid. If we now consider the energy difference of the two mass states, we can use the above approximation to write ∆E = Ee − Eµ ≈ pc + which simplifies to ∆E ≈ or, ∆E ≈ c3 (∆m2 ), 2p (10.26) c3 ( m2 − m2 ), µ 2p e (10.25) m2 c3 m2 c3 µ e − pc − , 2p 2p (10.24)

10.6 implications of the existence of neutrino oscillations


0.9 Probability to Detect an Electron Neutrino 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 Distance (m) 10 15 x 10

Figure 10.2: The probability to measure an electron neutrino as a function of distance from the source. See the second homework problem for the parameters used. Image created in MATLAB.

where ∆m2 is the difference of the squares of the two different masses. We have assumed that the momentum of the two mass states is the same, which is a somewhat reasonable approximation. Because the neutrinos move relativistically, we can assume that the distance x traveled by a neutrino is related to the time after creation by t≈ x . c (10.27)

If we now use the approximations given in equations 10.26 and 10.27, our expression for the probability of measuring an electron neutrino is P( µ → e ) ≈ sin2 2θ sin2 c2 ∆m2 x 4p¯ h . (10.28)

Once again using a relativistic approximation, we can say that the overall energy of the neutrino, Eν , is roughly given by Eν ≈ pc. (10.29)

Also, because the masses of elementary particles are typically measured in units of eV/c2 , it is convenient to introduce a factor of c to the numerator and denominator of the argument to the time dependent sinusoidal term. So if we make these last two changes to our expression, our final result is P( µ → e ) ≈ sin2 2θ sin2 c4 ∆m2 x 4¯ cEν h . (10.30)

Figure 10.2 shows the probability to measure an electron neutrino as a function of the distance from the source, using the relation derived in the example. The parameters used are those given in the second homework problem. 10.6 implications of the existence of neutrino oscillations

There are several reasons why our final expression in example 10.1 is of great interest to experimentalists. First, notice that the oscillations explicitly depend on the difference of the squares of the two masses. If the difference were to be zero, then our expression would be identically


keith fratus: neutrino oscillations in the standard model

zero for all time, implying that there would be a zero probability to measure an electron neutrino. In other words, there would be no oscillation. Because neutrino oscillations have now been experimentally verified, we know that neutrinos must indeed have mass (since the masses cannot all be zero and different from each other at the same time), which has tremendous implications for several branches of physics, from high energy physics to cosmology. The fact that neutrinos have mass shows a similarity between quarks and leptons. Because quarks and (some) leptons were known to have mass, it was known that quarks and (some) leptons must interact with the Higgs mechanism in some way. Thus, it was surprising that only quarks exhibited oscillation, and not leptons. Now that leptons have been shown to exhibit oscillation, it shows a similarity between the two groups of fermions. This is crucial for “Grand Unified Theories,” theories which hypothesize that above some threshold energy, all of the interactions in the standard model unify into one fundamental interaction that acts amongst all of the fermions in the standard model on an equal footing. The fact that we see slight differences amongst the different fermions, but in general they have the same behavior, suggests a sort of “broken symmetry” that would presumably be restored above this energy threshold. This energy threshold (referred to as the Planck energy) however is believed to be around 1028 eV, which is much higher than the energies that can be reached in modern accelerators (the Large Hadron Collider at CERN will only be able to reach energies of 14 TeV) [76, 14]. The fact that neutrinos have mass is also of interest to cosmologists, since it could have an influence on the rate of expansion of the universe. Neutrinos are generally created quite frequently in nuclear processes, so a large number of them flood the universe. Because of this, the fact that they have mass has serious implications for the energy-mass density of the universe, which, according to the theory of general relativity, affects the manner in which the fabric of space-time deforms and evolves with time [76]. The first real evidence for neutrino oscillations was provided by the Sudbury Neutrino Observatory in Sudbury, Ontario, Canada, which detected neutrinos from the sun. While previous experiments had strongly suggested the existence of neutrino oscillations, the experiment in Canada was the first one to definitively identify the number of neutrinos of each type, and the experimentally determined numbers agreed with theoretical predictions. Previous experiments had only shown a lack of electron neutrinos being detected from the sun, and gave no information regarding the number of neutrinos of other types [14]. Amazingly, without resorting to any of the more advanced mathematical tools of Quantum Field Theory, it is possible to get a quantitative understanding of a quantum mechanical effect that has enormous implications for modern research in physics. Some of the homework problems investigate the subject with specific numerical examples, to give a better idea of some of the numerical values actually involved in neutrino oscillations. Of course, any reader who is interested is encouraged to research the subject further on his or her own, since Quantum Field Theory is a very complex, intellectually rewarding subject that should begin to be accessible to students towards the end of their undergraduate study.

10.7 problems




1. A proton, as mentioned in the text, is made up of two up quarks and a down quark, while a neutron is composed of two down quarks and an up quark. Neutron beta decay is the process in which a down quark inside of a neutron turns into an up quark by emitting a W − boson, thus turning the neutron into a proton. There is a certain characteristic rate at which the weak force acts, and we might at first expect the rate of neutron decay to be equal to this characteristic rate. However, the observed rate of neutron decay is slightly less than this. This can actually be explained by the fact that the weak states of quarks are not the same as the mass states of quarks. The quarks that form composites are mass states, so the quarks inside of the neutron and proton are linear combinations of weak states. If we simplify the mixing to the first two families, we can write d = cos θ d − sin θ s (10.31a)

s = sin θ d + cos θ s ,


where, as usual, d and s are the mass states, and d’ and s’ are the weak states. Explain why this might account for the the rate of neutron decay being less than expected. The mismatch between these two types of quarks also accounts for the existence of another type of decay. A lambda particle is one that is comprised of an up, down, and strange quark. One of the ways it can decay is to a proton, where the strange quark turns into an up quark. This is called lambda beta decay. One might expect this decay type to be prohibited, since the up and strange quark are in two different families. Explain how this decay might be possible, and how it is related to the previously mentioned issue of neutron beta decay [76].
Solution: The key to understanding this problem is realizing that, as mentioned before, the weak force only acts on weak states, and that the weak force is the mechanism that drives the two decay types. Each of the mass states in the quark composites is a linear combination of two weak states, so each has some probability of being measured as one of the two weak states. In the neutron, the down quark that we expect to decay into the up quark is actually a linear combination of the two weak states, and the probability that it will be a down quark weak state is proportional to the square of the coefficient in the linear combination that is applied to the down quark weak state. This coefficient is the cosine of the rotation angle, which implies that squaring this term will give us the probability that we will select the down quark weak state. Of course, this implies that there is some probability that with respect to the weak force, we will have a strange weak state, which is not allowed to decay into an up quark. So the probability that the neutron will not be able to decay into a proton is given by the square of the sine of the rotation angle. So the decreased rate is essentially explained by the fact that in some sense, the down quark mass state in the neutron is not always a down quark with respect to the weak mechanism, and so the neutron will not always decay to the proton. This also explains how a lambda particle can decay into a proton. The strange quark in the lambda particle likewise has some probability of


keith fratus: neutrino oscillations in the standard model

being a down quark with respect to the weak force, given by the square of the sine of the rotation angle. So there is a finite probability that the down quark weak state component of the strange quark mass state will interact with the weak force, allowing the lambda particle to decay into a proton. The rate of neutron beta decay will be proportional to the probability that the down quark mass state in the neutron is in the weak down quark state, which is given by the square of the cosine of the rotation angle. The rate of lambda beta decay will be proportional to the probability that the strange quark mass state in the lambda particle is in the weak down quark state, which is given by the square of the sine of the rotation angle. The sum of these two probabilities of course gives unity, so in some sense, the rate of decay of the lambda particle “makes up for” the missing rate of neutron beta decay. Experimentally, the rotation angle between the first two families is found to be about 0.22 radians, for a decreased rate of neutron decay of about four percent [76]. 2. For the final expression derived in example 10.1, find the spatial wavelength of oscillation. Solution: The wavelength of the square of the sine function will be the first value of the distance that returns the sinusoidal function to zero, since the square of the sine function oscillates back and forth between zero and one. The sine of any integer multiple of π will be equal to zero, so if λ is the wavelength of oscillation, and x is the distance from the source, then the argument that will be equal to π when the distance x from the source is equal to the wavelength is simply πx/λ. Equating this to the expression for the argument found in the example, we have c4 ∆m2 x πx = . λ 4¯ cEν h Rearranging this, we have λ= 4π¯ cEν h c4 ∆m2 (10.33) (10.32)

for the spatial wavelength of oscillation. Note that a larger energy implies a longer oscillation wavelength, since the neutrino will travel a longer distance in the amount of time it takes to oscillate from one type of neutrino to another. A larger difference in the masses will cause a smaller wavelength, since the rate of oscillation increases with larger mass difference. Note that the wavelength of oscillation is not affected by the rotation angle among the two families of neutrinos. 3. Using the result derived in the example, find the probability that the emitted neutrino will be measured as an electron neutrino one hundred kilometers from the source if it has an energy of one million electron volts. The experimentally determined value for the mixing among the first two families is 0.59 radians, and the value found for the difference between the squares of the masses is 8 · 10−5 eV 2 / c4 (see Figure 10.2) [14]. Solution: If we square the sine of twice the rotation angle, then the value we attain is sin2 2θ = sin2 (1.18) ≈ 0.8549. (10.34)

If we substitute the value of the constants into the expression for the argument to the second sinusoidal function, we have

(c4 ) · (8 · 10−5 eV ) · (105 m) c4 ∆m2 x c4 = , 4¯ cEν h (4) · (197.327 · 10−9 eV · m) · (106 eV )



10.8 multiple choice test problems


where we look up the value of hc. All of the other constants cancel, and ¯ the value of the argument becomes roughly 10.1355. Thus, the probability is approximately P( µ → e ) ≈ sin2 (1.18) sin2 (10.1355) ≈ 0.364. (10.36)

Note that because of the value of the first sinusoidal term, the maximum probability to measure an electron neutrino at any given time is roughly eighty-five percent.


multiple choice test problems

1. Which of the following categories would a neutrino fall under? a) b) c) d) Gauge bosons Quarks Leptons Massless particles

2. Which of the following particles have half-integer spin? a) b) c) d) Only gauge bosons Only quarks All bosons All fermions

3. What is the quark content of a neutron? a) b) c) d) u, u, d u, d, d u, d, s u, u, s


A N D Y O ’ D O N N E L L : FA S T F O U R I E R T R A N S F O R M




ourier Transforms can be found all throughout science and engineering. In Physics they are often used in Quantum Mechanics to convert the wave function in position space into the wavefunction in momentum space. In Electrical Engineering it is used to convert a signal from the time domain into the frequency domain. The tool of the Fourier Transform is essential for analyzing the properties of sampled signals. However, many times scientists and engineers are not dealing with known analytic functions but are rather looking at discrete data points and therefore they must use the discrete form of the Fourier Transform . The Discrete Fourier Transform(DFT) has one major drawback, it is computationally slow. It requires N 2 computations, where N is the number of samples you have taken and often the frequency of sampling is much higher to lower error. Therefore we will derive and explore the radix-2 Fast Fourier Transform(FFT) for the reader. We will also show that the number of operations we have to perform is N log2 ( N ). This chapter is divided into three sections. 2 The first section will give a quick introduction into the continuous Fourier Transform and its inverse. The second section will introduce the Discrete Fourier Transform and talk about its inverse. The third section will talk about the Fast Fourier Transform and a derivation of the Radix-2 FFT algorithm will be given.



fourier transform

The Fourier Transform is given by X( f ) =


x (t)e− j2π f t dt.


√ Where x (t) is some function of time, j is the imaginary number −1, and f is the frequency. The Fourier Transform uses two mathematical properties to extract the frequency. In the above, we can think of x (t) as a superposition of sines and cosines. The second term in the above equation, e− j2π f t , can be expanded using Euler’s Formula
e jθ = cos(θ ) + jsin(θ ) (11.2)

Due to the orthogonality of the trig functions, it can be seen that only only at a certain values of 2π f will values of the integral survive. The equation for the inverse continuous Fourier Transform is given by x (t) =


X ( f )e j2π f t dt.


Given the information on the frequency of a signal, we can then find the time domain of that signal. [54]



andy o’donnell: fast fourier transform


discrete transform

On the computer we do not deal with continous fucntions but rather with discrete points. The discrete Fourier Transform and its inverse is
N −1

X (m) =

n =0

x (n)e− j2πnm/N .


x (n) =

1 N

N −1 n =0

X (m)e j2πnm/N .


Where m is the frequency domain index, n is the time domain index, and N is the number of input samples. According to Nyquist Criterion, if the sampling rate is greater than twice the highest frequency, then the discrete form is exactly the same as the continuous form. So with that condition in mind, we can easily make the transition from the continuous form of Fourier Transform to the discrete Fourier Transform. Here is an example of a discrete Fourier Transform. [54] example 11.1: simple sine function in matlab
Let us look at the Fourier Transform of the function sin(t). If the above statement is true about the orthogonality, then we expect that that only one frequency will survive if we apply for the Fourier Transform to sin(t). Here is some simple code for evaluating the sin(t) in MATLAB. Here we use the embedded function ’fft’ inside of MATLAB. We can effectively think of it as a very good aproximation to the continuous Fourier Transform.
%MATLAB Code to compute the Foruier Transform of sin(t) clear all max=360; for ii=1:max x(ii,1)=ii*(pi/180); y(ii,1)=sin(x(ii,1)); index(ii,1)=ii/360; end; Y=fft(y); figure(1) hold on; clf; title('Fourier Transform of sin(t)') xlabel('Frequency Hz') ylabel('Magnitude of X(m)'); plot(index,Y,'.-') figure(2) hold on; plot(index,y) title('sin(t)') xlabel('Time in seconds'); ylabel('x(n)')

As seen from the figures of the signal and the Fourier Transform of it, there is only one position on the frequency space where we find it. This is backed up by out expectations about it.

11.4 the fast fourier transform


1 0.8 0.6 0.4 0.2


0 −0.2 −0.4 −0.6 −0.8 −1 0 0.2 0.4 0.6 0.8 1

Time in seconds
Figure 11.1: One period of a sine function.


the fast fourier transform

As you can see above, for the Discrete Fourier Transform to be useful, there needs to be a faster way to computer it. For example, if you were to have a value of N = 2, 097, 152, using the Fast Fourier Transform algorithm it would take your computer about 10 seconds to do it while using the Discrete Fourier Transform described above it would take over three weeks. [3] This next section will get into the details and derivation of the radix-2 FFT algorithm. First, we start off with our original definition of the discrete Fourier Transform. [46]
N −1

X (m) =

n =0

x (n)e− j2πnm/N .


Fourier Transform of sin(t)
3.5 3

Magnitude of X(m)



















Frequency Hz

Figure 11.2: The FFT of a sine function. Notice how it only has discrete components .


andy o’donnell: fast fourier transform

For the next part of this, we split the sums into evens and odds and get
( N/2)−1 ( N/2)−1
n =0

X (m) =

n =0

x (2n)e− j2π (2n)m/N +

x (2n + 1)e− j2π (2n+1)m/N . (11.7)

In the second term, we can easily pull out the phase angle. Doing that we find
( N/2)−1

X (m) =

n =0

x (2n)e− j2π (2n)m/N + e− j2πm/N

( N/2)−1
n =0

x (2n + 1)e− j2π (2n)m/N . (11.8)

Next we need to define some notation to simplify the exponential terms. We shall now use the following notation.
n 2 nm WN = e− j2π/N , WN = e− j2πn/N , WN = e− j2π2/N , WN = e− j2πnm/N

(11.9) Using this notation, we can then replace the above equation with
( N/2)−1 ( N/2)−1
2nm m x (2n)WN + WN n =0

X (m) =

n =0

2nm x (2n + 1)WN . (11.10)

e− j2π/( N/2)

2 Then, through algebraic manipulation we know that WN = e− j2π2/N = = WN/2 . This allows us to change the equations to

( N/2)−1

X (m) =

n =0

( N/2)−1
nm m x (2n)WN/2 + WN n =0

nm x (2n + 1)WN/2 . (11.11)

Next let’s consider the X (m + N/2) case and we find that
( N/2)−1 ( N/2)−1
n =0

X (m + N/2) =

n =0

x (2n)WN/2

n(m+ N/2)

m + WN + N/2

x (2n + 1)( N/2) − 1WN/2 (11.12)

n(m+ N/2)


Next we can use the following expression that
n(m+ N/2) nN/2 nm nm nm nm = WN/2 Wn/2 = WN/2 (e− j2πn2N/2N ) = WN/2 (1) = WN/2 . (11.13)


We then call the express in front of the summation the twiddle factor and we can simplify the above as
m m N/2 m m m WN + N/2 = WN WN = WN (e− j2πN/2N ) = WN (−1) = −WN . (11.14)

We can then plug this in to find that
( N/2)−1 ( N/2)−1
n =0

X (m + N/2) =

n =0

m x (2n)WN/2 − WN


nm x (2n + 1)WN/2 .

11.5 multiple choice


(11.15) The only different between the X (m + N/2) and X (m) equations is the minus sign on the second summation. This means we only use the first N/2 terms in the DFT and then use those to find the final N/2 terms. This is the Fast Fourier Transform because it greatly simplifies the computational work that has to be done to solve it. [54] 11.5 multiple choice

Question: The number of multiplication steps in the FFT is A) N 2 B) N C) N 3 D) N log2 ( N ) 2 Answer: D. See Above text. 11.6 homework problems

1) If N = 4, what are the first four terms of the summation of the Discrete Fourier Transform? Answer: Let N=4, compute the 4 terms of the Discrete Fourier Transform output.
X(0)=x(0)cos(2 \pi 0*0/4) -jx(0)sin(2\pi 0*0/4) + x(1)cos(2 \pi 1*0/4) -jx(1)sin(2\pi 1*0/4) + x(2)cos(2 \pi 2*0/4) -jx(2)sin(2\pi 2*0/4) + x(3)cos(2 \pi 3*0/4) -jx(3)sin(2\pi 3*0/4) X(1)=x(0)cos(2 \pi 0*1/4) -jx(0)sin(2\pi 0*1/4) + x(1)cos(2 \pi 1*1/4) -jx(1)sin(2\pi 1*1/4) + x(2)cos(2 \pi 2*1/4) -jx(2)sin(2\pi 2*1/4) + x(3)cos(2 \pi 3*1/4) -jx(3)sin(2\pi 3*1/4) X(2)=x(0)cos(2 \pi 0*2/4) -jx(0)sin(2\pi 0*2/4) + x(1)cos(2 \pi 1*2/4) -jx(1)sin(2\pi 1*2/4) + x(2)cos(2 \pi 2*2/4) -jx(2)sin(2\pi 2*2/4) + x(3)cos(2 \pi 3*2/4) -jx(3)sin(2\pi 3*2/4) X(3)=x(0)cos(2 \pi 0*3/4) -jx(0)sin(2\pi 0*3/4) + x(1)cos(2 \pi 1*3/4) -jx(1)sin(2\pi 1*3/4) + x(2)cos(2 \pi 2*3/4) -jx(2)sin(2\pi 2*3/4) + x(3)cos(2 \pi 3*3/4) -jx(3)sin(2\pi 3*3/4)

As you can clearly see, the amount of multiplication that needs to be done grows very quickly as N increases. The amount of multiplication that needs to be done in the discrete Fourier Transform is N 2 , as can be seen above. [54] 2: Prove Linearity in the discrete Fourier Transform: Proof: Xsum (m) = X1 (m) + X2 (m) (11.16)


andy o’donnell: fast fourier transform

N −1

Xsum (m) =

n =0

∑ (x1 (n) + x2 (n))e j2πnm/N .
N −1 n =0


N −1


n =0

∑ (x1 (n))e j2πnm/N + ∑ (x2 (n))e j2πnm/N .






n this chapter we are going to apply some physical principles to modern data storage technology. Digital technology exploded in the last quarter of the twentieth century with the invention of the microprocessor. However, without the data storage devices that were developed in parallel, the computer revolution would not have been pervasive to all areas of society. What good what a computer be to the banking industry if they could not store your account information? More importantly, what good would your desktop computer be if it was unable to have a hard disk drive to store your operating system or desktop applications. The first several sections of this chapter introduce some basic concepts of digital storage, such as binary number systems, that are required to understand the remaining parts of the chapter. A reader experienced with computer science concepts may wish to skip over these sections and start with section 12.4. Hard disk drives are the most common storage device used by modern computers and the next section is devoted to explaining the physical principle behind these devices. In particular, we explain some of the physics behind giant magnetoresistance, a technology which has enabled disk drives to exceed storage capacities of up to 1 Terabyte.



bits and bytes — the units of digital storage

Before we can begin to discuss how data is physically stored, we need to understand the numerical units that are used by computers and data storage devices. People count using a base-10 number system. This means that each digit of our number system takes on one of ten different values, which are the digits 0 through 9. Most scientists believe that we use a base-10 number system because we have 10 fingers and counting is something that we first learn to do with these 10 fingers. Digital computers use a base-2 number system. The digits used by a computer are called binary numbers and these digits can take on only two values — either 0 or 1. The reason that computers use a base-2 number system instead of a base-10 number system is because it is much easier to design physical devices that have only two states, on and off, as opposed to a device that has ten different states. A single two-bit number is called a "bit". Both digital computers and digital data storage devices operate at the physical level on these binary bits of data. Larger numbers are represented by stringing several bits together. Table 12.1 shows how binary numbers with three bits are used to represent eight different numeric values. When N binary bits are combined to form a number, the number of unique numbers M that they can represent is determined by M = 2N . (12.1)



tim mortsolf: the physics of data storage

Table 12.1: Numeric values Represented By 3-Bit Binary Numbers

Binary/Base-2 Number 000 001 010 011 100 101 110 111

Number 0 1 1 3 4 5 6 7

If we invert this equation by taking the base-2 logarithm of both sides, then we get a formula to determine the minimum number of binary digits required to represent a M different values N = log2 M. (12.2)

A "byte" is a a number formed by collecting eight binary bits to form a single number that can take on 256 different values. The earliest common microprocessors operated on 8-bit quantities and had eight separate wires to signals of these bits between the microprocessor and the memory. Today’s microprocessors operate on 64-bit numbers but we still use bytes as the most unit to express the size of a digital storage device. Prefixes are used to represent large numbers of bytes like the amount used in data storage systems. Commonly used prefixes are: a kilobyte for 210 or 1,024 bytes, a megabyte for 220 or 1,048,576 bytes, and a gigabyte for 230 or 1,073,741,824 bytes. example 12.1: number of bits on a cd-rom
A standard 120 mm CD-ROM contains 703.1 megabytes of data. How many binary bits of data does a CD hold? Solution A megabyte is 1,048,576 bytes and each byte consists of 8 bits so we can compute the number of bytes by simply multiplying these numbers together, 1 CD = 703.1 megabytes × 1, 048, 576 bytes 8 bits × = 5, 898, 030, 285 bits. 1 megabyte 1 byte

example 12.2: using binary digits to store dna
DNA sequences are represented by biochemists with a string of alphabetic letters that represent the primary sequence of a DNA strand. For example, the string AGCTCGAT is a DNA sequence made of eight DNA bases. For almost all situations there are only four DNA bases in a DNA sequence that we represent by 4 letters: A for adenine, G for guanine, C for cytosine, and T for thymine. How many bits of data does it take to store the value of a DNA base? If we are instead required to use bytes to store these values, what is the most number of DNA bases that we could store into a byte? Solution Since there are only four different values of the DNA base,

12.3 storage capacity is everything


we can simply use formula 12.2 to solve for the number of bits that are required number bits = log2 (4) = 2. There are several different schemes that we could use encode these bits. Here is one that encodes the bases in alphabetical order of the symbols. Binary/Base-2 Number 00 01 10 11 DNA base A (adenine) C (cytosine) G (guanine) T (thymine)

A byte has eight bits of information. Since each DNA base requires 2 bits for encoding, then a byte can store the value of 8/2 = 4 DNA bases in a sequence. To represent sequences longer than four DNA bases we would simply use additional bytes.


storage capacity is everything

When customers are purchasing new computer hardware, the most important considerations are performance and price. The manufacturers of computer components are under competitive pressure to regularly come out with new devices with better performance metrics. When it comes to data storage devices, and hard disk drives in particular, the metrics that matter are: storage capacity, transfer rate, and access time; of these, storage capacity is by far the most important to the average consumer. Ever since the digital computer has arrived, there have been two important trends with storage capacity. The storage capacity and the transfer rates have simultaneously increased. This is because the areal density of the hard disk drive surfaces have increased ever since they were invented. Areal density, or bit density, is the amount of bits that can be stored in a certain amount of "real estate" of the hard drive platter’s surface. The units for areal density are BPSI (bits per square inch). Although we are probably far away from the peak of this trend [25], it is a trend that in the long term appears to oppose the quantum mechanical principles of uncertainty. When we consider the future of data storage, one limit that that scientists envision is the ability to store a bit of information using a single atom. It might be possible or even practical to store more than one bit of information in an atom, but for the sake of this discussion let’s assume that the one bit per atom limit will one day be achieved. Quantum computers have the ability to store one bit of information per degree of freedom and provide a reasonable hope that this is limit can be obtained. As the size of matter used to store a single bit of information shrinks down to the size of an atom, the energy of the matter becomes very small. The Heisenberg uncertainty principle relates the spread in energy ∆E of a quantum state to the amount of time ∆t it takes for the spread to evolve to an orthogonal state with the relation ∆E∆t ≥ h. ¯


tim mortsolf: the physics of data storage

Figure 12.1: Transfer rates of magnetic disk drives and microscopic materials. The transfer rates of single bits of information on a magnetic drive has continued to improve as the density as increased, but we may be approaching a limit a trade off is required. The datapoint for the silicon atom shows the transfer rate of gold atoms bonded to a silicon surface that were read with an scanning tunneling microscope (STM). The data rate depicted for DNA is presumably how fast the DNA is able to be transcribed into an RNA molecule. The rate of RNA transcription is typically greater than 50 RNA bases per second. (Courtesy of [42])

This has been extended to relate the uncertainty in time to a particle with average energy E as which we can rearrange to determine the uncertainty in time as [56] ∆t = π¯ h . 2∆E

This physical principle implies that in the future, when the densities used to store bits of information approach the quantum mechanical limits, there will be a trade off between storage density and transfer rates. In fact, this is already a limit that we see in microscopic systems such as gold atoms on a silicon surface and the decoding of DNA. DNA is very dense, much denser than any man-made storage devices by several orders of magnitude. DNA uses approximately 32 atoms for each nucleotide of a DNA sequence; these are the letters A, G, C, and T that one learns about in a introductory biology class. Figure 12.1 shows the trend of how storage density and transfer rates have increased throughout the history of HDD development but drop off tremendously for atomic systems. 12.4 the physics of a hard disk drive

Hard disk drives (HDDs) are the major data storage devices used by today’s desktop computer for permanent storage of data. Although HDDs were introduced into the market place in the 1956 by IBM, the

12.4 the physics of a hard disk drive


Figure 12.2: Diagram of the major internal view of a hard disk drive. (Courtesy of [63])

earlier desktop PCs used removable floppy disk drives (FDDs) because of the high cost and relatively low data storage of HDDs available at the time. During the early 1990s, HDDs quickly supplanted FDDs in desktop PCs as technology permitted HDDs to be made at lower cost and with higher storage capacity. In this section we will explain how hard disk drives work and the physics of giant magnetoresistance (GMR), which is a newer technology that has enabled modern HDDs to have data storage capacities of 1 Terabyte. 12.4.1 Hard Disk Drive Components

Figure 12.2 shows a basic structure of the internal components of a HDD. We are going to focus on the three main components that deal with how the binary data is magnetically stored inside the hard drive: the platter, the read/write heads, and the magnetic surface. This section serves to introduce the reader to how modern disk drives work and cannot begin to fully describe the inter workings and technologies that these devices encompass. The references at the end of this chapter contain suggested books for an interested reader who wishes to learn more details about hard disk drive technology. Hard Disk Platters An HDD is a sealed unit that that uses stores digital bits of data on hard disk platters that are coated with a magnetic surface. These platters of composed of two types of material — a substrate that forms the mechanical structure of the platter, and a magnetic surface that coats the substrate and stores the encoded data. The HDDs used in modern disk drives are composed of ceramic and glass and use both surfaces of the platter to store data. Modern HDDs contain more than one platter for two reasons: first this increases the data capacity of the drive, and second this increases data transfer rate since each read/write head


tim mortsolf: the physics of data storage

can transfer data simultaneously. The HDD platters are attached to a spindle that rotates very rapidly and at a constant speed. The rotational velocity of HDDs is specified in revolutions per minute (rpm) and is an important parameter for the consumer. As you can guess, the higher the rpm, the higher the data transfer rate and the lower the data access times. Premium desktop HDDs that are available in November of 2008 rotate at 10,000 rpm and have data transfer rates of about 1 Gbit/s. The information on a hard disk platter is organized at the highest level in a "track" . A track contains a ring of data on a concentric circle of the platter surface. Each platter contains thousands of tracks and each track stores thousands of bytes of data. Each track is divided into a smaller segment called a sector . A sector holds 512 bytes of data plus some additional bytes that that contain error correction codes used by the drive to verify that the data in the sector is accurate. Read/write heads HDD read/write heads read data from and write data onto a hard disk platter. Each platter surface has its own read/write head. A disk drive with five platters has ten heads since there are each side of a platter has a magnetic surface. The heads are attached to a single actuator shaft that positions the heads over a specific location of the platter. The drive controller is the electronic circuitry that controls the rotation of the drive and the position of the heads. The computer interfaces with the drive controller to read and write data to the drive. The computer refers to a specific location on a hard drive by using a head number, cylinder number, and sector number. This drive controller uses the head number to determine which platter surface the data is on, the cylinder number to identify what track it is on, and the sector number to identify a specific 512 byte section of data within the track. To access the data, the drive controller instructs the actuator to move to the requested cylinder. The actuator moves all of the heads in unison to the cylinder, placing each head directly over the track identified by the cylinder number. The drive controller then requests the head to read or write data as the sector spins past the read/write head. When the hard disk drive is turned off, the heads rest on the platter surface. When the drive is powered up and begins to spin, the air pressure from the spinning platters lift the heads slightly to create a very tiny gap between the heads and the platter surface. A hard drive crash occurs when the head hits the surface while the platter is spinning and scratches the magnetic media. Magnetic surface The magnetic surface, or media layer, of a hard disk platter is only a few millionths of an inch thick. The earlier hard drive models used iron oxide media on the platter surfaces because of its low cost, ease for manufacturing, and ability to maintain a strong magnetic field. However, iron oxide materials are no longer used in today’s HDDs because of its low density storage capacity. Increases in storage density require smaller magnetic fields so that the heads only pick up the magnetic signal of the surface directly beneath them. Modern disk drives have a thin-film media surface. First a base surface called the underlayer is placed onto the platter using hard metal alloys such as NiCr. The magnetic media layer is deposited on

12.4 the physics of a hard disk drive


top of the underlayer. This surface is formed by depositing a cobalt alloy magnetic material using a continuous vacuum deposition process called sputtering [68]. The magnetic media layer is finally covered with a carbon layer that protects the magnetic from dust and from head scratches. 12.4.2 Hard Disk Magnetic Decoding and Recording

The read/write heads in modern HDDs contain separate heads for the reading and writing functions. Although the technology for both heads has improved, most of the increases in data density have come from technology improvements of the read head. Over the last decade, the technology for read write heads has evolved through several related by different physical processes. The original heads used in HDDs were ferrite heads that consisted of an iron core wrapped with windings, similar to that used to form a solenoid. These heads were able to perform both a reading and writing function, but the density and data rates were terrible by today’s standards. One of the technologies that was applied in the 1990s to led to the widespread acceptance of HDD technology was the use of the magnetoresistance effect. AMR (anisotropic magnetoresistance) read heads utilize the magnetoresistance effect to read data from a platter. The next breakthrough was the incorporation of GMR (giant magnetoresistance) read heads that utilize the giant magnetoresistance effect. We will cover the physical processes of magnetoresistance and GMR technology in the next section; this is an effect that leads to much higher areal densities than AMR and is led to the explosion of HDD capacities after the turn of the century. Most HDDs in use today orient the magnetic fields parallel to the magnetic surface along the direction of the track. This magnetization scheme is referred to as LMR (longitudinal magnetic recording) . Recently, the growth rate of HDD capacities have slowed because of limitations in how dense the thin-film media can reliably store magnetic bits of information. The fine structure of the magnetic cobalt-alloys in the magnetic media consists of randomly shaped grains that come in a variety of different sizes. Each bit that is written onto the surface must be stored in nearly 100 grains in order for the information to be reliably stored [95]. One problem that arises as HDDs encode magnetic information at higher densities is that thermal energy can excite these grains and reverse the magnetization of regions on the surface. The amount of thermal energy required to reverse the magnetization is proportional to the number of grains used to store the magnetic information. PMR (perpendicular magnetic recording) is a new technology that is being use to further push the density envelope on information stored on the magnetic surface. PMR, as its name suggests, encodes the magnetic bits in up and down orientations that are perpendicular to the media surface. These bits are more resilient to thermal fluctuations, but require a stronger magnetic field to write the information into the grains. Bits encoded with PMR also produce a sharper magnetic response, so not only can they be more densely encoded into the surface, but they are also easier for the disk read head to interpret. Figure 12.3 shows a picture of LMR and PMR magnetically recorded bits on a media surface. Notice how the waveform produced at the read head is much sharper for PRM encoding, even though the regions used to store the


tim mortsolf: the physics of data storage

Figure 12.3: Depiction of the magnetic bit orientation and read head signals of media surfaces encoding using LMR (longitudinal magnetic recording) and perpendicular magnetic recording (PMR) technology (Courtesy of [95])

information on the surface are smaller. In the future, all HDDs will probably use PMR until a newer technology comes along to replace it. 12.4.3 Giant Magnetoresistance

Magnetoresistance is a physical process that causes the resistance of a material to change when it is in the presence of a magnetic field. Let’s explain how a HDD uses the physical process of magnetoresistance to detect the magnetic signals on a HDD. The drive controller applies a voltage to the read heads and detects the current that passes through them; in this way the drive controller acts as an ammeter. As the magnetic surface spins beneath a read head, it enters the magnetic field induced by the cobalt alloy magnetic grains on the thin-film media surface that is directly beneath the read head. The precise electrical resistance of the read head depends on the magnitude and direction of these magnetic grains. Because the HDD platter is spinning, the magnetic field and hence its electrical resistance changes in direct response to the magnetic field on the surface. The drive controller detects these changes in resistance by measuring the current that passes through the read head. Thus, the drive controller is able to interpret the magnetic field information that is encoded on the surface of the disk. The drive controller has very accurate knowledge of which portion of the disk is under the read head at any given time and is thus able to determine the bit values of all the magnetic bits stored in a 512 byte cluster.

12.4 the physics of a hard disk drive


As we saw in the last section, the density of HDD information is intrinsically related to how small of an area the HDD is able to reliably store the magnetic information so that it cannot be modified by thermal energy. This is one important area where HDD density is increased, but it is also not all that is required. The HDD read heads must also be able to reliably read back the information that is encoded ever more densely onto the surface. Just before the turn of the century, one major barrier to improved HDD storage capacity was that HDD read heads were unable to keep pace at reliably reading magnetic bits at the increased storage densities that were required. As the density of magnetic information stored on the disk shrinks, the magnetic fields become smaller and the resistance changes induced by magnetoresistance become weaker. At a small enough limit, the signal changes become weak enough that the signal to noise ratio is too large to reliably decode the information. The breakthrough that addressed this problem was the incorporation of GMR (giant magnetoresistance) technology into the HDD read heads. The physics of magnetoresistance has been understood for over a century so it is not surprising that AMR heads were quickly replaced with technologies that work at much high areal densities. The GMR effect was independently discovered in 1988 by Albert Fert and Peter Grünberg for which they shared a Nobel prize in physics in 2007 [64]. GMR is a technology that can be used to create larger changes in electrical resistance with weaker magnetic fields. The term "giant" comes not from the size of the magnets, but from the size of the effect that is produced. In the explanation of GMR that follows, the thin magnetic layers in which the GMR effect occurs is are those used in the read/write heads, not the magnetic materials on the thin-media layer. GMR occurs on thin layers of magnetic material, such as cobalt alloys, that are separated by a nonmagnetic spacer that is just a few nanometers thick. This configuration of magnetic materials produces a tremendous reduction of electrical resistance when a magnetic field is applied. The thin magnetic layers that are used in the read write heads do not have a permanent magnetic dipole associated with them and are quite easily able to orient themselves in response to an applied magnetic field. In the absence of an external magnetic field, the magnetizations of the thin magnetic layers orient themselves in opposite or antiparallel directions. In this orientation, the electrical resistance is at it highest. When an external magnetic field is applied, the thin magnetic become aligned and is accompanied by a tremendous drop in electrical resistance. A graph of this effect is shown in Figure 12.4. GMR heads are designed so that one layer is always in a fixed orientation and the other layer is free to reorient itself. A fourth layer that is a strong antiferromagnet is used to "pin" down its orientation. As the magnetic bits pass under the GMR head, the magnetic field from the grains on the thin-media layer directly beneath the GMR head reorient the "unpinned" layer which to produce the strong changes in electrical resistance [68]. Since the "pinned" layer is always force in one orientation, grains that are oriented in the same direction produce little change in resistance but grains that are oriented in the opposite direction will produce large changes in resistance. Thus, a reliable signal can be encoded onto the thin-media layer at smaller densities with weaker magnetic fields.


tim mortsolf: the physics of data storage

Figure 12.4: Graph of the GMR effect of the resistance on thin magnetic alloy layers separated by a thin nonmagnetic spacer. (Courtesy of [85])



Digital computers and data storage systems store numbers using binary bits because it is practical to build devices that have two states, on and off. A byte made by combining the values of eight bits into a single quantity and can represent 28 = 256 different values. Data storage capacity and transfer rates have been increasing since the beginning of the digital era, but as we scale the devices down to the atomic level, there will be a trade off between these two metrics. This limit is explained by the Heisenberg relation which relates the amount of time it takes to force a physical system into a state to the energy of the state the system is forced into, ∆E∆t ≥ h. Hard disk drives (HDDs) are the major data ¯ storage devices in use today. They store information by magnetically encoded physical bits of information onto a hard disk platter that is coated with a thin-layer media surface composed of magnetic alloys. HDDs read information from the magnetic surface from a change in electrical resistance that occurs from magnetoresistance. Giant magnetoresistance (GMR) is a recent technology that shows greater response to changes in magnetic fields by designing the read heads to have thin layers of magnetic alloys separated by a thin nonmagnetic spacer. 12.6 exercises

1. The human genome contains 3 billion nucleotide base sequences. If we stored each nucleotide in 1 byte, could we store the human genome on a single 703.1 Megabyte CD? Solution Using one byte per nucleotide base sequence would require 3 × 109 bytes of storage. A 703.1 Megabyte CD holds .703 × 109 bytes. A CD would not be able to store the human genome unless a more efficient storage scheme was used. In fact,

12.7 multiple choice questions


the human genome has many repeated nucleotide sequences and can be compressed to fit onto a single CD. 12.7 multiple choice questions

1. Which physical form of magnetic storage has the highest areal density? (a) AMR stored with longitudinal encoding (b) AMR stored with perpendicular encoding (c) GMR stored with longitudinal encoding (d) GMR stored with perpendicular encoding Solution (d) 2. The first generation of hard disk drives that were produced, relied on the magnetoresistance effect to decode the magnetically stored bits of data? (a) True (b) False Solution (a) 3. A hard disk drive uses a single read/write head for: (a) All of the platter surfaces in the drive (b) Each platter in the drive (c) Each platter surface in the drive Solution (c) 4. The GMR effect that has been used to increase hard drive density is primarily used to: (a) Decode bits of information on the magnetic thin-layer media (b) Encode bits of information on the magnetic thin-layer media (c) Decode and encode bits of information on the magnetic thinlayer media Solution (a)





n this chapter we are going to apply information theory to the data storage concepts presented in the last chapter. The theory of information is a modern science that is used for the numerical computation of data encoding, compression, and transmission properties. The chapter is an introduction to information theory and the physical principles behind "information entropy", a concept which in data storage is analogous to the role of entropy in thermodynamics.



information theory – the physical limits of data

We have seen that the technological improvements in data storage devices over the last 50 years follow a pattern of increased data density that roughly doubles every 18 months. At the time this textbook was written (2008), researchers announced a new hard drive density record of 803 Gb/in2 on a hard drive platter surface. This record was achieved using TMR (Tunneling Magneto-Resistance) read/write heads. The rate law of data storage as described by Moore’s law, is not a law that comes from physical principles, but rather a law has been accurate in estimating the engineering advances of computational technology that have occurred over the past half century. There are some laws of data storage and computation that can be derived from physical principles. "Information theory" is the branch of science that applies the rules of physics and mathematics to obtain physical laws that describe how information is quantitized. These laws are primarily derived from the thermodynamic equations of state for a system. Entropy is a key player in this arena since entropy is a quantity that defines the number of available configurations (or states) of a closed system. In this section, we will introduce some of the laws of information theory and show how they can be applied to data storage devices. We will also use these laws to show that there are indeed physical limits to computation that determine the maximum information density and computational speed at which theoretical data storage devices can operate. 13.2.1 Information Theory and the Scientific Definition of Information

Information theory can be informally defined as the mathematical formulation of the methods that we use to measure, encode, and transform information. "Information" is an abstract quantity that is hard to precisely define. In this chapter we will use a narrow technical term for information to mean the encoding of a message into binary bits that can be stored on a data storage device. One of the earliest applications of information theory occurred when the Samuel Morse designed an efficient Morse code that was used to encode messages for transmission over a telegraph line. His code used a restricted alphabet of only "dash



tim mortsolf: the physics of information theory

(long)" and "dot (short)" electrical signals to transmit text messages over long distances. For example, the letter C is encoded by the Morse sequence "dash, dot, dash, dot". The principles of information theory can be used to show that the Morse code is not an optimal encoding using an alphabet of just two characters and the encoding can be further improved by a factor of 15% [3]. The scientific treatment of information began with Hartley’s "Transmission of Information" in 1928 [40]. In this paper, he introduced a technical definition for information . The answer to a question that can assume the two values ’yes’ or ’no’ (without taking into account the meaning of the question) contains one unit of information. Claude Shannon’s publication "A Mathematical Theory of Communication" in 1948 formed a a solid foundation for the scientific basis for information theory [72]. Shannon introduced the concept of "information entropy", also called the Shannon entropy , that minimum message length in bits that must be used to encode the true value of a random variable. Today, information theory exists as a branch of mathematics that is has its main applications in encryption, digital signal processing, and even biological sciences. 13.2.2 Information Entropy and Randomness

The best way to understand the basics of information entropy is to look at an example. For our experiment, we want to flip a coin a large number of times (let’s say 1,000,000) and encode the results of this experiment into a message that we can store on a digital storage device. If the coin used in our experiment is not biased then the two outcomes "heads" and "tails" should occur with equal probability of 1/2. We will encode each coin toss with a bit — let 1 indicate "heads" and 0 indicate "tails". Each coin toss requires exactly one bit of information so for 1,000,000 coin tosses, therefore we would need to store 1,000,000 bits of data to record the results of our experiment. Let’s repeat this experiment, but instead let’s assume the coin is now biased — let "heads" occur 3/4 of the time and "tails" occur only 1/4 of the time. It might surprise you that the results of this experiment can be recorded on average with less than 1,000,000 bits of information. Since the distribution of coin flips is biased, we can design an efficient coding scheme that takes advantage of the biased nature to encode the results of each coin flip in less than one bit. It turns out that we are able to record each coin flip using only 0.8113 bits of information and can thus use a total of 813,000 bits of data to record our results. This same process explains why we are able to compress text documents and computer images into a data file that is much smaller than the raw data that we recover when we decompress the information. A better example for explaining how we can encode the results of a coin flip in less than one bit occurs when we let the coin become even more biased — let "heads" occur 99/100 of the time and "tails" occur only 1/100 of the time. If we encoded each coin flip using a single bit as before, we would still need exactly 1,000,000 bits to record the results of our experiment. But for this experiment we can design a variable length encoding algorithm that does much better than this.

13.3 shannon’s formula


Since almost all of our coin flips are "heads", the sequence of coin flips will be a long series of "heads" with an occasional "tail" interspersed to break the series. Our encoding scheme takes advantage of the biased nature by recording only those places in the sequence where a "tail" flip has occurred. Since there are 1,000,000 coin flips, each "tail" requires a 20-bit number to record its position in the sequence. Out of 1,000,000 coin flips, the average number of tails will be 1, 000, 000/100 = 10, 000. On average it will take "10, 000 × 20 = 200, 000 bits to record the results of each experiment. Thus, by using a very simple encoding scheme we have been able to reduce the data storage of our experiment by a factor of 1/5. For our initial recording scheme that recorded the values of each coin flip as a single bit, the amount of data required to store the message was always 1,000,000 bits, no more and no less. With our new encoding scheme we can only discuss the statistical distribution of what the amount of data required for each experiment will be. If we look at the almost impossibly unlucky result where each of 1,000,000 coin flips is "tails" that occurs for (100)1,000,000 of our experiments, the encoding scheme would use "1, 000, 000 × 20 = 20, 000, 000 bits of data. This is 100 times larger than our average value of 200,000 bits that we expect to use for each trial. These improbable unlucky scenarios are balanced by scenarios where we get lucky and have much fewer coin flips with "tails" than the average result and can encode our experiment in much less than 200,000 bits of data. The important thing to note is that almost all of experiments will require nearly 200,000 bits of data storage, but the exact value will be different for each result. We can design an even better encoding scheme than this one. Instead of recording the absolute sequence number of each coin flip with a "tails" result, we could just record the difference between the absolute sequence number of each coin flip of "tails". For example, if a coin flips of "tails" occurred on the coin flip values {99, 203, 298, 302, 390}, we could encode these results as the differences in the positions, {99 - 0 = 99, 203 - 99 = 104, 298 - 203 = 95, 302 - 298 = 96, 390 - 302 = 88}. (13.2) For this encoding scheme, we use 8-bit to encode differences less than 256, while our original compression scheme used 20-bit numbers for those cases where the "tails" result to save its absolute position in the sequence. With our latest encoding scheme, on average it will take "10, 000 × 8 = 80, 000 bits to record the results of each experiment. But can we do better than this? As you have probably guessed, we can indeed. The limit to how well we can do can be calculated by Shannon’s general formula for information uncertainty that we will develop in the next section. 13.3 shannon’s formula (13.1)

We begin our development of Shannon’s formula by introducing the principle of information content, also known as information uncertainty. We want to create a definition of information content that represents the minimum number of bits required to encode a message with M


tim mortsolf: the physics of information theory

possible values. Let’s go back to the first experiment of 1,000,000 coin flips of an unbiased coin. Before we start this experiment, we have no idea what the result is going to be. Since each coin flip is can be one of two results, namely "heads" or "tails", then there are M = 2n = 21,000,000 possible outcomes for this experiment and each of these outcomes is equally likely since our coin flips are unbiased. It takes n = 1, 000, 000 bits of information to record each of these possible outcomes. We define the uncertainty of the information content I contained in a message with M possible different messages as I = log2 ( M ) (13.3)

We also require that our definition of information content be additive. For example, if we perform an experiment using 500,000 coins followed by a second experiment again of 500,000 coins, then the sum of the information content for each experiment should be the same as the information content of the experiment for 1,000,000 coins. We can see that our definition of information content does indeed satisfy our requirement. The sum of the uncertainties of two different experiments performed n1 and n2 times I1 = log2 2n1 = n1 I2 = log2 2n2 = n2 , (13.4)

equals the uncertainty of an experiment performed n1 + n2 times I = log2 2n1 +n2 = log2 2n1 2n1 = log2 2n1 + log2 2n2 = I1 + I2 . (13.5) The quantity I is unitless. The unit value of I represents the number of bits required to encode a message with 2 possible outcomes, I (1) = log2 2 = 1. (13.6)

We now need to apply the definition for information content to situations where the values encoded by the message are not equally likely, such as is the case for our biased coin. Let Pi = 1/Mi be the probability of getting any message Mi . We define the surprisal ui as the "surprise" that we get when we encounter the ith type of symbol, ui = − log2 ( Pi ). (13.7)

If the symbol Mi occurs rarely, then Pi is almost zero and ui becomes very large, indicating that we would be quite surprised to see Mi . However, if the symbol Mi occurs almost all the time, then Pi is almost one and ui becomes very small, and we would not be surprised to see Mi at all. Shannon’s definition of uncertainty is the average surprisal of the values contained in a message of infinite size. This is a limit that a finite message of length N converges to when the size of the message becomes infinite. First let’s determine the average surprise Hn for a message of length N is encoded by an set of M different symbols, with each symbol Mi appearing Ni times, Hn = N1 u1 + N2 u2 + N3 u3 + ... + N4 u N = N



Ni u. N i


If we make this calculation in the limit of messages with an infinite number of symbols then the frequency Ni /N of symbol Mi converges to its probability Pi Pi = lim N →∞ Ni , N (13.9)

13.4 the physical limits of data storage


which we can substitute into (13.8) to get, H=

∑ Pi ui .



We finally arrive at Shannon’s formula for information uncertainty by substituting (13.7) into (13.10), H = − ∑ Pi log2 Pi
i M

Shannon’s formula (bits per symbol). (13.11)

example 13.1: using shannon’s formula
In this section, we indicated that trials from a coin biased with 3/4 "heads" to 1/4 "tails" can be recorded using 0.8113 bits of information. Let’s use Shannon’s formula to prove this. Solution Use Shannon’s formula from 13.11, setting P1 = 3/4 and P2 = 1/4 H = − ∑ Pi log2 Pi
i M

= −(3/4 × log2 3/4) − (1/4 × log2 1/4) log10 3/4 log10 1/4 = −(3/4 × ) − (1/4 × ) log10 2 log10 2 = 0.8113


the physical limits of data storage

In the closing section to our chapter, we combine Shannon’s law that we used to compute the information entropy with the laws of thermodynamics. These will be used in a non-rigorous fashion to determine the physical limits of data storage density and data operations of a device called the "perfect disk drive". The "perfect disk drive" or PDD is not a real device that exists today, but one we use to determine what the ultimate boundaries of data storage capacity and data storage rates. The PDD stores information at the lowest density of matter permitted by physics. Compared to the PDD, today’s computers are extremely inefficient. Most of the atoms of a highly dense hard drive that we use today does not store any information, but are instead required to make the disk drive function. Even if we peel back the layers of a single surface of a hard disk drive platter, there are millions (if not more) of atoms required to store the state of just a single bit of information. Let’s consider this problem from a statistical mechanics point of view. A disk drive has an enormous amount of available energy states that come from the number of particles that the components of the disk drive are constructed from. If we try to use quantum mechanics to determine the number of available energy states for these particles then we are attempting to solve something that is impossible with today’s scientific methods. Fortunately, statistical mechanics provides us with a method to approximate the number of available states of a system from the thermodynamic equations of state. These approximations are very accurate as temperatures increase and the quantum distribution of


tim mortsolf: the physics of information theory

energies become nearly continuous and can be treated classically. The equation S = k B ln W (13.12)

relates the thermodynamic state variable for entropy S to the quantity W for the number of accessible states of a closed system, where k B is the value of Boltzmann’s constant. The highest efficiency that our PDD could achieve for data density is such it is able to precisely encode its information into the number of accessible thermodynamic states. That is, we assume the PDD can encode a single bit of information using just one of these accessible states. We are not claiming this is something we will actually be able to achieve, but we are claiming this is a physical limit that the device can’t exceed. The number of accessible states W of our closed system then able to hold exactly W bits of information. In the previous section we defined the information content as the minimum number of bits required to encode a message with M possible values. The physical relation that defines our PDD is M = W, (13.13)

which means it can encode a message with M possible values using exactly the number of states W that are accessible to the system. Now we can substitute this into (13.12) and use (13.3) to get S = k B ln M = k B ln 2 I = k B I ln 2, which we rearrange to solve for I I= S k B ln 2 (13.15) (13.14)

This formula (13.15) relates the information content we can store in the device to the thermodynamic entropy of the physical system. Unfortunately, there is not a simple analytical formula that we can use to compute the entropy of an arbitrary device from knowledge of physical properties such as mass, volume, temperature, and pressure. The computation of entropy requires complete knowledge of the quantum energy levels available to each particle in the device. One author used a method to approximate a lower bound for the maximum amount of entropy in a 1 kg device that occupies a volume of 1 L and assumes that most of the energy of the system arises from blackbody photon radiation [52]. Using these simplifications, he arrives at a value of T = 5.872 × 108 K for the temperature at which the maximum entropy occurs, and S = 2.042 × 108 J K−1 for the total entropy. Substituting these values into our equation for the information content, I= 2.042 × 108 J K−1 = 2.132 × 1031 bits 1.381 × 10−23 J K−1 ln 2 (13.16)

How does this compare to today’s hard disk drive technology? At the time of this book (November 2008), disk drives with a little over 1.5 Terabyte = 1.2 × 1013 bits of data can be purchased for desktop computer systems.

13.5 summary


example 13.2: applying moore’s law
Use Moore’s law to see how far away we are from this theoretical limit Solution Current technology permits us to mass produce desktop hard drives with 1.2 × 1013 bits of data. We need to find out how long it will take to achieve 2.132 × 1031 bits of data in roughly the same size device. increased storage ratio = 2.132 × 1031 bits = 1.78 × 1018 1.2 × 1013 bits

To find out how many times we have to double the storage capacity to increase our storage capacity by this much, we take the base 2 logarithm of it. number times to double = log2 1.78 × 1018 = log10 1.78 × 1018 = 60.6 times log10 2

Now using Moore’s law of 18 months per doubling of storage capacity, we estimate the amount of time required to be time required = 60.6 times to double × time required ≈ 91 years Thus it will take about 90 years to develop the technology to store data at our arrived value of the maximum theoretical limit for data storage. Although this calculation does have several simplifying assumptions, it does show that there is a lot of room left to increase the data density relative to a device that precisely encodes information content into the entropic states of a physical system. 1 year 18 months × 1 time to double 12 months



Information theory is a mathematical formulation of the methods that we use to measure, encode, and transform information. Information entropy is a quantity that measures the randomness of data sequences. Shannon’s formula for information uncertainty is used to calculate this value. The lower the entropy, the less the randomness, and the greater we can compress the data. The principles of thermodynamics and information entropy can be combined to estimate the maximum amount of information that can be stored at the atomic level per quantum degree of freedom. 13.6 exercises

1. Your assistant Bob rolls a six-sided dice and you don’t know the result. What is the minimum number of yes/no questions that you could ask Bob to get the answer? Try to use information theory to answer this question. Solution Before posing a set of questions, let’s first use information theory to try to encode the result. The dice and have six results and using a binary coding scheme, we need three bits to encode the results. There are several different schemes that we could use encode the dice result. Here is one that encodes them in order of lower to higher values.


tim mortsolf: the physics of information theory

Encoding 000 001 010 011 100 101 110 111

Dice Throw 1 2 3 4 5 6 Not used Not used

Our encoding scheme is not perfect since it has two unused values. But to record the results of a single dice throw, we cannot design a code more efficient than this. If the results of many dice throws were recorded we could be able to record the results in less than three bits per throw by using a clever compression scheme. From the encoding scheme, we can design a series of questions that ascertain the value of the bits used to encode our results. The first question tests the value of the first bit — "Is the value of the dice roll a five or a six?". The second question tests the value of the second bit "Is the value of the dice roll a 3 or 4?". And the third question tests the value of the last bit — "Is the value of the dice roll an odd number?". 2. Repeat the same problem but this time Bob rolls two six-sided dice and you only want to determine the value of the sum. Solution The sum of two six-sided dice rolls will have values from 2 (when two 1’s are rolled) through 12 (when two 6’s are rolled). Our encoding scheme needs to encode eleven values which takes four bits. However, since in this case some results are more likely than others, let’s see if we can design a more clever encoding scheme. Binary/Base-2 Number 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Sum of dice 2 3 4 5 6 6 7 7 7 7 8 8 9 10 11 12
4 36 3 36 2 36 1 36 5 36 6 36

1 36 2 36 3 36 4 36 5 36

13.7 multiple choice questions


This encoding scheme does not have any unused values. We already require four bits to encode all of the possible values of the sum. However, instead of having encoding that are unused, the extra codes are assigned to those values of the dice sums that are more likely to occur, which in this case are the values 6, 7, 8. This encoding requires a maximum of four questions to determine the value of the dice sums, but for some values it will be less than this, namely the ones for which we have degenerate encoding values. From the encoding scheme, we can design a series of questions that ascertain the value of the bits used to encode our results. But now we can now test for a value of a six using just two questions. The first question tests the value of the second bit — "Is the value of the sum a six or a seven?". For an answer of yes, the second question tests the value of the third bit — "Is the value of the sum a six?". Thus, in just two questions we know if the sum has a value of six. Similarly, with just three questions we will know if the value of the sum was a seven or an eight. All the other values would require the full four questions. However, on average our encoding scheme and associated questions require less than four questions to determine the result of each set of dice rolls. Challenge: Use Shannon’s formula to determine the minimum number of bits required to encode the results. Then use our encoding scheme and compare its efficiency to this theoretical limit. 13.7 multiple choice questions

1. How many bits of information would be required on average to record the results of an unbiased coin that is flipped 1,000,000 times (a) 813,000 (b) 500,000 (c) 1,000,000 (d) None of the above Solution (c) 2. How many bits of information would be required on average to record the results of 1,000,000 coin flips of a coin that is biased so that 3/4 of the flips are "heads" and 1/4 are "tails"? (a) 813,000 (b) 500,000 (c) 1,000,000 (d) None of the above Solution (a)



warm Intelligence is the property of a system where a collection of unsophisticated beings or agents are functionally coherent through local interactions with their environment allowing for large scale patterns to occur. This organizing principle is named stigmergy. While the term, stigmergy, was dubbed in the 1950’s by Pierre-Paul Grasse to give a name for the relationship between social insects and the structures they create, such as ant hills, beehives and termite mounds, this anomaly has actually existed for billions of years. The term, stigmergy, literally means “driven by the mark [86].” The bodies of all multi-cellular organisms are stigmergy formations. This phenomenon is apparent throughout nature in multiple species of social insects such as ants, wasps, termites and others. While stigmergy has been proven to be a powerful and immensely useful criterion of self-organized optimization, it has only been researched in recent years [66]. The research that has been conducted has uncovered simple algorithms for traffic congestion control and resource usage based on local interactions as opposed to centralized systems of control. These powerful algorithms have already begun to be applied to artificial intelligence and computer programing.


14.1 14.1.1

introduction Stigmergy in Nature

Ant trail formation and general large scale organized travel is made possible by the indirect communication between individual ants through the environment. The individual ants deposit pheromones while walking. As an ant travels, it will probabilistically follow the path richest in pheromones [22]. This allows ants to adapt to changes or sudden obstacles in their path, and find the new shortest path to their destination. This incredibly powerful social system allows entire colonies of ants to make adaptive decisions based solely on local information permitting ants to transport food far distances back to their nest. The organizing process, stigmergy, is also used in a similar way by termites. When termites build mounds they start by one termite retrieves a grain of soil and sets it in place with a sticky glue-like saliva that contains pheromones. In this case the mark referred to in the literal meaning of stigmergy is the grain of soil. This pheromone filled saliva then attracts other termites who then glue another grain of soil on top of the last. As this process continues the attractive signal increases as the amount of pheromones increase [86]. This process is called, stigmertic building. Stigmertic Building allows the termites to construct pillars and roofed galleries. The simplest form of stigmertic building is pillar construction. As the pile of grains of soil accumulate upwards it naturally forms a pillar. Once the pillar reaches a height approximately equal to that of a termite standing on its hind legs the termites begin



sam boone: analytical investigation of the optimal traffic organization of socia

to add soil grains laterally. These lateral additions collectively form a shelf extending from the top of the pillar. Eventually, the shelf of one pillar meets with the shelf of another nearby pillar. This connection of shelves creates a roof.

Figure 14.1: Simulated Stages of Termite Roofed Gallery Construction [86]

Figure 14.2: Common Example of a Termite Roofed Gallery Construction [86]


Applying Stigmergy in Human Life

The stigmergy concept in general has led to simple algorithms that have been applied to the field of combinatorial optimization problems, which includes routing in communication networks, shortest path problems and material flow on factory floors [66]. While these applications can aid efforts to make transportation routes become more efficient other considerations must be made, such as, congestion and bottlenecking. This problem applies to user traffic on the internet as well as motor vehicle traffic. Ant Colony Optimization(ACO) based algorithms have also begun to be used in multi-objective optimization problems. In recent years video game designers have started to use ACO based algorithms to manage Artificial Intelligence in their games. In particular, these algorithms are used to manage functions like multi-agent patrolling systems. In order for this function to run properly, all agents must coordinate their actions so that they can most efficiently visit the areas of most relevance as frequently as possible. Similar algorithms have been used in robotics, computer network management and vehicle routing. Their has also been recent break-throughs in using ACO to create artificial 3D terrain objects for video games, as well as engineering, simulations, training environments, movies and artistic applications. ACO algorithms can also be very useful for site layout optimization. This proves to be very valuable, whether applied to the construction of highways and parking garages to manufacturing plants and ports. Any construction site layout that is concerned with the positioning and timing of temporary objects, machines and facilities that are used to carry out the actual construction process, as well as the permanent objects being created with regards to their ultimate purpose that will be carried out, will greatly benefit from the solutions of these prob-

14.2 ant colony optimization


lems. Ultimately these ACO based algorithms will greatly enhance the efficiency while decreasing the cost of these sites. While ACO research is still very young and fairly scarce, the number of possible applications seem to be limitless. Almost any field of science, mathematics, engineering, computer design, finance, economics and electronics can be greatly improved and expedited with the application of ACO. 14.2 ant colony optimization

In 1992, Marco Doringo proposed a metaheuristic approach called Ant Colony Optimization (ACO). Dorigo was inspired by the foraging behavior of ants. ACO enables ants to find the shortest paths from food sources to their nest. The decisions of path of travel are probabilistically the path of highest pheromone concentration. The ACO algorithms. that Doringo proposed, are based on a probabilistic model, the pheromone model. This model is used to represent the chemical pheromone trails [22]. The pheromone model uses artificial ants that construct solutions in increments by adding solution components to a partial solution that is under consideration. The artificial ants perform randomized walks on a graph G = (C, L), called a construction graph, whose vertices are the solution components C and the set L are the connections [22]. This model is applied to a particular combinatorial optimization problem by having the constraints of that particular problem built into the artificial ants’ constructive procedure so that in every incremental step of the solution construction only feasible solution components can be added. Depending on the problem the use of the pheromone trail parameter Ti is beneficial. The set of all pheromone trail parameters is labeled T. This allows the artificial ants to make probabilistic decisions on which direction to move on the graph [10]. 14.3 optimization by hand

The difficulty of an optimization problem directly correlates with the complexity of the problem itself. The more variables that the solution relies on the exponentially more difficult the problem becomes. To drive home this notion we will attempt a simple 2-variable optimization problem whose solution can easily be solved by hand. example 14.1:
Problem 1: Find two positive numbers whose sum is 12 and the product of one number and the square of another is at a maximum. Solution: Strictly based on the problems design we are left with the fact that: x + y = 12. By rearranging for y we are left with, y = 12 − x. Our goal is to maximize the product: P = xy2 , which can be rewritten by substituting our value for y to look like, P = x (12 − x )2 . (14.4) (14.3) (14.2) (14.1)


sam boone: analytical investigation of the optimal traffic organization of socia

We then take the first order derivative of both sides, P = dx (12 − x )2 = −2x (12 − x )2 + (12 − x )2 , dx (14.5)

which simplifies to become, P = (12 − x )(−3)( x − 4). (14.6)

Therefore, P equals 0 when x = 4or12. We then calculate the values of our product for these values. P( x = 0, y = 12) = 0, (14.7)

P( x = 4, y = 8) = 256,


P( x = 8, y = 4) = 128,


P( x = 12, y = 0) = 0. Finally, we are left with our solution: x = 4, y = 8,



P = 256.


While this simple optimization problem can be easily solved by hand, a more complex problem with hundreds of variables would be unbelievably difficult to solve and the time to calculate the solution would be tremendous. It is for these complex optimization problems that ACO based algorithms are enormously helpful. 14.4 applying aco meta-heuristic to the traveling salesman problem

One of the famous optimization problems that the Ant Colony Optimization Meta-Heuristic has been applied to is the Traveling Salesman Problem. The Traveling Salesman Problem is the problem to find the shortest closed path in which one could visit all the cities in a certain set. While this algorithm or others of the same nature could be tweaked to handle this problem with different variables we will concern ourselves with a Traveling Salesman Problem in which there exists a path (edge) between any pair of cities. 14.4.1 Applicable Ant Behaviors Observed in Nature

When a colony of ants forage for food, the ants will initially leave their nest and head for the foraging area in which the food that they are looking for lies. Lets assume that their is a natural path from the nest to the foraging area which twice forks and reconnects. Therefore their are four possible paths to the food in which the ants can travel. At first, there is no information on these paths that could give the ants

14.4 applying aco meta-heuristic to the traveling salesman problem


prior information on which path is the shortest. As a result the ants will randomly choose a path to take to the food. Naturally the ants that arrive at the food first will be the ones that chose the shortest path. These ants will then take the food that they gather and head back to their nest backtracking on their path that they took to the foraging area. Since these ants will be the first to the food and the first to head back to the nest their chosen paths will build up higher levels of pheromones quicker than the paths chosen by their counterparts. Very quickly, the shortest path will by far possess the highest level of pheromone and as a result the vast majority of the colony will begin to use this path. The amazing aspect of this natural problem solving exposition displayed by these ants is that the natural properties of the paths or obstacles in question intrinsically hold the solution to the problem.

Figure 14.3: Double bridge experiment. (a) The ants start to randomly choose paths to foraging area. (b) The majority of the ants eventually choose the shortest path (Graph) The distribution of the percentage of ants that chose the shortest path in the double bridge experiment [24]


Approach to the Traveling Salesman Problem

While we will use Marco Dorigo’s artificial ants to solve our optimization problem hands-on we must recall that all of our artificial ants will be acting based upon natural ant behaviors that have been witnessed in nature. For this problem we will use three ideas that have been witnessed in nature by real ants [23]: 1. Preference to choose paths rich in pheromone levels. 2. Increased rate of growth of pheromone levels on shorter paths. 3. Communication between ants mediated by alterations in environment or the trail. Our artificial ants will move from city to city on the Traveling Salesman Problem (TSP) Graph based on a probabilistic function that incorporates both heuristic values that depend on the length of edges (length of trails between cities) and trail accumulated on edges [23]. The artificial


sam boone: analytical investigation of the optimal traffic organization of socia

ants will prefer cities rich in pheromone trail and which are connected by edges (which in our case is all possible cities). Every time step when an ant travels to another city they modify the pheromone trail on the edge being used. Dorigo calls this local trail updating. Initially we will place m artificial ants randomly across our (TSP) Graph at randomly selected cities. Once all of our ants have completed a tour the ant that made the shortest journey will modify the edges used in his tour by adding pheromone amounting to a total that is inversely proportional to the tour length. This step is dubbed global trail updating. 14.4.3 Comparison Between Artificial Ants and Real Ants

As we stated earlier, the artificial ants that we are using act based upon observed behaviors of real ants in nature, however these artificial ants were given some capabilities that the real ants do not possess. However, all of these differences are necessary adaptations that must take place in order for the artificial ants to be able to discovery solutions to optimization problems in their new environment (the TSP Graph). The fundamental difference between the artificial ants used in the ACO algorithm and the biological ants their design is based upon is in the way that they modify the trails they travel. Like the ants observed in nature who deposit pheromones on the surface of their environment, the artificial ants modify their surroundings by changing some numerical information that is stored locally in the problem’s state [24]. This information includes the ant’s current history and performance which can be read and altered by ants who later visit that region. This information is called artificial pheromone trail. This artificial pheromone trail is the only means of communication between each individual member of the colony. In most ACO algorithms an evaporation mechanism is installed, much like real pheromone evaporation, that weakens pheromone information over time allowing the ant colony to gradually forget its past history so that it can modify its direction without being hindered too much by its past decisions [24]. There are a few traits that the artificial ants possess that the real ants do not [24]. 1. The artificial ants possess an internal state that contains the memory of its past actions. 2. The artificial ants deposit an amount of pheromone that correlates to the quality of the solution. 3. The artificial ants timing of when they lay their pheromone does not reflect any pattern of actual ants. For example, in our case of the Traveling Salesman Problem the artificial ants update the pheromone trails after they have found the best solution. 4. The artificial ants in the ACO algorithms can be given extra capabilities by design like lookahead, backtracking, local optimization and others, that make the overall system much more efficient. 14.5 solution to the traveling salesman problem using an aco algorithm

In the Ant Colony System we have designed an artificial ant k is in a city r and chooses a city s to travel to. This city s will be chosen among those

14.5 solution to the traveling salesman problem using an aco algorithm


that are not apart of that ants working memory Mk . The ant chooses the next city s using the probabilistic formula: s = argmaxu/ Mk [τ (r, u)] · [η (r, u)] β , ∈ (14.13)

where q ≤ qo . The probabilistic function, “argmax,” stands for the “argument of the maximum.” In other words, this is the value of u where [τ (r, u)] · [η (r, u)] β is at a maximum. Otherwise, s = S, (14.14)

where τ (r, u) is the amount of pheromone on an edge (r, u), η (r, u) is a heuristic function that was designed to be the inverse of the distance between cities r and u, β is the parameter that weighs the importance of the pheromone level and how close the nearest cities are, q is a random value with a uniform probability [0, 1], qo is a parameter in (0 ≤ qo ≤ 1), and S is a variable that were randomly selected according to a probability distribution which favors edges that are shorter and have higher build-ups of pheromone [23]. The random variable S is chosen based on the probability distribution: pk (r, s) =

[τ (r, s)] · [η (r, s)] β , ∑u/ Mk [τ (r, s)] · [η (r, s)] β ∈


if s ∈ Mk . Otherwise, / pk (r, s) = 0, (14.16)

where pk (r, s) is the probability of an ant k choosing to move from a city r to a city s [23]. So once all of the artificial ants have completed their respective tours the ant with the shortest solution goes back and lays pheromone along the edges it used in its solution. The amount of pheromone ∆φ(r, s) that the ant deposits is inversely proportional to the length of the tour. This global trail updating formula is: ∆φ(r, s) ← (1 − α) · φ(r, s) + α · ∆φ(r, s), (14.17)

where ∆φ(r, s) = 1/shortesttour. This global trail updating ensures that the better solutions receive a higher reinforcement. Local trail updating, on the other hand, ensures that not all of the ants choose a very strong edge. This is achieved by applying the local trail updating formula: τ (r, s) ← (1 − α) · τ (r, s) + α · τo , (14.18)

where τo is simply a parameter. The beauty of this reinforcement learning system is that the colonies behavior depends on two constraints, the pairs of formulas (1.1) and (1.2), as well as, (1.3) and (1.4). The first pair allows an ant to exploit the colony’s accumulated experience, with probability qo , in the means of pheromone levels, which has been built up on edges that belong to short tours. The second pair allows the ant to make decisions based upon exploration, with probability (1 − qo ), that is biased towards short and high trail edges. This exploration is one of new cities that are chosen randomly with a probability distribution that is a function of the heuristic function, the accumulated pheromone levels and the working memory Mk .


sam boone: analytical investigation of the optimal traffic organization of socia

1. Homework Problem 1: An open rectangular box with a square base is to be made from 96 f t2 of material. Find the dimensions that will allow the box to have the maximum possible volume. Solution The surface area of the box will be equal to the sum of the area of the base and the four sides. This must all be equal to 96 f t2 , A = 96 = x2 + 4xy. By rearranging and isolating y we are left with, 96 − x2 . 4x Simplifying this further, y= 24 1 − x. x 4 The expression for the volume of our box is as follows, y= V = x2 y. Substituting in our expression for y we are left with, 1 V = 24x − x3 . 4 Now we differentiate both sides, 3 V = 24 − x2 4 V = 3 (32 − x2 ) 4 (14.23) (14.20) (14.19)





Therefore, when V = 0, x must be equal to +/ − a measurement it must be positive. Therefore, √ x = 32 f t. = 5.66 f t., and y = 2.83 f t. The volume of the box is then, V = 90.56 f t2 .

32. SInce x is (14.26)



2. Homework Problem 2: In this chapter the only specific ACO algorithm that we discussed was the one designed for the Traveling Salesman Problem. We did, however, mention that ACO algorithms have been used for many different optimization problems in many different fields. In general, what aspects of the ACO algorithm would need to be adjusted for the algorithm to be able find a solution for a different problem? Solution: The major aspects of an ACO algorithm that determine the problem solving abilities of that algorithm are the probabilistic biases that the artificial ants’ movements abide to, the way in which they lay pheromone and reinforce quality solutions, the movement constraints that exist on the ants due to the problem’s intrinsic design and whether or not the artificial ants possess extra, problem specific abilities that assist them in their search for the solution.

14.5 solution to the traveling salesman problem using an aco algorithm


3. Exam Question 1: The process in which an artificial ant in an ACO algorithm for the Traveling Salesman Problem retraces his steps after every ant has completed its respective tour and adds pheromone to the edges he used is called: a = local trail updating b = the pheromone model c = stigmertic building d = global trail updating Solution: d = global trail updating 4. Exam Question 2: Which of the following traits is not a difference between artificial ant of the ACO algorithm and real ants? a = The amount of pheromone that is deposited by a single ant correlates to the quality of the solution b = A preference to choose trails that are rich in pheromone c = The possesion of an internal state that contains the memory of the owner’s past actions d = The way in which the ant alters their environment to communicate with the rest of the colony Solution: b = A preference to choose trails that are rich in pheromone





hysics as a whole is an area in which precise measurement and observation is integral to the expansion of the field. It is perhaps for this reason that the study of living systems under the framework of physics is an oft overlooked application. This chapter therefore concerns itself with attempting to approach one of the groups of living systems most closely correlated to preexisting physical concepts and theories: the social insect. But just what are social insects? This question is best answered by taking into account the concept of eusociality, but since this involves at least some digression into the world of sociology, we will leave it to be explored at will by the reader. For our purposes, the physical aspects of the social insects are most important. We have a group of hundreds to thousands of insects of the same species who, individually, show almost no signs of intelligence or survival skills but are a functioning group as a whole. The physical dynamics of this group in its entirety shall be our focus. It is the author’s experience that nature is a thing of patterns, and this observation will be the basis for the structure of this chapter. We shall apply to the dynamics of social insects the concepts and theories derived from a more well-explored avenue of physics, namely electromagnetism. Note: This chapter serves a dual purpose as not only a method of studying the dynamics of social insects, but also as an example of how to approach a concept with "‘the physicists mind"’. That is, the chapter explains how to expand the ideas of physics to cover new areas.



the electron and the ant

In comparing social insect systems with electromagnetic ones, the first logical correlation to draw is of the fundamental constituents of the system. In social insect systems, this is the individual insect (which shall hereafter be referred to by the synecdoche "‘ant"’ to save time). In electromagnetism, this is the electron. The second logical correlation is of the system as a whole. For social insects, this is the colony. For electromagnetism, our analogy will be the circuit. So what does a circuit do? We know that individual electrons moving through a conducting material creates a current. This current is used to power various components in the circuit which achieve some goal. Likewise, ants move in a constrained group in some sort of cycle to achieve some goal, namely the survival of the group. We can say with confidence that the movement of the ants is indeed cyclical simply because they are members of the colony. If they leave the center of the colony (to fulfill some goal for the group), they will return at some time (probably having fulfilled said goal). We can define the center of the



christopher kerrigan: the physics of social insects

colony as the place where more ants are produced. This has an obvious analogy to the battery (or EMF) in an electric circuit. Also, we can safely say that the constrained group in which the ants move is probably a single-file line because of the width of ant tunnels (and from general observation.) Knowing that the processes of both systems occur in a similar fashion, perhaps we can use the equations of electric circuits to describe the motion of the ants. 15.3 insect current

The current in a circuit is any motion of charge from one region to another, and can be described as the net charge flowing through some cross-sectional area per unit time. We should be able to define the insect current in the same manner, replacing the charge with the ant. This current can be described as the number of ants crossing some width (assuming the ants move along a roughly two-dimensional surface (the ground) and do not walk on top of each other) per unit time, I= N , t (15.1)

where N is the number of ants. With this simple and obvious comparison, we can draw many conclusions about the motion of a group of social insects. example 15.1: the ant hill
A group of one hundred ants is moving in a single-file line into the ant hill, whose diameter can fit one ant. If ants can travel at a rate of .3 meters per second and an ant’s size is 10 mm, what is their current through the ant hill? Answer: Our line of ants is 10mm × 100 = .1m long. A line of ants of this length moving at the given rate will take (.1m)/(.3m/s) = 1/3 seconds. Knowing both the time and the number of ants, we can easily calculate the current, I = 100/ 1 = 300 ants per second. 3


insect diagram

The destinations of social insect groups can often be defined in terms of the provisions that the group needs to survive. The most important factor for the survival of our ant colony is its food, so we can interpret the possible destinations of the ants as locations with food to bring back to the center. We are beginning to make it possible to chart the movement of the colony. We know that the ants come from a source and move in lines toward a source of sustenance which they collect and return to the colony center with. This is analagous to the circuit, whose electrons come from a battery and travel through a conductor through various pathways which, by the definition of a complete electric circuit, must lead them back to the battery. Since this process can be charted by way of the circuit diagram, we can chart the progression of the ants in a similar manner. Let us first examine the well-known structure of the circuit diagram by referring to Fig. 1. We can see that the diagram depicts a battery and

15.5 kirchoff’s rules (for ants)


Figure 15.1: Circuit

two resistors in parallel. If we imagine the battery as being the center of the ant colony and the resistors as being sources of food, we have what we shall refer to as an insect diagram. Ants leave the center from the "‘positive side,"’ travel to one of two (for example) food sources, and return to the center on the "‘negative side"’. 15.5 kirchoff’s rules (for ants)

We will now apply two well-known theorems to the notions that we have already established, namely those of current and diagram. The theorems to which we refer are Kirchoff’s junction rule and his loop rule. The former states that the algeraic sum of the currents into any junction is zero. This would mean, in our insect terms, that the number of ants going into any point at which they decide which food source to travel to is equal to the number of ants at all the subsequent food sources. This should be not only true, but obvious (assuming no ant dies along the way). The loop rule states that the algebraic sum of voltages in a closed circuit must be zero. In our terms, this may mean that there must be both an ant source and a food source in order for a path to exist. This makes logical sense, since ants do not simply go out wandering. We have therefore successfully applied an important theorem of electrodynamics to a macroscopic and very real system. 15.6 differences

When applying a theory that works to describe some system to a totally new system, perhaps the most important relationship between the two is that of their differences. A difference that seems particularly suited to analysis is the fact that electrons will travel in the direction of lower voltage, while ants seem to choose an initial path rather randomly and then perfect this path based on the movement of the returning ants and by sensing the pheromones left behind by ants who have travelled the better path. We shall explore this behavior as an exercise in making the differences clear. Assume that two ants start a journey towards food with equal probabilities of going on either of two paths (Fig. 2). We will say that one of the paths (the "‘lower"’ path in the diagram) is shorter, and thus it takes longer for the ant not on this path to get to the food (Fig. 3). Following


christopher kerrigan: the physics of social insects

Figure 15.2: Ants

Figure 15.3: Ants

this logic, the ant who has taken the shorter path will return to the nest faster (Fig. 4). This means that just outside the nest, the pheromone density is twice as high on the shorter path (because it has been passed twice, whereas the longer path has only been passed once until the ant makes it back to the nest). Other ants leaving the nest, then, will opt for the shorter path because of the higher pheremone density. Over many iterations, the ants further reinforce the shorter path and after a while this will be the only path used (Fig. 5). 15.7 the real element

We have managed to describe at least some of the motion of social insects in a quantitative manner, which was our initial goal. This method can be used to study the social insects in their environment, but it is important to remember the constraints and assumptions that we may have taken for granted. The behavior of living beings is an extremely complicated concept to analyze, and it is therefore vital that we take our limitations into account. We can consider, for example, experiments in which ants in an induced state of panic and given two doors to escape the "room" they are in will not leave in a symmetric manner, as a physical analysis may predict. They instead show a tendency to "follow the leader," which is a complex phenomenon far beyond the reach of this text and

15.8 problems


Figure 15.4: Ants

Figure 15.5: Ants

perhaps even biophysics as a whole. This situation is intended to illustrate the fact that physical representations of systems are just those: representations, and should be applied as such. 15.8 problems

1. Calculate the probability that one of many ants is at one of n food sources. The source of the ants and the food sources are all on a line with a distance l between each.
Solution: We know that a given food source is n × l from the ant source. If we assume that the probability of an ant to go either direction at a food source is 1/2, we see that for a food source F (n) P[ F (1)] = 1 2 1 1 ( ) 2 2 (15.2)

P[ F (2)] =


1 P[ F (n)] = ( )n 2


2. Suppose the distance between two points a and b is d ab and Cab is the pheromone density along this line. What is the probability for a single ant


christopher kerrigan: the physics of social insects

to go Solution:









We can imagine that the probability of an ant moving in a direction is directly proportional to the pheromone density in that direction. Since density along a line is inversely proportional to the length of the line (whereas in three dimensions it is inversely proportional to the volume), we can say that P= Cab d ab (15.5)

Part IV W H AT ’ S O U T T H E R E





eptune was first officially discovered on September 23, 1846. While the discovery of any new planet is normally viewed by society as a rather important event, Neptune’s discovery stands out as notable in the scientific community as well because of the manner in which it was found. Unlike the other seven planets in our solar system Neptune was not first directly observed, but rather it’s existence was theorized due to the effects of its gravity, in accordance with Newton’s Laws.



newton’s law of universal gravitation

Before we can look into exactly how it was that astronomers found Neptune, we must first understand the laws that govern celestial bodies. The interaction of any two masses was described by Newton with his Law of Universal Gravitation which states that every point mass acts on every other point mass with an attractive force along the line connecting the points. This force is proportional to the product of the two masses and inversely proportional to the square of the distance between the points F=G m1 m2 r2 (16.1)

where the force is in Newtons and G is the gravitational constant equal to 6.674 × 10−11 N m2 kg−2 [101]. example 16.1: celestial gravitational attraction
Using Newton’s Laws we can find the forces acting upon Uranus due to both the Sun and Neptune, when Neptune and Uranus are at their closest: Orbital Radius of Neptune ( Rn ) = 4.50 × 1012 m Orbital Radius of Uranus ( Ru ) = 2.87 × 1012 m Rnu = Rn − Ru = 1.63 × 1012 m Mass of Sun ( Ms ) = 1.99 × 1030 kg Mass of Neptune ( Mn ) = 1.02 × 1026 kg Mass of Uranus ( Mu ) = 8.68 × 1025 kg

Fs−u = G

Ms Mu = 1.399 × 1021 N ( R u )2 Mn Mu = 2.224 × 1017 N ( Rnu )2


Fn−u = G


So the force acting on Uranus due to the Sun’s gravity is 4 orders of magnitude larger than the force due to Neptune’s gravity.



robert deegan: the discovery of neptune

Figure 16.1: Predicted and Actual Orbits of Uranus and Neptune. Grey = Uranus’ predicted position, light blue = Uranus’ actual position, dark blue = Neptune’s actual position, yellow = Le Verrier’s prediction, green = Adams’ prediction [94]

Given that one object is much more massive than the other, this attractive force will cause the less massive object to accelerate towards the more massive object while having little effect on the latter. However, if the less massive object is not at rest but rather has some initial velocity vector and thus an initial angular momentum, then this force will not cause the lesser mass to accelerate directly towards the larger mass but rather cause it orbit about the larger object due to the necessity to conserve the angular momentum of the system. This obviously is the reason the planets in our solar system orbit about the Sun, due to its relatively large mass. However, the orbits of the planets in our solar system are not just affected by the Sun, in most cases the gravitational field of adjacent planets are also strong enough to influence each others orbit, as we shall see momentarily. 16.3 adams and le verrier

Shortly after Uranus was discovered in 1781 its predicted orbit was calculated using Newton’s laws of motion and gravitation. However, by looking at data for the position of Uranus from 1754-830(some observations of the planet were made prior to the discovery that it was in fact a planet) and specifically over the period from 1818-1826 it was noticed that there were discrepancies between Uranus’ predicted and observed orbit. Tables predicting Uranus’ orbit had been made shortly after its discovery and soon comparison to these tables noted discrepancies to large to be accounted for by observational error or by the effects of Saturn and Jupiter on Uranus. As these discrepancies grew larger and larger in the early 1800’s they became more and more troubling, and numerous theories to account for this were postulated, one being that there was some unknown body perturbing the orbit. This theory appealed to two men, Urbain Le Verrier and John Couch Adams. Both Adams and Le Verrier set out independently to investigate it.

16.4 perturbation


Observations had shown that not only was Uranus not at the correct position in it’s orbit at a given time, but also that its distance from the sun was substantially larger than it was predicted to be at some points. This evidence to some further supported the hypothesis of the unknown planet, and the increased radius vector clearly showed that this mysterious planet must be outside of Uranus’ orbit. This is the assumption Adams made as he began his investigation, but he quickly ran into trouble. The typical way of determining a perturbing bodies effect on an orbit was to calculate the gravitational effects of a body of known mass and position on another planet, and subtract these values from the observed orbit. After this had been done one could see the "true" orbit of the planet, as it would be without the perturbing body, and then when adding back in the effects of the perturbing planet one gets the actual elliptical orbit followed by the planet and can thus predict its location with extreme precision. In this case however, the characteristics of the perturbing body were unknown which made it impossible to calculate the "true" elliptical orbit, as this required knowing the perturbative effects. And these effects in turn could only be found once one already knew the "true" orbit, so Adams’ approach was to solve both problems simultaneously. Using this approach Adams soon ran into another problem, that being that the perturbation caused by a more massive planet further away from Uranus would be indistinguishable from the effect of a less massive planet closer to Uranus. To avoid this issue Adams simply guessed that the average distance from the sun of the perturbing planet was twice that of Uranus, and then to come back and adjust this assumption after he had completed his calculations were it necessary to do so. From this point on Adams simply needed to right out a serious of equations relating the predicted position at each date to the actual position it was found at. Adams wrote 21 such equations, one for every third year for which he had observational data, and solved each of these equations one at a time. Alone these were not enough to give the characteristics of the perturbing planet, but each narrowed down the possibilities further and further until after all 21 equations had been solved Adams had a very accurate model describing the characteristics of this new planet. Adams then calculated the effect this new planet would have on Uranus’ orbit and found that it accounted for all of the discrepancies between Uranus’ actual and predicted orbits. Finally, Adams calculated the longitudinal position he expected to find Uranus at and gave this data to the Royal Astronomer in order to confirm the existence of this new planet there, and indeed the new planet was observed to be within a few arcseconds of Adams’ prediction[77]. At the same time in France, Le Verrier was going through the same processes and calculations and came to an almost identical result by which astronomers in France independently discovered this new planet at almost the exact same time that this was being done in England. 16.4 perturbation

It is important to examine how exactly Neptune’s gravitational field perturbed the orbit of Uranus. First of all, the gravitational interaction between Neptune and Uranus obviously puller Uranus further from the Sun and Neptune closer to it. Also, as a result of this increased radius at certain points in Uranus’ orbit the average distance of Uranus from


robert deegan: the discovery of neptune

Figure 16.2: Graviational Perturbation of Uranus by Neptune[18]

the Sun was increased. By Kepler’s third law, P2 = 4π 2 a3 /G ( M + m), we know that as a result of this period of revolution of Uranus was increased slightly, and obviously the opposite was true for Neptune[19]. However, neither of these was the most drastic or noticeable effect; since Uranus is closer to the Sun than Neptune it obviously traversed its orbit faster than Neptune and so there were points where Neptune was slightly in front of Uranus and points where it was slightly behind it. In the case of the first situation Neptune’s gravitational attraction would pull Uranus towards it effectively speeding up Uranus’ motion through its orbit and pulling it ahead of the predicted location. The opposite was true when Neptune was behind Uranus, it then pulled it back and slowed its orbit. This effect is what caused the discrepancies in Uranus’ longitude that astronomers noticed, as Neptune and Uranus got closer together in the early 1800’s the effect became more pronounced since the gravitational force between them was increasing, and this is what caused astronomers to finally throw out their current predictions of Uranus’ orbit and try to determine what what was causing these discrepancies. 16.5 methods and modern approaches

The method Adams used to predict the characteristics of Neptune was an incredibly tedious one and at the time was thought to be the only possible way to solve the problem. Since then however, another possible way to solve this problem has been postulated. This process involves looking at the problem as a three-body system, in which we examine the mutual gravitational interactions of the three bodies involved, the Sun, Neptune, and Uranus. This problem is certainly not a trivial one, and though numerous particular solutions to this problem have been found there is still no general solution, and many believe such a solution is actually impossible as this problem involves chaotic behavior[21]. An attempt at a particular solution to this problem for the case of the Neptune-Uranus-Sun system is beyond the level of this book and so we shall not attempt it here. It is significant to note though that despite numerous advances in the field of celestial mechanics since the time of Adams and Le Verrier, the method they used is still the only

16.6 practice questions


practical one for solving this problem. As mentioned in this section there are other possible ways to solve this problem but they are far more complicated and no more accurate, so if in the present day there was a need to find the cause of some unexplained perturbations the same method that Adams and Le Verrier followed would probably still be used(though it would be much faster and more accurate thanks to use of modern computers). 16.6 practice questions

1. Astronomers noted that Uranus was not following its predicted elliptical orbit, but rather traversing a different ellipse. How much data is required to determine this, i.e. what is required to characterize an ellipse? 2. Assume that the gravitational pull of Neptune increased Uranus’ average distance from the sun by 3 × 1012 m, how much would this increase Uranus’ period by (in years)? 16.7 answers to practice questions

1. An elliptical orbit is mathematically characterized by six numbers: The average distance between the planet and the sun, the eccentricity of the ellipse, three angles to determine the orientation of the ellipse in space, and a point in time at which the planet is at a particular point in the orbit. The average distance of Uranus was already known to astronomers, and the eccentricity of the orbit is determined from the length of the semi-major and semi-minor axises. It requires two reasonably separated data points to determine these axises. In order to determine the orientation of the orbit in space requires three data points as one would expect, and again some separation is required between these points. Since the points used to determine the eccentricity and the orientation of the orbit need not be separate, we can thus conclude that along with the knowledge of the average distance to the sun(the semi-major axis), it requires only three reasonably separated observations to determine the orbit of a planet. 2. We can apply Kepler’s 3rd Law here to find what effect this increase in orbital distance will have on the period: P2 = a3 = = 2.7 × 1037 m3 −11 s−2 m3 kg−1 G = 6.67 × 10 M = 1.99 × 1030 kg m = 1.02 × 1026 kg P2 = P=
1.066×1039 m3 = 8.03 × 1018 s2 1.327×1020 s−2 ∗m3 2.83 × 109 s = 89.8 years 4π 2 ∗ a3 G ( M+m) (3 × 1012 )3

A L E X K I R I A K O P O U L O S : W H I T E D WA R F S




hite Dwarfs are class D stars based on the Morgan-Keenan spectral classification. They fall under extended spectral types and can further be differentiated by DA (H rich), DB (He I rich), DO (He II rich), DQ (C rich), DX (indeterminate), DZ (Metal rich), and DC (no strong spectral lines). The additional letter indicates the presence of different spectral lines which indicate what elements are present in the atmosphere and do not correspond to other star classes. They are characterized by low mass about one solar mass [73], small sizes with characteristic radii of 5000 km [73], and no nuclear fusion reaction in the core. White dwarfs, neutron stars, and black holes are fall under a caterogr of compact objects. These compacts objects do not burn nuclear fuel and have small radii. White dwarfs are thought to be in late evolutionary stages of less massive stars and from lack of nuclear reactions in the core are radiating their residual thermal energy slowly over time. Sirius B a typical white dwarf star has a solar mass of about 0.75 M to 0.95 M and a radius of about 4700 km [73]. This corresponds to a planetary sized star with a mass nearing that of the Sun. Because of their small radii white dwarfs have higher effective temperatures than other stars and hence appear whiter. The luminosity varies as R2 T 4 for a black body. [73]. This make white dwarfs extremely dense stars with mean densities of 106 g/cm3 , 1, 000, 000 times great than the sun. All four fundamental forces are actively involved in the dynamics of these stars [73]. The tremendous density values create immense gravitational forces that pull the star together. This creates an electron degeneracy pressure that supports the star from gravitational collapse. According to the uncertainty principal if the electrons positions are all well define then they have a correspondingly high uncertainty in their momentum. Even if the temperature were to be zero there would still be electrons moving about. When pressure due to the confinement of the matter exceeds that of thermally contributed pressure, the electrons are referred to as degenerate. Black holes however are completely collapsed stars that had no means of creating a pressure great enough to support against gravitational collapse. Simplifying this problem into the particle in the box with dimensions L x , Ly , and Lz and setting the potential to be zero in the box and


h ¯ ¨ infinite outside the box the Schrodinger equation reads − 2m 2 ψ = Eψ. ψ factors into 3 functions of X ( x ), Y (y), and Z (z) and yields three second order ordinary differential equations, one for each of the three functions. Solving them and setting


k x,y,z =

2mEx,y,z h ¯




alex kiriakopoulos: white dwarfs

and taking into consideration the boundary conditions k x,y,z L x,y,z = n x,y,z π. The allowed energies of this wave function are h2 k 2 ¯ 2m (17.2)

Enx,y,z =


where k is the magnitude of the wave vector k and each state in this 3 system occupies a volume π for n = 1. Since the electrons behave as V identical fermions and hence are subject to the Pauli exclusion principle only two can occupy any given state. Therefore they fill only one octant of a sphere of k-space [37] 1 4 3 Nq π 3 ( πk F ) = ( ) 8 3 2 V


V , the electrons must take, N is the total number of electrons, and q is the number of free electrons each atom contributes. Nq Then ρ ≡ V is the free electron density. The Fermi surface here is the boundary separating the occupied and unoccupied states in k-space. The corresponding energy is called the Fermi energy the energy of the highest occupied quantum state in this system of electrons. The Fermi


where k3 = (3ρπ 2 ) 3 is the radius determined by the required volume, F

energy is EF = gas. 17.2

h2 ¯ 2 2 3 2m (3ρπ ) ,

and this is the energy for a free electron

the total energy

To calculate the total energy in the gas considered a shell of thickness dk which contains a volume 1 (4πk2 )dk with the number of electrons 8
V occupying that shell to be π2 k2 dk. Each state carries energy Thus the total energy must be

h2 k 2 V 2 ¯ k dk. 2mπ 2

Etotalelectron =

h2 (3π 2 Nq) 3 − 2 ¯ V 3 10π 2 m



This energy is analogous to the internal thermal energy of ordinary gas and exerts a pressure on the boundaries if it expands by a small amount [37], dV. dE = − 2 E dV resembles that of the work done by this electron pres3 V sure, 2E (3π 2 ) 3 h2 5 ¯ P= = ρ3 3V 5m


Evidently the pressure depends on the free electron density. It is when this pressure equals the gravitational pressure from the mutual attraction of the ensemble, do white dwarf conditions occur. This is the fundamental difference between white dwarfs and "‘normal"’ stars. Stars usually have cores that sustain nuclear fusion of hydrogen into helium that counteract the gravitational pressure. The white dwarf

17.3 question


sustains itself however through the electron degeneracy pressure. This is a quantum mechanical result. This is why a cold object would not simply collapse after being continusouly cooled and allows the white dwarf to exist. Expressing the total energy as the sum and the total electron energy 2 plus the gravitational energy of a uniformly dense sphere, U = 3 GM , 5 r and setting the equation to zero, Etotal = Etotalelectron + U = 0 reveals the radius function for this minimization is, R=( 9π 2 h2 q 3 ¯ )3 1 2 4 GmMnuclei N 3



The equation reveals a fundamental nature of these compact objects; as the mass increases the radius decreases in this case we would be increase N, the number of nucleons. This is how objects such as black holes or neutrons stars which, are other compact objects like white dwarfs, can contain so much mass into a smaller and smaller radius yielding higher densities. Energy transport within the star occurs via conduction because the particle’s mean free paths are increased due to density. Particles then find it difficult to collide since all the lower states are filled. The coffeficent of thermal conductivity becomes very large and the interiors are not far from being isothermal making the core much hotter. A surface temperature of 8000 K will have a core tmeperature of 5, 000, 000 K [80]. The stars itself is composed mainly of carbon and oxygen but high gravitational forces separates these elements from lighter elements which are found on the surface. Spectroscopy techinques reveal atmosphere to be compose of either helium or hydrogen dominate. The atmosphere may however contain other elements in some cases such as carbon and metals. Magnetic fields in white dwarfs are thought to be due to conservation of total surface magnetic flux. A larger progenitor star generating a magnetic field at one radius will produce a much stronger magnetic field once its radius has decreased according to conservation of magnetic flux. This explains the magnetic fields on the order of millions of guass in white dwarf stars. The strength of the field is calculated by observing the emission of circularly polarized light. White dwarfs once formed are stable and cool continuously until it can no longer emit heat or light. Once this happen the white dwarf is referred to as a black dwarf. No black dwarfs are not thought to exist however since the time required for a white dwarf to become a black dwarf is longer then the age of the universe [80]. 17.3 question h2 k 2 V 2 ¯ k dk 2mπ 2 h2 (3π 2 Nq) 3 − 2 ¯ V 3 10π 2 m

dE = and


Etotal electron =



alex kiriakopoulos: white dwarfs

What is the integral of the above derivative. What is it? What are the limits of integration? What does the upper limit represent? Answer: Etotal electron = h2 V ¯ 2π 2 m
kF 0

k4 dk


The limits are 0 to k F the radius of a sphere of k-space. The k-space radius, k x,y,z depends on the energy levels, n x,y,z .





he universe started with a big bang. The temperatures were so high in the minutes after this event that fusion reactions occurred. This resulted in the formation of elements such as hydrogen, deuterium, helium, lithium, and even small amounts of beryllium.[6] This is known as Big Bang Nucleosynthesis, and nicely explains the presence of these lighter elements in the universe. However, the brevity of this process is believed to have prevented elements heavier than beryllium from forming.[53] So what is the origin of oxygen, carbon, nitrogen, and the many other heavy elements known to man? And how do we explain the significant abundances of these elements in our solar system?



creation of heavy elements

Nuclear fusion is the process by which multiple atomic nuclei join together to form a heavier nucleus.[61] As explained before, this was widespread just after the Big Bang. The result of this process is the release of considerable amounts of energy; the resultant nucleus is smaller in mass than the sum of the original nuclei, and the difference in mass is converted into energy by Einstein’s equation, E = mc2 .[53] Nuclear fusion also occurs in the cores of stars and is the source of their thermal energy. In general, large stars have higher core temperatures than small star because they experience higher internal pressures due to the effects of gravity.[53] Thus, a star’s mass determines what type of nucleosynthesis can occur in its core. In stars less massive then our sun, the dominant fusion process is proton-proton fusion. This converts hydrogen to helium. In stars with masses between one and eight solar masses (we define our sun as one solar mass), the carbon cycle fusion process takes place.[53] This converts helium into oxygen and carbon once hydrogen is depleted within the star. In very massive stars (greater than eight solar masses), carbon

Figure 18.1: Elements are produced at different depths within a star. This illustrates the elements that are produced in massive stars (not to scale). Notice the iron core.[39]



daniel rogers: supernovae and the progenitor theory

and oxygen can be further fused into neon, sodium, magnesium, sulfur and silicon. Later reactions transform these elements into calcium, iron, nickel, chromium, copper, etc. In a supernova event, neutron capture reactions lead to the formation of elements heavier than iron.[53] Thus, we see that all heavy elements are formed in the cores of stars at various points in their lives as they burn through their thermonuclear fuel. In general, the mass of a star can be used to determine what elements are formed and the abundances that are produced.

example 18.1: conversion in the sun
How much hydrogen is converted to helium each second in the sun? Use the fact that the sun’s luminosity is 3.8 × 1026 W, and that 0.7% of the hydrogen mass becomes energy during the fusion process. Solution: We know that the sun produces 3.8 × 1026 W, and so 3.8 × 1026 J are emitted each second. Now, simply use Einstein’s equation. E = mc2 ⇒ m = 3.8 × 1026 J E = 4.2 × 109 kg = c2 (3 × 108 m )2 s (18.1)

This is the mass converted to energy in the sun each second. We know that this mass is only 0.7% of the mass of the hydrogen that goes into the fusion process. MH = 4.2 × 109 kg = 6.0 × 1011 kg 0.007 (18.2)

So we see that the sun fuses about 600 billion kg of hydrogen each second, though about 4 billion kg are converted into energy. The remaining 596 billion kg becomes helium.[6]


dispersal of heavy elements

A star experiences a constant struggle against collapse due to the gravitational force of its own mass. Throughout the main sequence of its life, it is able to resist gravity with thermal pressure, as the fusion of elements in its core heats up the star’s interior gas. The hot gas expands, exerting an outward pressure that balances the inward force of gravity. The life of a star ends when it is completely depleted of thermonuclear fuel, and gravity is able to overcome this outward thermal pressure.[53] The life span of a star and its final state are determined by the mass of the star. Large stars generally live shorter lives than small stars; although they have more fuel for nuclear reactions, their rate of consumption is much greater. When a relatively small star runs out of fuel, it collapses because of gravity and becomes a white dwarf. At this point, the only outward pressure is due to electron degeneracy. Since white dwarfs succumb entirely to gravity and never explode outward, the elements formed in their cores are never ejected into space.[6] Stars that are large, however, experience different effects due to greater gravitational forces during collapse. As a massive star begins to run low on hydrogen fuel, the iron it produces piles up in its core. Iron has the lowest mass per nuclear particle of all nuclei and therefore cannot release energy by fusion. Once all the matter in the core turns to iron, it can no longer generate any energy.[6] This marks the beginning of collapse. As massive stars collapse, reactions take place in which electrons and protons are forced together with such great amounts of

18.3 dispersal of heavy elements


Figure 18.2: (a) The layered shells of elements in a massive star. (b) Collapse begins when fuel is depleted. (c) As gravity takes over, the star shrinks significantly. (d) The red area experiences enormous outward forces as hot gases pile up on the degenerate core. (e) The gases are ejected outward at high speeds. (f) All that remains is the degenerate neutron core.[39]

force that they merge to become neutrons. Quantum mechanics restricts the number of neutrons that can have low energy, as each neutron must occupy its own energy state. When neutrons are tightly packed together, as they are in this case, the number of available low energy states is small and many neutrons are forced into high energy states. The resulting neutron degeneracy pressure quickly stops gravitational collapse and the matter in the star is subjected to an enormous outward force. With gravitational collapse halted suddenly, the outer layers of gas bounce back upon hitting the degenerate core like a large wave hitting a sea wall.[53] The violent explosion that follows is known as a supernova event. Most of the energy of a supernova explosion is released in the form of energetic neutrinos. It is this energy that initiates the formation of elements heavier than iron, as described before. The remaining energy is released as kinetic energy in the ejected matter. The shock wave sends the ejected material outward at speeds of over 10,000 km/s.[50] All that remains is the sphere of tightly packed neutrons, called a neutron star. If the original star was massive enough, the remaining neutron star may be so large that gravity also overcomes the neutron degeneracy pressure and the core continues to collapse into a black hole. Otherwise, it becomes nothing more than the corpse of a star that has depleted its fuel supply.[53] The expanding cloud of debris from the supernova explosion is known as a supernova remnant. The ejected gases slowly cool and fade in brightness, but they continue to move outward at high speeds. Carried with this debris is the variety of heavy elements produced in the core of the star, as well as those created by the collision of high energy neutrons during the supernova event. example 18.2: stellar equilibrium
To maintain equilibrium, a star’s outward thermal pressure must balance inward gravitational forces. This results in enormous pressure at its core. How does the gas pressure in the core of the sun compare to the pressure of Earth’s atmosphere at sea level? The sun’s core contains about 1026 particles per cubic centimeter at a temperature of 15 million K. At sea level on Earth, the atmosphere contains 2.4 × 1019 particles per cubic centimeter at a temperature of about 300 K. Solution: All we need to do here is apply the ideal gas law.

(1 × 1026 cm3 )(1.5 × 107 K) Psun nsun kTsun = = = 2 × 1011 (18.3) part PEarth nEarth kTEarth (2.4 × 1019 cm3 )(300 K)
The sun’s core pressure is about 200 billion times greater than the atmospheric pressure on Earth at sea level.[6]



daniel rogers: supernovae and the progenitor theory

Figure 18.3: An artist’s depiction of a solar nebula receiving supernova ejecta.[81]


progenitor theory

Now consider if part of this expelled cloud of isotopes (the ejecta) were to fall into the gravitational field of a neighboring star that is in the early stages of its formation. At this point the young star is surrounded by what is known as a protoplanetary disk, which is the disk of gas and dust from which planets and asteroids form. If the ejecta from the supernova event collided and blended with this disk, it would contribute to its chemical composition.[83] This is one of the leading theories that explains the presence of heavier elements in our own solar system. This is obviously impossible to prove by experiment, but fortunately there are ways to measure the probability that this process occurred by studying chemical properties of our solar system. One such property is the concentration of the stable isotope 60 Ni. This has been detected and measured in meteorites that have been untouched since the formation of our solar system. There is also a correlation between the amounts of 60 Ni and another stable isotope, 56 Fe, in these meteorites.[65] The laws of chemistry predict that the unstable isotope 60 Fe will decay into 60 Ni with a half-life of 1.5 million years. Since our solar system is 4.6 billion years old, it is theorized that it was initially 60 Fe that was present in the meteorites.[50] It is known that 60 Fe is one of the isotopes that massive stars form in their cores.[51] Therefore, 60 Ni might serve as an indicator as to how much supernova ejecta our solar system collected during its formation. 18.5 a mathematical model

This section will work through a simplified mathematical model of the progenitor theory. In the end, we will find an expression relating the radius of a young solar system with some of its chemical properties and distance from a likely supernova. Remember that there are more factors that real astronomers take into account, but this is a general idea of how they hope to prove the validity of the progenitor theory. We start with defining several variables. Let R be the radius of our solar system as it was during its formation 4.6 billion years ago. We will then call the area of our solar system AR = πR2 . Now we define r as the radius of the supernova remnant; here, we want r equal to the distance between the supernova and our solar system. We will assume that the material ejected from the supernova is uniformly distributed

18.6 conclusion


over a sphere that expands as the ejecta moves outward. Thus we will define Ar = 4πr2 . Now let Mr60 be the total mass of 60 Fe ejected from the supernova, and MR60 be the total mass of 60 Fe injected into our solar system. The amount of ejecta that a solar nebula can receive is inversely proportional to its square distance from a supernova explosion.[50] MR60 = Mr60 AR Ar (18.4)

Now let M be the total mass of our solar system, and MR56 be the amount of 56 Fe in our solar system per unit mass. Multiplying them together gives the total amount of 56 Fe in our solar system. Then we can define a relationship for the ratio of 60 Fe to 56 Fe injected into our solar system.[50]
60 Fe 56 Fe


MR60 ( MR56 M)


Since it is impossible to know exactly how much 60 Fe was initially present in our solar system, we can use an estimated ratio between 60 Fe and 56 Fe based on the 60 Ni meteoritic evidence mentioned before. This ratio has been determined to be on the order of 10−7 based on studies of the meteorites, though its precise value is still being investigated.[79] It must be noted that there is a relatively small window of time during the formation of a solar system in which supernova ejecta is optimally received. This window is on the scale of a few million years for a star the size of our sun. Thus, we need to assume that the progenitor star has a lifetime on this scale to increase the probability of it becoming a supernova during the necessary time period. Stars that are around 60 solar masses have short lifetimes of about 3.8 million years, making a star of this size a good candidate.[50] As discussed previously, the amount of iron produced in the interior of a star throughout its lifetime can be determined from its mass. For a 60 solar mass progenitor star, we can estimate the amount of 60 Fe produced to be about 0.0002512 solar masses. This was taken from research by Marco Limongi and Alessandro Chieffi in 2006.[51] We can use estimates for the other values in the equations above as well. For instance, we know that the mass of a protoplanetary disk around a solar mass sized star is about 0.01 solar masses.[50] Also, the iron in our solar system is thought to comprise roughly 0.014% of the mass of the entire system. An estimated 91.57% of this iron consists of the isotope 56 Fe. This means that the amount of 56 Fe in the solar system comprises 0.01282% of its mass.[50] By combining the two equations above, it is possible to find the minimum radius of a solar nebula for it to have received the appropriate amount of the isotopes from a supernova.[50] R=
60 Fe 56 Fe

4r2 ( MR56 M ) Mr60




Calculating the necessary radius for the protoplanetary disk of a solar mass star using known properties of our solar system suggests how probable it is that a supernova event played a role in its formation.


daniel rogers: supernovae and the progenitor theory

Protoplanetary disks rarely exceed a few hundred astronomical units (AU) and have never been known to stretch beyond 1,000 AU. The farthest we have observed a body orbiting our sun is roughly 47 AU, indicating that it may have been reduced to a few tens of AU before planet formation.[50] Thus, if these calculations yield radii on this scale across a wide range of values for r (again, the distance between the supernova event and our sun), then this will provide support for the progenitor theory of supernova injected isotopes. There are, of course, many factors that determined our sun’s distance from supernova events. These include the size of the star cluster in which it formed and the ratio of the total potential and kinetic energies in that star cluster (known as the virial ratio). As these properties change, the average distances between solar mass stars and likely supernovae do as well. This is significant since stars that are further from supernova events will receive fewer ejecta.[50] There is still one very important assumption that has been made as well. By calculating the radius we assume that the protoplanetary disk was face-on relative to the supernova event (to maximize the flux of the ejecta through the disk). Very little ejecta would be received by a disk if it were facing a progenitor edge-on.[50] These shortcomings are not grounds for abandoning this study, however, as this is still a likely explanation for the presence of heavy elements in our solar system. The results of these calculations are sure to bring us closer to the truth as mathematical techniques and computer programs evolve over the years to come. 18.7 problems

1. Homework Problem 1 Question: Use the result of Example Problem 1. How many times does the proton-proton fusion reaction occur each second in the sun? The mass of a proton is 1.6726 × 10−27 kg; hydrogen is composed of one proton, and helium is composed of four. The mass of a helium nucleus is 6.643 × 10−27 kg. Solution: Here, fusion converts four hydrogen nuclei (protons) into one helium nucleus. Four protons have a mass of 6.690 × 10−27 . When four protons fuse to make one helium nucleus, the amount of mass that disappears and becomes energy is as follows. 6.690 × 10−27 kg − 6.643 × 10−27 kg = 0.047 × 10−27 kg (18.7) From Example Problem 1 we know that the sun converts a total of 4.2 × 109 kg of mass into energy each second. 4.2 × 109 s mass lost per second = mass lost in each reaction 0.047 × 10−27 kg reactions = 8.9 × 1037 s


Thus, nearly 1038 fusion reactions occur in the sun each second.[6] 2. Homework Problem 2

18.7 problems


Question: Explain why the ratio of 60 Fe to 56 Fe is used in the progenitor theory, rather than the amount of 60 Fe. Solution: Since 60 Fe has a half-life of 1.5 million years, nearly all of it that was present during the formation of our solar system has decayed into 60 Ni. Scientists have been able to determine a ratio of 60 Ni to 56 Fe from samples of meteorites. These meteorites are indicative of the chemical composition of the early solar system. Scientists have also been able to roughly determine the total amount of 56 Fe present today, since it has not decayed. Multiplying the ratio of 60 Ni to 56 Fe by the total amount of 56 Fe gives an estimate of the total amount of 60 Fe injected into our solar system during its formation. 3. Test Problem 1 Question: Which of these elements had to be made in a supernova explosion? (a) calcium (b) oxygen (c) uranium Solution: (c) uranium 4. Test Problem 2 Question: Suppose there is a supernova explosion at some distance from a young solar system. If that distance is doubled, by what factor would the radius of the solar system need to be multiplied in order to receive the same amount of ejecta? (a) 0.5 (b) 2 (c) 4 Solution: (b) 2





he basic idea of the equivalence principle is to “assume the complete physical equivalence of a gravitational field and a corresponding acceleration of the reference system.”[26] This means that being at rest on the surface of the Earth is exactly equivalent to being in an accelerating reference frame free of any gravitational fields. This idea, when originally developed by Einstein in 1907, was the beginning of Einstein’s search for for a relativistic theory of gravity . There are three different ways to interpret the equivalence principle each of which allow or disallow different theories of gravity; Currently the only theory of gravity to satisfy all three is general relativity , this is part of what makes GR peerlessly elegant. Bear in mind that most discussion of the equivalence principle is in attempts to disprove or provide limitations on it in order to support an alternate theory of gravity. Because of this much of the discussion requires you to check you intuitive understanding of the universe even more so than GR.



weak? strong? einstein?

The weak equivalence principle states that “All test particles at the same spacetime point in a given gravitational field will undergo the same acceleration, independent of their properties, including their rest mass.”[92] This form of the equivalence principle is very similiar to Einstein’s original statement on the subject, but is referred to as “weak” because of the extent to which Einstein’s conceptualization of the equivalence principle matured. The other two interpretations of the equivalence principle use the weak equivalence principle as a starting point, assuming its truth. Very few, if any, theories of gravity contradict the weak equivalence principle. The Einstein equivalence principle states that the result of a local nongravitational experiment in an inertial reference frame is independent of the velocity of location of the experiment. This variation is basically an extension of the Copernican idea that masses will behave exactly the same anywhere in the universe. It grows out of the postulates of special relativity and requires that all dimensionless physical values remain constant everywhere in the universe. The strong equivalence principle states that the results of any local experiment in an inertial reference frame are independent of where and when it is conducted. The important difference between this variation and the first two, weaker, variations is that this is the only variation that accounts for self-gravitating objects. That is, objects that are so massive as to have internal gravitational interactions. Therefore, this is an extremely important idea because of the extreme importance of self-gravitating bodies, e.g. stars, to our understanding of the universe.



david parker: the equivalence principle

Figure 19.1: Bending starlight in a space-time diagram [87]



The easiest and most drastic consequence of the equivalence principle is that light will bend in a gravitational field. Imagine an elevator freely falling in a gravitational field with a laser on one wall. If the laser shines then, according to the laws of relativity, the light will hit the point on the wall that is directly opposite the laser. an observer in the elevator will correctly assert that the light traveled a straight line. However, from the perspective of an observer on the surface of the Earth, or whatever body is causing the gravitational field, the light will still hit the point on the wall directly opposite the laser, but because of teh elevator’s downward motion the light will have followed a parabolic path. This was how Einstein originally described the phenomenon of light bending in a gravitational field, and this is still the best way to describe it. A sharp student may point out that if you continued this thought experiment by having countless elevators falling side by side with windows to allow the laser to shine through them all that the light would actually bend at a rate twice that of a normal object falling in a gravitational field because the elevators are all accelerating radially rather than in the same direction. And with this student I would have to agree, but Einstein got there first. 19.4 example

1. Imagine that an elevator is falling freely in Earth’s gravitional field with a laser mounted in the ceiling pointing directly at the floor. When the laser shines would an observer at the floor of the

19.5 problems


elevator see the light as doppler shifted? Would an observer on the surface of the Earth? Answer No the light would not appear redshifted to the observer inside of the elevator, from his perspective he is in a completely motionless box that is free from any external fields. However for the observer on the surface of the Earth the light from the laser would of course be blue-shifted because of its descent into the gravitational field, or conversely, because of the acceleration of the observer up to it. 2. Are there any constraints on the equivalence principle, and if so, what are they? Answer There are constraints, and they actually change the nature of thought about the equivalence principle greatly. The equivalence principle is only valid in completely flat space, or a homogenous gravitational field. However, since there is no place in the universe where we will find either of those things the equivalence principle can only actually be applied in a infinitesimally small section of space-time. 19.5 problems

1. An elevator on Earth is accelerating upwards at 6.6 m/sec2 . How long will it take a rock of 0.2 kilograms to hit the floor of the elevator if it is dropped from a height of 1.5 meters? Answer 1.5m = f rac12(6.6m/sec2 + 9.8m/sec2 )t2 t = .43sec 2. If granite was found to fall at a different rate than water what consequences would this have for the principle of equivalence? Answer This is a direct violation of the WEP and therefore would invalidate all of GR and most other theories of gravity in one fell swoop. 3. If the speed of light was measured in an area of flat space, and then measured again on the surface of the Earth, discounting atmosphere, in which reference frame would the speed of light be faster? Answer They would both be the same. The speed of light in a vacuum is a constant in all reference frames. The equivalence principle combined with this fact, allows us to predict the bending of light in a gravitational field.




introduction: the ‘four’ forces

urrent models of this universe explain all interaction in terms of four fundamental forces. These forces (gravity, electromagnetism, and the strong and weak nuclear forces), are all defined by unique “sources," or properties of individual particles that determine their attraction or repulsion in terms of the specific force. For example, observe the known electric potential for a pair of particles:


Ue (r ) = −

1 4π


q1 q2 r

Here it can be seen that the only properties of the matter that determine the electrical potential are q1 and q2 , the respective charges of the particles. Thus, the “source" of the electric portion of the electromagnetic force is the value of charge. In the case of (non-relativistic) gravity, only the gravitational mass of each particles acts as the source for the potential: m m2 Ug (r ) = − G 1 , (20.2) r where G is Newton’s gravitational constant1 . Notice that in both of these cases, the potential falls off as the separation r increases, but has an infinite range. In the case of the nuclear forces, this is not the case: their ranges are finite.2 example 20.1: the resultant force
Forces are often described by their potential. Once such a potential is described, however, it is simple to determine the force experienced by a system of two particles: F12 = − d U (r ), dr (20.3)

where F12 is the force experienced by one particle away from the other. In the case of gravity: F12 = − d m m −G 1 2 dr r

= −G

m1 m2 , r2


the negative sign implying the particles move toward one another.

In the early 1980s, some irregularities in experimental data prompted many to ponder the existence of another, fifth primary force, with a potential determined by particle properties unique from these four known forces. 20.2 the beginning: testing weak equivalence

A basis of both Sir Isaac Newton’s Law of Universal Gravitation and Albert Einstein’s General Theory of Relativity, the weak equivalence prinm 1 Currently accepted to be 6.673 · 10−11 kg·s2

2 Roughly 10−15 m for the strong force and 10−18 m for the weak force



richard rines: the fifth interaction

ciple is both elegant in its simplicity and profound in its implications. Put simply, the principle states that: “All test particles at the same spacetime point in a given gravitational field will undergo the same acceleration, independent of their properties, including their rest mass." [91] More subtly, this implies that the inertial mass of an object (the proportionality between the force acting on the object and its acceleration) is exactly equal to its gravitational mass (the proportionality between the force an object experiences in a gravitational field and the strength of that field). This point is readily seen in observing Newton’s gravitational potential (equation (20.4)). Late in the 19th and early in the 20th century, Physicist Lorand Eötvös set out to verify this equivalence. Between 1885 and 1909, he devised, implemented, and refined an experiment which evidenced the equivalence to a much higher degree of accuracy than had been previously shown. Put simply, his procedure involved hanging different types of masses in a balance along a solid rod. Torque would then be applied to the rod from two sources: differences in gravitational force on the two masses (a measurement of the gravitational mass of the masses), and differences in the centrifugal force each mass experiences (a measurement of the inertial mass). Only if these two masses are equivalent would the rod remain perfectly stationary. As with any physical procedure, experimental uncertainty and imperfections plagued Eötvös’s results. These were originally averaged out to provide fairly precise results favor of weak equivalence. As subsequent tests of the weak equivalence principle, using slightly different experimental procedure, did not observe such uncertainty, the variations in Eötvös’s results were seen as the result of imprecise experimental process. More recently, however, physicists such as Ephraim Fischbach began to reexamine these irregularities in terms of a possible fifth interaction between elementary particles. 20.3 a new force

Well after the time of Eötvös, physicists were struggling to explain two, seemingly unconnected phenomena. The first involved certain violations of CP-symmetry seen in the decay of K0 mesons [34]. Such L decomposition had been observed, but was not expected by any current theory or model of elementary particles. The second was an uncertainty in the gravitational potential: the value of the gravitational constant G has much higher experimental uncertainty than any other physical constant. Recently, experiments have been conducted providing a range of inconsistent results from −0.1% to +0.6% [1]. A group in based in Russia has measured values with a 0.7% fluctuation based on time and position [1]. Furthermore, measurements taken in mineshafts and submarines, though containing a very large range of uncertainty, have consistently provided values that are greater than the accepted number [34]. This inability to determine precisely the value of G opened the door to possible modifications to the gravitational potential, which could in turn, some physicists argued, explain the aforementioned CP-violations. Such a modification

20.3 a new force


on the gravitational potential would then imply a possible fifth particle interaction. In this light, Fischbach explained the dissimilarities between Eötvös’s results and that of his predecessors not as experimental imprecision, but as the result of certain experimental differences between his experiments and those of his predecessors. Eötvös’s experiment involved relatively small acting distances for gravity (namely that between objects on the surface of the Earth and Earth itself). However, many experiments and measurements conducted by later experimentalists in order to verify the weak equivalence principle relied only on much larger distances (such as those between celestial bodies). Most relevantly, the two subsequent repetitions of the experiment that were found to be most in conflict with Eötvös relied on the attraction of Earthly objects toward the sun [33]. To explain the relevance of this discrepancy, Fischbach postulated that a new force could be modeled by the Yukawa potential, which had been very effective in modeling the finite-range nuclear forces. In a 1986 paper, he proposed an additive term, in the form of a Yukawa potential, to the gravitational potential to account for the effect of a fifth particle interaction: U (r ) = − Gm1 m2 1 + αe−r/λ , r (20.5)

where α and λ were to be determined experimentally. Clearly, this term very quickly approaches unity as radius increases, making it only relevant at closer distances. At these distances, the correction terms acts equivalently to a new, rescaled value of the gravitational constant G. The factor by which the value is scaled is very strongly dependent upon the distance at which the gravity is acting, seemingly explaining the existence of such a large range of measured values. Furthermore, Fischbach found that with the correct physical parameters3 , this model accurately predicted the varying published geophysical measurements of G [33]. Motivated by the potential for modifications on the law of gravity to explain known violations of CP-symmetries [34]4 , Fischbach plotted variations in results between different masses against various elementary properties of the material used (properties which were unknown at the time of Eötvös’s experiments). Such a pattern would imply a compositionally-dependent deviation from the weak equivalence principle, and therefore the existence of a new, yet unexplained force. His results were promising: he found evidence of a linear relationship between the error in Eötvös’s results and the difference in a function of the baryon number of the particles in balance. The baryon number (B) of a particle is defined as: B= Nq − Nq ¯ , 3 (20.6)

where Nq is the number of quarks and Nq the number of antiquarks. ¯ Specifically, Fischbach observed the fractional difference (∆κ) between the acceleration of the object and the acceleration of gravity and found the following linear relationship:
3 Very roughly, α = −(7.2 ± 3.6) · 10−3 , λ = (200 ± 50) m 4 For a more technical discussion these violations as Fischbach’s (and others’) motivations, see Franklin [34]


richard rines: the fifth interaction

Figure 20.1: The linear relationship observed by Fischbach

∆κ = ∆

a =α g

B µ


Here, a is the acceleration of the object, g is that of gravity, µ is the atomic mass of the object, and α and β are constants5 This relationship, according to Fischbach, proved Eötvös’s results to be compositionally dependent, and this dependence was associated with the fifth fundamental force. This force would have a short-range Yukawa potential: U5 (r ) = − Gm1 m2 αe−r/λ , r (20.8)

where α and λ are functions of the baryon numbers of m1 and m2. 20.4 the death of the force

Though Fischbach’s findings were initially quite persuasive, very little else came of the fifth force. Of the fourteen subsequent published experiments searching for a force couple to Baryon number, only two had positive results, and these had not been effectively reproduced [34]. Some of these results, such as those from Washington University’s Eöt-Wash group (see figure 20.2), directly contradicted the linear Baryoncoupling results shown by Fischbach. Though some research in the area still exists, most have accepted that no evidence remains that such a fifth particle interaction physically exists. In the 1990 Moriond workshop, fewer than ten years after the beginning of the search, it was established that no further experimentation was necessary to disregard the fifth interaction hypothesis [34].

5 Fischbach found that, very roughly, α = (5.65 ± 0.71) · 10−6 and β = (4.83 ± 6.44) · 10−10 .

20.5 problems


Figure 20.2: New experiments by the Eöt-Wash group provide results contradicting Fischbach’s assumption of linearity [34]



1. Write an expression for the force experienced between two objects as a result of the fifth interaction. Answer: F12 = − Gm1 m2 α
e−r/λ r2


e−r/λ λr

2. Assume the additive potential (equation (20.8)) accurately models the gravitational potential. Using the rough geophysical data provided by Fischbach (α ≈ −7.2 · 10−3 , λ ≈ 200 m), write an expression for the fraction by which a measured value of the gravitational constant G on the surface of the Earth (r = 6.38 · 106 m) would differ from that at infinity. Answer:
1 −4 1−7.2e−3.19·10

3. Two objects of mass 1 kg, with the same composition as Earth, are separated by a distance of 1 mm. What force do the experience as a result of the fifth interaction? In what direction is this force? Answer: F = 4.805 · 10−9 N, away from one another 4. Though the existence of a fifth force was eventually shown to lack sufficient evidence, how could its investigation have been useful to the physicists involved? For further insight, see Franklin [34] for a fascinating discussion of this importance.





or 500,000 years Homo sapiens has roamed the Earth, building cities and creating languages. We’ve gone from stagecoaches to space travel in the span of one human lifetime, and we’ve sent robotic scouts to other planets. It’s difficult to imagine it all coming to an end. Yet 99 percent of all species that have ever lived have gone extinct, including every one of our hominid ancestors. If humans survive for a long enough time and spread throughout the galaxy, the total number of people who will ever live might number in the trillions. It’s unlikely that we would be among the first generation to see our descendants colonize planets, but what are the odds that we would be the last generation of Homo sapiens? By some estimates, the current rate of extinctions is 10,000 times the average in the fossil record. We may be worried about spotted owls and red squirrels, but the next statistic on the list could be us.



asteroid impact

Space is filled with asteroids and comets which can pose a threat to life on Earth. Fortunately Earth’s atmosphere protects us from the thousands of pebble-sized and smaller asteroids - only weighing a few grams - which strike earth every day at speeds of tens of kilometers per second1 . At these high velocities, friction with the upper atmosphere heats the meteoroids white-hot and causes immense deceleration forces. These small meteoroids get destroyed by the heat and deceleration, and are seen from Earth as shooting stars. However, some of the fragments, especially those from iron meteoroids, will reach Earth’s surface. It is estimated that 20 tons of meteorites reach Earth’s surface each day. So, what if an asteroid hit Earth? Not just the dust that Earth collects as it sweeps through space, but something serious, like “Armageddon”, or “Deep Impact”? Earth and the moon are heavily cratered from previous impacts, the most famous of which happened on Earth at the end of the Cretaceous period, about 65 million years ago. Scientists hypothesize that this impact was the cause of the End-Cretaceous (K-T) extinction, in which eighty-five percent of all species on Earth disappeared, making it the second largest mass extinction event in geological history2 . Dr. Richard Muller in his book “Nemesis: The Death Star” describes the event: At the end of the Cretaceous period, the golden age of dinosaurs, an asteroid or comet about 5 miles in diameter (about the size of Mt. Everest) headed directly toward the
1 A meteor is an asteroid which breaches Earth’s atmosphere, a meteorite is one that strikes Earth’s surface. 2 The Permian mass extinction occurred about 248 million years ago and was the greatest mass extinction ever recorded in earth history; even larger than the better known K-T extinction that felled the dinosaurs.



douglas herbert: the science of the apocalypse

Earth with a velocity of about 20 miles per second, more than 10 times faster than our fastest bullets. Many such large objects may have come close to the Earth, but this was the one that finally hit. It hardly noticed the air as it plunged through the atmosphere in a fraction of a second, momentarily leaving a trail of vacuum behind it. It hit the Earth with such force that it and the rock near it were suddenly heated to a temperature of over a million degrees Celsius, several hundred times hotter than the surface of the sun. Asteroid, rock, and water (if it hit in the ocean) were instantly vaporized. The energy released was greater than that of 100 million megatons of TNT, 100 teratons, more than 10,000 times greater than the total U.S. and Soviet nuclear arsenals. Before a minute had passed, the expanding crater was 60 miles across and 20 miles deep. (It would soon grow even larger.) Hot vaporized material from the impact had already blasted its way out through most of the atmosphere to an altitude of 15 miles. Material that a moment earlier had been glowing plasma was beginning to cool and condense into dust and rock that would be spread worldwide. The entire Earth recoiled from the impact, but only a few hundred feet. The length of the year changed by a few hundredths of a second3 . In 2028, the asteroid 1997 XF11 will come close to Earth but will miss our planet by about two and a half lunar distances, that’s extremely close, considering how big outer space is. If something was to change it’s course, and it did hit Earth, what you would have is a 1.6 km (1 mile) wide meteorite striking the planet’s surface at about 48,000 kph (30,000 mph). The energy released during the impact is related to the kinetic energy of the asteroid before atmospheric entry begins. At typical solar system impact speeds of 12 to 20 km/s, energy E is approximately given as one half times the asteroid mass m times the square of the asteroid velocity v , which can be rewritten in terms of the asteroid’s density ρ and diameter l , assuming that the asteroid is approximately spherical: 1 2 π 3 2 mv = ρl v 2 12



A kilometer and a half wide asteroid traveling at 20 km/s has an E roughly equal to a 1 million megaton bomb, that’s 10 million times greater than the bomb that fell on Hiroshima. A land strike would produce a fireball several miles wide which would briefly be as hot as the surface of the sun, igniting anything in site, and it would blast tons of sulfur-rich rock and dust high into the atmosphere, encircling the globe. As the burning debris rained back down to earth, the soot and dust would blacken the skies for months, if not years, to come. An ocean landing would be no better, instantly vaporizing 700 cubic kilometers (435 cubic miles) of water, and blasting a tower of steam several miles into the atmosphere, again benighting the sky. The meteor itself would
3 The impact believed to have caused the extinction of the dinosaurs left a 300 km (186 mile) wide crater on the coast of Yucatán. The impactor had to have been at least 30 km (19 miles) across.

21.3 errant black holes


likely crack Earth’s crust at the ocean floor, triggering massive global earthquakes, sprouting volcanoes at weak spots in Earth’s crust, and creating a tsunami as high as the ocean is deep, moving at hundreds of kilometers an hour in every direction. Humans would likely survive, but civilization would not. The Kuiper belt is an asteroid zone just beyond the orbit Neptune, and it contains roughly 100,000 ice balls that are each more than 80 and a half kilometers (50 miles) in diameter. As Neptune’s orbit perturbs the Kuiper belt, a steady rain of small comets are sent earthward. If one of the big ones heads right for us, that would certainly be the end for just about all higher forms of life on Earth. 21.3 errant black holes

The Milky Way galaxy is full of black holes, collapsed stars about 20 km (12 miles) across. Just how full is hard to say, a black hole’s gravity is so strong that anything approaching within a certain radius will no longer be able to escape, including any light which would betray it’s presence, this critical radius is called the “Schwarzchild radius”4 . The interior of a sphere whose radius is the Schwarzchild radius is completely cut off from the rest of the Universe, the only way to “see” a black hole is to spot it’s gravitational lensing - the distortion of background light by foreground matter, i.e., the black hole. Beyond the Schwarzchild radius, the gravitational attraction of a black hole is indistinguishable from that of an ordinary star of equal mass. Based on such observations, and theoretical arguments, researchers estimate that there are about 10 million black holes in the Milky Way, including one at the core of our galaxy5 , whose mass is as much as 2 million solar masses. A black hole has an orbit just like any other star, so it’s not likely that one is heading toward us. If any other star was approaching us, we would know, with a black hole there would be little warning. A few decades before a close approach to Earth, astronomers would probably notice a strange perturbation in the orbits of the outer planets, as the effects grew larger, it would be possible to make increasingly precise estimates of the location and mass of the interloper. A black hole doesn’t have to come very close to Earth to bring ruin, it simply needs to pass through the solar system. Just as Neptune’s gravity disturbs the Kupier belt, a rogue black hole could pull Earth’s orbit into an exaggerated ellipse, either too close to the sun, or too far from the sun. It could even eject Earth from it’s orbit and send the planet off into deep space. 21.4 flood volcanism

In 1783, the Laki volcano in Iceland erupted, pouring out 5 cubic kilometers (3 cubic miles) of lava. Floods, ash, and fumes killed 9,000 people and 80 percent of Iceland’s livestock, and the ensuing starvation killed 25 percent of Iceland’s remaining population. Atmospheric dust lowered winter temperatures by 9 degrees in the newly independent United States, and that was minor compared to what Earth is capable
4 In 1916, German astronomer Karl Schwarzchild established the theoretical existence of black holes as the solution to Einstein’s gravitational equations. The Schwarzchild radius is equal to 3 km times the number of solar masses of the black hole, the solar masses are determined via Kepler’s laws of planetary motion. 5 Researchers also believe that there is a black hole at the center of every galaxy


douglas herbert: the science of the apocalypse

of. Sixty-five million years ago, a plume of hot rock from Earth’s mantle burst through the crust in what is now India, when continental drift moved that area moved over a “hot spot” in the Indian Ocean. Eruptions raged century after century, producing lava flows that exceeded 100,000 square kilometers (62 square miles) in area and 150 meters (500 feet) in thickness. - the Laki eruption 3,000 times over. Some scientists believe that this Indian outburst, and not an asteroid, was responsible for the fall of the dinosaurs (the K-T extinction), since such lava flows would have produced enormous amounts of ash, altering global climatic conditions and changing ocean chemistry. An earlier, even larger event in Siberia occurred at the time of the Permian mass extinction 248 million years ago, and is the most thorough extermination known to paleontology. At that time 95 percent of all species were wiped out. Sulfurous volcanic gases produce acid rains. Chlorine-bearing compounds break down the ozone layer6 . While they are causing short-term destruction, volcanoes also release carbon dioxide which yields long term, greenhouse effect global warming. The last big pulse of flood volcanism built the Columbia River plateau about 17 million years ago. If the idea of cataclysmic volcanism sounds too unlikely, Tom Bissell notes in a 2003 Harper’s Magazine article (A Comet’s Tale) that: ...73,500 years ago what is known as a volcanic supereruption occurred in modern-day Sumatra. The resultant volcanic winter blocked photosynthesis for years and very nearly wiped out the human race. DNA studies have suggested that the sum total of human characteristics can be traced back to a few thousand survivors of this catastrophe. 21.5 giant solar flares

More properly known as coronal mass ejections, solar flares are the outbursts caused by enormous magnetic storms on the sun, and they bombard Earth with high speed subatomic particles. A typical solar flare releases the equivalent energy of a billion hydrogen bombs, and ejects a hundred billion tons of high energy particles into space. Earth’s magnetic field and atmosphere negate the potentially lethal effects of ordinary flares, knocking the particles back into space, and steering some of the particles over the poles, creating Earth’s auroras. But while examining old astronomical records, Yale University’s Bradley Schaefer found evidence that some sunlike stars can brighten briefly by up to a factor of 20. Schaefer believes that these increases are caused by superflares millions of times more powerful than those commonly experienced by Earth. Scientists don’t know why superflares happen at all, or whether our sun could exhibit milder but still disruptive behavior. A superflare on the sun only a hundred times larger than typical would overwhelm Earth’s magnetosphere and begin disintegrating the ozone layer (see footnote 6). Such a burst would certainly kill anything basking in its glow, and according to Bruce Tsurutani of NASA’s Jet Propulson Labatory, “it would leave no trace in history.” While too much solar activity could be deadly, too little of it is problematic as well. Sallie Baliunas of the Harvard-Smithsonian Center for Astrophysics says that many sunlike stars pass through extended
6 Without the ozone layer, ultraviolet rays from the sun would reach the surface the earth at nearly full force, causing skin cancer, and killing off the photosynthetic plankton in the ocean that provide oxygen to the atmosphere and support the bottom of the food chain.

21.6 viral epidemic


periods of lessened activity, during which they become nearly 1 percent dimmer. A similar downturn in our own sun could send us into another ice age. According to Baliunas, there is evidence that decreased solar activity contributed to 17 of the 19 major cold episodes on Earth in the last 10,000 years. 21.6 viral epidemic

Viruses prosper by using the genetic material of a healthy cell to produce more virus. They multiply within the healthy cell, burst out, and attack more healthy cells. They also have the capacity to burst forth in some startling new form, and then to disappear just as quickly. In 1916, people in Europe and America began to come down with a strange sleeping sickness, which became known as encephalitis lethargica . Victims would go to sleep and not wake up, they could be roused with great difficulty, but once they were allowed to rest they would fall back into deep sleep. Some victims continued like this for months before dying. In ten years encephalitis lethargica killed five million people and then simply disappeared, the only reason this disease didn’t get much lasting attention is because the worst epidemic in history was sweeping across the world at the same time. The Great Swine Flu, sometimes called the Great Spanish Flu, killed twenty-one million people in its first four months. Between autumn of 1918 and spring of 1919, 548,452 people in the United States died of the flu. In Britain, France, and Germany, the toll was over 220,000 dead in each country, the global toll was approximately 50 million, with some estimates as high as 100 million. Much about the 1918 flu is understood poorly, if at all, one mystery is how it managed to break out seemingly everywhere all at once, in countries separated by oceans and mountain ranges. A virus can only survive outside a host body for a few hours, so how did this virus appear in Bombay, Madrid, and Philadelphia, all within the same week? Some consider it a miracle that other diseases have not gone rampant. Lassa fever, first detected in 1969 West Africa is extremely virulent. A doctor at Yale University came down with Lassa fever when he was studying it in 1969. He survived, but a technician in a different laboratory, with no direct exposure to the virus contracted it and died. Fortunately the outbreak stopped there, but in 1990 a Nigerian living in New York contracted Lassa fever on a visit home and didn’t develop any symptoms until his return to the United States. He died undiagnosed in a Chicago hospital, without anyone having taken any special precautions, or knowing that he had contracted one of the most infectious and lethal diseases on the planet. Our lifestyles invite epidemics, air travel makes it possible to spread disease with alarming ease. An Ebola outbreak could begin in Boston, jump to Paris, and then to Tokyo before anyone ever became aware of it. We’re accustomed to being Earth’s (and maybe the galaxy’s) dominant species, which makes it difficult to consider that we may only be here because of various chance events, or that our continued presence may be due to the absence of other chance events. Humans are here today because our particular line never fractured - never once at any of the billion points that could have erased us from history.


douglas herbert: the science of the apocalypse

Stephen Jay Gould





ife, as far as we know, exists only on our own planet Earth. There is no evidence of intelligence (or life of any form) in our solar system, galaxy, or beyond. But are we really alone? The idea that our small planet possesses the only biological beings in the universe seems egotistical and, in light of our recent advances in astronomy and the sciences, unlikely. Many efforts are currently underway in the search for extraterrestrial intelligence, with the hope of finding it still undamped by the lack of success. If intelligent life does indeed exist elsewhere, it remains to be seen whether it will dwell in a region close enough to make contact with, and even whether or not it will want to be found.



the possibility of life in the universe

Exploration of our solar system has provided no evidence for the existence of life anywhere within it except Earth. While ancient Mars may once have had a life-supporting environment and planetary moons such as Europa and Titan might be capable of sustaining life, no life, and certainly no intelligent life, has been observed. Despite this, modern Astronomy suggests that extrasolar planets (planet-star systems like our Earth and sun) might not be as rare in galaxies as was once believed [55]. Many Jovian planets have been detected due to their large mass, but smaller, rocky planets may even outnumber these gas giants three to one. The quantity of possible life-supporting planets, along with several other factors which would determine the amount of intelligent life in the universe, is expressed concisely in the famous Drake Equation. This equation was created by Dr. Frank Drake of the University of California, Santa Cruz in 1960 [55], and defines the total number of civilizations in the universe with which we might be able to communicate (NC ) as NC = RS f P n f L f I f C L, where • RS is the formation rate of stars in a galaxy, • f P is the fraction of stars with planetary systems, • n is the average number of habitable planets in a planetary system, • f L is the fraction of habitable planets which develop life, • f I is the fraction of habitable planets which develop intelligent life, • f C is the fraction of planets with life that develop intelligent beings interested in communication, and (22.1)



amanda lund: extraterrestrial intelligence

Figure 22.1: One way of detecting extrasolar planets is by observing a decrease in the brightness of a star as the planet passes in front of it [30].

• L is the average lifetime of a civilization. As the only term in this equation known with any certainty is the rate of star formation RS (about 2 or 3 per year), the projected values of NC vary greatly [75]. The fraction of stars with solar systems ( f P ) has been estimated between 20% and 50% [55], but the remaining terms depend greatly on the optimism of the person assigning them. For instance, the average lifetime of an intelligent civilization might be anywhere between 100 and 1010 years. A belief in the tendency of civilizations to self-destruct would set L at the lower limit, while the opposing view that societies can overcome this inclination to eradicate themselves would let L be as large as the lifetime of their star [75]. As the only known example of f L , f I , and f C is the earth, there is really no way to obtain an accurate estimate of these probabilities [57]. Based on the time it took life to evolve on earth, a rough estimate of f L > 0.13 could be assigned. Drake has estimated both f I , and f C to be 0.01, though there is not much basis for these assumptions. 22.3 the search for intelligent life

The current quest for intelligent life mainly involve filtering through multitudes of electromagnetic emissions in an attempt to "eavesdrop" on possible sentient societies [82]. Since World War II, radio astronomy has advanced significantly, and most of the searches are being done in the radio and microwave regions of the spectrum. SETI (the Search for ExtraTerrestrial Intelligence), the main organization conducting these searches, has been using large radio telescopes in an attempt to pick up alien transmissions. Their receivers are intended to identify narrowband broadcasts, which are easy to discern even at large distances and can only be created by transmitters [84]. A particularly important signal for astronomy and SETI is the hy-

22.4 will we find it? and when?


drogen 21-centimeter line. This special wavelength of radiation is a quantum effect of the hyperfine structure of hydrogen, which provides a very small correction to the hydrogen energy levels due to the interaction of the proton and electron spins. A transition from the triplet to the singlet state in the ground state of hydrogen corresponds to the spin of these two particles flipping from parallel to antiparallel. This is a very rare transition, but when it occurs a small amount of energy is released at a wavelength of 21 centimeters [15]. example 22.1: the 21 centimeter line
Using Planck’s relation E = hν, show that if the energy of the photon emitted in the transition from the triplet state to singlet state of the hydrogen hyperfine structure described above is 5.9 × 10−6 eV, the corresponding wavelength is 21 centimeters. Solution: First we must convert the energy in eV to joules, using the conversion factor 1 eV = 1.6 × 10−19 J. This gives us: 5.9 × 10−6 eV = 9.44 × 10−25 J. (22.3) (22.2)

Using both this value of the change in energy in joules and Planck’s constant, h = 6.626 × 10−34 J s, (22.4)

we can determine the frequency of the photon emitted using Planck’s relation. E = hν ν= ∆E J = = 1420 MHz h 6.626 × 10−34 J s 9.44 × 10−25 (22.5) (22.6)

Now, by plugging this frequency into the equation relating frequency and wavelength (where c denotes the speed of light in a vacuum), we can find the corresponding wavelength. λ= c 3 × 108 m/s = = 0.21 m = 21cm ν 1420 MHz (22.7)

Though this type of transition is extremely uncommon, about 90% of the interstellar medium consists of atomic and molecular hydrogen; this greatly increases the probability of observing 21-centimeter radiation. This radiation is able to penetrate dust clouds in space, and as frequencies around it are protected for radio astronomy, there is limited interference from Earth [15]. Because it is such an important frequency, some scientists believe it is a likely signal for extraterrestrial intelligence to send out in a communication attempt. As a result, SETI has radiotelescopes searching for the 21-centimeter line around sun-like stars and is even broadcasting its own 21-centimeter wavelength signals. 22.4 will we find it? and when?

Over the past 50 years, great advances in technology and in our understanding of physics and astronomy have transformed the search for extraterrestrial intelligence from science fiction and Hollywood horror films into an actual science. Despite this, we may still be far from finding other life in the universe. If it does indeed exist, it may be nearly


amanda lund: extraterrestrial intelligence

Figure 22.2: SETI uses radio telescopes to try to pick up signals from extraterrestrial life [71].

impossible to locate life among the billions of stars and galaxies and light years of space that surround us, especially if any intelligent life out there does not want to be found. The search has been likened to trying to find not a needle but a single atom of a needle in a haystack [74]. Yet there are still optimists who believe we might not be so far off after all. Notable astronomers including Carl Sagaan and Frank Drake, who are searching through this cosmic haystack themselves, have predicted that extraterrestrial intelligence will be found between 2015 and 2026 [74]. SETI even offers a program called SETI@home, where anyone with a computer can donate disk space to help analyze radio telescope data and speed up the search. It seems reasonable to believe we are not alone in the universe; the real question, then, must be whether or not we will see evidence of other intelligence–and if so, when. 22.5 problems

1. Using the Drake Equation (Eq. (22.1)) and given values a) b) c) d) e) f) g) RS = 2.5, f P = 0.35, n = 1, 100 < L < 1010 , 0.13 < f L > 1, f I = 0.01, f C = 0.01,

determine the most optimistic and most pessimistic values of NC .
Solution: Using NC = RS f P n f L f I f C L, (22.8)

the most optimistic value would use the upper limits in the ranges, and the result would be NC = 2.5 × 0.35 × 1 × 1 × 0.01 × 0.01 × 1010 = 8.75 × 106 . The most pessimistic value, using the lower limits, would be NC = 2.5 × 0.35 × 1 × 0.13 × 0.01 × 0.01 × 100 = 1.14 × 10−3 . (22.10) (22.9)

22.6 multiple choice questions


However, this pessimistic estimate is much less than 1, indicating a lack of intelligent civilizations in the universe.

2. Some astronomers have suggested that electromagnetic waves outside the radio and microwave regions might be as good a way or even a more promising way to succeed in the search for extraterrestrial intelligence [84]. If we were to search for signals with wavelengths between 750 nm and 10 µm, in what region of the electromagnetic spectrum would this be? What would the be corresponding frequencies for these wavelengths?
Solution: These wavelengths are in the infrared region. To find their frequencies, we use ν= c . λ (22.11)

The upper frequency is ν= 3 × 108 m/s = 4 × 1014 s−1 , 7.5 × 10−7 m (22.12)

and the lower frequency is ν= 3 × 108 m/s = 3 × 1013 s−1 , 1 × 10−5 m (22.13)

so 3 × 1013 s−1 < ν < 4 × 1014 s−1 .


multiple choice questions

1. What is an extrasolar planet? a) A planet in our solar system that is outside the frost line b) A planet outside our solar system orbiting around another star c) A massive body that is not a true planet but exists in free space and does not orbit a star d) A planet on which life exists
Answer: b)

2. How does SETI hope to find extraterrestrial life? a) By intercepting radio signals from intelligent civilizations b) By building a spaceship that can travel at close to the speed of light and sending it toward extrasolar planets in the Andromeda galaxy c) By following up on all reports of UFO sightings in hopes that one of them is valid d) By using radio telescopes to intercept government intelligence transmissions and expose the evidence of extraterrestrials that the government has been concealing
Answer: a)


amanda lund: extraterrestrial intelligence



So far, the only life we know of in the universe exists on Earth. However, given the vastness of the universe and the huge number of galaxies, stars, and planets within it, it is unlikely that we are alone. With our current technology there is no way to tell how much intelligent life exists outside our planet or where it is located; the Drake Equation is one way to estimate the number of intelligent civilizations, but it is based mainly on guesswork. The SETI institute has set out to find life by attempting to eavesdrop on radio signals from other civilizations–one important wavelength in this search is the 21-centimeter line, a rare transition between two hyperfine states of hydrogen. There is no way to know when or if we will find anything, though some optimistic astronomers predict as soon as 2015.



[1] The Eöt-Wash Group, Washington University, 2008. (Cited on page 148.) [2] Geoff Andersen. The Telescope. Princeton University Press, 2007. (Cited on page 17.) [3] Christoph Arndt. Information Measures. Springer, 2001. (Cited on page 98.) [4] BBC. Freak wave. BBC 2, November 14 2002. (Cited on pages 47 and 53.) [5] Carl M. Bender and Steven A. Orszag. Advanced Mathematical Methods for Scientists and Engineers. Springer, 1999. (Cited on page 29.) [6] Bennet, Donahue, Schneider, and Voit. The Cosmic Perspective: Fourth Edition. Pearson Education Inc., San Francisco, CA, 2007. (Cited on pages 135, 136, 137, and 140.) [7] Ben Best. Lessons for cryonics from metallurgy and ceramics. (Cited on pages 34 and 35.) [8] B.I. Bleaney. Electricity and Magnetism, volume 1. Oxford University Press, Great Britain, 3rd edition, 1976. (Cited on page 4.) [9] Laura Bloom. The Electromagnetic Spectrum. University of Colorado, 2008. Electromagnetic%20Spectrum.htm. (Cited on pages 4 and 7.) [10] Christian Blum. on page 109.) [11] Stehpen Brawer. Relaxation in Viscous Liquids and Glasses. The American Ceramics Society, 1985. (Cited on page 34.) [12] Evelyn Brown. Ocean Circulation. Butterworth Heinemann, Milton Keynes, UK, 2nd edition, 2002. (Cited on page 47.) [13] Edward Bryant. Tsunami-The Underrated Hazard. Cambridge University Press, Cambridge, UK, 2001. (Cited on page 47.) [14] Laura Cadonati. private communication, 2008. (Cited on pages 64, 66, 67, 72, and 74.) [15] Laura Cadonati. Hyperfine structure. Quantum Mechanics class notes, November 2008. (Cited on page 161.) [16] Kenneth Chang. The nature of glass remains anything but clear, July 2008. New York Times. (Cited on pages 33 and 34.) Ant colony optimization. http://iridia. (Cited




[17] Colloid. Colloid. colloid. (Cited on page 36.) [18] Wikimedia Commons. Gravitational perturbation. Gravitational_perturbation.svg. (Cited on page 128.)

[19] Dennis Joseph Cowles.

Kepler’s third law. (Cited on page 128.)


[20] Rod Cross. Increase in friction with sliding speed. American Journal of Physics, 73(9), 2005. (Cited on page 56.) [21] J.M.A. Danby. Fundamentals of Celestial Mechanics. MacMillan, New York City, NY, first edition, 1962. (Cited on page 128.) [22] Marco Dorigo. Ant colony optimization. be/~mdorigo/ACO/ACO.html, 2008. (Cited on pages 107 and 109.) [23] Marco Dorigo. Ant colonies for the traveling salesman problem. Technical report, Université Libre de Bruxelles, 1996. (Cited on pages 111 and 113.) [24] Marco Dorigo. Ant algorithms for discrete optimization. Technical report, Université Libre de Bruxelles, 1996. (Cited on pages 111 and 112.) [25] Belle Dumé. Magnetic recording has a speed limit. http: //, Apr 21, 2004. (Cited on page 87.) [26] Albert Einstein. Relativity: The special and general theory. Henry Holt and Company, New York, 1920. Translated by Robert W. Lawson. (Cited on page 143.) [27] SR Elliot. Physics of Amorphous Materials. Longman, 1984. (Cited on pages 33, 34, and 35.) [28] B. N. J. Persson et. al. On the origin of amonton’s friction law. Journal of Physics: Condensed Matter, 20, 2008. (Cited on page 57.) [29] H. Olsson et. al. Friction models and friction compensation. European Journal of Control, 4(9), 1998. (Cited on pages 56 and 57.) [30] Extrasolar Planets. Extrasolar planets. http://www.astro.keele. Image of an artist’s representation of an extrasolar planet taken from this site. (Cited on page 160.) [31] Adalbert Feltz. Amorphous Inorganic Materials and Glasses. VCH, 1993. (Cited on page 33.) [32] Richard Feynman, Robert Leighton, and Matthew Sands. The Feynman Lectures on Physics. Addison-Wesley, San Franscisco, 1964. (Cited on page 19.) [33] Ephraim Fischbach. Reanalysis of the Eötvös Experiment. 1986. (Cited on page 149.) [34] Allan Franklin. The Rise and Fall of the Fifth Force. American Institute of Physics, New York, New York, 1993. (Cited on pages 148, 149, 150, and 151.)



[35] Carolyn Gordon, David L. Webb, and Scott Wolpert. One cannot hear the shape of a drum. Bulletin of the American Mathematical Society, 27:134–138, 1992. (Cited on page 29.) [36] David J. Griffiths. Introduction to Electrodynamics. Prentice Hall, Upper Saddle River, New Jersey, 3rd edition, 1999. (Cited on pages 3, 5, 6, and 8.) [37] David J. Griffiths. Introduction to Quantum Mechanics. Benjamin Cummings, San Francisco, second edition, 2004. (Cited on pages 61, 62, 64, and 132.) [38] John Grue and Karsten Trulsen. Waves in Geophysical Fluids. SpringerWienNewYork, Italy, 2006. (Cited on page 47.) [39] R. J. Hall. Interior structure of star prior to core collapse., December 2006. (Cited on pages 135 and 137.)

[40] R.V.L. Hartley. Transmission of information. Bell System Technical Journal, July:535–563, 1928. (Cited on page 98.) [41] Sverre Haver. Freak wave event at draupner jacket. Statoil, 2005. (Cited on page 48.) [42] Franz Himpsel. Nanoscale memory. http://uw.physics.wisc. edu/~himpsel/memory.html, 2008. (Cited on page 88.) [43] Keigo Iizuka. Springer Series in Optical Sciences: Engineering Optics. Springer Science+Business Media, 2008. (Cited on page 13.) [44] Vincent Ilardi. Renaissance Vision from Spectacles to Telescopes. American Philosophical Society, 2007. (Cited on page 13.) [45] Mark Kac. Can one hear the shape of a drum? American Mathematical Montly, 73:1–23, 1966. (Cited on pages 27 and 29.) [46] Edward W. Kamen and Bonnie S. Heck. Fundamentals of Signals and Systems. Prentice Hall, Standford University, 2007. (Cited on pages 81 and 84.) [47] Joseph E Kasper and Steven A Feller. The Complete Book of Holograms: How They Work and How to Make Them. Dover, New York, 1987. (Cited on pages 19, 20, 21, and 23.) [48] Christian Kharif and Efim Pelinovsky. Physical mechanisms of the rogue wave phenomenon. European Journal of Mechanics, 22: 603–634. (Cited on pages 52 and 53.) [49] I. V. Lavrenov. The wave energy concentration at the agulhas current off south africa. Natural Hazards, 17:117–127, March 1998. (Cited on pages 50 and 51.) [50] L. A. Leshin, N. Ouellette, S. J. Desch, and J. J. Hester. A nearby supernova injected short-lived radionuclides into our protoplanetary disk. Chondrites and the Protoplanetary Disk: ASP Conference Series, 341, 2005. (Cited on pages 137, 138, 139, and 140.) [51] M. Limongi and A. Chieffi. The nucleosynthesis of 26 Al and 60 Fe in solar metallicity stars extending in mass from 11 to 120 solar masses. The Astrophysical Journal, 647:483–500, 2006. (Cited on pages 138 and 139.)



[52] Seth Lloyd. Ultimate physical limits to computation. Nature, 406: 1047–1054, 2000. (Cited on page 102.) [53] Jim Lochner and Meredith Gibb. Introduction to supernova remnants. snrs/snrstext.html, October 2007. (Cited on pages 135, 136, and 137.) [54] Richard G. Lyons. Understanding Digitial Signal Processing. Prentice-Hall, Univ. of California Santa Cruz, 2004. (Cited on pages 79, 80, and 83.) [55] D. J. Des Marais and M. R. Walter. Astrobiology: Exploring the origins, evolution, and distribution of life in the universe. Annual Review of Ecology and Systematics, 30:397–420, 2008. (Cited on pages 159 and 160.) [56] N. Margolus and L. B. Levitin. The maximum speed of dynamical evolution. Physica D, 120:188–195, 1998. (Cited on page 88.) [57] Roy Mash. Big numbers and induction in the case for extraterrestrial intelligence. Philosophy of Science, 60:204–222, 1993. (Cited on page 160.) [58] S.R. Massel. Ocean Surface Waves; their Physics and Prediction. World Scientific Publ., New Jersey, 1996. (Cited on page 49.) [59] N. David Mermin. Boojums All the Way Through: Communicating Science in a Prosaic Age. Cambridge University Press, Cambridge, 1990. (Cited on page xi.) [60] Kurt Nassau. The Physics and Chemistry of Color. John Wiley and Sons, Inc., New York, second edition, 2001. (Cited on page 9.) [61] R. Nave. Nuclear reactions in stars. http://hyperphysics., 2005. (Cited on page 135.) Maxwell’s Equations. Georgia State Univer electric/maxeq2.html#c3. (Cited on page 4.)

[62] Rod Nave. sity, 2005.

[63] BBC News. Factfile: Hard disk drive. 2/hi/technology/6677545.stm, 21 May 2007. (Cited on page 89.) [64] The Royal Swedish Academy of Sciences. The nobel prize in physics 2007. laureates/2007/info.pdf, 2007. (Cited on page 93.) [65] H. Palme and A. Jones. Solar system abundances of the elements. Treatise on Geochemistry, 1:41–61, 2003. (Cited on page 138.) [66] Karsten Peters, Anders Johansson, Audrey Dussutour, and Dirk Helbing. ANALYTICAL AND NUMERICAL INVESTIGATION OF ANT BEHAVIOR UNDER CROWDED CONDITIONS. World Scientific Publishing Company, Institute for Transport & Economics, Dresden University of Technology, Andreas-Schubert-Str. 23, 01062 Dresden, Germany, 2008. (Cited on pages 107 and 108.) [67] Paul R. Pinet. Invitation to Oceanography. Jones and Bartlett Publishers, Sudbury, MA, 4th edition, 2006. (Cited on page 47.)



[68] IBM Research. The giant magnetoresistive head: A giant leap for ibm research. html, 2008. (Cited on pages 91 and 93.) [69] C. Patrick Royall, Esther C. M. Vermolen, Alfons van Blaaderen, and Hajime Tanaka. Controlling competition between crystallization and glass formation in binary colloids with an external field. JOURNAL OF PHYSICS-CONDENSED MATTER, 20(40), OCT 8 2008. ISSN 0953-8984. doi: {10.1088/0953-8984/20/40/404225}. 2nd Conference on Colloidal Dispersions in External Fields, Bonn Bad Godesberg, GERMANY, MAR 31-APR 02, 2008. (Cited on pages 36 and 37.) [70] Francis W. Sears and Mark W. Zemansky. University Physics. Addison-Wesley Publishing Company, Reading, Massachusetts, second edition, 1955. (Cited on page 10.) [71] SETI. Seti radio telescopes. uncategorized/2008/06/09/setiscopes.jpg. Image of SETI radio telescopes taken from this site. (Cited on page 162.) [72] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423,623–656, 1948. (Cited on page 98.) [73] S. L. Shapiro and Saul A. Teukolsky. Black Holes White Dwarfs, and Neutron Stars. Wiley-Interscience, 1983. (Cited on page 131.) [74] Seth Shostak. When will we find the extraterrestrials? (And What Will We Do When We Find Them?). lecture at the University of Massachusetts Amherst, October 2008. (Cited on page 162.) [75] Frank H. Shu. The Physical Universe: An Introduction to Astronomy. University Science Books, 1982. (Cited on page 160.) [76] Richard Slansky, Stuart Raby, Terry Goldman, and Gerry Garvey. The oscillating neutrino: An introduction to neutrino masses and mixings. Los Alamos Science, 25:28–63, 1997. (Cited on pages 65, 66, 72, 73, and 74.) [77] Tom Standage. The Neptune File: A Story of Astronomical Rivalry and the Pioneers of Planet Hunting. Walker Publishing Comapny, Inc., New York, NY, 2000. (Cited on page 127.) [78] Boris Svistunov. private communication, 2008. (Cited on page 64.) [79] S. Tachibana, G. R. Huss, N. T. Kita, H. Shimoda, and Y. Morishita. The abundances of iron-60 in pyroxene chondrules from unequilibrated ordinary chondrites. Lunar and Planetary Science, XXXVI:1529–1530, 2005. (Cited on page 139.) [80] R.J. Tayler. The Stars: their structure and evolution. Cambridge, 1994. (Cited on page 133.) [81] Ker Than. Monster busts theory. category/science, October 2007. (Cited on page 138.) [82] D. E. Thomsen. Sifting the cosmic haystack for aliens. Science News, 121:374, 1982. (Cited on page 160.)



[83] J. J. Tobin, L. W. Looney, and B. D. Fields. Radioactive Probes of the Supernova-Contaminated Solar Nebula: Evidence that the Sun was Born in a Cluster. Faculty Publication, Univ. of Illinois at Urbana-Champaign, 2006. (Cited on page 138.) [84] C. H. Townes. At what wavelengths should we search for signals from extraterrestrial intelligence? Proceedings of the National Academy of Sciences of the United States of America, 80:1147–1151, 2008. (Cited on pages 160 and 163.) [85] Evgeny Y. Tsymbal. Giant magnetoresistance. http://physics., 2008. (Cited on page 94.) [86] Scott Turner. Stigmergic building. turner/termite/stigmergicbuilding.html. (Cited on pages 107 and 108.) [87] Syracuse University. Introducing the einstein principle of equivalence. equivalence.html, November 2008. (Cited on page 144.) [88] Martinus Veltman. Facts and Mysteries in Elementary Particle Physics. World Scientific Publishing Company, Sassari, 2003. (Cited on pages 62, 63, and 65.) [89] Nelson work? Wallace. Rainbows: How does a rainbow // (Cited on

page 12.) [90] Martin Waugh. Liquid sculpture. http://www.liquidsculpture. com/fine_art/image.htm?title=untitled_001, 2006. (Cited on page 39.) [91] Paul S. Wesson. Five-dimensional Physics: Classical and Quantum Consequences of Kaluza-Klein Cosmology. World Scientific, 2006. (Cited on page 148.) [92] Paul S. Wesson. Five-dimensional Physics: Classical and Quantum Consequences of Kaluza-Klein Cosmology. World Scientific, 2006. (Cited on page 143.) [93] Robert J. Whitaker. Physics of the rainbow. Physics Teacher, 17: 283–286, May 1974. (Cited on pages 10 and 11.) [94] Nicholas Kollerstrom William Sheehan and Craig B. Waff. The case of the pilfered planet. Scientific American, page 98, 2004. (Cited on page 126.) [95] R. Wood, Y. Hsu, and M. Schultz. Perpendicular magnetic recording technology. Hitachi Global Storage Technologies White Paper, November 2007. (Cited on pages 91 and 92.) [96] A.M. Worthington. A Study of Splashes. MacMillan, New York US, 1963. (Cited on page 39.) [97] Lei Xu, Wendy W. Zhang, and Sidney R. Nagel. Drop splashing on a dry smooth surface. Phys. Rev. Lett., 94, 2005. (Cited on page 39.)



[98] Lei Xu, Loreto Barcos, and Sidney R. Nagel. Splashing of liquids: Interplay of surface roughness with surrounding gas. Phys. Rev., 76, 2007. (Cited on page 39.) [99] Hugh D. Young and Roger A. Freedman. University Physics. Addison-Wesley, 2004. (Cited on pages 15 and 16.) [100] Hugh D Young and Roger A Freedman. University Physics, 11th Edition. Addison-Wesley, San Franscisco, 2005. (Cited on page 19.) [101] Hugh D. Young and Roger A. Freedman. University Physics. Pearson Education, Inc., San Francisco, CA, 11th edition, 2004. (Cited on page 125.)


Andersen [2], 17, 165 Arndt [3], 98, 165 BBC [4], 47, 53, 165 Bender and Orszag [5], 29, 165 Bennet et al. [6], 135–137, 140, 165 Best [7], 34, 35, 165 Bleaney [8], 4, 165 Bloom [9], 4, 7, 165 Blum [10], 109, 165 Brawer [11], 34, 165 Brown [12], 47, 165 Bryant [13], 47, 165 Cadonati [14], 64, 66, 67, 72, 74, 165 Cadonati [15], 161, 165 Chang [16], 33, 34, 165 Colloid [17], 36, 165 Commons [18], 128, 166 Cowles [19], 128, 166 Cross [20], 56, 166 Danby [21], 128, 166 Dorigo [22], 107, 109, 166 Dorigo [23], 111, 113, 166 Dorigo [24], 111, 112, 166 Dumé [25], 87, 166 Einstein [26], 143, 166 Elliot [27], 33–35, 166 Extrasolar Planets [30], 160, 166 Feltz [31], 33, 166 Feynman et al. [32], 19, 166 Fischbach [33], 149, 166 Franklin [34], 148–151, 166 Gordon et al. [35], 29, 166 Griffiths [36], 3, 5, 6, 8, 167 Griffiths [37], 61, 62, 64, 132, 167 Grue and Trulsen [38], 47, 167 Hall [39], 135, 137, 167 Hartley [40], 98, 167 Haver [41], 48, 167 Himpsel [42], 88, 167 Iizuka [43], 13, 167 Ilardi [44], 13, 167 Kac [45], 27, 29, 167 Kamen and Heck [46], 81, 84, 167 Kasper and Feller [47], 19–21, 23, 167 Kharif and Pelinovsky [48], 52, 53, 167 Lavrenov [49], 50, 51, 167 Leshin et al. [50], 137–140, 167

Limongi and Chieffi [51], 138, 139, 167 Lloyd [52], 102, 167 Lochner and Gibb [53], 135–137, 168 Lyons [54], 79, 80, 83, 168 Marais and Walter [55], 159, 160, 168 Margolus and Levitin [56], 88, 168 Mash [57], 160, 168 Massel [58], 49, 168 Mermin [59], xi, 168 Nassau [60], 9, 168 Nave [61], 135, 168 Nave [62], 4, 168 News [63], 89, 168 Palme and Jones [65], 138, 168 Peters et al. [66], 107, 108, 168 Pinet [67], 47, 168 Research [68], 91, 93, 168 Royall et al. [69], 36, 37, 169 SETI [71], 162, 169 Sears and Zemansky [70], 10, 169 Shannon [72], 98, 169 Shapiro and Teukolsky [73], 131, 169 Shostak [74], 162, 169 Shu [75], 160, 169 Slansky et al. [76], 65, 66, 72–74, 169 Standage [77], 127, 169 Svistunov [78], 64, 169 Tachibana et al. [79], 139, 169 Tayler [80], 133, 169 Than [81], 138, 169 Thomsen [82], 160, 169 Tobin et al. [83], 138, 169 Townes [84], 160, 163, 170 Tsymbal [85], 94, 170 Turner [86], 107, 108, 170 University [87], 144, 170 Veltman [88], 62, 63, 65, 170 Wallace [89], 12, 170 Waugh [90], 39, 170 Wesson [91], 148, 170 Wesson [92], 143, 170 Whitaker [93], 10, 11, 170 William Sheehan and Waff [94], 126, 170




Wood et al. [95], 91, 92, 170 Worthington [96], 39, 170 Xu et al. [97], 39, 170 Xu et al. [98], 39, 170 Young and Freedman [100], 19, 171 Young and Freedman [101], 125, 171 Young and Freedman [99], 15, 16, 171 eot [1], 148, 165 et. al. [28], 57, 166 et. al. [29], 56, 57, 166 of Sciences [64], 93, 168 21-centimeter line, 161 Alhazen, 13 amorphous Solid, 33 Antarctica, 50 Areal density, 87 astronomer’s lens, 17 baryon number, 149 Basis, 62 beam ratio, 22 Benjamin-Feir instability, 52 Binary bit, 85 Binary byte, 86 Bosons, 62 Bragg reflection, 21 breather solutions, 52 Bremen, 52 Caledonian Star, 52 camera obscura, 13 camera screen, 15 Cape of Good Hope, 50 Carl Sagaan, 162 CERN, 72 Change of basis, 66 classical photography, 19 Colloids, 35 converging lens, 14 Creation operator, 64 crystalline Solid, 33 current refraction, 51 diverging lens, 14 DNA, 86 double-slit experiment, 19 Drake Equation, 159 draupner wave, 48, 49 Eötvös, 148 Eigenstates, 61 Einstein equivalence principle, 143

electromagnetic waves, 3 Electromagnetic force, 63 Electroweak theory, 64 Entropy, 102 equivalence principle, 143 extrasolar planets, 159 f-number, 16 Fermions, 62 fifth interaction, 148 Fischbach, 148 focal length, 14 Fourier hologram, 24 Frank Drake, 159, 162 fundamental forces, 147 Gauge Boson, 63 gauss’s law, 4 general relativity, 143 Generations of matter, 62 Giant magnetoresistance (GMR), 93 Girolamo Cardano, 13 glass, 33 Glass Transition, 34 Glass Transition Temperature, 35 Gluon, 62 Grand unified theory, 72 gravitation, 147 Gulfstream current, 50 Hamiltonian, 70 Hard disk platter, 89 Hard disk sector, 90 Hard disk track, 90 Hartley, 98 Heisenberg uncertainty principle, 87 Hermitian operator, 62 Higgs boson, 65 Higgs mechanism, 65 holographic data storage, 23 holographic convolution, 24 holography geometric model, 19 Inertia, 65 Information Entropy, 98 Information Theory, 97 Information uncertainty, 100 Information, formal definition of, 98 interference fringes, 19 intermodulation noise, 22 John Couch Adams, 127



Kepler’s Third Law, 128 Kuroshio current, 50 Ladder operator, 64 Lambda particle, 73 Large hadron collider, 72 Lepton, 63 liquid, 33 long-range order, 33 Longitudinal magnetic recording (LMR), 91 Magnetoresistance, 92 magnification, 14 Mass states, 67

strong equivalence principle, 143 Strong force, 62 Sudbury Neutrino Observatory, 72 Supercooled Liquid, 35 surface hologram, 22 theory of gravity, 143 thick hologram, 22 thin hologram, 22 tidal wave, 47 transmission hologram, 21 tsunami, 47 Uranus, 126

virtual image, 21 Neptune, 125 weak equivalence principle, 143, Neutrino, 67 147 Neutrino oscillations, 67 Weak eigenstate, 65 Newton’s Law of Universal GraviWeak force, 63 tation, 125 W boson, 64 object beam, 20 Yukawa potential, 149 Pacific Ocean, 50 zone plate model, 21, 22 pattern recognition, 24 Perpendicular magnetic recording zoom lens, 16 Z boson, 64 (PMR), 91 Photon, 63 Planck energy, 72 Principle of superposition, 61 Quantum field theory, 63 Quantum mechanical operator, 61 Quantum mechanics, 61 Quark, 63 Radial Distribution Function, 36 Rayleigh distribution, 49 real image, 21 redundancy, 20 reference beam, 20 reflection holograms, 21 rogue waves, 49–54 SETI, 160, 162 Shannon, 98 Shannon entropy, 98 Shannon’s formula, 101 short-range order, 33 sine waves, 3 Soda-Lime Glass, 34 South Africa, 50 South Africa, 50 Special relativity, 70 Standard Model, 62