Vector Completo Group

San Francisco State University
Department of Physics and Astronomy

August 6, 2015
Vector Spaces in Physics
Notes
for
Ph 385: Introduction to
Theoretical Physics I
R. Bland
TABLE OF CONTENTS
Chapter I. Vectors
A. The displacement vector.
B. Vector addition.
C. Vector products.
1. The scalar product.
2. The vector product.
D. Vectors in terms of components.
E. Algebraic properties of vectors.
1. Equality.
2. Vector Addition.
3. Multiplication of a vector by a scalar.
4. The zero vector.
5. The negative of a vector.
6. Subtraction of vectors.
7. Algebraic properties of vector addition.
F. Properties of a vector space.
G. Metric spaces and the scalar product.
1. The scalar product.
2. Definition of a metric space.
H. The vector product.
I. Dimensionality of a vector space and linear independence.
J. Components in a rotated coordinate system.
K. Other vector quantities.
Chapter 2. The special symbols ij and ijk, the Einstein summation
convention, and some group theory.
A. The Kronecker delta symbol, ij
B. The Einstein summation convention.
C. The Levi-Civita totally antisymmetric tensor.
Groups.
The permutation group.
The Levi-Civita symbol.
D. The cross Product.
E. The triple scalar product.
F. The triple vector product.
The epsilon killer.
Chapter 3. Linear equations and matrices.
A. Linear independence of vectors.
B. Definition of a matrix.
C. The transpose of a matrix.
D. The trace of a matrix.
E. Addition of matrices and multiplication of a matrix by a scalar.
F. Matrix multiplication.
G. Properties of matrix multiplication.
H. The unit matrix
I. Square matrices as members of a group.
ii
J. The determinant of a square matrix.
K. The 3x3 determinant expressed as a triple scalar product.
L. Other properties of determinants
Product law
Transpose law
Interchanging columns or rows
Equal rows or columns
M. Cramer's rule for simultaneous linear equations.
N. Condition for linear dependence.
O. Eigenvectors and eigenvalues
Chapter 4. Practical examples

A. Simple harmonic motion - a review
B. Coupled oscillations - masses and springs.
A system of two masses.
Three interconnected masses.
Systems of many coupled masses.
C. The triple pendulum
Chapter 5. The inverse; numerical methods

A. The inverse of a square matrix.
Definition of the inverse.
Use of the inverse to solve matrix equations.
The inverse matrix by the method of cofactors.
B. Time required for numerical calculations.
C. The Gauss-Jordan method for solving simultaneous linear equations.
D. The Gauss-Jordan method for inverting a matrix.
Chapter 6. Rotations and tensors

A. Rotation of axes.
B. Some properties of rotation matrices.
Orthogonality
Determinant
C. The rotation group.
D. Tensors
E. Coordinate transformation of an operator on a vector space.
F. The conductivity tensor.
G. The inertia tensor.
Chapter 6a. Space-time four-vectors.

A. The origins of special relativity.
B. Four-vectors and invariant proper time.
C. The Lorentz transformation.
D. Space-time events.
E. The time dilation.
F. The Lorentz contraction.
G. The Maxwell field tensor.
iii
Chapter 7. The Wave Equation
A. Qualitative properties of waves on a string.
B. The wave equation.
Partial derivatives.
Wave velocity.
C. Sinusoidal solutions.
D. General traveling-wave solutions.
E. Energy carried by waves on a string.
Kinetic energy.
Potential energy.
F. The superposition principle.
G. Group and phase velocity.
Chapter 8. Standing Waves on a String
A. Boundary Conditions and Initial Conditions
String fixed at a boundary.
Boundary between two different strings.
B. Standing waves on a string.
Chapter 9. Fourier Series
A. The Fourier sine series.
The general solution.
Initial conditions.
Orthogonality.
Completeness.
B. The Fourier sine-cosine Series.
Odd and even functions.
Periodic functions in time.
C. The exponential Fourier series.
Chapter 10. Fourier Transforms and the Dirac Delta Function
A. The Fourier transform.
B. The Dirac delta function (x).
The rectangular delta function.
The Gaussian delta function.
Properties of the delta function.
C. Application of the Dirac delta function to Fourier transforms.
Basis states.
Functions of position x.
D. Relation to quantum mechanics.
Chapter 11. Maxwell's Equations in Special Relativity
Appendix A. Useful mathematical facts and formulae.

A. Complex numbers.
B. Some integrals and identities
C. The small-angle approximation.
D. Mathematical logical symbols.
Appendix B. Using the P&A Computer System
1. Logging on.
2. Running MatLab, Mathematica and IDL.
3. Mathematica.
iv
Appendix C. Mathematica
1. Calculation of the vector sum using Mathematica.
2. Matrix operations in Mathematica
3. Speed test for Mathematica.
References
v
Vector Spaces in Physics 8/6/2015
Chapter 1. Vectors
We are all familiar with the distinction between things which have a direction and those
which don't. The velocity of the wind (see figure 1.1) is a classical example of a vector
Figure 1-1. Where is the vector?
quantity. There are many more of interest in physics, and in this and subsequent chapters
we will try to exhibit the fundamental properties of vectors.
Vectors are intimately related to the very nature of space. Euclidian geometry (plane
and spherical geometry) was an early way of describing space. All the basic concepts of
Euclidian geometry can be expressed in terms of angles and distances. A more recent
development in describing space was the introduction by Descartes of coordinates along
three orthogonal axes. The modern use of Cartesian vectors provides the mathematical
basis for much of physics.
A. The Displacement Vector

The preceding discussion did not lead to a definition of a vector. But you can convince
yourself that all of the things we think of as vectors can be related to a single fundamental
quantity, the vector r representing the displacement from one point in space to another.
Assuming we know how to measure distances and angles, we can define a displacement
vector (in two dimensions) in terms of a distance (its magnitude), and an angle:
displacement from 
r12   
point 1 to point 2  (1-1)
  distance, angle measured counterclockwise from due East 
(See figure 1.2.) Note that to a given pair of points corresponds a unique displacement,
but a given displacement can link many different pairs of points. Thus the fundamental
definition of a displacement gives just its magnitude and angle.
We will use the definition above to discuss certain properties of vectors from a strictly
geometrical point of view. Later we will adopt the coordinate representation of vectors
for a more general and somewhat more abstract discussion of vectors.
1-1
distance
point
2
r
angle
east
point
1
Figure 1-2. A vector, specified by giving a distance and an angle.
B. Vector Addition
A quantity related to the displacement vector is the position vector for a point.
Positions are not absolute – they must be measured relative to a reference point. If we
call this point O (the "origin"), then the position vector for point P can be defined as
follows:
rP  displacement from point O to point P (1-2)
It seems reasonable that the displacement from point 1 to point 2 should be expressed in
terms of the position vectors for 1 and 2. We are be tempted to write
  
r12  r2  r1 (1-3)
A "difference law" like this is certainly valid for temperatures, or even for distances along
a road, if 1 and 2 are two points on the road. But what does subtraction mean for
vectors? Do you subtract the lengths and angles, or what? When are two vectors equal?
In order to answer these questions we need to systematically develop the algebraic
properties of vectors.
  
We will let A , B , C , etc. represent vectors. For the moment, the only vector
quantities we have defined are displacements in space. Other vector quantities which we
will define later will obey the same rules.
Definition of Vector Addition. The sum of two vector displacements can be defined so
as to agree with our intuitive notions of displacements in space. We will define the sum
of two displacements as the single displacement which has the same effect as carrying out
the two individual displacements, one after the other. To use this definition, we need to
be able to calculate the magnitude and angle of the sum vector. This is straightforward
using the laws of plane geometry. (The laws of geometry become more complicated in
three dimensions, where the coordinate representation is more convenient.)
 
Let A and B be two displacement vectors, each defined by giving its length and angle:

A  ( A, A ),
 (1-4)
B  ( B, B ).
1-2
Here we follow the convention of using the quantity A (without an arrow over it) to

represent the magnitude of A ; and, as stated above, angles are measured
counterclockwise from the easterly direction. Now imagine points 1, 2, and 3 such that
 
A represents the displacement from 1 to 2, and B represents the displacement from 2 to
3. This is illustrated in figure 1-3.
 3
B
2 B

A
A
1
 
Figure 1-3. Successive displacements A and B .
 
Definition: The sum of two given displacements A and B is the third
 
displacement C which has the same effect as making displacements A and

B in succession.

It is clear that the sum C exists, and we know how to find it. An example is shown in
  
figure 1-4 with two given vectors A and B and their sum C . It is fairly clear that the

length and angle of C can be determined (using trigonometry), since for the triangle 1-2-
3, two sides and the included angle are known. The example below illustrates this
calculation.
  
Example: Let A and B be the two vectors shown in figure 1-4: A =(10 m,

48), B =(14 m, 20). Determine the magnitude and angle from due east of their
   
sum C , where C  A  B . The angle opposite side C can be calculated as
shown in figure 1-4; the result is that 2 = 152. Then the length of side C can be
calculated from the law of
1-3
  
C

 A B 48-20
B 3 = 28 3
14 m 3 48
2 20 2
 20
A 2 
C
10 m
1 48 C 180-28
1 = 152
1
Figure 1-4. Example of vector addition. Each vector's direction is measured counterclockwise from due
 
East. Vector A is a displacement of 10 m at an angle of 48and vector B is a displacement of 14 m at an
angle of 20.
cosines:
C2 = A2 + B2 -2AB cos 2
giving
C = [(10 m)2 + (14 m)2 - 2(10m)(14m)cos 152]1/2
= 23.3072 m .
The angle 1 can be calculated from the law of sines:
sin 1 / B = sin 2 / C
giving
1 = sin-1 .28200
= 16.380 .
The angle C is then equal to 48  - 1 = 31.620 . The result is thus
C   23.3072 m, 31.620  .
One conclusion to be drawn from the previous example is that calculations using the
geometrical representation of vectors can be complicated and tedious. We will soon see
that the component representation for vectors simplifies things a great deal.
C. Product of Two Vectors

Multiplying two scalars together is a familiar and useful operation. Can we do the
same thing with vectors? Vectors are more complicated than scalars, but there are two
useful ways of defining a vector product.
The Scalar Product. The scalar product, or dot product, combines two vectors to give a
scalar:
 
A  B  A B cos( B -  A ) (1-5)
1-4
This simple definition is illustrated in figure

1-5. One special property of the dot product
 
is its relation to the length A of a vector A :
  B
A  A  A2 (1-6) B - A
This in fact can be taken as the definition of
the length, or magnitude, A. 
An interesting and useful type of vector is a A
 
unit vector, defined as a vector of length 1. Figure 1-5. Vectors A and B . Their dot
We usually write the symbol for a unit vector product is equal to A B cos(B - A).
 
with a hat over it instead of a vector symbol: Their cross product A  B has magnitude
û . Unit vectors always have the property A B sin(B - A), directed out of the paper.
uˆ  uˆ  1 (1-7)
Another use of the dot product is to define orthogonality of two vectors. If the angle
between the two vectors is 90, they are usually said to be perpendicular, or orthogonal.
Since the cosine of 90 is equal to zero, we can equally well define orthogonality of two
vectors using the dot product:
A  B  A B  0 (1-8)
Example. One use of the scalar product in physics is in calculating the work

done by a force F acting on a body while it is displaced by the vector

displacement d . The work done depends on the distance moved, but in a special
way which projects out the distance moved in the direction of the force:
Work  F  d (1-9)

Similarly, the power produced by the motion of an applied force F whose point
of application is moving with velocity v is given by
Power  F  v (1-10)
In both of these cases, two vectors are combined to produce a scalar.
Example. To find the component of a vector A in a direction given by the unit

vector n̂ , take the dot product of the two vectors.
Component of A along n  A  nˆ (1-11)
The Vector Product. The vector product, or cross product, is considerably more
complicated than the scalar product. It involves the concept of left and right, which has
an interesting history in physics. Suppose you are trying to explain to someone on a
distant planet which side of the road we drive on in the USA, so that they could build a
car, bring it to San Francisco and drive around. Until the 1930's, it was thought that there
was no way to do this without referring to specific objects which we arbitrarily designate
as left-handed or right-handed. Then it was shown, by Madame Wu, that both the
electron and the neutrino are intrinsically left-handed! This permits us to tell the alien
how to determine which is her right hand. "Put a sample of 60Co nuclei in front of you,
on a mount where it can rotate freely about a vertical axis. Orient the nuclei in a
1-5
magnetic field until the majority of the decay electrons go downwards. The sample will
gradually start to rotate so that the edge nearest you moves to the right. This is said to be
a right-handed rotation about the vertically upward axis." The reason this works is that
the magnetic field aligns the cobalt nuclei vertically, and the subsequent nuclear decays
emit electrons preferentially in the opposite direction to the nuclear spin. (Cobalt-60
decays into nickel-60 plus an electron and an anti-electron neutrino,
60
Co  60 Ni  e  νe (1-12)
See the Feynman Lectures for more information on this subject.) Now you can just tell
the alien, "We drive on the right." (Hope she doesn't land in Australia.)
 
A B
 y
B

A
x

Figure 1-6. Illustration of a cross product. Can you prove that if A is along the x-
 
axis, then A  B is in the y-z plane?
 
This lets us define the cross product of two vectors A and B as shown in figure 1-5.

The cross product of these two vectors is a third vector C , with magnitude
C = |A B sin (B - A)|, (1-13)
 
perpendicular to the plane containing A and B , and in the sense "determined by the
right-hand rule." This last phrase, in quotes, is shorthand for the following operational
 
definition: Place A and B so that they both start at the same point. Choose a third
 
direction perpendicular to both A and B (so far, there are two choices), and call it the
 
upward direction. If, as A rotates towards B , it rotates in a right-handed direction, then

this third direction is the direction of C .
Example. The Lorentz Force is the force exerted on a charged particle due to
electric and magnetic fields. If the particle's charge is given by q and it is moving
with velocity v , in electric field E and magnetic field B , the force F is given
by
F  qE  qv  B (1-14)
The second term is an example of a vector quantity created from two other
vectors.
1-6
Example. The cross product is used to find the direction of the third axis in a
three-dimensional space. Let û and v̂ be two orthogonal unit vectors,
representing the first (or x) axis and the second (or y) axis, respectively. A unit
vector ŵ in the correct direction for the third (or z) axis of a right-handed
coordinate system is found using the cross product:
ˆ  uˆ  vˆ
w (1-15)
D. Vectors in Terms of Components
Until now we have discussed vectors from a purely geometrical point of view. There is
another representation, in terms of components, which makes both theoretical analysis
and practical calculations easier. It is a fact about the space that we live in that it is
possible to find three, and no more than three, vectors that are mutually orthogonal. (This
is the basis for calling our space three dimensional.) Descartes first introduced the idea
of measuring position in space by giving a distance along each of three such vectors. A
Cartesian coordinate system is determined by a particular choice of these three vectors.
In addition to requiring the vectors to be mutually orthogonal, it is convenient to take
each one to have unit length.
A set of three unit vectors defining a Cartesian coordinate system can be chosen as
follows. Start with a unit vector iˆ in any direction you like. Then choose any second
unit vector ĵ which is perpendicular to iˆ . As the third unit vector, take kˆ  iˆ  ˆj .
These three unit vectors (iˆ, ˆj, kˆ) are said to be orthonormal. This means that they are
mutually orthogonal, and normalized so as to be unit vectors. We will often refer to their
directions as the x, y, and z directions. We will also sometimes refer to the three vectors
as (eˆ1 , eˆ2 , eˆ3 ) , especially when we start writing sums over the three directions.

Suppose that we have a vector A and three orthogonal unit vectors (iˆ, ˆj, kˆ) , all defined
as in the previous sections by their length and direction. The three unit vectors can be

used to define vector components of A , as follows:

Ax  A  iˆ,

Ay  A  ˆj , (1-16)

Az  A  kˆ.
This suggests that we can start a discussion of vectors from a component view, by simply
defining vectors as triplets of scalar numbers:
 Ax 
 
A   Ay  Component Representation of Vectors (1-17)
A 
 z
It remains to prove that this definition is completely equivalent to the geometrical
definition, and to define vector addition and multiplication of a vector by a scalar in terms
of components.
1-7
Let us show that these two ways of specifying a vector are equivalent - that is, to each
geometrical vector (magnitude and direction) there corresponds a single set of
components, and (conversely) to each set of components there corresponds a single
geometrical vector. The first assertion follows from the relation (1-16), showing how to
determine the triplet of components for any given geometrical vector. The dot product of
any two vectors exists, and is unique.

The converse is demonstrated in Figure 1-7. It is seen that the vector A can be written as
the sum of three vectors proportional to its three components:

A  iÂ  ˆjA  kÂ
x y z (1-18)
From the diagram it is clear that, given three components, there is just one such sum. So,
we have established the equivalence
k̂Az
iÂx
ĵAy
x

Figure 1-7. Illustration of the addition of the component vectors iAx, jAy, and kAz to get the vector A .

This proves that a given set of values for (Ax, Ay, Az) leads to a unique vector A in the geometrical picture.
 Ax 
 
A  (magnitude, direction)   Ay  (1-19)
A 
 z
E. Algebraic Properties of Vectors.
As a warm-up, consider the familiar algebraic properties of scalars. This provides a road
map for defining the analogous properties for vectors.
Equality.
1-8
ab  ba
a  b and b  c  a  c
Addition and multiplication of scalars.
ab  ba
a  (b  c)  (a  b)  c  a  b  c
ab  ba
a(bc)  (ab)c  abc
a(b  c)  ab  ac
Zero, negative numbers.
a0  a
a  (a)  0
No surprises there.
Equality. We will say that two vectors are equal, meaning that they are really the same
vector, if all three of their components are equal:
A  B  Ax  Bx , Ay  By , Az  Bz . (1-20)
The commutative property, A  B  B  A , and the transitive property,
A  B and B  C  A  C follow immediately, since components are scalars.
Vector Addition. We will adopt the obvious definition of vector addition using
components:
  
C  A B
C x  Ax  B x 
  DEFINITION (1-21)
 C y  Ay  B y 
C  A  B 
 z z z 
That is to say, the components of the sum are the sums of the components of the vectors
being added. It is necessary to show that this is in fact the same definition as the one we
introduced for geometrical vectors. This can be seen from the geometrical construction
shown in figure 1-7a.
1-9
C B
y
A
Figure 1.7a. The components of the sum vector C are seen to be the algebraic sum of the
components of the two vectors being summed.
Multiplication of a vector by a scalar. We will take the following, rather obvious,


definition of multiplication of a vector A by a scalar c:
 cAx 
 
cA   cAy  DEFINITION (1-23)
 cA 
 z
It is pretty clear that this is consistent with the procedure in the geometrical
representation: just multiply the length by the scalar c, leaving the angle unchanged.
The Zero Vector
We define the zero vector as follows:

0
 
0  0 DEFINITION (1-24)
0
 
Taken with the definition of vector addition, it is clear that the essential relation
A 0  A
1 - 10
is satisfied. And a vector with all zero-length components certainly fills the bill as the
geometrical version of the zero vector.
The Negative of a Vector. The negative of a vector in terms of components is also easy
to guess:
  Ax 
 
 A    Ay  DEFINITION (1-26)
 A 
 z
 
The essential relation A   A  0 will clearly be satisfied, in terms of components. It is
also easy to prove that this corresponds to the geometrical vector with the direction
reversed; we will also omit this proof.
Subtraction of Vectors. Subtraction is then defined by

A  B  A  (B) subtraction of vectors (1-27)
That is, to subtract a vector from another
one, just add the vector's negative. The  
"vector-subtraction parallelogram" for BA B
two vectors A and B is shown in figure
A
1-8. The challenge is to choose the
A
directions of A and B such that the

A B
diagonal correctly represents head-to-tail B
addition of the vectors on the sides.
Figure 1-8. The vector-subtraction parallelogram.
E. Algebraic properties of vector Can you put arrows on the sides of the
addition. parallelogram so that both triangles read as correct
vector-addition equations?
Vectors follow algebraic rules similar to

those for scalars:
A B  B  A commutative property of vector addition
A  ( B  C )  ( A  B)  C ) associative property of vector addition
 
a A  B  aA  aB distributive property of scalar multiplication (1-28)
 a  b  A  aA  aB another distributive property

c  dA    cd  A associative property of scalar multiplication
In the case of geometrically defined vectors, these properties are not so easy to prove,
especially the second one. But for component vectors they all follow immediately (see
the problems). And so they must be correct also for geometrical vectors.
As an illustration, we will prove the distributive law of scalar multiplication, above, for
component vectors. We use only properties of component vectors.
1 - 11
 Ax  Bx 
 
a ( A  B )  a  Ay  By  definition of addition of vectors
A B 
 z z 
 a  Ax  Bx  
 
  a  Ay  By   definition of multiplication by a scalar
 
 a  Az  Bz  
 aAx  aBx 
 
  aAy  aBy  distributive property of scalar multiplication
 aA  aB 
 z z 
 aAx   aBx 
   
  aAy    aBy  definition of addition of vectors
 aA   aB 
 z  z
 Ax   Bx 
   
 a  Ay   a  By  definition of multiplication of a vector by a scalar
A  B 
 z  z
 aA  aB QED
The proofs of the other four properties are similar.
F. Properties of a Vector Space.
Vectors are clearly important to physicists (and astronomers), but the simplicity and
power of representing quantities in terms of Cartesian components is such that vectors
have become a sort of mathematical paradigm. So, we will look in more detail at their
abstract properties, as members of a Vector Space.
In Table 1-1 we give a summary of the basic properties which a set of objects must have
to constitute a vector space.
1 - 12
A vector space is a set of objects, called vectors, with the operations of addition of
two vectors and multiplication of a vector by a scalar defined, satisfying the
following properties.
 
1. Closure under addition. If A and B are both vectors, then so is C  A  B.

2. Closure under scalar multiplication. If A is a vector and d is a scalar, then
B  dA is a vector.
 
3. Existence of a zero. There exists a zero vector 0 , such that, for any vector A ,
  
A  0  A.
 
4. Existence of a negative. For any vector A there exists a negative  A , such that
  
A  ( A)  0 .
5. Algebraic Properties. Vector addition and scalar multiplication satisfy the
following rules:
A B  B A (1-29) commutative
A  ( B  C )  ( A  B)  C (1-30) associative
a( A  B)  aA  aB (1-31) distributive
 a  b  A  aA  bA (1-32) distributive
 
c dA  (cd ) A (1-33) associative
Table 1-1. Properties of a vector space.
Notice that in the preceding box, vectors are not specifically defined. Nor is the method
of adding them specified. We will see later that there are many different classes of
objects which can be thought of as vectors, not just displacements or other three-
dimensional objects.
Example: Check that the set of all component vectors, defined as triplets of real
numbers, does in fact satisfy all the requirements to constitute a vector space.
Referring to Table 1-1, it is easy to see that the first four properties of a vector
space are satisfied:
 Ax   Bx 
   
1. Closure under addition. If A   Ay  and B   By  are both vectors, then so
A  B 
 z  z
 Ax  Bx 
 
is C  A  B   Ay  By  . This follows from the fact that the sum of two scalars
A B 
 z z 
gives another scalar.
1 - 13

2. Closure under multiplication. If A is a vector and d is a scalar, then
 dAx 
 
B  dA   dAy  is a vector. This follows from the fact that the product of two
 dA 
 z
scalars gives another scalar.
0

3. Zero. There exists a zero vector 0   0  , such that, for any vector A ,
0
 
 Ax  0 x 
 
A  0   Ay  0 y   A . This follows from the addition-of-zero property for
 A 0 
 z z 
scalars.
  Ax 

4. Negative. For any vector A there exists a negative  A    Ay  , such that
 A 
 z
  
A  ( A)  0 . Adding components gives zero for the components of the sum.
5. The algebraic properties (1-29) through (1-33) were discussed above;

they are satisfied for component vectors.
So, all the requirements for a vector space are satisfied by component vectors.
This had better be true! The whole point of vector spaces is to generalize from
component vectors in three-dimensional space to a broader category of
mathematical objects that are very useful in physics.
Example: The properties above have clearly been chosen so that the usual
definition of vectors, including how to add them, satisfies these conditions. But
the concept of a vector space is intended to be more general. What if we define
vectors in two dimensions geometrically (having a magnitude and an angle) and
we keep multiplication by a scalar the same, but we redefine vector addition in the
following way.
C  A  B  ( A  B, A   B ) (1-34)
This might look sort of reasonable, if you didn't know better. Which of the
properties (1)-(5) in Table 1-1 are satisfied?
1. Closure under addition: OK. A + B is an acceptable magnitude, and A +
B is an acceptable angle. (Angles greater than 360 are wrapped around.)
2. Closure under scalar multiplication: OK
3. Zero: the vector 0  (0,0) works fine; adding it on doesn't change A .
4. Negative: Not OK. There is no way to add two positive magnitudes
(magnitudes are non-negative) to get zero.
5. Algebraic properties: You can easily show that these are all satisfied.
1 - 14
Conclusion: With this definition of vector addition, this is not a vector space.
G. Metric Spaces and the Scalar Product
The vector space as we have just defined it lacks something important. Thinking of
displacements, the length of a displacement, measuring the distance between two points,
is essential to describing space. So we want to add a way of assigning a magnitude to a
vector. This is provided by the scalar product.
The scalar product. The components of two vectors can be combined to give a scalar as
follows:
 
A  B  Ax B x  Ay B y  Az B z DEFINITION (1-40)
 
It is easy to show that the result follows from the representation of A and B in terms of
the three unit vectors of the Cartesian coordinate system:
 
A  B  (iÂx  ˆjAy  kÂz )  (iˆB x  ˆjB y  kˆBz )
 Ax Bx  Ay B y  Az Bz
where we have used the orthonormality of the unit vectors, iˆ  iˆ  ˆj  ˆj  kˆ  kˆ  1 ,
iˆ  ˆj  iˆ  kˆ  ˆj  kˆ  0 . From the above definition in terms of components, it is easy to
demonstrate the following algebraic properties for the scalar product:
A B  B  A 1-13
 
A B  C  A B  AC 1-14 
A   aB   a  A  B  1-15
The inner product of a displacement with itself can then be used to define a distance
between points 1 and 2:
r12  r12 r12  x122  y122  z122 (1-41)
From this expression we see that the distance between two points will not be zero unless
they are in fact the same point.
The scalar product can be used to define the direction cosines of an arbitrary vector,
with respect to a set of Cartesian coordinate axes. The direction cosines are defined as
follows (see figure 1-9):

DIRECTION COSINES  ,  , and  of a vector A (1-42)
1 - 15
1  ˆ Ax
cos  Ai  z
A A
1  Ay
cos   A  ˆj 
A A
1  A  A
cos  A  kˆ  z
A A
Specifying these three values is one way of 
giving the direction of a vector. However, y
only two angles are required to specify a

direction in space, so these three angles must
not be independent. It can be shown (see x
problems) that Figure 1-9. The direction cosines for the vector
cos2   cos2   cos2   1 A are the cosines of the three angles shown.
(1-43)
Definition of a Metric Space. The properties given in table 1-1 constitute the standard
definition of a vector space. However, inclusion of a scalar product turns a vector space
into the much more useful metric space, as defined in table 1-2. The difference between
a vector space and a metric space is the concept of distance introduced by the inner
product.
A metric space is defined as a vector space with, in addition to its usual properties,
an inner product defined which has the following properties:
 
1. If A and B are vectors, then the inner product A B is a scalar.
 
2. A A  0  A  0 . (1-44)
3. The following algebraic properties of the inner product must be obeyed:
AB  B A
A BC  A B  A C (1-45)
A aB  a A B
Table 1-2. Properties of a metric space. Note that the scalar product of two
vectors as just defined has all of these properties.
H. The vector product.

 
The geometrical definition of the cross product of A and B results in a third vector, say,
   
C . The relation between A , B and is C quite complicated, involving the idea of right-
handedness vs. left-handedness. We have already built a handedness into our coordinate
system in the way we choose the third unit vector, kˆ  iˆ  ˆj . As a preliminary to
1 - 16
 
evaluating the cross product A  B , we work out the various cross products among the
 
unit vectors iˆ, ˆj, kˆ . From equation (1-17) we see that the cross product of two
perpendicular unit vectors has magnitude 1. We use the right-hand rule and refer to
figure 1-11 to get the direction of the cross products. This gives
iˆ  ˆj  kˆ
ˆj  kˆ  iˆ
kˆ  iˆ  ˆj
(1-46)
ˆj  iˆ  kˆ
kˆ  ˆj  iˆ

iˆ  kˆ   j
 
Now it is straight forward to evaluate A  B in terms of components:
 
 
A  B  iÂx  ˆjAy  kÂz  iˆBx  ˆjB y  kˆBz 
 Ax B y kˆ  Ax Bz ˆj  Ay Bx kˆ  Ay Bz iˆ  Az Bx ˆj  Az B y iˆ;
A  B  iˆ( Ay Bz  Az By )
 ˆj ( Az Bx  Ax Bz ) definition of the cross product
 kˆ( Ax By  Ay Bx )
(1-47)
This is not as hard to memorize as you might think - stare at it for a while and notice
permutation patterns (yz vs. zy, etc.) Later on we will have some other, even more
elegant ways of writing the cross product.
The cross product is used to represent a number of interesting physical quantities - for
     
example, the torque,   r  F , and the magnetic force, F  qv  B , to name just a
couple.
The cross product satisfies the following algebraic properties:

A  B  B  A (1-48)
A  (B  C)  A  C  B  C (1-49)
A  (aB)  a( A  B) (1-50)
Note that the order matters; the cross product is not commutative.
I. Dimensionality of a vector space and linear independence.
In constructing our coordinate system we used a very specific procedure for choosing the
directions of the axes, which only works for 3 dimensions. There is a broader, general
question to be asked about any vector space: What is the minimum number of vectors
required to "represent" all the others? This minimum number n is the dimensionality of a
vector space.
1 - 17
  
Here is a more precise definition of dimensionality. Consider vectors E1 , E2 , . . . E m .
These vectors are said to be linearly dependent if there exist constants (scalars) c1, c2, . . .
cm, not all zero, such that
   
c1 E1  c2 E2  ...  cm Em  0 . (1-51)
If it is not possible to find such y
constants ci, then the m vectors are
said to be linearly independent.
Now imagine searching among all B
the vectors in the space to find the
largest group of linearly
independent vectors. The
A C x
dimensionality n of a vector space
Figure 1-10. Three vectors in a plane.
is the largest number of linearly
independent vectors which can be found in the space.*
Example using geometrical vectors. Consider the three vectors shown in figure 1-
     
    
10. Which of the pairs A, B , A, C , and B, C are linearly independent?
  
Solution: It is pretty clear that aA  C  0 , for some value of a about equal to 2;
   
so A and C are not linearly independent. But A and B do not add up to zero no matter
what scale factors are used. To see this more clearly, suppose that there exist a and b
    a 

 
such that aA  bB  0 ; then one can solve for B , giving B   A . This means that
b
 
A and B are in the same direction, and this is clearly not so, showing by contradiction
  
that A and B are not linearly dependent. A similar line of reasoning applies to B and
    
   
C . Conclusion: A, B and B, C are the possible choices of two linearly independent
vectors.
Example using coordinates. Consider the three vectors

 2  4  3
     
E1   1  , E2   3  , E3   2  . (1-52)
1  3 1
     
Are they linearly independent?
Solution: Try to find c1, c2, and c3 such that
*
This may sound a bit vague. Suppose you look high and low and you can only find at most two linearly
independent displacement vectors. Are you sure that the dimensionality of your space is two? What if you
just haven't found the vectors representing the third dimension (or the fourth dimension!) This is the
subject of the short film Flatland.
1 - 18
   
c1 E1  c2 E2  c3 E3  0 . (1-53)
This means that the sums of x-components, y-components and z-components must
separately add up to zero, giving three equations:
2c1  4c2  3c3  0 x  component
c1  3c2  2c3  0 y - component .
c1  3c2  c3  0 z - component
Now we solve for c1, c2, and c3. Subtracting the third equation from the second gives c3
=0. The first and second equations then become
2c1  4c2  0
.
c1  3c2  0
The first equation gives c1 = -2c2, and the second equation gives c1 = -3c2. The only
consistent solution is c1 = c2 = c3 = 0. These three vectors are linearly independent!
This second example is a little messier and less satisfying than the previous example, and
it is clear that in 4, 5 or more dimensions the process would be difficult. In chapter 2 we
will discuss more elegant and powerful methods for solving simultaneous linear
equations.
Solving simultaneous linear equations with Mathematica. It is hard to resist

asking Mathematica to do this problem. Here is how you do it:
Solve[{2c1+4c2+3c3==0,c1+3c2+2c3==0,c1+3c2+c3==0},{c2,c3}]
{}
This is Mathematica’s way of telling us that there is no solution.
 4
   
What if we try to make E1 , E2 , and E 3 linearly dependent? If we change E2 to  3  ,
1
 
  
then the sum of E1 and E2 is twice E 3 , so the linear dependence relation
   
E1  E2  2 E3  0
should be satisfied; this corresponds to c2 = c1, c3 = -2c1. Let’s ask Mathematica:
Solve[{2c1+4c2+3c3==0,c1+3c2+2c3==0,c1+c2+c3==0},{c2,c3}]
{{c2->c1,c3->-2 c1}}
Sure enough!!
Is this cheating? I don't think so!
1 - 19
J. Components in a Rotated Coordinate System. In physics there are lots of reasons

for changing from one coordinate system to another. Usually we try to work only with
coordinate systems defined by orthonormal unit vectors. Even so, the coordinates of a
vector fixed in space are different from one such coordinate system to another. As an

example, consider the vector A shown in figure 1-11.
y' y Ax cos 
Ay sin 
Ax'
A x'
Ay
 x
Ax

Figure 1-11. Coordinates of a vector A in two different coordinate systems.
It has components Ax and Ay, relative to the x and y axes. But, relative to the x' and y'
axes, it has different components Ax' and Ay'. You can perhaps see from the complicated
construction in figure 2-3 that
Ax'=Ax cos  + Ay sin  (1-54)
And a similar construction leads to
Ay'=-Ax sin  + Ay cos  (1-55)
There is another, easier (algebraic rather than geometrical) way to obtain this result:

Ax '  A  iˆ'
 (iÂx  ˆjAy )  iˆ' ,
j' j
  
 Ax iˆ  iˆ'  Ay ˆj  iˆ' 
 i'
Ay '  A  ˆj '
 (iÂx  ˆjAy )  ˆj ' ,
 i
  
 Ax iˆ  ˆj '  Ay ˆj  ˆj '  Figure 1-12. Unit vectors for
unrotated (i and j) and rotated
We can evaluate the dot products between unit vectors (i' and j') coordinate systems.
with the aid of figure 1-12, with the result
1 - 20
iˆ  iˆ'  cos ,

iˆ  ˆj'  cos(  2 )   sin
, (1-57)

 ˆj  iˆ'  cos( 2   )  sin ,
 ˆj  ˆj'  cos
This gives for the transformation from unprimed to primed coordinates
Ax '  Ax cos  Ay sin
, (1-58)
Ay '   Ax sin  Ay cos
It is easy to generalize this procedure to three or more dimensions. However, we will

wait to do this until we have introduced some more powerful notation, in the next
chapter.
K. Other Vector Quantities
You may wonder, "What happened to all the other vectors in physics, like velocity,
acceleration, force, . . . ?" They can ALL be derived from the displacement vector, by
taking derivatives or by using a law such as Newton's 2nd law. For example, the average
velocity of a migrating duck which flies from point 1 to point 2 in time t is equal to
 1  
v12  ( r2  r1 ) (1-40)
t
   
The quantity (r2  r1 ) , is the sum of two vectors (the vector r1 and the vector  r2 , which
  
is the negative of the vector r2 ), and so (r2  r1 ) is itself a vector. It is multiplied by a
1
scalar, the quantity . And the product of a vector and a scalar is a vector. So, average
t
velocities are vector quantities. Instantaneous velocities are obtained by taking the limit
as t  0 , and the same argument still applies. You should be able to reconstruct the
line of reasoning showing that the acceleration is a vector quantity.
PROBLEMS
NOTE: In working homework problems, please: (a) make a diagram with every
problem. (b) Explain what you are doing; calculation is not enough. (c) Short is good,
but not always.
Problem 1-1. Consider the 5 quantities below as possible vector quantities:
1. Compass bearing to go to San Jose.

2. Cycles of Mac G4 clock signal in one second.
1 - 21
3. Depth of the water above a point on the bottom of San Francisco Bay, say
somewhere under the Golden Gate Bridge.
4. Speed of wind, compass direction it comes from.
5. Distance and direction to the student union.
Explain why each of these is or is not a vector. (Be careful with number 3. Do you need
to use g , a vector in the direction of the gravitational field near the Golden Gate Bridge,
to define the water depth?)

Problem 1-2. [Mathematica] Consider two displacements: A  (5 feet,0) ,

B  (5 feet,90) (The angle is measured from due East, the usual x axis.)
  
(a) Make a drawing, roughly to scale, showing the vector addition C  A  B .

Then, using standard plane geometry and trigonometry as necessary, calculate C
(magnitude and angle).
(b) Appendix C gives a Mathematica function, Vsum, for adding two vectors. Use
Vsum to add the two given vectors. (Note that the function can be downloaded from the
Ph 385 website.)
Attach a printout showing this operation and the result. Include a brief account of what
you did and an interpretation of the results
Problem 1-3. [Mathematica] Consider the following vectors:


A  (1, 90) ,

B  (2, 45) ,

C  (1, 180) .
Use the Mathematica function Vsum given in Appendix C to verify that, for these three
     
   
vectors, the associative law of vector addition, A  B  C  A  B  C , is satisfied.
[This is, of course, not a general proof of the associative property.]
Attach a printout showing this operation and the result. Include a brief account of what
you did and an interpretation of the B
results. C2
A
Problem 1-4. Consider adding two A
  C1
vectors A and B in two different B
ways, as shown in the diagram. These
vector-addition triangles correspond to
the vector-addition equations
C1  A  B
.
C2  B  A
Show using geometrical arguments (no components!) that C1  C2 .
1 - 22
New Problem 1-4. The object of this problem is to show, by geometrical construction,
that vector addition satisfies the commutativity relation,
A  B  B  A (to be demonstrated)
Start with the two vectors shown to the right. Draw
the vector-summation diagram forming the following
sum: A
B
   
A  B   A  B
This should form a closed figure, adding up to 0 .
(a) Explain why this figure is a parallelogram.
(b) Use this parallelogram to illustrate that
A B  B  A .
(You can use the definition of the negative of a vector as being the vector, drawn in the
opposite direction.)
  
Problem 1-5. Suppose that a quadrilateral is formed by adding four vectors A , B , C ,
  
and D , lying in a plane, such that the end of D just reaches the beginning of A . That is
    
to say, the four vectors add up to zero: A  B  C  D  0 .
Using vector algebra (not plane geometry), prove that the figure formed by joining the
   
center points of A , B , C , and D is a parallelogram. [It is sufficient to show that the
 
vector from the center of A to the center of B is equal to the vector from the center of
   
D to the center of C , and similarly for the other two parallel sides.] Note: A , B , and

C are arbitrary vectors. Do not assume that the quadrilateral itself is a square or a
parallelogram.
Problem 1-6. Consider the "vector-subtraction

parallelogram'' shown here, representing the vector
equation B
A  B  ( A)  ( B)  0 .
(a) Draw the "vector-subtraction parallelogram,'' A A
  
and on it draw the two vectors D1  A  B and
D2   B  A ; they should lie along the diagonals of B
the quadrilateral.
(b) The two diagonals are alleged to bisect each The vector-subtraction parallelogram
other. Using vector algebra, show this by showing
that displacing halfway along D1 leads to the same point in the plane as moving along
one side and displacing halfway along D2 . HINT: you can show that amounts to
proving that D1 / 2  B  D2 / 2 .
Problem 1-7. Explain why acceleration is a vector quantity. You may assume that
velocity is a vector quantity and that time is a scalar. Be as rigorous as you can.
1 - 23
Problem 1-8. Using Hooke's law, and assuming that displacements are vectors, explain
why force should be considered as a vector quantity. Be as rigorous as you can.
Problem 1-9. Can you devise an argument to show that the electric field is a vector?
The magnetic field?
Problem 1-10. Consider a set of vectors defined as objects with magnitude (the
magnitude must be non-negative) and a single angle to give the direction (we are in a
two-dimensional space). Let us imagine defining vector addition as follows:
   
A  B   max( A, B), A B  .
 2 
That is, the magnitude of the sum is equal to the greater of the magnitudes of the two
vectors being added, and the angle is equal to the average of their angles. We keep the
usual definition of multiplication of a vector by a scalar, as described in the text. (Note:
The symbol  indicates an alternate definition of vector addition, and is not the same as
the usual vector addition in the geometrical representation.)
In order to see if this set of vectors constitutes a vector space,

(a) Try to define the zero vector.
(b) Try to define the negative of a vector.
(c) Test to see which of the properties (1-29) through (1-33) of a vector space can be
satisfied.
New Problem 1-10. Consider a set of vectors defined as objects with magnitude (the
magnitude must be non-negative) and a single angle to give the direction (we are in a
two-dimensional space). Let us imagine defining vector addition as follows:
A  B   A  B , A   B  .
That is, the magnitude of the sum is equal to the absolute value of the difference in
magnitude of the two vectors being added, and the angle is equal to the difference of their
angles. We keep the usual definition of multiplication of a vector by a scalar, as described
in the text. (Note: The symbol  indicates an alternate definition of vector addition,
and is not the same as the usual vector addition in the geometrical representation.)

satisfied.
Problem 1-10a. Consider a set of vectors defined as objects with three scalar
components;
 Ax 
 
A   Ay  .
A 
 z
1 - 24
Let us imagine defining vector addition as follows:

 Ax  Bx 
 
A  B   Ay  Bz  .
 Az  By 
 
We keep the usual definition of multiplication of a vector by a scalar, as described in the
text. (Note: The symbol  indicates an alternate definition of vector addition, and is
not the same as the usual vector addition in the component representation.)

satisfied.
Problem 1-11. Vector addition, in the generalized sense discussed in this chapter, is a
process which turns any two given vectors into a third vector which is defined to be their
sum. Consider the space of 3-component vectors. Suppose someone suggests that the
cross product could actually be thought of as vector addition, since from any two given
vectors it produces a third vector. What is the most serious objection that you can think of
to this idea, based on the general properties of a vector space given in the chapter?
Problem 1-12. The Lorentz force on a charged particle is given by equation (1-14) in the
text. Let us consider only the second term, representing the force on a particle of charge
q, moving with velocity v in a magnetic field B :
F  qv  B .
The power produced by the operation of such a force, moving with velocity v , is given
by P  F  v .
Using these definitions, show that the Lorentz force on a moving charged particle does
no work.
 
Problem 1-13. Let us consider two vectors A and B , members of an abstract metric
space. These vectors can be said to be perpendicular if A B  0 . Using the basic
properties of the vector space (Table 1-1) and of the inner product (Table 1-2), prove that,
 
if A B  0 , then A and B are linearly independent; that is, that if you write
  
c1 A  c2 B  0 ,
 
prove using the inner product that c1=0 and c2=0. (It is assumed that neither A nor B is

equal to 0 .) This is a general proof that does not depend on geometrical properties of
vectors in space. Hint: Start with A c1 A  c2 B  0 .
Problem 1-14. One of the important properties of a rotation is that the length of a vector
is supposed to be invariant under rotation. Use the expressions (1-54) and (1-55) for the
1 - 25
 
coordinates of vector A in a rotated coordinate system to compare the length of A
 
before and after the rotation of the coordinate system. [Use A 2  A  A to determine the

length of A .]
The following seven problems will all make reference to these three vectors:
 4 0 7
     
A   4 , B   5 , C   1 .
 4   5   1
     
      

Problem 1-15. Calculate the following dot products: A  B , A  C , and B  A  2C . 
  
Problem 1-16. Which vector (among A , B , and C ) is the longest? Which is the
shortest?
  

Problem 1-17. Calculate the vector product A  B  C . 

Problem 1-18. Find a unit vector parallel to C .

Problem 1-19. Find the component of B in the direction perpendicular to the plane
 
containing A and C . (Hint: the component of a vector in a particular direction can be
found by taking the dot product of the vector with a unit vector in that direction.)
 
Problem 1-20. Find the angle between A and B .
  
Problem 1-21. Use Mathematica to determine whether or not A , B , and C (the three
vectors given above) are linearly independent. That is, ask Mathematica to find values of
   
c1, c2, and c3 such that c1 A  c2 B  c3C  0 . (See section I of this chapter, using the
"solve" function.) Your results should be briefly annotated in such a way as to explain
what you did and to interpret the results.
Problem 1-22. Use the definition of vector addition in terms of components to prove the
associative property of vector addition, equation (1-30).
   
Problem 1-23. Find 2 A , 3 B , and 2 A -3 B , when
1  2
   
(a) A   2  and B   1  ;
1  2
   
 3 1
   
(b) A   2  and B   1  ;
 3  2
   
1 - 26
6  4
(c) A   3  and B   2  ;
 
1 1
   
 
Prove that 2 A  3B is parallel to the (x,y)-plane in (b), and parallel to the z axis in (c).

Problem 1-24. Suppose that the vector D  x0 , y0  points from the origin to the center
of a circle. (We are working in two dimensions in this problem.) The points on the circle
are defined to be those for which the distance from the center point is equal to the

constant R. Let the vector from the origin to a point on the circle be X  x, y  . Then the
vector from the circle's center to the point on the circle is given by
  
R  X  D .
The condition that the point is on the circle can then be expressed in terms of the dot
product as follows:
 
  
R  R  R 2 .
Show that this condition leads to the standard equation for a circle, in terms of x, y, x0, y0,
and R.
 2 
 2
1 ˆ ˆ  2 
Problem 1-25. Consider the vector uˆ  
i  j  
2 
, lying in the x-y plane.
2 
 0 
 
 
(a) Show that û is a unit vector.
(b) Find another unit vector v̂ which is also in the x-y plane, and is orthogonal to û .
(c) Find a third unit vector ŵ , such that û , v̂ , and ŵ are mutually orthogonal.
Problem 1-26. Show that the three direction cosines corresponding to a given vector
satisfy the relation
cos2   cos2   cos2   1 .
1 - 27
Problem 1-27. Use the dot product and coordinate notation to find the cosine of the
angle between the body diagonal A of the cube
y
shown and one side B of the cube.
B
Problem 1-28. Consider the following situation,
analogous to the expansion of the Universe. A swarm
of particles expands through all space, with the a A
velocity v (t ) of a given particle with position vector a
r (t ) relative to a fixed origin O given by a
dr z
v (t )   f (t )r ,
dt
with f(t) some universal function of time. This is the Hubble law.
Show that the same rule applies to positions r (t ) and velocities v(t ) measured relative
to any given particle, say particle A, with position rA (t ) . That is, show that for
dr 
r(t )  r  t   rA  t  and v  ,
dt
v  f (t )r  .
This invariance with respect to position in the Universe is sometimes called the
cosmological principle. Can you explain why it implies that we are not at the "center of
the Universe?"

Problem 1-29. It is sometimes convenient to think of a vector B as having components

parallel to and perpendicular to another vector A ; call these components B and B ,
respectively
(a) Show that
 
  B A
B| |  A 2
 A

is parallel to A and has magnitude equal to the component B  Aˆ of B in the direction of

A ; that is, show that
 
B||  Aˆ B  Aˆ

where as usual Â is a unit vector in the direction of A (and A is its magnitude).
 
(b) Consider an expression for the part of B perpendicular
  to A:
   B A
B  B  A 2 .
A
Show that
1. B  B  B
2. B  A  0
1 - 28
Problem 1-30. Consider a cube of side a, with one z

corner at the origin and sides along the x, y, and z axes

as shown. Let the vector A be the displacement from B
the origin to the opposite side of the cube, as shown, and

let B be the vector from the origin to the other corner of A y
the cube on the z axis, as shown. a
Use the result of the previous problem to find the a
 
component of B perpendicular to A . Express the result a
in component form, in terms of a. x
Problem 1-31. Consider the set of positive real numbers (zero not included). Of course
we know how to add and multiply such numbers. But let us think of them as vectors,
with a crazy definition for vector addition, represented by the symbol  . Here is the
definition of this possible vector space.
Consider the set V  {a, b, c,... real numbers greater than zero} , with its elements
referred to as vectors, with vector addition defined in the following way:
a  b  ab ,
where ab represents the usual product of the two numbers.
Refer to Table 1.1, all of the properties of a vector space except those involving scalar
multiplication. Are properties 1, 3, and 4 satisfied? Are the associative and commutative
properties of vector addition, (1-29) and (1-30), satisfied? Explain.
Problem 1-32. Consider component vectors defined in the usual way, as in section E of
this chapter. However, hoping to elevate this vector space to metric-space status, we
define the inner product of two vectors A and B in the following way:
A  B  Ax2  Ay2  Az2 Bx2  By2  Bz2 .
Consider the properties of the inner product listed in Table 1-2. Are they satisfied with
this definition of the inner product? Explain.
1 - 29
Chapter 2. The Special Symbols  ij and  ijk , the Einstein

Summation Convention, and some Group Theory
Working with vector components and other numbered objects can be made easier (and
more fun) through the use of some special symbols and techniques. We will discuss two
symbols with indices, the Kronecker delta symbol and the Levi-Civita totally
antisymmetric tensor. We will also introduce the use of the Einstein summation
convention.
References. Scalars, vectors, the Kronecker delta and the Levi-Civita symbol and the
Einstein summation convention are discussed by Lea [2004], pp. 5-17. Or, search the
web. One nice discussion of the Einstein convention can be found at
http://www2.ph.ed.ac.uk/~mevans/mp2h/VTF/lecture05.pdf . You may find other of the
lectures at this site helpful, too.
A. The Kronecker delta symbol,  ij .
This symbol has two indices, and is defined as follows:

0, i  j 
 ij   , i, j 1,2,3 Kronecker delta symbol (2-1)
1, i  j 
Here the indices i and j take on the values 1, 2, and 3, appropriate to a space of three-
component vectors. A similar definition could in fact be used in a space of any
dimensionality.
We will now introduce new notation for vector components, numbering them rather
than naming them. [This emphasizes the equivalence of the three dimensions.] We will
write vector components as
 Ax 
 
 Ay    Ai , i  1,3 (2-2)
A 
 z
We also write the unit vectors along the three axes as
 
iˆ, ˆj, kˆ  eî , i  1,3 (2-3)
The definition of vector components in terms of the unit direction vectors is

Ai  A  eî , i  1,3 (2-4)
The condition that the unit vectors be orthonormal is
eî  eˆ j   ij (2-5)
This one equation is equivalent to nine separate equations: iˆ  iˆ  1 , ˆj  ˆj  1 , kˆ  kˆ  1 ,
iˆ  ˆj  0 , ˆj  iˆ  0 , iˆ  kˆ  0 , kˆ  iˆ  0 , ˆj  kˆ  0 , kˆ  ˆj  0 !!! [We have now stopped
writing "i,j=1,3;" it will be understood from now on that, in a 3-dimensional space, the
"free indices" (like i and j above) can take on any value from 1 to 3.]
2-1
Example: Find the value of ˆj  kˆ obtained by using equation (2-5).

Solution: We substitute ê2 for ĵ and ê3 for k̂ , giving
ˆj  kˆ  eˆ2  eˆ3   23  0 ,
correct since ĵ and k̂ are orthogonal.
B. The Einstein summation convention.

 
The dot product of two vectors A and B now takes on the form
  3
A  B   Ai Bi . (2-6)
i 1
This is the same dot product as previously defined in equation (1-40), except that AxBx
has been replaced by A1B1 and so on for the other components.
Now, when you do a lot of calculations with vector components, you find that the sum of
an index from 1 to 3 occurs over and over again. In fact, occasions where the sum would
not be carried out over all three of the directions are hard to imagine. Furthermore, when
a sum is carried out, there are almost always two indices which have the same value - the
index i in equation (2-6) above, for example. So, the following practice makes the
equations much simpler:
The Einstein Summation Convention. In expressions involving vector or tensor indices,

whenever two indices are the same (the same symbol), it will be assumed that a sum over
that index from 1 to 3 is to be carried out. This index is referred to as a paired index;
paired indices are summed. An index which only occurs once in a term of an expression is
referred to as a free index, and is not summed.
This sounds a bit risky, doesn't it? Will you always know when to sum and when not to?
It does simplify things, though. The reference to tensor indices means indices on
elements of matrices. We will see that this convention is especially well adapted to
matrix multiplication.
So, the definition of the dot product is now

 
A  B  Ai Bi , (2-7)
the same as equation (2-6) except that the summation sign is omitted. The sum is still
carried out because the index i appears twice, and we have adopted the Einstein
summation convention.
To see how this looks in practice, let's look at the calculation of the x-component of a

vector, in our new notation. We will write the vector A , referring to the diagram of
figure 1-11, as
2-2

A  iÂx  ˆjAy  kÂz
, (2-8)
 eî Ai
where in the second line the summation over i = {1,2,3} is implicit. Now use the
definition of the x-component,

Ax  A1  A  ê1 . (2-9)
Combining (2-8) and (2-9), we have
A1  A  eˆ1
 (eî Ai )  eˆ1
 Ai (eî  eˆ1 ) (2-10)
 Ai  i1
 A1
In the next to last step we used (2-5), the orthogonality condition for the unit direction
vectors.
Next we carried out one of the most important operations using the Kronecker delta
symbol, summing over one of its indices. This is also very confusing to someone seeing
it for the first time. In the last line of equation (2-10) there is an implied summation over
the index i. We will write out that summation term by term, just this once:
Ai  i1  A1 11  A2 21  A3 31 (2-11)
Now refer to (2-1), the definition of the Kronecker delta symbol. What are the values of
the three delta symbols on the right-hand side of the equation above? Answer: 11 = 1,
21 = 0, 31 = 0. Substituting these values in gives
Ai  i1  A1 11  A2 21  A3 31
 A1  1  A2  0  A3  0 (2-12)
 A1
What has happened? The index "1" has been transferred from the delta symbol to A.
C. The Levi-Civita totally antisymmetric tensor.
The Levi-Civita symbol is an object with three vector indices,
 ijk ,  i  1, 2,3; j  1, 2,3; k  1, 2,3 Levi-Civita Symbol

1,  i, j , k  an even permutation of 1, 2,3
 (2-13)
 ijk   1,  i, j , k  an odd permutation of 1, 2,3
 0 otherwise

All of its components (all 27 of them) are either equal to 0, -1, or +1. Determining which
is which involves the idea of permutations. The subscripts (i,j,k) represent three
numbers, each of which can be equal to 1, 2, or 3. A permutation of these numbers
scrambles them up, and it is a good idea to approach this process systematically. So, we
are going to discuss the permutation group.
2-3
Groups. A group is a mathematical concept, a special kind of set. It is defined as

follows:
Definition: A group G is a set of objects  A, B, C ,... with multiplication of one
member by another defined, closed under multiplication, and with the additional
properties:
(i) The group contains an element I called the identity, such that, for every
element A of the group,
AI  IA  A (2-14)
(ii) For every element A of the group there is another element B, such that
AB  BA  I . (2-15)
B is said to be the inverse of A:
A  B 1 . (2-16)
(iii) Multiplication must be associative:
A  BC    AB  C . (2-17)
There is an additional property which only some groups have. If multiplication is
independent of the order in the product, the group is said to be Abelian.
Otherwise, the group is non-Abelian.
AB  BA  Abelian group . (2-18)
This may seem fairly abstract. But the members of groups used in physics are usually
operators, operating on interesting things, such as vectors or members of some other
vector space. Right now we are going to consider permutation operators, operating on
sets of indices.
The Permutation Group. We will start by defining the objects operated on, then the
operators themselves. Consider the numbers 1, 2, and 3, in some order, just like the
indices on the Levi-Civita symbol:
( a , b, c ) . (2-19)
Here each letter represents one of the numbers, and they all three have to be represented.
It is pretty easy to convince yourself that the full set of possibilities is
(a, b, c)  1, 2,3 , 1,3, 2  ,  2,1,3 ,  2,3,1 , 3,1, 2  , 3, 2,1 . (2-20)
Now the permutation group of the third degree consists of operators which re-arrange the
three numbers as follows:
 P   a, b, c    a, b, c    P123  I 
 
 P   a, b, c    a, c, b    P132 
 
 P   a, b, c    b, a, c    P213 
 . (2-21)
 P   a, b, c    b, c, a    P231 
 P a , b, c  c , a , b  P
      312 
 P   a, b, c    c, b, a    P321 
 
2-4
The second form of the notation shows where the numbers (1,2,3) would end up under
that permutation. The first entry is the permutation which doesn't change the order,
which is evidently the identity for the group. The group consists of just these six
members
Examples of permutations operating on triplets of indices:

P123  2,3,1   2,3,1
P132  2,3,1   2,1,3
. (2-22)
P321  2,3,1  1,3, 2 
P132 P321  2,3,1  1, 2,3
Do you follow the fourth line? First the permutation P321 is carried out, giving (1,3,2);
and then the permutation P132 operates on this result, giving (1,2,3). This brings us to the
subject of multiplication of group elements. This fourth line shows us that the product of
the given two permutation-group elements is itself a permutation, namely
P132 P321  P312 . (2-23)
Try this yourself, and verify that
P312  2,3,1  1, 2,3 . (2-24)
From this example, it is pretty clear that the group of six elements given above is closed
under multiplication. There is an identity, the permutation which doesn't change the
order. And it is pretty easy to identify the inverses within the group.
Example: Show that
P3121  P231 .
Proof: Try it out on the triplet (a,b,c):
p312  a, b, c    c, a, b 
.
P231  c, a, b    a, b, c 
The inverse permutation P231 just reverses the effect of P312.
There are some simpler permutation operators related to the Pijk, the binary permutation
operators which just interchange a pair of indices, while leaving the third one unchanged.
 P12  a, b, c    b, a, c  
 
 P13  a, b, c    c, b, a   . (2-25)
 P  a, b, c    a, c, b  
 23 
It is easy to see that the six group members given in equation (2-21) can be written in
terms of the binary permutation operators:
2-5
 P123  I  P12 2  P132  P232 

 
 P132  P23 
 P213  P12 
 . (2-26)
 P231  P12 P13 
 P312  P13 P12 
 
 P321  P13 
(Remember, the right-hand operator in a product operates first.)
There is a special subset of permutations of a series of objects called the circular

permutations, where the last index comes to the front and the others all move over one
(see Figure 2-1).
(a,b,c,d,e,f,g) ---> (g,a,b,c,d,e,f)
Figure 2-1. A circular permutation.

For the six objects listed in eq. (2-21), three of them are circular permutations of (1,2,3),
namely
(a, b, c)circular   1, 2,3 , 3,1, 2  ,  2,3,1 . (2-27)
Each of these is produced by an even number of binary permutations, and the other three
are produced by an odd number of binary permutations. So, the group divides up into
three "even permutations" and three "odd permutations:"
 P123  
  
 P312  even permutations 
 P231  
 . (2-28)
 P213  
 P  odd permutations 
 132  
 P321  
The Levi-Civita symbol. Now we can finally use the idea of even and odd permutations
to define the Levi-Civita symbol:
1, (ijk) an even permutation of (123)
 
 ijk  - 1, (ijk) an odd permutation of (123) Levi - Civita Symbol (2-29)
 
 0 otherwise 
Notice that there are only six non-zero symbols, three equal to +1 and three equal to -1.
And any binary permutation of the indices (interchanging two indices) changes the sign.
This is the key property in many calculations using the Levi-Civita symbol.
Example: Give the values of 312, 213, and 322,
2-6
Answer: (312) is an even permutation of (123), so 312 = +1. (213) is obtained

from (312) by permuting the first and last numbers, so it must be an odd
permutation, and 213 = -1. (322) is not a permutation of (123) at all, so 322 = 0.
Question: Is the permutation group Abelian? What about the subgroup consisting of the
three circular permutations? Answering these questions will be left to the problems.
D. The cross product.


In the last chapter we found the following result for the cross product of two vectors A

and B in terms of their components:
 
A  B  iˆ( Ay Bz  Az By )
 ˆj ( Az Bx  Ax Bz ) (1-48)
 kˆ( A B  A B )
x y y x
Notice that there are a lot of permutations built into this definition. In particular, each
term involves a permutation of (x,y,z), with the first letter indicating the unit vector, the
second, the component of A, and the third, the component of B. Here is an elegant way of
re-writing this expression using the Levi-Civita symbol:
 

A  B i   ijk A j Bk  (2-30)
It may be less than obvious at first glance that (2-30) is the equivalent of (1-48). First
let's just examine the index structure of the expression. The left-hand side has a single
unpaired, or free, index, i. This means that it represents any single one of the components
   
of the vector A  B - we would say that it gives the i-th component of A  B . Now look
at the right-hand side. There is only one free index, and it is i, the same as on the left-
hand side. This is the way it has to be. In addition, there are two paired indices, j and k.
These have to be summed. If we were not using the Einstein summation convention, this
 
 
3 3
expression would read A  B i     ijk A j Bk . We have decided to follow the
j 1 k 1
Einstein convention and so we will not write the summation signs. However, for any
given value of i, there are nine terms to evaluate.
To see exactly how this works out, let's evaluate the result of (2-30) for i=2. This
 
should give the y-component of A  B . Here it is:
 
 
A  B 2   2 jk A j Bk
  211 A1 B1   212 A1 B2   213 A1 B3 
(2-31)
 221 A2 B1   222 A2 B2   223 A2 B3 
 231 A3 B1   232 A3 B2   233 A3 B3
But, most of these terms are equal to zero, because two of the indices on the Levi-Civita
symbol are the same. There are only two non-zero L.-C. symbols: 213 = -1, and 231 =
+1. Using these facts, we arrive at the answer
2-7
A  B 
  A1 B3  A3 B1
2 (2-32)
This is the same as the y-component of equation (1-48), if the correct substitutions are
made for numbered instead of lettered components. So, the two versions of the cross
product agree.
Example: Use the tensor form of the cross product, equation (2-30), to prove that
   
A  B  B  A .
Proof: There was a similar relation for the dot product - but with a plus sign!
Let's see
 how this works in tensor notation:

 
A  B i   ijk A j Bk definition of the cross product
  ikj A j Bk permuting two indices of ε gives a minus sign
  ikj Bk A j the A j and the Bk can be written in any order
 
 
  B  A i definition of the cross product
In regard to the last step of this example, it is worth remarking that particular name
given to a summed index doesn't matter - it is sort of like the dummy variable inside a
 
definite integral. What matters in the definition of the cross product A  B is that the
index of the components of A match with the second index of , and the index of the
components of B, with the third index of .
E. The triple scalar

product. height of
BxC A rectangular
There is a famous way of solid
area of parallelogram =
making a scalar out of three
magnitude of BxC.
vectors. It is illustrated in
figure 2-2, where the vectors C
  
A , B and C form the three
independent sides of a B
parallelopiped. The cross
  Figure 2-2. A parallelepiped, with its sides defined by
product of B and C gives vectors A, B and C. The area of the parallelogram forming
the area of the base of the the base of this solid is equal to BC sin , where  is the
parallelopiped (a angle between B and C. This is just the magnitude of the
parallelogram), and dotting cross product BxC. When BxC is dotted into A, the area of

with A gives the volume: the base is multiplied by the height of the solid, giving its
volume.   
Volume  A  B  C . (2-31)
Putting in the forms of the dot and cross product using , we have
2-8
  
Volume  A  B  C
  
 A BC
 
 
 Ai B  C i .  (2-33)
 Ai ijk B j Ck
  ijk Ai B j Ck
There is an identity involving the triple scalar product which is easy to demonstrate from
this form:
       
A  B  C  B  C  A  C  A B . (2-34)
In the geometrical interpretation of the triple scalar product, these three forms correspond
to the three possible choices of which side of the parallelepiped to call the base (see
figure 2-2).
F. The triple vector product.
There is another way to combine three vectors, this time giving another vector:
   
 
D  A  B  C (triple vector product) (2-35)
In tensor notation this becomes
   
   
D i  A B  C i 
 

  ijk A j B  C k  (2-36)
  ijk A j klm Bl Cm
This is not very encouraging. It is not simple, and furthermore it conjures up the prospect
   
 
of more cross products. Do we have to live in dread of D  A  B  C and all of her big 
sisters?
The Epsilon Killer. Happily there is a solution. There is an identity which guarantees
that there will never be more than one ijk in an expression, by reducing a product of two
epsilons to Kronecker deltas. Here it is:
 ijk  ilm   jl  km   jm kl (2-37)
This is the epsilon killer! Here are the important structural features. There are two
epsilons, with the first index of each one the same, so there is a sum over that index. The
other four indices (two from each epsilon) are all different, and so are not summed. We
will not prove eq. (2-37), but it is not too difficult, if you just consider all the possibilities
for the indices.
We will now use this identity to simplify the expression for the vector triple product:
  
  
A  B  C i   ijk A j klm Bl Cm definition of cross product
  kij klm A j Bl Cm cyclic permutation of indices
(2-38)
 ( il jm   im jl ) A j Bl Cm use epsilon killer
 Am Bi Cm  Al Bl Ci sum over paired indices of deltas
2-9
The last step has used the "index transferring" property of a sum over one index of a delta
symbol illustrated in equation (2-12). In the last line of (2-38) we can see two sums over
   
paired indices, AmCm  A  C and Al Bl  A  B. This gives
      
    
A  B  C i  Bi A  C  Ci A  B   (2-39)
or, in pure vector form,
        
  
A B  C  B A  C  C A  B    (2-40)
This is sometimes referred to as the "BAC - CAB" identity. It occurs regularly in
advanced mechanics and electromagnetic theory.
PROBLEMS
In the problems below, repeated indices imply summation, according to the Einstein
summation convention. Sum from 1 to 3 unless otherwise stated.
Problem 2-1. Consider  ij and  ijk as defined in the text, for a three-dimensional space.
(a) How many elements does  ij have? How many of them are non-zero?
(b) Give the following values:
 11 
 23 
 31 
(c) How many elements does  ijk have? How many are equal to zero? Which elements
are equal to -1?
(d) Give the following values:
 111 
 321 
 123 
 132 
Problem 2-2. Evaluate the following sums, implied according to the
Einstein Summation Convention.
 ii 
 12 j  j 3 
 12 k  1k 
 1 jj 
Problem 2-3. Consider a possible group of permutations operating on three indices, but
consisting of only the two members
I , P12  (3-25)
(a) Is this set of operators closed under multiplication? Justify your answer.
2 - 10
(b) Is this set of operators Abelian? Justify your answer.
Problem 2.4. Consider a possible group of permutations operating on three indices, but
consisting of only the four members
I , P12 , P13 , P23  (3-25)
(a) Is this set of operators closed under multiplication? Explain your answer.
(b) Is this set of operators Abelian? Explain your answer.
Problem 2.5. Consider the full permutation group, operating on three indices.
(a) Is the group Abelian? Explain your answer.
(b) What about the subgroup consisting of just the two circular permutations (and the
identity)? Explain your answer.
[You might approach these questions by simply trying two successive permutations, and
then reversing the order.]
Problem 2-6. Assume that the cross product D  A  B is defined by the relation
 
Di  A  B   ijk Aj Bk .
i
Show using tensor notation (rather than writing out all the terms) that the magnitude of
this vector agrees with the geometrical definition of the cross product. That is, show that
D has a magnitude equal to |ABsin|. [Hint: Evaluate D  D using the ''epsilon-killer''
identity.]
Problem 2-7. Use tensor notation (rather than writing out all the terms) to prove the
  
following identity for three arbitrary vectors A , B , and C .
        
A  B  C  B  C  A  C  A B
Problem 2-8. (a) Use tensor notation (rather than writing out all the terms) to prove the
 
following identity for two arbitrary vectors A and B .
  
A  A B  0
[Hint: Use the symmetries of the Levi-Civita symbol to prove that

     
A  A  B   A  A  B . This
implies that both sides of the equation are equal to zero.]
(b) Make a geometrical argument, based on the direction of A  B , to show that this
identity has to be satisfied.
Problem 2-9. Let eî , i  1,2,3 be the usual three directional unit vectors of a 3-
dimensional Cartesian coordinate system, satisfying the orthonormality relation
 
eî  eˆ j   ij . In terms of components, A and B can be written as
2 - 11
A  Am eˆm ,
B  B j eˆ j .
Using these definitions for A and B and using tensor notation, show that
 
A  B  Ai Bi .
2 - 12
Chapter 3. Linear Equations and Matrices
A wide variety of physical problems involve solving systems of simultaneous linear

equations. These systems of linear equations can be economically described and
efficiently solved through the use of matrices.
A. Linear independence of vectors.
Let us consider the three vectors E1 , E2 and E3 given below.
 2
 
E1   1    Ei 1 , i  1,3
1
 
 4
 
E2   3    Ei 2 , i  1,3 (3-1)
 3
 
 3
 
E3   2    Ei 3 , i  1,3
1
 
These three vectors are said to be linearly dependent if there exist constants c1, c2, and c3,
not all zero, such that
c1 E1  c2 E2  c3 E3   Eij c j   0 (3-2)
The second form in the equation above gives the i-th component of the vector sum as an
implied sum over j, invoking the Einstein summation convention. Substituting in the
values for the components Eij from equation (3.1), this vector equation is equivalent to
the three linear equations:
2c1  4c 2  3c3  0
c1  3c 2  2c3  0 (3-3)
c1  3c 2  c3  0
These relations are a special case of a more general mathematical expression,
 
Ac  d (3-4)
 
where c and d are vectors represented as column matrices, and A is a sort of operator
which we will represent by a square matrix.
The goal of the next three chapters is to solve equation (3-4) by the audacious leap of
faith,
3-1
 
c  A 1d (3-5)
.What is A 1 ? Dividing by a matrix!?! We will come to this later.
B. Definition of a matrix.
A matrix is a rectangular array of numbers. Below is an example of a matrix of

dimension 3x4 (3 rows and 4 columns).
2 1 1 1 
 
A    1  2 2 1   Aij , i  1,3, j  1,4 (3-6)
1 
1  1
 2
We will follow the "RC convention" for numbering elements of a matrix, where Aij is the
element of matrix A in its i-th row
and j-th column. As an example, in
the matrix above, the elements which
are equal to -1 are A13, A21, and A34.
C. The transpose of a matrix.
The transpose A T of a matrix A is

obtained by drawing a line down the
diagonal of the matrix and moving
each component of the matrix across
the diagonal to the position where its
image would be if there were a mirror mirror line
along the diagonal of the matrix:
Figure 3-1. The transpose operation moves
This corresponds to the interchange elements of a matrix from one side of the
of the indices on all the components diagonal to the other.
of A :
Aij  A ji
T
Transpose (3-7)
Example: Calculate the transpose of the square matrix given below:
 2 4 3
 
A   1 3 2 (3-8)
1 3 1
 
Solution:
 2 4 3  2 1 1
  transpose  
 1 3 2     4 3 3  (3-9)
1 3 1  3 2 1
   
3-2
D. The trace of a matrix.
A simple property of a square matrix is its trace, defined as follows:

Tr ( A )  Aii Trace (3-10)
This is just the sum of the diagonal components of the matrix.
Example: Find the trace of the square matrix given in (3-8).
Solution:
 2 4 3
 
Tr  1 3 2   A11  A22 A33  2  3  1  6 (3-11)
 1 3 1
 
E. Addition of Matrices and Multiplication of a Matrix by a Scalar.
These two operations are simple and obvious – you add corresponding elements, or
multiply each element by the scalar. To add two matrices, they must have the same
dimensions.
A  B ij  Aij  Bij Addition (3-12)
cA  ij  cAij Multiplication by a Scalar (3-13)
F. Matrix multiplication.
One of the important operations carried out with matrices is multiplication of one
matrix by another. For any two given matrices A and B the product matrix C  A B
can be defined, provided that the number of columns of the first matrix equals the number
of rows of the second matrix. Suppose that this is true, so that A is of dimension p x n,
and B is of dimension n x q. The product matrix is then of dimension p x q.
The general rule for obtaining the elements of the product matrix C is as follows:
C  AB
n
Cij   Aik Bkj , i  1, p, j  1, q Matrix Multiplication (3-14)
k 1
Cij  Aik Bkj (Einstein's version)
3-3

This illustrated below, for A a 3 x 4 matrix and B a 4 x 2 matrix.
AB  C
1 3
 2 1 1 1     3 x 
   2 2    (3-15)
 1 2 2 1   2 1   y 8 
 1 2 1 1   z 
  1 1   0
 
Example: Calculate the three missing elements x. y, and z in the result matrix
above.
Solution:
x  C12
4
  A1n Bk 2
k 1
 2 * 3  1 * 2  (1) * (1)  1 *1
 10;
y  C 21
 (1) * 1  (2) * (2)  2 * 2  1 * (1)
6
z  1 * 3  2 * 2  1 * (1)  (1) *1
5
There is a tactile way of remembering how to do this multiplication, provided that the
two matrices to be multiplied are written down next to each other as in equation (3-15).
Place a finger of your left hand on Ai1, and a finger of your right hand on B1j. Multiply
together the two values under your two fingers. Then step across the matrix A from left

to right with the finger of your left hand, simultaneously stepping down the matrix B
with the finger of your right hand. As you move to each new pair of numbers, multiply
them and add to the previous sum. When you finish, you have the value of Cij. For
instance, calculating C21 in the example of equation (3-15) this way gives -1 + 4 + 4 - 1 =
6.
Einstein Summation Convention. For the case of 3x3 matrices operating on 3-

component column vectors, we can use the Einstein summation convention to write
matrix operations:
matrix multiplying vector:
 
y  Ax
(3-16)
y i  Aij x j
matrix multiplying matrix:
3-4
C  AB
(3-17)
C ij  Aik Bkj
The rules for matrix multiplication may seem complicated and arbitrary. You might
ask, "Where did that come from?" Here is part of the answer. Look at the three
simultaneous linear equations given in (3-3) above. They are precisely given by

multiplying a matrix A of numerical coefficients into a column vector c of variables, to
wit:
 
A c  0;
 2 4 3  c1   0 
     (3-18)
 1 3 2  c2    0 
 1 3 1  c   0 
  3   
  
The square matrix above is formed of the components of the three vectors E1 , E2 and E 3
, placed as its columns:
  
     
A  E1 E2 E3 (3-18a)
This matrix representation of a system of linear equations is very useful.
Exercise: Use the rules of matrix multiplication above to verify that (3-18) is
equivalent to (3-3).
G. Properties of matrix multiplication.
Matrix multiplication is not commutative. Unlike multiplication of scalars, matrix

multiplication depends on the order of the matrices:
AB  BA (3-19)
Matrix multiplication is thus said to be non-commutative.
Example: To investigate the non-commutativity of matrix multiplication, consider

the two 2x2 matrices A and B :
1 2
A   ,
4 3 
(3-20)
2 1
B   
1  2 
Calculate the two products A B and B A and compare.
3-5
Solution:
 1 2  2  1   4  5 
A B        (3-21)
 4 3  1  2  11  10
but
 2  1  1 2    2 1 
B A        (3-22)
 1  2  4 3    7  4 
The results are completely different.
Other properties. The following properties of matrix multiplication are easy to verify.
A B C   A B C Associative Property (3-23)
A B  C   A B  A C Distributive Property (3-24)
H. The unit matrix.
A useful matrix is the unit matrix, or the identity matrix,
 1 0 0
 
I   0 1 0    ij  (3-25)
 0 0 1
 
This matrix has the property that, for any square matrix A ,
IA  A I  A (3-26)
I has the same property for matrices that the number 1 has for scalar multiplication.
This is why it is called the unit matrix.
I. Square matrices as members of a group. The rules for matrix multiplication given
above apply to matrices of arbitrary dimension. However, square matrices (number of
rows equals the number of columns) and vectors (matrices consisting of one column)
have a special interest in physics, and we will emphasize this special case from now on.
The reason is as follows: When a square matrix multiplies a column matrix, the result is
another column matrix. We think of this as the matrix "operating" on a vector to produce
another vector. Sets of operators like this, which transform one vector in a space into
another, can form groups. (See the discussion of groups in Chapter 2.) The key
characteristic of a group is that multiplication of one member by another must be defined,
in such a way that the group is closed under multiplication; this is the case for square
matrices. (An additional requirement is the existence of an inverse for each member of
the group; we will discuss inverses soon.)
3-6
Notice that rotations of the coordinate system form a group of operations: a rotation of
a vector produces another vector, and we will see that rotations of 3-component vectors
can be represented by 3x3 square matrices. This is a very important group in physics.
J. The determinant of a square matrix.
For square matrices a useful scalar quantity called the determinant can be calculated.
The definition of the determinant is rather messy. For a 2 x 2 matrix, it can be defined as
follows:
 a b  a b
det      ad  bc (3-27)
 c d  c d
That is, the determinant of a 2 x 2 matrix is the product of the two diagonal elements
minus the product of the other two. This can be extended to a 3x3 matrix as follows:
a b c
e f d f d e
d e f a b c
h i g i g h (3-28)
g h i
 a(ei  hf )  b(di  gf )  c(dh  ge)
Example: Calculate the determinant of the square matrix A of eq. (3-18) above.
Result:
 2 4 3
 
 1 3 2   2(3  6)  4(1  2)  3(3  3) (3-29)
1 3 1
 
 2
3-7
m={{2,4,3},{1,3,2},{1,3,1}} defining a matrix

MatrixForm[m] display as a rectangular array
cm multiply by a scalar
a.b matrix product
Inverse[m] matrix inverse
MatrixPower[m,n] nth power of a matrix
Det[m] determinant
Tr[m] trace
Transpose[m] transpose
Eigenvalues[m] eigenvalues
Engenvectors[m] eigenvectors
Eigenvalues[N[m]],Eigenvectors[N[m]] numerical eigenvalues and eigenvectors
m=Table[Random[],{3},{3}] 3x3 matrix of random numbers
Table 3-1. Some mathematical operations on matrices which Mathematica can carry
out.
The form (3-28) for the determinant can be put in a more general form if we make a few
definitions. If A is a square nxn matrix, its (i,j)-th minor Mij is defined as the
determinant of the (n-1)x(n-1) matrix formed by removing the i-th row and j-th column of
the original matrix. In (3-28) we see that a is multiplied by its minor, -b is multiplied by
the minor of b, and c is multiplied by its minor. We can then write, for a 3 x 3 matrix A ,
3
A   A1 j  1
1 j
M1 j (3-30)
j 1
[Notice that j occurs three times in this expression, and we have been obliged to back
away from the Einstein summation convention and write out the sum explicitly.] This
expression could be generalized in the obvious way to a matrix of an arbitrary number of
dimensions n, merely by summing from j = 1 to n .
The 3 x 3 determinant expressed with the Levi-Civita tensor.
Loving the Einstein summation convention as we do, we are piqued by having to give it
up in the preceding definition of the determinant. For 3x3 matrices we can offer the
following more elegant definition of the determinant. If we write out the determinant of a
3 x 3 matrix in terms of its components, we get
A  A11 A22 A33  A11 A23 A32  A12 A21 A33  A12 A23 A31
(3-31)
 A13 A21 A32  A13 A22 A31
Each term is of the form A1iA2jA3k, and it is not too hard to see that the terms where (ijk)
are an even permutation of (123) have a positive sign, and the odd permutations have a
negative sign. That is,
3-8
A  A1i A2 j A3k  ijk (3-32)

(Yes, with the Einstein summation convention in force.)
The Meaning of the determinant. The determinant of a matrix is at first sight a rather
bizarre combination of the elements of the matrix. It may help to know (more about this
later) that the determinant, written as an absolute value, A , is in fact a little like the
"size" of the matrix. We will see that if the determinant of a matrix is zero, its operation
destroys some vectors - multiplying them by the matrix gives zero. This is not a good
property for a matrix, sort of like a character fault, and it can be identified by calculating
its determinant.
K. The 3 x 3 determinant expressed as a triple scalar product.
You might have noted that (3-31) looks a whole lot like a scalar triple product of three
vectors. In fact, if we define three vectors as follows:
A1   A11 A12 A13  ,

A2   A21 A22 A23  , (3-33)
A3   A31 A32 A33  ,
then we can write
A  A1i A2 j A3k  ijk

(3-34)
 A1  A2  A3
and the matrix A can be thought of as being composed of three row vectors:
  A11 A12 A13  

 
A    A21 A2 2 A2 3  (3-35)
  A3 A3 A3  
 1 2 3 
Thus taking the determinant is always equivalent to forming the triple product of the
three vectors composing the rows (or the columns) of the matrix.
L. Other properties of determinants.
Here are some properties of determinants, without proof.
Product law. The determinant of the product of two square matrices is the
product of their determinants:
det( A B )  det( A ) det(B ) (product law) (3-36)

Transpose Law. Taking the transpose does not change the determinant of a
matrix:
3-9
det( A T )  det( A ) (transpose law) (3-37)

Interchanging columns or rows. Interchanging any two columns or rows of a
matrix changes the sign of the determinant.
Equal columns or rows. If any two rows of a matrix are the same, or if any two
columns are the same, the determinant of the matrix is equal to zero.
M. Cramer's rule for simultaneous linear equations.
Consider two simultaneous linear equations for unknowns x1 and x2:

 
Ax  C (3-38)
or
 A11 A12  x1   C1 
      (3-39)
 A21 A22  x 2   C 2 
or
A11 x1  A12 x2  C1
(3-40)
A21 x1  A22 x2  C2
The last two equations can be readily solved algebraically for x1 and x2, giving
C1 A22  C 2 A12
x1  ,
A11 A22  A21 A12
(3-41)
A C  A21C1
x 2  11 2
A11 A22  A21 A12
by inspection, the last two equations are ratios of determinants:
C1 A12
C2 A22
x1  ,
A
(3-42)
A11 C1
A21 C 2
x2 
A
This pattern can be generalized, and is known as Cramer's rule:
3 - 10
A11 A12 . . A1,i 1 C1 A1,i 1 . . A1n

A21 A22 . . A2,i 1. C2 A2,i 1 . . A2 n
. . . . . C3 . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . C n 1 . . . .
An1 An 2 . . An ,i 1 Cn An ,i 1 . . Ann
xi  (3-43)
A
As an example of using Cramer's rule, let us consider the three simultaneous linear
equations for x1, x2, and x3 which can be written
 
A x  C;
 2 4 3  x1   1 
     (3-44)
 1 3 2  x 2    1 
 1 3 1  x   4 
  3   
First we calculate the determinant A :
2 4 3
A 1 3 2
1 3 1
3 2 1 2 1 3
2 4 3  2(3)  4(1)  3(0) (3-45)
3 1 1 1 1 3
 2
The values for the xi are then given by
3 - 11
1 4 3
1 3 2
4 3 1 1(3)  4(7)  3 9
x1    1,
A 2
2 1 3
1 1 2
1 4 1 2(7)  1(1)  33
x2    2, (3-46)
A 2
2 4 1
1 3 1
1 3 4 2(9)  4(3)  1(0)
x1    3,
A 2
So, the solution to the three simultaneous linear equations is
 1 
  
x  2  (3-47)
  3
 
Is this correct? The check is to substitute this vector x into equation (3-44), carry out the
multiplication of x by A , and see if you get back C .
N. Condition for linear dependence.
Finally we return to the question of whether or not the three vectors E1 , E2 and E3
given in section A above are linearly independent. The linear dependence relation (3-2)
can be re-written as
   
x1 E1  x2 E2  x3 E3  0 , (3-2a)
a special case of equation (3-38) above,
 
Ax  C (3-38)

where the constants C are all zero. From Cramer's rule, equation (3-43), we can
calculate xi, and they will all be zero! This seems to say that the three vectors are linearly
independent no matter what. But the exception is when A = 0; in that case, Cramer's
rule gives zero over zero, and is indeterminate. This is the situation (this is not quite a
proof!) where the xi in (3.2a) above are not zero, and the vectors are linearly dependent.
We can summarize this condition as follows:
3 - 12
     
        
 E1   E2   E3   0  E1 , E 2 , E3 are linearly dependent (3-48)
     
     
 
Example: As an illustration, let us take the vectors E1 and E 3 from eq. (3-1), and

construct a new vector E4 by taking their sum:
       2  3  5
           
 E 4    E1    E3    1    2    3  (3-49)
       1  1  2
           
      
E1 , E 3 and E4 are clearly linearly dependent, since E1 + E 3 - E4 = 0 . We form the
  
matrix A by using E1 , E 3 and E4 for its columns:
       2 3 5
         
A    E1   E3   E4     1 2 3  (3-50)
       1 1 2
       
Now the acid test for linear dependence: calculate the determinant of A :
2 3 5
A  1 2 3  2(1)  3(1)  5(1)  0 (3-51)
1 1 2
  
This confirms that E1 , E 3 and E4 are linearly dependent.
There is a final observation to be made about the condition (3-48) for linear dependence.
The determinant of a 3x3 matrix can be interpreted as the cross product of the vectors
forming its rows or columns. So, if the determinant is zero, it means that
   
E1  E2  E3  0 , and this has the following geometrical interpretation: If E1 is
   
perpendicular to E2  E3 , it must lie in the plane formed by E2 and E 3 ; if this is so, it
 
is not independent of E2 and E 3 .
O. Eigenvectors and eigenvalues.
For most matrices A there exist special vectors V which are not changed in direction
under multiplication by A :
AV  V (3-52)
In this case V is said to be an eigenvector of A and  is the corresponding eigenvalue.
In many physical situations there are special, interesting states of a system which are
invariant under the action of some operator (that is, invariant aside from being multiplied
by a constant, the eigenvalue). Some very important operators represent the time
3 - 13
evolution of a system. For instance, in quantum mechanics, the Hamiltonian operator

moves a state forward in time, and its eigenstates represent "stationary states," or states of
definite energy. We will soon see examples in mechanics of coupled masses where the
eigenstates describe the normal modes of motion of the system. Eigenvectors are in
general charismatic and useful.
So, what are the eigenstates and eigenvalues of a square matrix? The eigenvalue
equation, (3-52) above, can be written
 A   I V  0 (3-53)
or, for the case of a 3x3 matrix,
 A11   A12 A13  V1   0 
    
 A21 A22   A23  V2    0  (3-54)
 A    
A33    V3   0 
 31 A32
As discussed in the last section, the condition for the existence of solutions for the
variables V1 ,V2 ,V3  is that the determinant
A11   A12 A13
A21 A22   A23 (3-55)
A31 A32 A33  
vanish. This determinant will give a cubic polynomial in the variable , with in general
three solutions,  (i )    (1) ,  (2) ,  (3)  . For each value, the equation
 A     I V    0
i i
(3-56)
can be solved for the components of the i-th eigenvector V (i ) .
Example: Rotation about the z axis.
In Chapter 1 we found the components of a vector rotated by an angle  about the

z axis. Including the fact that the z component does not change, this rotation can
be represented as a matrix operation,
x   Rz   x (3-57)
where
 cos  sin  0 
 
Rz      sin  cos  0  (3-58)
 0 1 
 0
Now, based on the geometrical properties of rotating about the z axis, what vector
will not be changed? A vector in the z direction! So, an eigenvector of Rz   is
0
1  
V  0 (3-59)
1
 
3 - 14
Try it out:
 cos  sin  0  0   0 
    
Rz  V    sin  cos  0  0    0 
1
(3-60)
 0 1    
 0  1   1 
V1 is an eigenvector of Rz   , with eigenvalue  1  1 .
PROBLEMS
Problem 3-1. Consider the square matrix

 1  1 3
 
B   2 3 1 .
 1  2 1
 
(a) Calculate the transpose of B .

(b) Verify by direct calculation that det B   det B T .
Problem 3-2. Consider the two matrices

 2 4 3
 
A   1 3 2
 1 3 1
 
and
 1  1 3
 
B   2 3 1 .
 1  2 1
 
If C is the product matrix, C  A B , verify by direct calculation that

det C   det A B   det A  det B .
Problem 3.2a. There is a useful property of the determinant, stated in Problem 3-2
above, which is rather hard to prove algebraically:
det C   det A B   det A  det B ,
where A , B and C are square matrices of the same dimension. Let's consider a
numerical "proof" of this theorem, using Mathematica. (See Table 3-1 for a collection of
useful Mathematica operations.)
(a) Fill two 4x4 matrices - call them A and B - with random numbers.
(b) Calculate their determinants.
(c) Calculate the product matrix C  A B , and the determinant of C .
(d) See if the theorem above is satisfied in this case.
3 - 15
(e) Another theorem concerning square matrices is stated in Problem 3-1 above. Test
it, for both of your random matrices.
Problem 3-3. Using tensor notation and the Einstein summation convention, prove the
following theorem about the transpose of the product of two square matrices: If C is the
product matrix, C  A B , then
C T  B T AT .
[As a starting point, under the Einstein Summation Convention,
Cij  Ail Blj .]
Problem 3-4. Calculate det I  and Tr I  , where I is the nxn identity matrix.
Problem 3-5. Starting with the definition (3-30) of the determinant, but generalized to n
dimensions by carrying out the sum over j from 1 to n, use the fact that interchanging two
rows of a matrix changes the sign of its determinant to prove the following expression for
the determinant of a n x n matrix:
n
A   Aij  1
i j
M ij ,
j 1
where i can take on any value from 1 to n. This theorem says that, while the determinant
is usually calculated as a sum of terms with the first factor coming from the top row of
the matrix, that the first factor can in fact be taken from any chosen row of the matrix, if
the correct sign is factored in.

Problem 3-6. Verify by multiplying out A x that the xi’s of equation (3-47) are a
solution of equation (3-44).
Problem 3-7. Consider the system of linear equations below for x1, x2, and x3.
x1  x 2  x3  2
 x1  x 2  x3  0 .
x1  x 2  x3  6
Consider this to be a matrix equation of the form
 
Ax  c .
First, write down the matrix A . Then, using Cramer's rule, solve for x1, x2, and x3.
 
Finally, as a check, multiply out A x and compare it to c .
Problem 3-8. Consider the system of linear equations below for the components (x1, x2,
x3 , x4) of a four-dimensional vector,
3 - 16
x1  x 2  2
 x1  x 2  x3  3
.
 x 2  x3  x 4  3
 x3  x 4  2
. These equations can be represented as a matrix equation of the form
 
Ax  c ,
 
with A a 4x4 matrix, and x and c 4-element vectors.)

(a) Write down the matrix A and the vector c .
(b) Use Mathematica to solve the system of equations; that is, find the unknown vector
x . (Hint: first calculate the inverse of the matrix A .)
 
(c) Check the solution by multiplying x by A and comparing to c . (Do this check by
hand, not with Mathematica.)
Problem 3-9. (a) Show that the three vectors given in equation (3-1) of the text are
linearly independent.
  
(b) Make a new E 3 by changing only one of its three components, such that E1 , E2 , and

E 3 are now linearly dependent.
Problem 3-10. Consider the rotation matrix discussed in the chapter,

 cos  sin  0 
 
Rz      sin  cos  0  .
 0 1 
 0
(a) Consider the vector
1
 
V2   i  .
0
 
Calculate Rz   V2 , and thus show that V2 is an eigenvector of Rz   . [Here the
second component of V2 is i, the complex number whose square is equal to -1.]
(b) What is the eigenvalue corresponding to the eigenvector V2 of part (a)?
Problem 3-11. In this chapter we defined addition of matrices and multiplication of a

matrix by a scalar. Do square matrices form a vector space V? Refer to Table 1-1 for the
formal properties of a vector space.
(a) Demonstrate whether or not this vector space V is closed under addition and
multiplication by a scalar.
(b) Do the same for the existence of a zero and the existence of a negative.
(c) Show that property (1-36) is satisfied. [Hint: This is most economically done using
index notation.]
(d) A vector space without a metric is not very interesting. Could the matrix product as
we have defined it be used as an inner product? Discuss.
3 - 17
3 - 18
Chapter 4. Practical Examples.

In this chapter we will discuss solutions to two physics problems where we make use of
techniques discussed in this book. In both cases there are multiple masses, coupled to
each other so that their motions are not independent. This leads to coupled linear
equations, which are naturally treated using matrices.
A. Simple harmonic motion - a review.
We are going to discuss masses coupled by springs and a compound pendulum. Let us
start by reviewing the mathematical description of the oscillations of a single mass on a
spring or a simple pendulum.
Figure 4-1 shows the two simple systems which form the basis for the more complex
systems to be studied.
(a) (b)
x g
L
kx
m mgx/L
k
m
x
Figure 4-1. (a) A mass on a spring. (b) A simple pendulum. In both cases there is a restoring
force proportional to the displacement (for small displacements in the case of the pendulum). In the
analysis of these systems we will ignore vertical forces, which just cancel.
In each case there is a restoring force proportional to the displacement:
F  displacement (4-1)
If we combine this with Newton's law of motion,
 applied forces  ma (4-2)
we obtain
d 2x
acceleration  2    some constant  / m (4-3)
d t
or
d 2x
2
 02 x SHM (4-4)
dt
k
You can easily show that for the mass on a spring, o  , and for the pendulum,
m
g
0  , two famous relations.
L
4-1
So, how do we find a function x  t  satisfying equation (4-4)? Its graphical interpretation
is the following: the second derivative of a function gives the curvature, with a positive
second derivative making the function curve up, negative, down. So, equation (4-4) says
that the function always curves back towards the x = 0 axis, as shown in figure 4-2. Look
like a sine wave?
x>0  curves down d2x

   constant  x
x(t)
2
dt
t
x<0  curves up
Figure 4-2. The differential equation makes the curve x(t) keep curving back towards the axis,
like, for instance, a sine wave.
The equation (4-4) cannot be simply integrated to give x  t  . Too bad. Second best is to
do what physicists usually do - try to guess the solution. What familiar functions do we
know which come back to the same form after two derivatives?
sin t  sinh t 
cos t  cosh t 
   
f ''   f :  it  f ''   f :  t 
e  e 
eit  e  t 
   
The first set of functions are the ones to use here, though they are closely related to the
second set. The general solution to equation (4-4) can be written as
x  t   C cos 0t    (4-5)
where C and  are arbitrary constants. (Second-order differential equations in time
always leave two constants to be determined from initial conditions.) It is fairly easy to
show (given as a homework problem) that the following forms are equivalent to that
given in equation (4-5).
x  t   A sin 0t  B cos 0t , A and B real constants
x  t   Dei0t  Ee  i0t , E *  D complex constants (4-6)
x(t )  Re Fe i0t
, F a complex constant
4-2
It turns out that the exponential forms are the easiest to work with in many calculations,
and the very easiest thing is to set
x(t )  aei 0t . (4-7)
This looks strange, since observables in physics have to be real. But what we do is to use
this form to solve any (linear) differential equation, and take the real part afterwards. It
works. We will use this form for the general solution in the examples to follow.
B. Coupled oscillations - masses and springs.
Many complex physical systems display the phenomenon of resonance, where all parts of
the system move together in periodic motion, with a frequency which depends on inertial
and elastic properties of the system. The simplest example is a single point mass
connected to a single ideal spring, as shown in figure 4-1a. The mass has a sinusoidal
displacement with time which can be described by the function given in equation (4-7),
k
with  0  as the resonant frequency of the system, and a a complex amplitude. It is
m
understood that the position of the mass is actually given by the real part of the
expression (4-7); thus the magnitude of a gives the maximum displacement of the mass
from its equilibrium position, and the phase of a determines the phase of the sinusoidal
oscillation.
A system of two masses. A somewhat more complicated system is shown in figure 4-3.
Here two identical masses are connected to each other and to rigid walls by three
identical springs. The motions of masses 1 and 2 are described by their respective
displacements x1(t) and x2(t) from their equilibrium positions. The magnitude of the
force exerted by each spring is equal to k times the change in length of the spring from
the equilibrium position of the system, where it is assumed that the springs are
unstretched. For instance, the force exerted by the spring in the middle is equal to k(x2 -
x1). Taking the positive direction to be to the right, its force on m1 would be equal to +
k(x2 - x1), and its force on m2 would be equal to -k(x2 - x1). Newton's second law for the
two masses m1 and m2 then leads to the two equations
x1 x2
m m
k k k
Figure 4-3. System of two coupled oscillators.
4-3
mx1  kx1  k ( x2  x1 )
, (4-8)
mx2  k ( x2  x1 )  kx2
or, in full matrix notation,
F  mx  kKx (4-8a) (generalized Hooke's law).
where
 2 1
K   (4-8b)
 1 2 
In the absence of external forces the masses will vibrate back and forth in some
complicated way. A mode of vibration where both masses move at the same frequency,
in some fixed phase relation, is called a normal mode, and the associated frequencies are
referred to as the resonant frequencies of the system. Such a motion is described by
x1 (t )  a1eit
, (4-9)
x2 (t )  a2 eit
or
x (t )  aeit , (4-9a)
Note that the frequency  is the same for both masses, but the amplitude and phase,
determined by a1 or a2 , is in general different for each mass.
d 2 it
Substituting (4-9a) into (4-8a), and using the fact that2
e   2 e it , we obtain two
dt
coupled linear equations for the two undetermined constants of the motion a1 or a2 :
2a1  a2   a1
, (4-10)
a1  2a2   a2
or
Ka   a , (4-10a)
Here we have introduced a dimensionless constant

2
 , (4-11)
02
where  is the angular frequency of this mode of oscillation, and
k
0  (4-12)
m
is a constant characteristic of the system, with the dimensions of an angular frequency.
Note that 0 is not necessarily the actual frequency of any of the normal modes of the
system; the frequency of a given normal mode will be given by    0 1 / 2 .
Equation (4-10a) is the eigenvalue equation for the matrix K , and the eigenvalues are
determined by re-writing (4-13) as
 K  I  a  0 . (4-16)
4-4
This system of linear equations will have solutions when the determinant of the matrix
K   I is equal to zero. This leads to the characteristic equation:
2 1
K  I 
1 2  
    2 1
2
. (4-17)
  2  4  3
 (  1)(  3)  0
There are thus two values of  for which equation (4-16) has a solution:  (1)  1 and
 (2)  3 , corresponding to frequencies of oscillation  (1)  0 and  (2)  30 . We will
investigate the nature of the oscillation for each of these resonant frequencies.
Case 1.  (1)  0 . This is the same frequency as for the single mass-on-a-spring of
figure 4-1a. How can the interconnected masses resonate at this same frequency? A
good guess is that they will move with a1 = a2, so that the distance between m1 and m2 is
always equal to the equilibrium distance, and the spring connecting m1 and m2 exerts no
force on either mass. To verify this, we substitute  = 1 into equation (4-16) and solve
for a1 and a2:
2 1 
 K   I  a   1 2    a
 
. (4-18)
 1 1
 a  0
 1 1 
giving two equations for two unknowns:
a1  a2  0
. (4-19)
a1  a2  0
Both equations tell us the same thing:
a1  a2 . (4-20)
Both masses have the same displacement at any given time, so the spring joining them
never influences their motion, and their resonant frequency is the same as if the central
spring was not there.
Case 2.  (2)  30 . This frequency is higher than for the single mass-on-a-spring of
figure 4-1a, so the middle spring must be stretched in such a way as to reinforce the effect
of the outer springs. We might guess that the two masses are moving in opposite
directions. Then as they separated, the middle spring would pull them both back towards
the center, while the outside springs pushed them back towards the center. The
acceleration would be greater and the vibration faster. We can see if this is right by
substituting  = 3 into equation (4-16) and solve for a1 and a2:
4-5
 2   (2) 1 
 K   (2)
I   1 2   (2)  a
a 
 
. (4-21)
 1 1
 a  0
 1 1
giving the equations
a1  a2  0
, (4-22)
a1  a2  0
confirming that
a1   a2 . (4-23)
Thus we have the following eigenvalues and eigenvectors for the matrix K :
1
a 1  1 1   1  1
2
. (4-24)
1
 2
a  1 1   2
 3
2
The equations above only determined the ratios of components of a ; I have added the
factor of 1/ 2 to normalize the vectors to a magnitude of 1.
Three interconnected masses. With three masses instead of two, at positions x1, x2 and
x1 x2 x3
m m m
k k k k
Figure 4-4. System of three coupled oscillators.
x3, the three coupled equations still have the form of equation (4-13), with
 2 1 0 
 
K   1 2 1 (4-25)
 0 1 2 
 
and characteristic equation
2   1 0
K   I  1 2   1  0 . (4-26)
0 1 2  
It will be left to the problems to find the three normal-mode frequencies and to determine
the way the masses move in each case.
Systems of many coupled masses. A long chain of masses coupled with springs is a
commonly used model of vibrations in solids and in long molecules. It would not be too
4-6
hard to write down the matrix K corresponding to such a long chain. However,
analyzing the solutions requires more advanced methods which we have not yet
developed.
C. The triple pendulum
There is an interesting problem which illustrates the power (and weaknesses) of the
trained physicist. Consider three balls, suspended from a fixed point, as shown in figure
(a) (b) (c)
L1
g
1
T2
m1
x1 2
L2
2
m2 x2
x2
L3 mg T3
x3 3
m3
3
Figure 4-5. Three balls, forming a compound pendulum. (a) Hanging from the ceiling, at rest. (b)
Oscillating in the first normal mode. (c) Free-body diagram for ball 2.
4-5a. If the balls are displaced from equilibrium and released, they can move in rather
complicated ways. A further amusing problem is to imagine making the point of support
move back and forth, or in a circle. We may not get quite this far, for lack of time.
To make a tractable problem, take the usual scandalous physics approach of simplifying
the problem, as follows:
1. Consider only motion in a plane, consisting of the vertical direction and a transverse
direction.
2. Consider only small displacements. The idea is to be able to make the small-angle
approximation to trigonometric functions.
3. Take all three masses to be equal, given by m, and take the three string lengths to be
equal, given by L.
Now the problem looks like figure 4-5b. The three variables of the problem are the
transverse positions of the three balls. The forces on the three balls are not too hard to
4-7
calculate. For instance, the free-body diagram for ball 2 is shown in Figure 4-5c. In the
small-angle approximation,
 2  sin  2   x2  x1  / L
. (4-27)
3  sin 3   x3  x2  / L
Also, reasoning that the string tensions mainly just hold the balls up, they are given by
T1  3mg
T2  2mg . (4-28)
T3  mg
The vertical forces automatically cancel. For forces in the horizontal direction, Newton's
second law for this ball then gives
" ma  F "
mx2  T2 sin  2  T3 sin 3
2mg mg . (4-29)
  x2  x1    x3  x2 
L L
 2m0  x2  x1   m02  x3  x2 
2
Here we have used the fact that a simple pendulum consisting of a mass m on a string of
length L oscillates with an angular frequency of
g
o  . (4-30)
L
Similar reasoning for the other two masses leads to the three coupled equations in three
unknowns,
mx1  3m02 x1  2m02  x2  x1 
mx2  2m02  x2  x1   m02  x3  x2  . (4-31)
mx3  m02  x3  x2 
We now look for normal modes, where

x  aeit . (4-32)
Substituting into equation (4-31) gives a factor of  on the left-hand side, suggesting
2
that we define a dimensionless variable as before,

2
 2 . (4-33)
0
giving
 Ia  Ka . (4-34)
with
 5 2 0 
 
K   2 3 1 . (4-35)
 0 1 1 
 
4-8
This is the classic eigenvector-eigenvalue equation,

Ka   a . (4-36)
(You might want to fill in the steps yourself leading from equation (4-31) to this point.)
In this way, the physical concept of a search for stationary patterns of relative
displacements of the masses translates into the mathematical idea of finding the
eigenvectors of the matrix K .
As with the coupled masses, we write this equation in the form

 5   2 0  lambda0 dlambda
 K   I  a   2 3   1  a  0
 0 0.1
 0 1 1   
 lambda equation
. (4-37) 0 -6
0.1 -4.289
Solutions will exist if and only if the determinant of the matrix 0.2 -2.752
K   I vanishes, leading to the "characteristic equation" for 0.3 -1.383
the eigenvalues, 0.4 -0.176
 5 2 0 0.5 0.875
I  K  2  3 1  0 0.6 1.776
0 1  1 0.7 2.533
0.8 3.152
   5     3   1  1  4    1  0 . 0.9 3.639
 3  9 2  18  6  0 1 4
1.1 4.241
(4-38)
This is a cubic, with three roots, and is hard to solve 1.2 4.368
analytically. There is in principle a closed-form solution, but it 1.3 4.387
is pretty hairy. Here is how Mathematica does it: 1.4 4.304
NSolve[x^3-9x^2+18x-60,x] 1.5 4.125
{{x0.415775},{x2.29428},{x6.28995}} 1.6 3.856
Another pretty good way, however, is just to calculate values 1.7 3.503
using Excel until you get close. In the spreadsheet to the right 1.8 3.072
you can see that the cubic goes through zero somewhere near  1.9 2.569
= 0.4, and again near = 2.2. You can easily make the step 2 2
smaller and pin down the values, as well as finding the third 2.1 1.371
root. The values are given in Table I. 2.2 0.688
2.3 -0.043
2.4 -0.816
2.5 -1.625
2.6 -2.464
4-9
motion eigenvalue  normalized

frequency
(single ball) 1.0000
mode 1 0.4158 0.64487
mode 2 2.2943 1.5147
mode 3 6.2899 2.5080
Table I. Eigenvalues for the three normal modes of the three-ball
system, and the corresponding frequency, given in terms of the
frequency for a single ball on a string of length L.
Next, for each of the three eigenvalues, we must determine the corresponding
eigenvector. This amounts to solving the system of three homogeneous linear equations,
 5   i  2 0 
 
 2 3   i  1  a i   0 . (4-39)
 
 0
 1 1   i  
Here (i) and a i  are the i-th eigenvalue and eigenvector, respectively. For instance, for
the first eigenvalue given above, this gives
 4.5842 2 0 
 
 2 2.5842 1  a 1  0 . (4-40)
 0 1 
 0.5842 
The magnitude of the eigenvector is not determined, since any multiple of the eigenvector
would still be an eigenvector, with the same eigenvalue. So, let's take the first
component of a to be equal to 1. The we can find the ratios a2/a1 and a3/a1 from
 4.5842 2 0  1 
  
 2 2.5842 1  a2   0 . (4-41)
 0 1  
 0.5842  a3 
For instance, the equation from the first line of the matrix is
4.5842 *1  2 * a2  0 . (4-42)
giving
a2  2.2921 . (4-43)
Next, multiply the third line in the matrix by 2.5842 and add it to the second line, to give
4 - 10
 4.5842 2 0  1 
  
 2 0 0.5097  a2   0 . (4-44)
 0 1 0.5842  
  a3 
The equation from the second line is

2  0.5097 a3 . (4-45)
giving
a3  3.9240 . (4-46)
Or,
 1 
1  
a   2.2921  . (4-47)
 3.9240 
 
for the first eigenvector! In this mode, the coordinates of three balls are given by
 x1 (t )   cos 1t 
  1 i1t  
x (t )   x2 (t )   a  e    2.2921 cos 1t  . (4-48)
 x (t )   3.9240 cos  t 
 3   1 
Note that the balls all move in the same direction, in this mode.
The other eigenvectors can be found in a similar way. The exact values are left to the
problems. But figure 4-6 shows the displacements of the balls in the three modes. The
higher the mode (and the higher the frequency), the more the balls move in opposite
directions.
4 - 11
g g g
Figure 4-6. The three normal modes for the triple pendulum. The balls are shown at maximum
displacement, when they are all (momentarily) at rest.
PROBLEMS
Problem 4-1. (a) Using identities from Appendix A, show that
C cos 0t     A sin 0t  B cos 0t
and find A and B in terms of C and .
(b) Using identities from Appendix A, show that
i t i t
C cos 0t     De 0  Ee 0
and find D and E in terms of C and . (Here C is taken to be real.)
Problem 4-2. Find the normal-mode frequencies i , i  1,3 for the problem described
in the text (see fig. 4-4) of three identical masses connected by identical springs. Express
2 k
the frequencies in terms of   2 , where 0  .
0 m
Problem 4-3. Find the normal modes for the problem described in the text (see figure 4-
4) of three masses connected by springs.
Problem 4-4. Consider a system of two masses and three springs, connected as shown in
figure 4-3, but with the middle spring of spring constant equal to 2k.
4 - 12
(a) Try and guess what the normal modes will be - directions of motion of the masses
and frequencies.
(b) Write the equations of motion, find the characteristic equation, and solve it, and so
determine the frequencies of the two normal modes. Compare with your guesses in part
(a).
Problem 4-5. Find the eigenvectors a2 and a3 for the triple pendulum corresponding to
the second and third eigenvalues, 2 and 2. Give a qualitative interpretation, in terms of
the co- or counter-motion of the balls, with respect to the first one.
Problem 4-6. Repeat the analysis of the multiple pendulum in the text, but for two balls,
rather than three. You should determine the two normal-mode frequencies i and the
normal-mode eigenvectors ai In this case it should be possible to find the eigenvalues
exactly, without having to resort to numerical methods. Discuss the solution.
4 - 13
Chapter 5. The Inverse; Numerical Methods

In the Chapter 3 we discussed the solution of systems of simultaneous linear algebraic
equations which could be written in the form
 
Ax  C (5-1)
using Cramer's rule. There is another, more elegant way of solving this equation, using
the inverse matrix. In this chapter we will define the inverse matrix and give an
expression related to Cramer's rule for calculating the elements of the inverse matrix. We
will then discuss another approach, that of Gauss-Jordan elimination, for solving
simultaneous linear equations and for calculating the inverse matrix. We will discuss the
relative efficiencies of the two algorithms for numerical inversion of large matrices.
A. The inverse of a square matrix.
Definition of the inverse. The inverse of a scalar number c is another scalar, say d, such
that the product of the two is equal to 1: cd=1. For instance, the inverse of the number
5 is the number 0.2 . We have defined multiplication of one matrix by another in a way
very analogous to multiplication of one scalar by another. We will therefore make the
following definition.
Definition: For a given square matrix A , the matrix B is said to be the inverse
of A if
BA  AB  I (5-2)
1
We then write B  A .
Notice that we have not guaranteed that the inverse of a given matrix exists. In fact,
many matrices do not have an inverse. We shall see below that the condition for a square
matrix A to have an inverse is that its determinant not be equal to zero.
Use of the inverse to solve matrix equations. Now consider the matrix equation just
given,
 
Ax  C (5-1)
1
We can solve this equation by multiplying on both sides of the equation by A :
 
A 1 A x  A 1C ;
 
I x  A 1C ; (5-3)
 
x  A 1C.
Thus, knowing the inverse of the matrix A lets us immediately write down the solution

x to equation (5-1).
As an example, let us consider a specific example, where A is a 2x2 matrix.
5-1
 
A x  C;
 1 2  x1   4  (5-4)
     
  1 1  x 2   5 
 
If we knew the inverse of A , we could immediately calculate C  A x . In this simple
case, we can guess the inverse matrix. We write out the condition for the inverse,
A A 1  I ;
 1 2  * *  ? ? 
  I ij 
(5-5)
    
  1 1  * *  ? ? 
As a first guess we try to make I12 come out to zero; one possibility is
A A 1  I ;
 1 2  * 2   ? 0 
  I ij 
(5-6)
    
  1 1  *  1   ? ? 
Now we arrange for I21 to be zero:
A A 1  I ;
 1 2 1 2   ? 0 
  I ij 
(5-7)
    
  1 1 1  1  0 ? 
If we now look at the diagonal elements of I , they come out to be I11 = 3 and I22 = -3.
We can fix this up by changing the sign of the (1,2) and (2,2) elements of the inverse, and
by multiplying it by 1/3. So we have
A A 1  I ;
 1 2  1 1  2   1 0 
        I ij ; (5-7)
  1 1  3 1 1   0 1 
1 1  2 
A 1   
3 1 1 
Now that we have the inverse matrix, we can calculate the values x1 and x2:
 
x  A 1C
1 1  2  4 
   
3 1 1  5 
1   6 (5-8)
  
3 9 
  2
  
 3 
So, the solution to the two simultaneous linear equations is supposed to be x1 = -2, x2 = 3.
We will write out the two equations in long form and substitute in.
 1 2  x1   4 
     ;
  1 1  x 2   5 
5-2
 x1  2 x 2  4 
 ;

 1 x  x 2  5 
 2  2 * 3  4 
 ;
 (2)  3  5 
4  4
 .
5  5 
It checks out!
The inverse matrix by the method of cofactors. Guessing the inverse has worked for a
2x2 matrix - but it gets harder for larger matrices. There is a way to calculate the inverse
using cofactors, which we state here without proof:
cof ( A)
 A1 ij  Aji
(5-9)
 1 M ji ( A)
j i

A
(Here the minor Mpq(A) is the determinant of the matrix obtained by removing the p-th
row and q-th column from the matrix A.)
Note that you cannot calculate the inverse of a matrix using equation (5-9) if the matrix
is singular (that is, if its determinant is zero). This is a general rule for square matrices:
A  0  inverse does not exist
Example: Find the inverse of the matrix

 1 2
A    (5-10)
 1 1
Here are the calculations of the four elements of A 1 . First calculate the determinant:
1 2
A   1  (2)  3 (5-11)
1 1
Then the matrix elements:
5-3
A  1

cof11 ( A )

 1 A11
11
1
 ;
11
A A 3
A  1

cof 21 ( A )

 121 A12 2
 ;
12
A A 3
(5-12)
A  1

cof12 ( A )

 1 A21
1 2
1
 ;
21
A A 3
A  1

cof 22 ( A )

 1 A11
2 2
1
 ;
22
A A 3
So,
1 1  2 
A 1    (5-13)
3 1 1 
Check that this inverse works:
1  1 2 1  2  1  3 0 
A A 1        I;
3   1 1 1 1  3  0 3 
(5-14)
1 1  2  1 2  1  3 0 
A 1 A     I
3 1 1   1 1  3  0 3 
Example: Calculate the inverse of the following 3x3 matrix using the method of
cofactors:
 2 4 3
 
A   1 3 2 (5-15)
 1 3 1
 
Solution: This is getting too long-winded. We will just do two representative elements
of A 1 .
2 4 3
A  1 3 2  2(3  6)  4(1  2)  3(3  3)  2;
1 3 1
3 2
 111
3 3
A 1
11 
cof 11 ( A )

3 1
  ;
2 2
(5-16)
A A
2 3
 13 2
1 1
A 1
23 
cof 32 ( A )

1 2
  ;
2 2
A A
B. Time required for numerical calculations.
5-4
Let’s estimate the computer time required to invert a matrix by the method of cofactors.
The quantity of interest is the number of floating-point operations required to carry out
the inverse. The inverse of a nxn matrix involves calculating n2 cofactors, each of them
requiring the calculation of the determinant of an (n-1)x(n-1) matrix. So we need to
know the number of operations involved in calculating a determinant. Let's start with a
2x2 determinant. There are two multiplications, and an addition to add the two terms.
n=2 gives 3 FLOPs. (FLOP = Floating-Point Operation.) To do a 3x3 determinant, the
three elements in the top row are each multiplied by a 2x2 determinant and added
together: 3x(3 FLOPs) + 2 FLOPs for addition; n=3 requires 3x3 + 2 FLOPs. Now we
can proceed more or less by induction. It is pretty clear that the determinant of a 4x4
matrix requires 4 calculations of a 3x3 determinant: --> 4x3x3 FLOPs. And for a 5x5
determinant, 5x4x3x3 operations. It is a pretty good approximation to say the following:
No. of operations for nxn determinant = n! (5-17)
2
This means that calculating the inverse by the cofactor method (n cofactors) requires
n2n! FLOPs.
A fast PC can today do about 10 GigaFLOPs/sec. This leads to the table given below
showing the execution time to invert matrices of increasing dimension.
dim. for for inverse time (sec) for inverse time (sec)
n determinant (method of cofactors) (PC) (Gauss-Jordan) (PC)
(n!) (n^2*n!) (4n^3)
2 2 8 8E-10 32 3.2E-09
3 6 54 5.4E-09 108 1.08E-08
4 24 384 3.84E-08 256 2.56E-08
5 120 3000 0.0000003 500 0.00000005
6 720 25920 0.000002592 864 8.64E-08
7 5040 246960 0.000024696 1372 1.372E-07
8 40320 2580480 0.000258048 2048 2.048E-07
9 362880 29393280 0.002939328 2916 2.916E-07
10 3628800 362880000 0.036288 4000 0.0000004
11 39916800 4829932800 0.48299328 5324 5.324E-07
12 479001600 68976230400 6.89762304 6912 6.912E-07
13 6227020800 1.05237E+12 105.2366515 8788 8.788E-07
14 8.7178E+10 1.70869E+13 1708.694508 10976 1.0976E-06
15 1.3077E+12 2.94227E+14 29422.67328 13500 0.00000135
16 2.0923E+13 5.35623E+15 535623.4211 16384 1.6384E-06
17 3.5569E+14 1.02794E+17 10279366.67 19652 1.9652E-06
18 6.4024E+15 2.07437E+18 207436908.1 23328 2.3328E-06
19 1.2165E+17 4.39139E+19 4391388125 27436 2.7436E-06
20 2.4329E+18 9.73161E+20 97316080327 32000 0.0000032
21 5.1091E+19 2.25311E+22 2.25311E+12 37044 3.7044E-06
22 1.124E+21 5.44016E+23 5.44016E+13 42592 4.2592E-06
23 2.5852E+22 1.36757E+25 1.36757E+15 48668 4.8668E-06
24 6.2045E+23 3.57378E+26 3.57378E+16 55296 5.5296E-06
Table 5-1. Floating-point operations required for calculation of n x n determinants and

inverses of n x n matrices, and computer time required for the matrix inversion. Results
5-5
are given for two different numerical methods. (As a useful conversion number, the
number of seconds in a year is about 3.14 x 107.)
It can be seen from the table that the inversion of a 24x24 matrix could take a time on a
fast computer about equal to the age of the Universe. This suggests that a more
economical algorithm is desirable for inverting large matrices!
Teasing Mathematica: Try this calculation of a determinant.
n=500
m=Table[Random[],{n},{n}];
Det[m]
Does this suggest that the algorithm used for Table 5-1 is not the fastest known?
C. The Gauss-Jordan method for solving simultaneous linear equations.
There is a method for solving simultaneous linear equations that avoids the determinants
required in Cramer's method, and which takes many fewer operations for large matrices.
We will illustrate this method for two simultaneous linear equations, and then for three.
Consider the 2x2 matrix equation solved above,
 
A x  C;
 1 2  x1   4  (5-4)
     
  1 1  x 2   5 
This corresponds to the two linear equations
x1  2 x2  4
(5-18)
 x1  x2  5
A standard approach to such equations would be to add or subtract a multiple of one
equation from another to eliminate one variable from one of the equations. If we add the
first equation to the second, we get
 x1  2 x 2  4  addt eq. (1) to eq.(2)  x1  2 x 2  4 
       (5-19)
  x 1  x 2  5   0  3 x 2  9 
Now we eliminate x2 from the top equation, by subtracting 2/3 x the bottom equation:
 x1  2 x 2  4  add eq. (1) to eq.(2)  x1  2 x2  4 
    
  x1  x 2  5   0  3x 2  9 
(5-20)
 x  0  2 
subtract
(2/3)x    1
eq. (2) from eq.(1)

 0  3x 2  9 
And finally, multiply the second equation by 1/3:
 x1  2 x 2  4  add eq. (1) to eq.(2)  x1  2 x2  4 
    
  x1  x 2  5   0  3x 2  9 
(5-21)
 x1  0  2  multiply eq. (2) by 1/3  x1  0  2 
 
subtract (2/3)x eq. (2) from eq.(1)
    
 0  3x 2  9   0  x2  3 
So we have found that x1 = -2 and x2 = 3, as determined earlier in the chapter using the
inverse.
5-6
Note that the same operations could have been carried out using just the coefficients of
the equations, and omitting x1 and x2, as follows. The assembly of the coefficients of x1
and x2 and the constants on the right of the equation is refered to as the augmented
matrix.
 1 2 4  add eq. (1) to eq.(2)  1 2 4 
   
  1 1 5      
   0 3 9 
(5-22)
 1 0  2  multiply eq. (2) by 1/3  1 0  2 
      
subtract (2.3)x eq. (2) from eq.(1)
     
  

 0 3 9   0 1 3 
The results for x1 and x2 appear in the column to the right.
Example: Use the Gauss-Jordan method to solve the system of linear equations
represented by
 
A x  C;
 2 4 3  x1   1 
     (5-23)
 1 3 2  x 2    1 
 1 3 1  x   4 
  3   
Solution: We set up the augmented matrix, and then set about making the matrix part of
it diagonal, with ones on the diagonal. This is done in the following systematic fashion.
First use the first equation to eliminate A21 and A22. Next use the second equation to
eliminate A32.
 2 4 3 1 2 4 3 1 
  subtract 1/2 x (1) from (2) and from (3)  
 1 3 2 1   0 1 1 / 2 1 / 2 
 1 3 1 4  0 1  1/ 2 7 / 2
   
(5-24)
2 4 3 1 
 
subtract
(2) from
 (3)
 0 1 1 / 2 1 / 2 
0 0 1 3 

Next we work upwards, using equation (3) to eliminate A23 and A13. After that, equation
(2) is used to eliminate A12. At this point the matrix is diagonal. the final step is to
multiply equations (1) and (3) by a constant which makes the diagonal elements of A
become unity.
5-7
2 4 3 1  add 1./2 * (3) to (2),  2 4 0 10 

  add 3 * (3) to (1)  
 0 1 1 / 2 1 / 2       0 1 0 2 
0 0 1   
 3  0 0 1 3 
 2 0 0 2  multiply (1) by 1.2  1 0 0 1  (5-25)
subtract 4 x (2)    
 0 1 0 2  and
from (1)
(3)
by
-1
  0 1 0 2 
 0 0  1 3 0 0 1  3
  
The solution for the unknown x's is thus x1 = 1, x2=2, x3 = -3.
SUMMARY: Work your way through the matrix, zeroing the off-diagonal elements, IN
THE ORDER SHOWN BELOW, zeroing ONE, then TWO, then THREE, etc. If you try
to invent your own scheme of adding and subtracting rows, it may or may not work.
  SIX FIVE 
 
.  ONE  FOUR 
 TWO THREE  

D. The Gauss-Jordan method for inverting a matrix.
There is a very similar procedure which leads directly to calculating the inverse of a
square matrix. Suppose that B is the inverse of A . Then
AB  I ;
 A11 A12 A13  B11 B12 B13   1 0 0 
     (5-26)
 A21 A22 A23  B21 B22 B23    0 1 0 
A    
 31 A32 A33  B31 B32 B33   0 0 1 
This can be thought of us three sets of three simultaneous linear equations:
 A11 A12 A13  B11   1 
    
 A21 A22 A23  B21    0 ,
A    
 31 A32 A33  B31   0 
 A11 A12 A13  B12   0 
    
 A21 A22 A23  B22    1 , (5-27)
A A33  B32   0 
 31 A32
 A11 A12 A13  B13   0 
    
 A21 A22 A23  B23    0 ,
A    
 31 A32 A33  B33   1 
These three sets of equations can be solved simultaneously, using a larger augmented
equation, as follows:
5-8
 2 4 3 1 0 0 2 4 3 1 0 0
  subtract 1/2 x (1) from (2) and from (3)  
 1 3 2 0 1 0          0 1 1 / 2  1 / 2 1 0
 1 3 1 0 0 1  0 1  1/ 2  1/ 2 0 1 
  
2 4 3 1 0 0  add 1./2* (3) to (2),  2 4
0 1 3 3 
   
    0 1 1 / 2  1 / 2 1 0  add
subtract (2) from (3)
3   0 1
0  1/ 2 1/ 2 1/ 2 
* (3) to (1)
0 0 1 0  1 1  1 0 0 0
 1 1 
 
2 0 0 3  5 1  multiply(1) by 1/2  1 0 0 3 / 2  5 / 2 1 / 2 
subtract 4 x (2)    
from
 (1)
 0 1 0  1 / 2 1 / 2 1 / 2  and (3)
by-1
  0 1 0  1 / 2 1 / 2 1 / 2 
0 0 1 0  1 1  0 0 1 0 1  1 
 
(5-28)
So, the result is
 3 5 1 
1 1 
B  A   1 1 1  (5-29)
2 
 0 2  2
The check is to multiply A by its inverse:
 3  5 1  2 4 3   2 0 0  1 0 0
1   1   
BA    1 1 1  1 3 2    0 2 0    0 1 0  (5-29)
2   2  0 0 2  0 0 1
 0 2  2  1 3 1     
So the inverse just calculated is correct.
Time for numerical inverse. Let us estimate the time to invert a matrix by this
numerical method. The process of zeroing out one element of the left-hand matrix
requires multiplying the line to be subtracted by a constant (2n FLOPs), and subtracting it
(2n FLOPs). This must be done for (approximately) n2 matrix elements. So the number
of floating-point operations is about equal to 4n3 for matrix inversion by the Gauss-
Jordan method. Consulting Table 5-1 shows that, for a 24x24 matrix, the time required
is less than a milli-second, comparing favorably with 1011 years for the method of
cofactors.
Number of operations to calculate the inverse of a nxn matrix.
method number of FLOPs
cofactor n2*n!
Gauss-Jordan 4n3
PROBLEMS
Problem 5-1. (a) Use the method of cofactors to find the inverse of
the matrix
5-9
 1 1 1 
 
C   1 1 1  .
 1 1  1
 
1
(b) Check your result by verifying that C C  I .
Problem 5-2. Use the Mathematica function Inverse to find the inverse of the matrix
 1 1 1 
 
C   1 1 1  .
 1 1  1
 
(See Appendix C for the necessary Mathematica operations.) Check your result.
Problem 5-3. Prove that if an operator A has both a left inverse (call it B ) and a right
inverse (call it C ), then they are the same; that is, if
BA  I
and
AC  I
then
B C .
[Be careful to assume only the properties of B and C that are given above. It is not to
be assumed that A , B and C are matrices.]
Problem 5-5. Suppose that B and C are members of a group with distributive
multiplication defined, each having an inverse (both left-inverse and right-inverse). Let
A be equal to the product of B and C , that is,
A  BC .
Now consider the group member D , given by
D  C 1 B 1 .
Show by direct multiplication that D is both a left inverse and a right inverse of A .
[Be careful to assume only the properties of B and C that are given above. It is not to
be assumed that A , B and C are matrices.]
Problem 5-6. (a) Use the method of Gauss-Jordan elimination to find the inverse of the
matrix
 1 1 1 
 
C   1 0 1  .
 1 1  1
 
(b) Check your result by verifying that C C 1  I .
5 - 10
Chapter 6. Rotations and Tensors

There is a special kind of linear transformation which is used to transforms coordinates
from one set of axes to another set of axes (with the same origin). Such a transformation
is called a rotation. Rotations have great practical importance, in applications ranging
from tracking spacecraft to graphical display software. They also play a central role in
theoretical physics
The linear transformation corresponding to a rotation can be described by a matrix. We

will describe the properties of rotation matrices, and discuss some special ones.
A. Rotation of Axes.
z
In chapter 1 we considered a rotation z'
of a Cartesian coordinate system
A
about the z-axis by an angle . A
more general case is shown in figure y'
6-1, where the axes are transformed
eˆ3 ' ê3
by rotation into an arbitrary second eˆ2 '
set of axes. (We take both sets or y
ê1
axes to be right handed.
Transformation to a left-handed
eˆ1 ' ê2
coordinate system is to be avoided,
unless you really know what you are x'

doing.) A typical vector A is also x
shown. Figure 6-1. Two sets of Cartesian coordinate axes.
The (x',y',z') system is obtained from the (x,y,z)
Figure 6-2 (a) shows the components system by a rotation of the coordinate system.

of A in the (eˆ1 , eˆ2 , eˆ3 ) system, and
figure 6-2 (b) shows its coordinates in the (eˆ1 , ' eˆ2 , ' eˆ3 ' ) system. We see that we can write

A , using components in the unprimed system, as
A  eî Ai ; (6-1)
or, using components in the primed frame, as
A  eî ' Ai ' (6-2)
The standard definitions of components are clear from (6-1) and (6-2):
Ai  eî  A,
(6-3)
Ai '  eî ' A.
We can, however, relate the primed components to the unprimed, as follows:
6-1
A '  eˆ j ' A
j
 eˆ j ' ( Al eˆl )
(6-4)
  eˆ j ' eˆl  Al
 C jl Al
z
(a) (b) z
A
A eˆ3 A3
eˆ3 A3 z'
y'
eˆ3 ' ê3 eˆ '
y
eˆ1 A1 2
y
ê1
eˆ1 ' ê2 eˆ2 A2
eˆ2 A2 eˆ1 A1
x'
x
x

Figure 6-2. The vector A expressed in terms of its (eˆ1 , eˆ2 , eˆ3 ) components (a), and in terms of its
(eˆ1 , ' eˆ2 , ' eˆ3 ' ) components (b).
That is to say that the relationship between primed and unprimed components can be
expressed as a matrix equation,
 
A'  C A , (6-5)
where the transformation matrix C is given by
 
C  C jl  (eˆ j ' eˆl ) . (6-6)
There is an interpretation of the coefficients Cjl in terms of angles. Since all of the eˆj
and eˆl are unit vectors, the dot product of any two is equal to the cosine of the angle
between them:
eˆm ' eˆn  cosm'n . (6-7)
Example: Consider a rotation by an angle  in the counter-clockwise sense about the z

axis, as described in Chapter 1. Since the z axis is the same in the primed and unprimed
systems, certain of the coefficients C jl  eˆ j ' eˆl are especially simple:
C 33  1,
; (6-8)
C 31  C 32  C13  C 23  0
The other coefficients can be read off from figure (2-4):
6-2
C11  î 'î  cos ,

C 22  ĵ' ĵ  cos ,
; (6-9)
C12  î ' ĵ  sin ,
C 21  ĵ'î  - sin .
and so
 cos sin 0 
 
C    sin cos 0   Rz ( ) . (6-10)
 0 1 
 0
The matrices corresponding to rotations about the x and y axes can be calculated in a
similar way; the results are summarized below.
1 0 0 
 
Rx ( )   0 cos  sin  
 0  sin  cos  
 
 cos  0  sin  
 
Ry ( )   0 1 0 . (6-11)
 sin  0 cos  

 cos  sin  0
 
Rz ( )    sin  cos  0
 0 1 
 0
B. Some properties of rotation matrices.
Orthogonality. Consider the definition of the transformation matrix C :

 
C  C jl  (eˆj  eˆl ) . (6-6)
If we form the product of C and its transpose, we get
CC T   CimC T ml
il
 CimClm
. (6-12)
 eî ' eˆmeˆl ' eˆm
 eî ' eˆmeˆm  eˆl '
But, the vector eˆl , like any vector, can be written in terms of its components in the un-
primed frame,
eˆl  eˆm (eˆm  eˆl ) . (6-13)
In this sum, (eˆm  eˆl ) is the m-th component of the vector eˆl in the unprimed frame. So,
equation (6-12) becomes
6-3
CC il  eî ' eˆl ' .

T
(6-14)
  il
or
CC T  I . (6-15)
The equivalent statement in index notation is
Cij Clj  il . (6-15a)
A similar calculation for the product C T C leads to
CT C  I
. (6-15b)
CliClj   il
This leads us to define orthogonality as follows:
For an orthogonal matrix, the
transpose matrix is also the inverse
matrix.
C T C  CC T  I
(6-15c)
Cli Clj  Cil C jl   il
[Note: We have not used the condition that either eî , i  1,3 or eî , i  1,3 form a right-
handed coordinate system, so orthogonal rotations include both rotations and
transformations consisting of a rotation and a reflection. More about this distinction
later.]
Example: Show that the rotation matrix of (6-10) above is orthogonal.
Solution:
 cos sin 0  cos  sin 0
  
CC T
   sin cos 0  sin cos 0
 0 1  0 1 
 0 0
 cos2   sin 2   sin cos  sin cos 0
 
   sin cos  sin cos sin 2   cos2  0  . (6-16)
 0 0 1 

I
Determinant. It is easy to verify the following fact: The determinant of an orthogonal

matrix must be equal to either +1 or -1. (The proof is left to the problems.) If its
determinant is equal to +1, it is said to be a proper orthogonal matrix. All rotation
matrices are proper orthogonal matrices, and vice versa. Rotation matrices must have a
determinant of +1. However, this is not the only condition for a square matrix to be a
rotation matrix. It must also have the property that its transpose is equal to its inverse.
6-4
NOTE: All rotation matrices are orthogonal; but not all orthogonal matrices are
rotation matrices. Can you explain why?
C. The Rotation Group
Rotations are linear operators which transform three coordinates {xi} as seen in one
coordinate system into three coordinates {x'i} in another system. All vectors transform in
this same way, as given in equation (6-4) above:
A j '  C jl Al (6-4)
We have shown that C must be orthogonal, and have determinant equal to +1.
Multiplication of two rotation matrices gives another square matrix, and it must also be a
rotation matrix, since carrying out two rotations, one after another, can be represented by
a single rotation. So, the rotation group is defined as follows.
Definition. The "rotation group" consists of the set of real orthogonal 3x3 matrices with
determinant equal to +1.
(a) Group multiplication is just matrix multiplication according to the standard
rules.
(b) The group is closed under multiplication.
(c) The identity matrix is the identity of the group.
(d) For every element A of the group, there exists an element A-1, the inverse of
A, such that AA1  I .
(e) Multiplication satisfies A  BC    AB  C .
These properties are easy to prove and are left to the exercises.
D. Tensors
What are tensors? Tensors look like matrices; but only certain types of matrices are
tensors, as we shall see. They must transform in a certain way under a rotation of the
coordinate system. Vectors, with one index, are tensors of the first rank. Other objects,
such as the rotation matrices themselves, may have more than one index. If they
transform in the right way, they are considered to be higher-order tensors. We will first
discuss rotations, then define tensors more precisely.
Under a rotation of the coordinate system, the components of the displacement vector
x change in the following way:
xi = Rij x j . (6-17)
The tensor transformation rules represent a generalization of the rule for the
transformation of the displacement vector.
Tensors of Rank Zero. A rank-zero tensor is an object g which is invariant under

rotation:
Rank - zero tensor : g   g (6-17a)
6-5
Tensors of Rank One. Modeled on the rotation properties of vectors, we define a

tensor of rank one as an object with a single index, C = { C i ,i = 1,3} or just Ci for short,
such that the components Ci' in a rotated coordinate system are given by
First - Rank Tensor : Ci = RijC j . (6-18)
Tensors of Rank Two. A second-rank tensor is an object with two indices,

D = { Dij ,i, j = 1,3} or Dij for short, such that its components Dij' in a rotated coordinate
system are given by
Second - Rank Tensor: Dij = Rik R jl Dkl . (6-19)
That is to say, each of the two indices transforms like a single vector index.
Tensors of Rank Three. Following the same pattern, a third-rank tensor Hijk follows
the transformation law
Third - Rank Tensor : H ijk = R il R jm R kn H lmn . (6-19)
And so on.
Much of the power of tensor analysis lies in the ease in which new tensors can be
created, with their transformation properties under rotations just determined by the
number of tensor indices. Here are two theorems about creation of tensors which we
offer without proof.
Tensor Products: The direct product of a tensor of rank n and a tensor of rank m
is a tensor of rank n+m.
Thus, for instance, the product of two vectors, AiBj, is a second-rank tensor.
Contraction of Indices: If two tensor indices are set equal (and so summed), the
resulting object is a tensor of rank lower by two than that of the original tensor.
Example: Let us consider the contraction

 of the two indices of the tensor AiBj:
Ai Bi  A  B . (6-20)
The fact that this dot product is a zero-rank tensor guarantees that it has the same
value in all coordinate systems.
Special Tensors. The Kronecker delta symbol and the Levi-Civita symbol have
indices which look like tensor indices - but their components are defined to be the same
regardless of the coordinate system. Can they still transform like tensors, of rank two and
three, respectively? This is in fact the case; the proof is left for the problems. This
means that these two tensors can be used in conjunction with vectors and other tensors to
make new tensors.
6-6
Example: Consider the following tensor operations: First form the direct
product ijkAlBm of the Levi-Civita tensor with two vectors A and B. Then
perform two contractions of indices, by setting l=j and m=k. This produces the
following object, with one free index:
 ijk Al Bm 
l j
mk
  ijk A j B k  (AxB )i . (6-21)
This is what we call the cross product or vector product. The theorems about
tensor indices guarantee, since it has one free index, that it does in fact transform
like a vector under rotations.
Example: The trace and determinant of a tensor are both scalars, since all the
free tensor indices are removed by contraction. Thus, the trace and determinant
of a matrix are the same in all coordinate systems, provided that the matrix itself
transforms like a tensor.
E. Coordinate Transformation of an Operator on a Vector Space.
Here is an explanation of the transformation law for a second-rank tensor. Consider an

operator O operating on a vector A to produce another vector B :
B  OA . (6-22)
Tensors are all about looking at something from another point of view - in a rotated
coordinate system. So, if we write quantities in the rotated coordinate system with a
prime, the other point of view is
B  OA . (6-23)
Under rotations, every vector is just multiplied by R . If we apply this to both sides of
equation (6-22) it becomes
 
RB  R OA . (6-24)
Now insert between O and A a factor of the identity matrix, which we can write as
I  R 1 R  R T R , where we have used the fact that R's inverse is just its transpose.
RB  ROA
 ROR T RA . (6-25)
 
  ROR T  RA
This just shows the operator working in the rotated coordinate system, as shown in
(6.23), provided that the operator in the rotated coordinate system is given by
O   ROR T .Similarity Transformation (6-26)
This process, of sandwiching a square matrix between R and its inverse, is called a
similarity transformation.
But . . . shouldn't O transform like a second-rank tensor? Let's write 6-26) in tensor
notation,
6-7
Oij  Ril Olm Rmj

T
 Ril Olm R jm . (6-27)

 Ril R jm Olm
This is exactly the transformation law for a second-rank tensor given in eq. (6-19).
This means that laws like

B  OA . (6-28)
can be used in any coordinate system, provided that vectors and operators are
transformed appropriately from one coordinate system to another. This sounds a little
like Einstein's (really Galileo's) postulate about the laws of physics being the same in all
inertial frames. And this is no accident. When laws are written explicitly in terms of
tensors, they are said to be in "covariant form." It makes for beautiful math, and
sometimes beautiful physics.
F. The Conductivity Tensor. There are many simple physical laws relating vectors.
For instance, the version of Ohm's law for a
distributed medium is z
J E . (6-29)
E
This law asserts that when an electric field
E is applied to a region of a conducting
material, there is a flux of charge J in the 
same direction, proportional to the
magnitude of E . This seems simple and y
right. But is it right? ("A physical law
should be as simple as possible -- but no
simpler," Einstein.)
A more general relation can be obtained,

following the principle that J must x
transform like a vector. We can combine J Figure 6-3. Graphite, represented as layers (of
in various ways with tensor quantities hexagonally bonded carbon) stacked one on top
representing the medium, as long as there is of another.
one free index. The two obvious relations
are
J   E  J i   Ei scalar conductivity
. (6-30)
J   E  J i   ij E j tensor conductivity
The scalar relation causes current to flow in the direction of the electric field. But the
tensor relation allows the current to flow off in a different direction. What kind of
medium would permit this?
Graphite is a material which is quite anisotropic, so we might expect a relation of more

complicated directionality. Figure 6-3 shows the model we will take, of planes of carbon
atoms where electrons flow easily, the planes stacked in such a way that current flow
6-8
from one plane to another is more difficult. We can set up a conductivity matrix  as
follows. From symmetry, we would expect that an electric field in the x-y plane would
cause a current in the direction of E , with conductivity 0 "in the planes." And, also
from symmetry, an electric field in the z direction should cause current to flow in the z
direction, but with a much smaller conductivity 1. This can be written in matrix form as
  0 0 0  E1 
  
J   E   0  0 0  E2  . (6-31)
 0 0   E 
 1  3 
If you multiply this out, you see that the first two components of E (the x and y
components) are multiplied by 0, and the third component, by 1.
But what if for some reason it is better to use a rotated set of axes? Then the applied field
and the conductivity tensor both need to be transformed to the new coordinate system,
and their product will produce the current density in the new coordinate system. Note
that scalar quantities, like J  E , are zero-th rank tensors and should have the same value
in both coordinate systems.
Numerical Example. In equation (6-31), let's take 1 = 0.1 * 0 (weak current

perpendicular to the graphite planes) and have the electric field in the y-z plane, at an
angle of 45 from the y axis, as shown in figure (6-3):
 
 0 
  1 0 0 
 2  
E  E0   ,   0 0 1 0  . (6-32a)
2  
   0 0 0.1
 2
 
 2 
This gives for the current density J in the original coordinate system,
   
 0   0 
1 0 0    
  2   2
 J   E   0 E0  0 1 0     0 E0  2  . (6-32b)
 0 0 0.1  2   
  2
   2
   
 2   20 
The z component of the current is small compared to the y component, and so the current
vector moves closer to the plane of good conduction.
Now let's see what happens in a rotated coordinate system. Let's go to the system where
the electric field is along the z axis. This requires a rotation about the x axis of   45 .
Using the form of this matrix given in eq. (6-11) above, we have
6-9
 
1 0 0 
 
R  Rx (  45)   0
2 2
 . (6-33)
2 2 
 
 2 2 
0 
 2 2 
In this system the electric field vector is simpler:
  
1 0 0  0 
   0
 2 2  2  
E   RE  E0  0   E0  0  ; (6-34)
2 2  2 
1
    
 2 2  2
0  
 2 2  2 
And the conductivity tensor becomes
   R  RT
   
1 0 0  1 0 0 
 1 0 0  
2  
  0  0 1 0   0
2 2 2
 0
2 2   2 2 
  0 0 0.1  
 2 2   2 2
0  0  
 2 2   2 2 
  
1 0 0 1 0 0 
  
 2 2  2 2
 0 0  0
2 2  2 2 
  
 2 2  2 2
0  0  
 2 2  20 20 
1 0 0 
 
  0  0 0.55 0.45 
 0 0.45 0.55  ; (6-35)
 
Now here is the punch line: We can either rotate J  (calculated in the original coordinate
system) into the rotated coordinate system; or we can calculate it directly, using the
electric field and the conductivity tensor expressed in the rotated coordinate system.
Here goes:
6 - 10
  
1 0 0  0 
    0 
 2 2  2   
J   RJ   0 E0  0       0 E0  0.45  ; (6-36)
2 2 2  0.55 
    
 2 2  2 
0  
 2 2  20 
And the other way:
1 0 0  0   0 
    
J    E    0 E0  0 0.55 0.45  0    0 E0  0.45  ; (6-36)
 0 0.45 0.55  1   0.55 
    
It works - same result either way. Here is the interpretation of this example.
The equation
J E
produces the current density flowing in a medium (a vector) in terms of the electric field
in the medium (also a vector). But . . . the current does not necessarily flow in the
direction of the applied electric field. The most general vector result caused by another
vector is given by the operation of a second-rank tensor!
In a simple physics problem, the conductivity tensor will have a simple form, in a
coordinate system which relates to the structure of the material. But what if we want to
know the current flowing in some other coordinate system? Just transform the applied
field and the conductivity tensor to that coordinate system, and use
J E
x
v
G. The Inertia Tensor.
In classical mechanics, a rigid body which is r

rotating about its center of mass had angular L
momentum. The rate of rotation at a particular x
instant of time can be described by a vector,  ,
which is in the direction of the axis of rotation, with x
magnitude equal to the angular speed, in
Figure 6-4. Displacement and velocity,
radians/sec. The angular momentum is also given and the resulting angular momentum.
by a vector, L , which is equal to the sum (or
integral) of the individual angular momenta of the parts of the rigid body, according to
dL  dm r  v . (6-37)
In certain cases, such as when the rotation is about an axis of symmetry of the object, the
relation between L and  is simple:
L  I . (6-38)
6 - 11
Here I is the moment of inertia discussed in a first

course in mechanics. 
v r
However, surprisingly enough, the resultant total r
angular momentum vector L is not in general parallel
to  . The general relation involves an angular-
momentum tensor, I , rather than the scalar moment
of inertia. The general relation is
L  I  . (6- Figure 6-5. A rigid body, rotating about
an axis through the origin.
39)
where now, for a body approximated by N discrete point masses,
N
L   Li
i 1
and (6-39)
Li  mi ri  vi
is the angular momentum of the i-th mass. The velocity of a point in the rotating body is
given by
vi    ri ,
and so
dL  dm r  v
 dm r    r 
We will evaluate the double cross product using tensor notation and the epsilon killer.
dLi  dm  r    r   i
 dm  ijk rj  klml rm
 dm  kij  klm rj l rm (6-41)
 dm  il  jm   im jl  rj l rm
 dm
Problems
The Pauli matrices are special 2x2 matrices associated with the spin of particles like the
electron. They are
 0 1  0 i 1 0 
 1   ,  2   ,  3    .
 1 0    i 0   0  1 
(Here “i” represents the square root of -1.)
Problem 6-1. (a) What are the traces of the Pauli matrices?
(b) What are the determinants of the Pauli matrices?
(c) Are the Pauli matrices orthogonal matrices? Test each one
6 - 12
Problem 6-2. The Pauli matrices are claimed to satisfy the relation
 j k   k j  2i jkl l
Test this relation, for j=2, k=3.
 0 i
Problem 6-3. In Problem 6-1 above one of the Pauli matrices,  2    , did not
 i 0 
turn out to be orthogonal. For matrices with complex elements a generalized relationship
must be used:
CC †  C †C  I  C is unitary
†
where the "Hermetian conjugate" matrix C is the complex conjugate of the transpose
matrix.
(a) Calculate the Hermetian conjugate  2† of the second Pauli matrix.
(b) Check and see if it is unitary.
[Unitary matrices become important in quantum mechanics, where one cannot avoid
working with complex numbers.]
Problem 6-5. Using the fact that the determinant of the product of two matrices is the
product of their determinants, show that the determinant of an orthogonal matrix must be
equal to either +1 or -1. Hint: Start with the orthogonality condition AT A  I .
Problem 6-6. For each of the three matrices below, say whether or not it is orthogonal,
and whether or not it is a rotation matrix. Justify your answers.
 2 2
 0 0 1  4 3 0  0  
  1   2 2 
A   0  1 0 , B    3 4 0 , C   0 1 0 
  1 0 0 5  0 0 5  2 2 
     0 
 2 2 
Problem 6-7. Consider rotations Rx() and Rz(), with the matrices Rx and Rz as defined
in the text. Here  represents the angle of a rotation about the x axis and  represents the
angle of a rotation about the z axis.
First, calculate the matrices representing the result of carrying out the two rotations in
different orders:
A  Rx  Rz  
and
B  Rz  Rx  .
Comparing A and B , do you find that the two rotations Rx   and Rz   commute?
6 - 13
Next, look at your results in the small-angle approximation (an approximation where
you keep only terms up to first order in the angles  and ). Does this change your
conclusion?
Problem 6-8. Show algebraically that the rotation group is closed under multiplication.
that is, show that, for A and B members of the rotation group, the product matrix C = AB
is orthogonal and has determinant equal to +1.
Problem 6-10. The Kronecker delta symbol is defined to have constant components,
independent of the coordinate system. Can this be true? That is to say, is it true that
?
 ij  Ril R jm lm   ij
Use the orthogonality of the transformation matrix Rij to show that this is true.
Problem 6-11. The Levi-Civita symbol ijk, defined previously, has specified constant
values for all of its components, in any frame of reference. How can such an object be a
tensor - that is, how can these constant values be consistent with the transformation
properties of tensor indices? Well, they are! Consider the transformation equation
 ijk = Ril R jm R kn  lmn ,

where R is a rotation matrix, satisfying the orthogonality condition RijRik = jk, and ijk is
the Levi-Civita symbol. The question is, does ' have the same values for all of its
components as ? As a partial proof (covering 21 of the 27 components), prove that
 ijk = 0 if any two of the indices {i,j,k} are equal.
Note: in carrying out this proof, set two indices to the same value - but do not sum over
that index. This operation falls outside of the Einstein summation convention.
Problem 6-12 Tensor transformation of the Levi-Civita symbol. (Challenge

Problem) Prove that under transformations of tensor indices by an orthogonal matrix R,
the Levi-Civita tensor transforms according to the rule
 ijk =  ijk det( R ) .
Problem 6-13 Invariance of length under rotation. The length of a vector is a scalar,
and so should not change from one coordinate system to another. To check this, use the
vectors E and J from the conductivity example in the text. See if they have the same
magnitude in both coordinate systems used.
Problem 6-14 Invariance of a scalar product under rotation. The inner product of
two vectors is a scalar, and so should not change from one coordinate system to another.
To check this, use the vectors E and J from the numerical example in the discussion of
tensor conductivity in this chapter. See if the inner product E  J is the same in both
coordinate systems used.
6 - 14
Problem 6-15. Return to the numerical example in the discussion of tensor conductivity
in this chapter. Suppose that the electric field E in the original system still has
 0
magnitude E0, but is directed along the positive z axis; that is, E   0 
E 
 0
(a) Calculate the current density J   E in the unprimed system.
(b) Now consider the same rotation, using R from the example in the text, and calculate
J    E  in the primed system. (Note:   has already been calculated in the notes.)
(c) Now calculate the current density in the primed system by the alternate method,
J   RJ . Compare with the result from part (b).
6 - 15
Chapter 6a. Space-time four-vectors.

The preceding chapters have focused on a description of space in terms of three
independent, equivalent coordinates. Here we discuss the addition of time as a fourth
coordinate in "space-time." This leads to the consideration of space-time
transformations, or Lorentz transformations, which are an extension to four dimensions
of rotations in three dimensions. Special relativity is introduced here as a generalization
of the invariance of length under rotations in three-space. The transformation of the
Maxwell stress tensor under relativistic "boosts" is introduced as an application to
electromagnetic theory. A later chapter, intended to follow the study of Maxwell's
equations, shows how covariant tensor calculus leads to these equations.
A. The origins of special relativity.
This course is mainly about mathematical methods, so I will completely ignore the rich
history of scientific discovery and speculation that lead to the integration of space and
time into four-space. I will just go over the most compelling reasons based on modern
science for requiring something like special relativity.
1. There are lots of kinds of electromagnetic radiation known: light, radio waves, X-
rays, WiFi and microwave ovens. All of these disturbances travel at a certain special
speed, c = 3 x 108 m/s.
2. Beams of electrons are commonplace. Electrons in radio and television tubes respond
like most massive particles, speeding up in response to forces acting on them. However,
at particle accelerators such as SLAC, it is observed that as particles approach the speed
c, they are harder and harder to speed up. They never quite get up to speed c.
3. Radioactive particles are also commonplace; they decay with a characteristic half life.
But, mu mesons which are contained in circular orbit by a magnetic field live longer than
when at rest. Furthermore, the light curves of supernovae moving away from us at high
velocity are stretched out in time, indicating that decaying iron and nickel isotopes are
exhibiting a longer half life, presumably because they are moving with a speed
approaching c.
Note that c is a special property of nature, even for phenomena which have nothing to do
with light. However, we always call it "the speed of light."
So - it is not too dumb to propose the following: we have admired the simplicity of
coordinate transformations from one frame of reference to a rotated frame. Space
coordinates are changed, but time is not involved. However, the phenomena described
above suggest that a moving observer sees time differently. We thus suppose that
transforming to the point of view of a moving observer involves a special sort of
"rotation" in space and time.
B. Four-vectors and invariant proper time.
6a - 1
We will add a time coordinate to the usual 3-component space vector. Time does not
have the correct units - but ct (time multiplied by the speed of light) does. Thus we have
a four-vector:
 ct 
 
 x
x   (6a-1)
y
 
z
 
[Note about indices. I will use Greek letters such as , , , and  (mu, nu, lambda and
ksi) to label space-time components, as distinct from i, j, k, l . . . for space components.
There is also an issue about upper and lower indices, corresponding to contra-variant and
co-variant indices. I will try to avoid discussing their difference in detail, leaving that for
a more specialized course.] A space-time index takes on one of the four values 0, 1, 2, 3.
Thus x1, x2, and x3 are just the usual x, y, and z, with the fourth coordinate as x0 = ct.
Now, what do we do with a four-vector? An important property of a three-space vector is

 A1 
 
its length. For a vector A   A2  , the length A  A2 is given by
A 
 3
A2   ij Ai Aj
(6a-2)
 A12  A22  A32
For the special case of the position vector r , the scalar length r is given by
r 2   ij xi x j
(6a-3)
 x12  x22  x32
We recall that under rotation the components x1, x2, and x3 can change, but in such a way
that the length r is invariant. In fact, orthogonal transformations are defined to be just
those linear coordinate transformations which leave the length of vectors invariant.
In space-time, the corresponding length is called proper time . It is defined this way:
 c   g x  x
2
(6a-4)
where g  is the metric tensor, defined as follows:
1 0 0 0 
 
0 1 0 0 
g    (6a-5)
 0 0 1 0 
 
 0 0 0 1
This gives, in terms of the common variables t, x, y, and z,
 c   g x x   ct   x2  y 2  z 2
2 2
(6a-6)
This is bit more complicated than the three-space version. For one thing, the inner
product is formed using not the Kronecker delta function ij, but the metric tensor g.
Note that it has two lower indices. A general rule for using four-space indices is the
6a - 2
following: When carrying out a contraction by setting two indices equal, one must be a
lower index and the other, an upper index. The Einstein summation convention is of
course in force, where a paired index is assumed to be summed, from 0 to 3.
There is one major difference between the length of a three-vector and the length of a
four-vector. Because of the minus sign in the metric tensor, the length of the vector can
come out to be zero, or even negative. This is a warning that, while time has been
introduced as a vector component, it is really still different from the space components.
C. The Lorentz transformation.
The defining property of a three-vector is how it changes when the frame of reference (of
the observer) is rotated about an axis. The corresponding change in the frame of
reference for a 4-vector is from that of a "stationary" observer to that of an observer
moving with velocity v in a particular direction, usually taken to be the x-direction. (See
figure 6a-1.) What is the transformation which changes space and time coordinates, in
such a way as to leave the 4-vector length unchanged? Here it is - the Lorentz
transformation.
   0 0 
 
  0 0 
    The Lorentz Transformation (6a-7)
 0 0 1 0
 
 0 0 0 1
where
1

1  2 (6a-8a)
  
and
v
 (6a-8b)
c
These symbols form the language of relativity. The symbol  is often just referred to as
the velocity; it is a dimensionless velocity formed by dividing the actual velocity (the
relative velocity of the two frames of reference) by the speed of light. The two symbols
 and  are functions of the velocity  . They play the role for the Lorentz
transformation that cos and sin  play in the rotation Rz   ; instead of
cos 2   sin 2   1 we have
 2  2  1 (6a-9)
The role of the Lorentz transformation matrix   given above is to calculate four-space
coordinates in a "moving" coordinate system S' , in terms of those in a "stationary"
6a - 3
system S. These systems are illustrated below in figure 6a-1.
y y
S ' S'
t t
'
x x
'
z z
v
Figure 6a-1. Two rest-frames, in relative motion. '
'
The frame S' is in motion, relative to S, with velocity v in the x-direction. The Lorentz
transformation from S to S' is
   0 0   ct    ct   x1 
    
      0 0   x1    ct   x1 

x  x   (6a-10)
 0 0 1 0   x2   x2 
    
 0 0 0 1  x3   x3 
or, in the form often quoted for the Lorentz transformation,
v
t '  t  2 x
c
x '   x   vt (6a-11)
y' y
z' z
The inverse of this transformation is pretty easy to figure out. It is just obtained by
changing v to -v.  is unchanged, and  changes to -,giving
  0 0

  0 0 
 1   The Inverse Lorentz Transformation (6a-12)
 0 0 1 0
 
 0 0 0 1
and
6a - 4
 ct    0 0   ct  
 
 x1  
  0 0   x1 

x   x 
1

(6a-13)
 x2  0 0 1 0   x2 
    
 x3  0 0 0 1   x3 
D. Space-time events.
Some of the most interesting effects in special relativity involve objects in motion - that
is, in motion with respect to the observer. The frame of reference in which the object is
not in motion is called its rest frame. Properties of an object, in space or time, may vary
according to the observer's state of motion with respect to the object, and the properties
observed in the rest frame are considered to be fundamental properties of the object. For
instance, the intrinsic length of an object is that measured in its rest frame. And the time
for an object to do something (go to sleep, then wake up, for instance) is properly
measured in the object's rest frame. We will now show that time intervals are stretched
out ("dilated") if the object is moving, and lengths are shortened ("contracted").
The basis of geometry in three-space consists of points, specified by the coordinates

(x,y,z). In four-space we talk instead of events, specified by time and position, e.g., for
event A,
 ct A 
 
x
xA   A  .

(6a-14)
 yA 
 
 zA 
A point in space can be marked by driving a stake in the ground, or carving your name on
a tree. For a four-space version, some authors imagine setting off a small bomb, so that a
black mark gives the position, and the sound adds the time. It is of special interest to
consider the 4-space displacement between two events. Thus, the displacement from
event A to event B is
 c  tB  t A  
 
    xB  x A 
x  xB  x A  . (6a-15)
 yB  y A 
 
 zB  z A 
E. The time dilation.
Let us consider two events, A and B, happening to an object in frame S'. This is the rest
frame of the particle, so they both happen at a single point, which we will take to be the
origin, x' = y' =z' = 0. Let the first event happen at time t = 0, and the second, at time
T0. So, the four-vector positions of A and B, in frame S', are
6a - 5
0  cT0   cT0 

     
0 0  0 
xA    , xB   , and x  xB  xA   .
0  0   0 
     
0  0   0 
Now we use the inverse Lorentz transformation to go to the frame S of the observer:
   0 0  cT0    cT0 

 1     0 0  
 
0   cT0 

x    x   (6a-16)
 0 0 1 0  0   0 
    
 0 0 0 1  0   0 
This tells us that the time interval T observed in S is a factor of  greater than the time
interval T0 observed in the rest frame. That is,
T   T0 time dilation (6a-17)

Example. Suppose a rocket going to Mars travels at relativistic speed   0.1 ,
that is, at 10% the speed of light. (This is not actually very practical.) How long
would a year of an astronaut's life (observed in her rest frame, moving with the
rocket) appear to take?
The time-dilation factor  is

1 1
   1.005
1  2 1  0.01
So, the length of the dilated year (as we see it, not in her rest frame) is
T   T0  1.005 years .
F. The Lorentz contraction.
According to the Lorentz contraction, fast-moving objects appear shorter than they
actually are. Let us see how this works. Suppose that a stick of length L0 (if measured in
its rest frame S') is in fact observed in frame S, where it appears to be moving at velocity
v along the x-axis. Let events A and B be observations of the two ends of the stick in its
rest frame, as shown in figure 6a-2 below.
6a - 6
y'
t'
S'
L0
A B
x'
z'
Figure 6a-2. Two events A and B as seen in system S'. They

mark observations (not necessarily simultaneous) of the two
ends of a stick of length L0.
Events A and B are separated by a distance L0, the length of the stick. We do not require
them to be at the same time, since the stick is not moving. So we can take A to be at the
origin, at time t' = 0, and event B to be at the end of the stick, at undetermined time t'B.
Thus,
0  ct B   ct B 
     
 0  L0  L

xA  
, xB  , and x =xB  x A   0  . .
 
0  0   0 
     
0  0   0 
We use the inverse Lorentz transformation to see the length of the stick in frame S:
   0 0  ct B    ct B   L0 

 1     0 0  
  
L0   ct B   L0 
x    x    (6a-18)
 0 0 1 0  0   0 
    
 0 0 0 1  0   0 
We are interested in events A and B which occur at the same time in frame S. And with
this condition, the separation of events A and B in frame S is the length of the stick, as
observed with the stick in motion. That is,
  ct B   L0   0 
 
   ct B   L0   L 
x   (6a-19)
 0  0
   
 0  0
This gives two equations, for the first two components. The first one can be used to
eliminate tB, giving
6a - 7

t B   L ,
c 0
and then the second equation gives
L   t B   L0
  
   L0   L0
 c 
 0   2   2 
L

or, using  2   2  1 ,
1
L L0
 Lorentz contraction (6a-20)
G. The Maxwell field tensor.
The electric field E and the magnetic field B each have three components which
seem to be related to the directions in space. But how do they fit into relativistic four-
space? There are no obvious scalar quantities to provide the fourth component of their
four-vectors. Furthermore, the magnetic field has some dubious qualities for a true
vector. For one, it is derived from a cross product of vectors, and so does not reverse
under the parity transformation, as all true vectors do.
There is another interesting argument indicating that the relativistic transformation

properties of the electric and magnetic fields are complicated. Under Lorentz
transformation, the four components of the four-vector are re-arranged amongst
themselves. But the transformation to a moving coordinate system turns a pure electric
field into a combination of electric and magnetic fields. This can be understood in a very
rough way from the following observation. An electrostatic field can be produced by a
distribution of fixed charges. But if one shifts to a moving coordinate system, the
charges are moving, constituting currents, which generate magnetic fields.
6a - 8
A concrete example can make a prediction for the transformation of electric into
magnetic fields. Consider two line charges, as shown in figure 6a-3 below. This
E
+ + + + +
+ + + +
z
Figure 6a-3. Two static line-charge distributions, producing an electrostatic electric field. Near the origin,
the electric field is in the positive y direction.
distribution of stationary charge produces an electrostatic field, as shown. Near the

origin, the electrostatic field is in the +y direction. Now, what does this look like to an
observer in system S', moving in the +x direction? This is shown in figure 6a-4. There is
6a - 9
B B
B
y'
I
x'
E E E
+ + + + +
+ + + +
z'
B B I B
Figure 6a-4. Two linear distributions of charges moving in the -x-direction, producing both an electrostatic
electric field and a magnetic field.. Near the x-axis, the magnetic resulting from the two "wires" is in the -z'-
direction.
now a magnetic field, in the negative z direction. There is another, more subtle
prediction. Because of the Lorentz contraction, the wires appear shorter, and so the
charge density on the wires is greater, and the electric field should be stronger.
So, we have this prediction for the transformation of electromagnetic fields: Suppose
that there is just an electric field present in S, in the positive y-direction. Then in the S'
frame, the field transformation should produce a magnetic field in the negative z-
direction, and a stronger electric field, still in the positive y-direction. Now we will
postulate a transformation law for electromagnetic fields, and see if this prediction is
fulfilled.
Here is the field-strength tensor of special relativity.
6a - 10
 0  Ex  Ey  Ez 
 
Ex 0  Bz By 
F   (6a-21)
 Ey Bz 0  Bx 
 
 Ez  By Bx 0 
The electric and magnetic fields are seen to be components of a four-tensor, rather than
forming parts of four-vectors. This may seem more complicated than necessary. As I.I.
Rabi said on hearing about the mu meson, who asked for that? Well, we know how to
transform a tensor into a moving frame of reference; and so we can see what a simple
electric field looks like to a moving observer.
The tensor transformation works just like with three-vectors and rotations R , except
that the Lorentz transformation matrix   plays the role for four-tensors that R played
for three-tensors. The electromagnetic fields as seen in the moving system are thus
 0  Ex  E y  Ez 
  
 Ex 0  Bz By 
F 

 E y Bz 0  Bx 
 
 Ez  By Bx 0 
      F 
 F  T
  0 0 0  Ex Ey  Ez     0 0
   Bz

By     
  0 0   Ex 0 0 0
 
 0 0 1 0   Ey Bz 0  Bx   0 0 1 0
    
 0 0 0 1   Ez  By Bx 0  0 0 0 1
  0 0    Ex  Ex Ey  Ez 
  
  0 0    Ex  Ex  Bz By 

 0 0 1 0    E y   Bz  E y   Bz 0  Bx 
  
 0 0 0 1    Ez   By  Ez   By Bx 0 
 0 Ex   2   2   E y   Bz  Ez   By 
 
 x
 E  2  2  0  E y   Bz  Ez   By  (6a-22)

  E y   Bz  E y   Bz 0  Bx 
 
  Ez   By  Ez   By Bx 0 
And finally, using the relation  2   2  1 , we have a grand result:
6a - 11
 0 Ex  E y   Bz  Ez   By 
 
Ex 0  E y   Bz  Ez   By
F   
  E y   Bz  E y   Bz 0  Bx  (6a-23)
 
  E z   By  Ez   By Bx 0 
Let's try to absorb this result. To start with, note that the matrix is still anti-symmetric;
this is a cross check on the algebra. Next, we see that the x-components of both E and
B , in the direction of the relative velocity, do not change. However, for the transverse
fields, they get all scrambled up. We can write the four non-trivial field transformation
equations like this:
E y   E y   Bz
Ez   Ez   By
(6a-24)
By   By   Ez
Bz   Bz   E y
We see that the transverse components of both fields get bigger. This is what we
predicted for the E field. And a bit of the other field, in the other transverse direction,
gets added on. Here is another simple check. Take the zero-velocity limit. Do you find
that the fields do not change?
Finally, let's consider the example above. We predicted that transforming a positive Ey
would give a negative Bz. Look at the fourth equation above: that is just what happens.
I. I. Rabi would love it.

Note on units: The form of F given is in Gaussian units. To use SI units, replace
Ei by Ei / c .
Problems
Problem 6a-1. The algebra of special relativity leans heavily on the following
dimensionless symbols:
v

c
1
 
1  2
  
Here v represents the velocity of one frame of reference with respect to the other.
(a) What are the limiting values of these three symbols as the velocity v approaches the
speed of light c?
(b) Calculate the value of
 2  2
The result should be independent of the velocity.
6a - 12
(c) The inverse of the Lorentz boost

   0 0 
 
    0 0 
 
 0 0 1 0
 
 0 0 0 1
is obtained by reversing the velocity v, which causes the change     . The result is
given above, in equation (6a-12). Carry out the matrix multiplication to demonstrate that
this works; that is, show by direct calculation that
 1  I
Problem 6a-2. The magnitude of the 4-position vector,

 ct   x2  y 2  z 2 ,
2
should be invariant under Lorentz transformation. Using the relations given in equation
(6a-11) above, calculate
 ct   x2  y 2  z 2
2
and see if the invariance works out.
Problem 6a-3. The energy of an object at rest is (famously) given by E  mc 2 , where m

(as always in our discussions) represents the object's rest mass. And, in the object's rest
frame, its four-momentum
 E 
 
  px c 
p  (general case)
 py c 
 
 Pz c 
becomes
 mc 2 
 
  0 
p  (in the particle's rest frame).
 0 
 
 0 
(a) Set the invariant length-squared of the first expression above equal to the invariant
length-squared of the second one. Solve for the object's energy E, in terms of its
momentum p and its rest-mass m. (The speed of light, c, will be there too.)
(b) Multiply the second four-vector by the Lorentz-transformation matrix  1 (that is,
transform it to a frame of reference moving backwards along the x-axis, with velocity
  ) and use the result to derive expressions for the object's energy and momentum as a
function of the relativistic velocity of the particle.
Problem 6a-4. The nearest star to our sun is about 3 light-years away. That is,
something traveling from Earth at speed v = c would take 3 years to get to the star,
according to observers in the Earth frame of reference. Consider a rocket, carrying an
6a - 13
astronaut, traveling to the star from Earth at speed . The time T to get there would be
3 years
T .

(a) How fast would the rocket have to travel, in m/sec, to get to the star within a
reasonable life expectancy of the astronaut, say T0 = 50 years? (Start by calculating the
value of  .) Note: here you can approximate   1 , so astronaut time and Earth time
will be about the same.
(b) Answer the same question about travel to Andromeda, 1,500,000 light-years away,
also in 50 years of astronaut time. Note: in calculating the travel time in the Earth frame
you can approximate   1 , so that the travel time in the Earth frame is about 1,500,000
years.
Problem 6a-5. Recent studies of distant supernovae played a central role in the
discovery of dark energy. One researcher (Prof. Gerson Goldhaber) pointed out that the
relativistic time dilation should be observable for the most distant galaxies, which are
moving close to the speed of light. This is because the decrease in the brightness of a
supernova over the first 100 days after the initial explosion is due to the decaying of the
isotope Fe58, which decays with a half-life of 20 days. According to the theory of special
relativity, this half-life should appear longer to us. (The half-life is the time for the decay
rate to decrease by a factor of 1/2.)
If the galaxy is moving with speed   0.8 , how long should it take for its light to
decrease in intensity by a factor of 1/2? By a factor of 1/128?
6a - 14
Chapter 7. The Wave Equation

The vector spaces that we have described so far were finite dimensional. Describing position in
the space we live in requires an infinite number of position vectors, but they can all be represented
as linear combinations of three carefully chosen linearly independent position vectors. There are
other analogous situations in which a complicated problem can be simplified by focusing attention
on a set of basis vectors. In addition to providing mathematical convenience, these basis vectors
often turn out to be interesting objects in their own right.
The pressure field inside a closed cylindrical tube (an ''organ pipe'') is a function of time and three
space coordinates which can vary in almost an arbitrary way. Yet there are certain simple patterns
which form a basis for describing all
the others, and which are
recognizably important and
interesting. In this case they are the
fundamental resonance of the organ
pipe and its overtones. They are the
''notes'' which we use to make music,
Figure 7-1. A pulse traveling down a telephone wire (solid
identified by our ears as basis vectors curve), and a reflected pulse traveling back (dashed curve).
of its response space. A similar
situation occurs with the vibrating strings of guitars and violas (or with the wires strung between
telephone poles), where arbitrary waves moving on the string can be represented as a superposition
of functions corresponding again to musical notes.
Another analogous situation occurs with quantum- mechanical electron waves near a positively
charged nucleus. An arbitrarily complicated wave function can be described as a linear
superposition of a series of ''states,'' each of which corresponds (in a certain sense) to one of the
circular Bohr electron orbits.
We will choose the stretched

string to examine in
mathematical detail. It is the
easiest of these systems to
describe, with a reasonable set
n=1
of simplifying assumptions.
The mathematical variety that it
offers is a rich preview of
mathematical physics in
general.
n=2
A. Qualitative properties
of waves on a string
Figure 7-2. Modes of resonance of a vibrating string.
Many interesting and
intellectually useful experiments can be carried out by exciting waves on a stretched string, wire or
7-1
cord and observing their propagation. Snapping a clothesline or hitting a telephone line with a rock
produces a pulse which travels along the line as shown in figure 7-1, keeping its shape. When it
reaches the point where the line is attached, it reverses direction and heads back to the place where
it started, upside down. The pulse maintains its shape as it travels, only changing its form when it
reaches the end of the string.
A string which has vibrated for a long time tends to "relax" into simpler motions which are
adapted to the length of the string. Figure 7-2 shows two such motions. The upper motion has a
sort of sinusoidal shape at any given moment, with a point which never moves (a node) at either
end of the string. If the string is touched with a finger at the midpoint, it then vibrates as
sketched in the lower diagram, with nodes at the ends and also at the touched point. It will be
noticed that the musical note made by the string in the second mode of vibration is an octave higher
than the note from the first mode.
Another interesting motion can be observed by stretching the string into the shape of a pulse like
that shown in figure 7-1, and then releasing it, so that the string is initially motionless. The pulse is
observed to divide into two pulses, going in opposite directions, which run back and forth along the
string. But after a time, the string "relaxes" into a motion like that in the upper part of figure 7-2.
How can a pulse propagate

without its shape changing?
Why does it reverse the
direction of its displacement
after reflection? Why does the
guitar string vibrate with a
sinusoidal displacement? Why
does the mode with an
additional displacement node
vibrate with twice the
frequency? Why does a
traveling pulse relax into a
stationary resonant mode?
We will try to build up a
mathematical description of
waves in a string which
predicts all these properties.
B. The wave equation. Figure 7-3. Forces exerted on an element of string.
The propagation of a wave disturbance through a medium is described in terms of forces exerted
on an element of mass of the medium by its surroundings, and the resulting acceleration of the mass
element. In a string, any small length of the string experiences two forces, due to the rest of the
string pulling it to the side, in the two different directions. Figure 7-3 shows an element of the
string of length dx, at a point where the string is curving downwards. The forces pulling on each
end of the string are also shown, and it is clear that there is a net downwards force, due to the string
tension. We will write down Newton's law for this string element, and show that it leads to the
7-2
partial differential equation known as the wave equation. But first, we will discuss a set of
approximations which makes this equation simple.
First, we will make the small-angle approximation, assuming here that the angle which the string
makes with the horizontal is small (  1 ). In this approximation, cos 1 and sin   .
Secondly, we will assume that the tension T is constant throughout the string. Two of these
assumptions ( cos 1 and T constant) result in a net force of zero in the longitudinal (x) direction,
so that the motion of the string is purely transverse. We will ignore a possible small longitudinal
motion and assume that it would have only a small effect on the transverse motion which we are
interested in.
In the transverse (y) direction, the forces do not cancel. The transverse component of the force on
the left-hand side of the segment of string has magnitude T sin . We will relate sin  to the slope
of the string, according to the relation from analytic geometry
dy rise
slope    tan  sin  . (7-1)
dx run
The last, approximate, equality is due to the small-angle approximation. Thus the transverse force
dy
has magnitude approximately equal to T , and Newton's second law for the transverse motion
dx
gives
Fy   dm  a
y y 2 y (7-2)
T T    dx  2
x x  dx x x t
Here we have used the linear mass density  (mass per unit length) to calculate dm =  dx.
Partial derivatives.
dy y
In the equation above we replaced the slope by , a partial derivative. We need to explain
dx x
briefly the difference between these two types of derivatives.
The displacement of the string is a function of two variables, the position x along the string, and
the time t. There are two physically interesting ways to take the derivative of such a function. The
rate of change of y with respect to x, at a fixed instant of time, is the slope of the string at that time,
and the second derivative is related to the curvature of the string. Similarly, the rate of change of y
with respect to time at a fixed position gives the velocity of the string at that point, and the second
time derivative gives the acceleration. These derivatives of a function with respect to one variable,
while holding all other variables constant, are referred to as partial derivatives.
The partial derivatives of an arbitrary function g(u,v) of two independent variables u and v are
defined as follows:
7-3
 g  g (u   u, v)  g (u, v)
   lim
 u v u  0 u
(7-3)
 g  g (u, v   v)  g (u , v)
   lim
 v u v0 v
The subscript indicates the variable which is held constant while the other is varied, and it can often
be left off without ambiguity. Second partial derivatives can be defined, too:
g g
 g
2 (u   u , v )  (u , v)
u u
 2   lim
 u v u 0 u
g g
 2 g  (u, v   v)  (u , v)
v v
 2   lim (7-4)
 v u v  0 v
g g
(u, v   v)  (u , v)
 g
2
 lim u u
uv  v 0 v
There is a fourth partial second-order partial derivative which we have omitted - but, for
mathematically well behaved functions, it can be shown that
2 g 2 g
 (7-5)
uv vu
The definitions give above are the formal definitions. Often in practice, however, taking a partial
derivative simply means taking the derivative with respect to the stated variable, treating all other
variables as constants.
Example. Calculate all the first and second partial derivates of the function g (u, v)  u 3v  sin u .
Solution: Evaluate the six partial derivatives. We will note in particular whether or not the two
cross partial derivatives come out to be equal, as they should.
g  3
  u v  sin u   3u 2v  cos u (a )
u u
g  3
  u v  sin u   u 3 (b)
v v
 2 g   g  
     3u 2v  cos u   6uv  sin u (c)
u 2 u  u  u
(7-6)
 2 g   g   3
    u   0 (d )
v 2 v  v  v
2 g   g  
     u 3   3u 2 (e)
uv u  v  u
2 g   g  
     3u 2v  cos u   3u 2 (f)
vu v  u  v
7-4
2 g 2 g
Check the two cross partial derivatives. Sure enough,  .
uv vu
Wave velocity.
Now we will work out the partial differential equation resulting from Newton's second law.
Starting with equation 7-2, we get
y y

x x  dx x x   2 y
 (7-7)
dx T t 2
But we recognize the left-hand side as just the definition of the second partial derivative with
respect to x, provided we let dx be arbitrarily small:
 dy dy 
 dx 
 dx x   2 y
lim x  dx
 (7-8)
dx 0  dx  x 2
 
 
And so we get the differential equation for the string, also known as the wave equation:
2 y  2 y
 (7-9)
x 2 T t 2
It is easy to see that the dimensions of the two constants must be such that a characteristic velocity
v can be defined as follows:
T
v (7-10)

In terms of this characteristic velocity, the differential equation becomes
2 y 1 2 y
 2 2 The Wave Equation (7-11)
x 2
v t
Now we must find the solutions to this partial differential equation, and find out what it is that
moves at speed v.
C. Sinusoidal solutions.
Since we associate the sinusoidal shape with waves, let's just try the function
y( x, t )  A sin  kx  t  (7-12)
At a fixed time (say, t = 0), this function (shown in figure 7-4) describes a
7-5
Figure 7-4. A sinusoidal wave, at t = 0 and at a later time such

that t = 1.
sinusoidal wave, sin kx , which repeats itself after the argument increases by 2. The distance over
which it repeats itself after the argument increases by 2 is called the wavelength . It is thus
related to k as follows:
2
k   2  k (7-13)

Similarly, if we observe the motion at x = 0 as a function of time, it repeats itself after a time
interval T, as the argument increases by 2. This means
2
T  2   (7.14)
T
Here  is the angular frequency, in radians per second, related to the frequency f (in cycles per
second, or Hz) by   2 f . This leads to the set of relations for sinusoidal waves,
2
k

2
 (7-15)
T
 1
f  
2 T
We will now see if this sinusoidal function is a solution to the wave equation, by substitution.
2 y 1 2 y

x 2 v 2 t 2
2 1 2
A sin  kx   t   A sin  kx  t  (7-16)
x 2 v 2 t 2
k 2 sin  kx  t   2   2  A sin  kx  t 
1
v
7-6
The factor of sin  kx  t  cancels out on both sides, and it is clear that the sine wave is a solution
to the wave equation only if k and  are related in a certain way:
 
 v (7-17)
k T
If we adopt this relation, we can write the sine wave in the following equivalent ways:
x t 
sin  kx  t   sin 2     sin k  x  vt  (7-18)
 T 
Example: Consider transverse waves on the lowest string of a guitar (the E string). Let us
suppose that the tension in the string is equal to 300 N (about 60 lbs), and that the 65-cm-long
string has a mass of 5 g (about the weight of a nickel). The fundamental resonance of this string
has a wavelength equal to twice the length of the string. Calculate the mass density of the string
and the speed of transverse waves in the string. Also calculate f,  and k for this resonance.
Solution: The mass density is

m

L

 0.005 kg  (7-19)
 0.65 m 
 0.00769 kg/m
The wave speed is then
T
v


 300 kg-m/s  2
(7-20)
 0.00769 kg/m 
 197.5 m/s
For a wavelength of  = 1.3 m we get

2 2
k   4.833 m -1
 1.3 m 
  kv   4.833 m -1  197.5 m/s   954 rad/sec (7-21)

f   152 Hz
2
Is this right for an E string? Using a frequency of 440 Hz for A above middle C, an A two octaves
down would have a frequency of (440 Hz)/4 = 110 Hz, and the E above that note would be at f =
(110 Hz)*1.5 = 165 Hz. [An interval of an octave corresponds to a factor of two in frequency; a
fifth, as between A and E, corresponds to a factor of 1.5.]
So, the frequency of the E string is a bit low. What do you do to correct it? Check the equations
to see if tightening the string goes in the right direction.
7-7
Example. Water waves are a type of transverse wave, propagating along the water surface. A
typical speed for waves traveling over the continental shelf (water depth of 40 m) is v = 20 m/s. If
the wavelength is  = 220 m, find the frequency and period with which a buoy will bob up and
down as the wave passes. Also calculate  and k.
Solution. With the velocity and wavelength known, the period of the oscillation and the
frequency can be calculated:
  220 m 
T   11 sec
v  20 m/s 
(7-22)
1
f   0.091 Hz
T
If you want to see if this period is reasonable, go to the web site
http://facs.scripps.edu/surf/nocal.html and look at "the swells" off the coast right now.
And  and k.
  2 f  0.571 rad/sec
2 2 (7-23)
k   0.0286 m -1
 220 m
As a cross check, see if /k gives back the wave speed; this is an important relation.
 0.571 rad/sec
  20 m/s (7-24)
k 0.0286 m
It works out.
D. General traveling-wave solutions.

It is heartening to have guessed a solution to the wave equation so easily. But how many other
solutions are there, and what do they look like? Is a sine wave the only wave that travels along at
speed v, without changing its form?
In fact, it is easy to see that a huge variety of waves have this property. [However, later on we
will see that the wave keeps its shape intact only if the wave speed does not depend on the
wavelength.] Consider any function whose argument is  x  vt  ,
y ( x, t )  f ( x  vt ) (7-30)
We can take two partial derivatives with respect to x and two partial derivatives with respect to t,
using the chain rule, and see what happens.
 
f ( x  vt )  f   x  vt   x  vt 
x x (7-31)
 f   x  vt 
7-8
Here f' is the derivative of the function f with respect to its argument. Taking the second partial
derivative with respect to x gives
2 
f ( x  vt )  f   x  vt 
x 2
x

 f   x  vt   x  vt 
x
 f   x  vt 
(7-32)
Taking the time derivative is similar, except

that in place of  x  vt   1 , we have
x

 x  vt   v . So,
t
 
f ( x  vt )  f   x  vt   x  vt 
t t
 vf   x  vt 
(7-33)
and
Figure 7-5. Gaussian wave form, with  = 1.
2 
t 2
f ( x  vt ) 
t
  vf   x  vt  

 vf   x  vt   x  vt 
t
 v f   x  vt 
2
(7-34)
Substituting into the wave equation shows it to be satisfied:
2 y 1 2 y

x 2 v 2 t 2
f   x  vt   2  v 2 f   x  vt  
1
(7-35)
v
 f   x  vt 
u2

As an illustration, let's consider the Gaussian function g (u )  e 2 2
shown in figure 7-5. We
can use this function to interpret the meaning of the parameter v in the wave equation. The peak of
the Gaussian occurs when the variable u is equal to zero. Suppose the argument of the function is x
- vt instead of u:
 x vt 2

g ( x  vt )  e 2
2
(7-36)
At t = 0, the peak of the function occurs when x = 0. But at a later time t = t, the peak as a
function of position will occur at a value x such that the argument x  vt vanishes; this gives
x
x  vt  0   v  wave velocity (7-37)
t
7-9
Thus, the velocity with which the peak of the Gaussian appears to move is just the constant v which
occurs in the wave equation. From now on, we will refer to it as the wave velocity.
E. Energy carried by waves on a string.

One of the paradoxical aspects of wave motion has to do with the question of what exactly about
the wave is moving. The string itself does not move in the direction of wave propagation. It seems
to be just the shape, or the profile made by the displaced string, which moves. But it can be seen
that there is energy in the moving string, and an argument can be made that energy is transported
by the wave, in the direction of the wave propagation.
Kinetic energy.
It is rather clear that there is kinetic energy due to the transverse motion associated with the
displacement y  x  vt  . Referring to figure 7-6a, we can see that the kinetic energy of the
string between x and x+dx is given by
 y ( x, t ) 
2
1
d  KE   dm  
2  t 
(7-38)
 y ( x, t ) 
2
1
  dx  
2  t 
Potential energy.
The potential energy due to the displacement of the string from the equilibrium position is equal
to the work done in stretching an undisplaced string out to the position of the string at a given
instant of time. Figure 7-6b shows how this calculation proceeds. We will write the displacement
of the string at this instant of time as y(x,t0), where the subscript on t0 reminds us that this time is
held fixed. We imagine exerting at every point on the string just the force necessary to hold it in
position. Then gradually the string is stretched from a straight line out to its shape y(x,t0) at time t0.
Let   x   a f  x, t0  be the displacement of the string during this process. As the parameter 
goes from 0 to 1, the string displacement goes from   x   0 to   x   y( x, t0 ) .
In deriving the wave equation we calculated the force on a length dx of the string, due to the
tension T and the curvature of the string:
 y y 
Fy  T   
 x x  dx x x 
 y y 
  
x x x 
 Tdx  x  dx
(7-39)
dx
2 y
 Tdx
x 2
7 - 10
Figure 7-6. Graphs of a traveling wave. The upper curve (a) shows an element of mass on the string and its
transverse velocity. The lower curve (b) shows the virtual process of stretching a string from the equilibrium
position to its actual position at an instant of time, to facilitate calculating the work done in this virtual process.
2 y
When the displacement is equal to  y ( x, t0 ) , the force will be equal to  Tdx . We now
x 2
calculate the work done on length dx of the string as  goes from 0 to 1.
1
W   Fy  x,   d
 0
1
  Fy  x.  y ( x, t0 )d
 0

1
2 y 
 
 0 
 Tdx  yd
x 2 
(7-40)
2 y
1
 Tdx y 2   d
x  0
1 2 y
  T y 2 dx
2 x
Thus we have, for a general wave motion y(x,t),
7 - 11
1  y 
2
KE per unit length =   

2  t 
(7-41)
1 2 y
PE per unit length = - T y 2
2 x
Example. Calculate the kinetic, potential and total energy per unit length for the sinusoidal
traveling wave
y( x, t )  A sin  kx  t  (7-42)
Solution. We will use the relations (7-33). The KE per unit length is
1 
2

KE per unit length    A sin  kx   t  
2  t  ; (7-43)
1
  A cos  kx   t 
2 2 2
2
and the PE per unit length is
1 2 y
PE per unit length   Ty 2
2 x
 2 
  T  A sin  kx  t    2 A sin  kx  t  
1
2  x 
  T  A sin  kx  t    k 2 A sin  kx  t   ;
1
(7-44)
2
1
 Tk 2 A2 sin 2  kx  t 
2
1
  2 A2 sin 2  kx   t 
2
T
In the last step we have used the relations   kv (general for all sinusoidal waves) and v 

(waves on a string). Note the similarity of the expressions for the two types of energy. The only
difference is the presence of the factor of cos 2  kx  t  in one case, and sin 2  kx  t  in the
other. If we use the well-known property that either sin2 u or cos2 u averages to 1/2 when averaged
over a large number of cycles, we see that the average energy per unit length is the same in the two
modes of energy storage, and we have
1
KE per unit length time  PE per unit length time  Tk 2 A2 . (7-45)
4
Here the brackets time represent an average over time. The equality of the average energy in
these two types of energy is an example of a situation often found in physics where different
"degrees of freedom" of a system share equally in the energy. Note that the average could have
been carried out over position x along the string rather than the time, and the result would have
been the same.
The total energy per length is constant, since sin 2  kx  t   cos2  kx  t   1 , for any time or
position along the string:
7 - 12
Total Energy per unit length  KE per unit length+PE per unit length
1 1 (7-46)
= Tk 2 A2 =  2 A2
2 2
Example. Consider the E string on a guitar as discussed in a previous example, with T = 300 N, 
= 0.00769 kg/m, k = 4.833 m-1, and  = 954 rad/sec. Take the amplitude of the string's deflection
to be A = 1 cm, and calculate the average energy density for kinetic and potential energy, and the
total energy in the string.
Solution. Just plug into the previous relations.

Average KE per unit length  Average PE per unit length
1
=  2 A2
4 (7-47)
1
=  0.00769 kg/m  954 rad/sec   0.01 m 
2 2
4
 0.175 J/m
The total energy density is twice this, giving for the total energy
total energy   total energy density    length of string 
  0.350 J/m    0.65 m  (7-48)
=0.228 J
This isn't much. If the note dies away in one second, the maximum average audio power would be
a quarter of a watt. Aren't guitars louder than that? Oops - what's that great big black box on stage
beside the guitar?
F. The superposition principle.

There is a very important property of linear differential equations, like the wave equation, called
the superposition principle, which can be stated as follows for the wave equation:
The Superposition principle. Suppose that f1  x, t  and f 2  x, t  are two solutions to the
wave equation as given above. Then any linear combination g  x, t  of f1 and f2, of the
form
g ( x, t )   f1  x, t    f 2  x, t  , (7-49)
where  and  are arbitrary constants, is also a solution to the wave equation.
This is pretty easy to prove. First we calculate the partial derivatives:

2 g 2

x 2 x 2
 f1  x, t    f 2  x, t  
(7-50)
 2 f1 2 f2
 2 
x x 2
and
7 - 13
1 2 g 1 2

v 2 t 2 v 2 t 2
 f1  x, t    f 2  x, t  
(7-51)
1  2 f1 1 2 f2
 2  
v t 2 v 2 t 2
Now substitute into the wave equation, in the form
2 y 1 2 y
 0 (7-52)
x 2 v 2 t 2
For g  x, t  this becomes
2 g 1 2 g
 0 ?
x 2 v 2 t 2
(7-53)
 2 f1  2 f 2  1  2 f1 1 2 f2 
 2   2   2 2   2 2   0 ?
x x  v t v t 
Regrouping, we have
2 f 1 2 f 2 f 1 2 f
 21   2 21   22   2 22  0 ?
x v t x v t
(7-54)
  f1 1  f1 
2 2
  f2 1 2 f2 
2
  2  2 2     2  2 2   0 ? YES!!!
 x v t   x v t 
Since f1 and f2 each satisfies the wave equation, the two expressions in parentheses vanish
separately, and the equality is satisfied.
This principle gives us a great deal of freedom in constructing solutions to the wave equation. A
very common approach to wave problems involves representing a complicated wave solution as the
superposition of simpler waves, as illustrated in the next section.
G. Dispersion; Group and phase velocity.

The wave equation with constant coefficients as given by equation (7-11) has traveling-wave
solutions which all travel at the same speed. This is true for sinusoidal solutions (equation (7-12))
or for arbitrarily shaped pulses (equation (7-30)). A wave medium where this is the case is referred
to as a "non-dispersive" medium, for the following reason: while sinusoidal waves propagate
without changing their shape, wave pulses do not; in general, as a wave pulse propagates, it
changes its shape, in most cases getting broader and broader as it travels. This broadening is
referred to as dispersion.
In a dispersive medium, there are two important velocities to be considered. The "phase
velocity," which we have represented with the symbol v, is the velocity of a sinusoidal wave of a
certain frequency or wavelength. The other velocity is the "group velocity," which we will denote
by u. The group velocity has the following definition, which will really only be clear after we have
discussed Fourier series and transforms. A pure sinusoidal solution to the wave equation represents
a disturbance which is completely delocalized, in the sense that it has the same amplitude for all
positions on the string and for all times. If one wanted to represent a short light signal from a laser,
for instance, as used in communications, it would make sense to use a modified wave which has a
7 - 14
beginning and an end. In order to do this, a linear superposition of sinusoidal waves of different
wavelengths can be used. However, this means superposing waves traveling at different velocities!
If they are initially lined up so as to add up to give a narrow pulse, or "wave packet," it makes sense
that after some time elapses they will no longer be properly aligned. The result of such a process of
superposition, however, is that the wave packet moves at a velocity which is entirely different from
the velocity of the component sinusoidal waves!
u (the group velocity)  v (the phase velocity) (7-55)
This remarkable fact has to be seen to be believed. There is a rather simple example of
superposition of waves which illustrates it. Let's superimpose two sinusoidal waves of equal
amplitude A, and slightly different wave numbers k1  k0  k and k2  k0  k . that is, the two
wave numbers are separated by an amount 2k , and centered about the value k0. We will assume
that we know the "dispersion relation:"
   (k ) the dispersion relation (7-56)
For non-dispersive media where v is a constant, this is a very simple linear relation,   kv ,
obtained from equation (7-17). However, if v is a function of k, the relation becomes more
complicated. In any event, this relation allows the angular frequencies 1  0   and
2  0   to be determined from k1 and k2. We now carry out the superposition.
y  x, t   A sin  k1 x  1t   A sin  k2 x  2 t 
(7-57)
 A sin   k0 x  0 t    kx  t    A sin   k0 x  0 t    kx  t  
Now we use the trigonometric identities
sin  a  b   sin a cos b  cos a sin b
sin  b    sin b (7-58)
cos  b   cos b
to obtain the result
y  x, t   A sin   k0 x  0 t    kx  t    A sin   k0 x  0 t    kx  t  
 A sin  k0 x  0 t  cos  kx  t   A cos  k0 x  0 t  sin  kx  t 
(7-59)
 A sin  k0 x  0 t  cos  kx  t   A cos  k0 x  0 t  sin  kx  t 
 2 A sin  k0 x  0 t  cos  kx  t 
Here is the interpretation of this result. The first sinusoidal factor, sin  k0 x  0 t  , represents a
traveling sinusoidal wave, with velocity

v 0 (7-60)
k0
The second factor is a slowly varying function of x and t which modulates the first sine wave,
turning it into a sort of wave packet, or rather a series of wave packets. This is illustrated
graphically in figure 7-7.
7 - 15
Figure 7-7. Motion of a wave packet with the dispersion relation for deep-water gravity waves,
  gk . The two plots show the waveform at t = 0 and at a later time, one period of the carrier
later. It can be seen that the carrier travels faster than the envelope, as predicted for this
Here dispersion
we have relation.
plotted the solution given in equation (7-59), using the dispersion relation for deep-
water gravity waves,
  gk , (7-61)
where g is the acceleration of gravity, for two different instants of time. Note that the displacement
of the envelope is different from the displacement of the carrier at the central frequency. In this
picture, the lobes produced by the envelope are thought of as the wave packets, and the rate of
displacement of the envelope is interpreted as the group velocity. From the form of the envelope
function,
yenvelope  cos  k x   t  , (7-62)
we can see how fast it propagates. For a sinusoid with argument kx - t, the propagation speed is
equal to /k. So, for the envelope, we get

venvelope  u. (7-63)
k
We will interpret this ratio as the derivative of  with respect to k. So, our final result for phase
and group velocities is
7 - 16
   k  dispersion relation

v phase velocity (7-64)
k
d
u group velocity
dk
Example. For deep-water gravity waves such as storm waves propagating across the pacific
ocean, the velocity of sinusoidal waves is given by
g
v . (7-65)
k
Find the dispersion relation   k  , and calculate the group velocity u. For waves of wavelength
200 m, calculate the phase and group velocities.
Solution. From the definitions of  and k we know that

 
v  .
T k
Solving for omega and using equation 7-65 gives
  kv
g
k .
k
 kg the dispersion relation
The group velocity is obtained by taking a derivative:
d d 1 g
u  kg  .
dk dk 2 k
Note that the group velocity is exactly equal to half of the phase velocity.
For a wavelength of =200 m, the wave number is

2 2
k   0.0314 m-1 .
  200 m 
The phase and group velocities are then
v
g

 9.8 m/s 
2
 17.7 m/s ( 35 mph)

k  0.031415 m  -1
.
1
u= v  8.83 m/s
2
[Just in case you ever need it, the general expression for the velocity of gravity waves in water of
depth h is

shallow water
 gh
g
v tanh  kh  g ,
k 
deep water

k
reducing correctly to the limiting forms for deep and shallow water given here and in section C
above.]
7 - 17
Problems
Problem 7-1. The derivative as a limit. To illustrate the derivative as the limit of a rate of
1
change, consider the function f (t )  at 2 .
2
df
(a) Use the standard rules for taking derivatives (learned in your calculus course) to calculate .
dt
(b) Now use the definition
df f (t  t )  f (t )
 Lim
dt t 0 t
df
to calculate . You will need to make an approximation which is appropriate when t is small.
dt
Problem 7-2. Calculation of partial derivatives. In the functions below, x, y, z, and t are to be
considered as variables, and other letters represent constants.

(a) 2axy 2
x

(b) 2axy2
y
2
(c) 2axy 2
x 2
2
(d) 2axy2
y 2
2
(e) 2axy2
xy

(f) A sin(kx  t )
x

(g) A sin(kx  t )
t
2
(h) A sin(kx  t )
x 2
(i)  B exp  z  c2  
 2

z  
 2d 
Problem 7-3. Solution to the wave equation. Show by direct substitution that the following
functions satisfy the wave equation,
2 1 2
f ( x , t )  f ( x, t ) .
x 2 v 2 t 2
7 - 18

You should assume that the relation v  holds.
k
(a) f ( x, t )  A cos k x  vt 
(b) f ( x, t )  C sin kx cost
Problem 7-4. Derivation of the wave equation for sound. Waves traveling along a tube of gas
can be pictured as shown in the diagram. The small rectangle shows an element of gas, which is
Figure p7.4. Forces on an element of gas in an "organ pipe," leading to the wave motion in the
gas.
considered to be at position x in the absence of any wave motion. As a wave passes, the
element of gas is displaced by an amount y(x,t), which depends on time and on the position x. The
motion of the element of gas is determined by the pressure on either side, as shown in the lower
part of the diagram.
(a) There will be no net force on the element of gas as long as the pressure is the same on both
sides. However, if there is a variation of pressure with position, there will be a net force F on the
element of gas, of width x , where the force is positive on the left-hand side of the gas element
and negative on the right-hand side, and follows the general relation F  PA . The force will thus
be
F  forceon left - hand side  force on right - hand side .
Show that this force is given by
P
F  A x .
x
7 - 19
(b) We will assume that the pressure in the gas varies by only a small amount from p0, the
ambient background pressure (''atmospheric pressure''):
P  p 0  p ( x, t ) .
The variation of p(x,t) with position is in turn related to the displacement of the gas, through the
change in the volume V of the element of gas. A uniform displacement of the gas such that every
part of the gas moves by the same amount does not change the volume of a gas element. However,
if y is a function of x, there is a change in volume. We can use the adiabatic gas law,
PV   constant , to relate the small pressure variation p to the corresponding change in volume of
the element of gas:
p
dP   0 V  p( x, t ) .
V0
Here p0 and V0 are the pressure and volume of the element of gas in the absence of a wave, and  is
a constant related to the number of degrees of freedom of the gas molecules (air = 5/3). In this
case, the change in the volume of the element of gas is V  A yx  x   y(x) . Show that this
relation and the relation from the adiabatic gas law lead to the partial-derivative expression
y
p  p 0 .
x
(c) Now combine the results for parts (a) and (b) and use Newton's 2nd law to find the wave
equation for the displacement y(x,t). Show that the wave velocity v can be given in terms of the
ambient pressure p0 and the gas density =m/V0 by
p 0
v .

Problem 7-5. Wave speed in telephone wires. Suppose someone wants to send telegraph signals
across the country using the transmission of transverse wave pulses along the wires, rather than
electrical signals. Make a reasonable estimate of the maximum tension in the wires and of their
mass density, and calculate the wave velocity. Assume that the waves run along the wires between
telephone poles, and that they pass right by each telephone pole without reflection. How long
would it take for such a signal to propagate from New York to San Francisco (about 5000 km)?
Problem 7-6. Energy density in a string. The expressions derived in the text for kinetic and
potential energy density are
1  y 
2
KE per unit length =   

2  t 
(7-41)
1 2 y
PE per unit length = - T y 2
2 x
It is a bit surprising that derivatives with respect to time and position enter in such a different way,
considering how symmetrical their role is in the wave equation. You can fix this up.
The total potential energy over the interval A < x < B is obtained by integrating the energy
density given above:
xB
 1 2 y 
PE between A and B =   - T y 2  dx
x A  2 x 
7 - 20
Do an integration by parts with respect to x, and show that an equally good expression for the
potential energy density is
1  y 
2
PE per unit length = T  

2  x 
You must explain why this is reasonable. Be careful about the evaluation at the endpoints that is
involved in integrating by parts.
Problem 7-7. Phase and group velocity for the deBroglie wave. In 1923 Louis deBroglie
advanced the idea that a free particle of mass m could be described by a wave function, of the form
px  Et
i
deB ( x, t )  Ae 
 Ae i kx t 
where  is Planck's constant, and we will take the non-relativistic forms for the momentum and
energy of the particle,
p  mv part ,
1 ,
E  mv part .
2
2
where vpart is the particle's velocity.
(a) Find expressions for k and  in terms of the particle velocity vpart.
(b) Show that the dispersion relation for this wave is

 2
   (k )  k .
2m
(c) Calculate the phase velocity and the group velocity for these waves and show how each one
relates to the particle velocity vpart.
(d) It is generally argued that the group velocity is the velocity with which the waves carry energy.
Do your answers to (c) support this argument?
Problem 7-8. Phase and group velocity for the deBroglie wave of a relativistic particle. In
1923 Louis deBroglie advanced the idea that a free particle of mass m could be described by a wave
function, of the form
p  x
i
 deB ( x, t )  Ae
Et  px
i
 Ae
 Aeit  kx 
where  is Planck's constant. Note that the phase of the particle, p  x  Et  px is a relativistic
invariant. We will take the relativistic forms for the momentum and energy of the particle,
7 - 21
E   mc 2 ,
,
p   mc 2 .
1 v
where   ,    , and   , with v the particle velocity.
1  2 c
(a) Find expressions for k and  in terms of ,  and .
(b) Show that the dispersion relation for this wave is

2
 mc 2 
   (k )  k  
2 2 2
 .
 
 d
(c) Calculate the phase velocity and the group velocity for these waves. (It will be
k dk
easiest to work with the results of part (a).) The result, in terms of  , is very simple and
surprising.
(d) Particles are not supposed to go faster than the speed of light. What do your results of part (c)
say about this?
7 - 22
Chapter 8. Standing Waves on a String

The superposition principle for solutions of the wave equation guarantees that a sum of waves,
each satisfying the wave equation, also represents a valid solution. In the next section we start with
a superposition of waves going in both directions and adjust the superposition to satisfy certain
requirements of the wave's surroundings.
A. Boundary conditions and initial conditions.

The wave equation results from requiring that a small segment of the string obey Newton's
second law. This is not sufficient to completely specify the behavior of a given string. In general,
we will find it necessary to specify initial conditions, given at a particular time, and boundary
conditions, given at particular places on the string. These conditions determine which of the
manifold of possible motions of the string actually takes place.
The wave equation is a partial differential equation, and is second order in derivatives with
respect to time, and second order in derivatives with respect to position. In general, a second-order
differential equation requires two side conditions to completely determine the solution. For
instance, the motion of a body moving in the vertical direction in the Earth's gravitational field is
not determined until two conditions, such as the initial position and initial velocity, are specified. It
is possible to vary the conditions - for instance, the velocity specified at two different times can
replace position and velocity specified at t = 0.
In the case of the wave equation, to determine the time dependence two conditions must be given,
at a specified time and at all positions on the string. For instance, for a plucked guitar string, the
initial conditions could be that initially the string has zero velocity at all points, and is displaced in
a triangle waveform, with the maximum displacement at the point where it is plucked. Two
additional conditions, the boundary conditions, are required to determine the spatial dependence of
the solution. Each condition specifies something about the displacement of the string, at one
particular point and for all time. For instance, for the guitar string, the displacement of the two
endpoints of the string is required to be zero for all time.
String fixed at a boundary.
A very important type of boundary condition for waves on a string is imposed by fixing one point
on the string. This is usually a point of support for the string, where the tension is applied.
Imagine a traveling-wave pulse like that shown in figure 8-1, traveling from left to right and
approaching a point of attachment of the string, where it cannot move up and down as the wave
passes. The shape of this pulse obviously has to change as it passes over this fixed point,. But the
wave equation says that this pulse will propagate forever, without changing its direction or shape.
How can we get out of this impasse?
The answer is shown in figure 8-1. Here the string occupies the x < 0 part of space, and is
attached to a wall at x = 0, imposing the boundary condition
y ( x  0, t )  0 (8.1)
8-1
The pulse traveling to the right can be represented by the function

yright ( x, t )  f ( x  vt ) , (8.2)
where the functional form f(u) determines its shape, and the argument x-vt turns it into a wave
traveling to the right. It is clear that for times when the pulse overlaps the fixed point of the string,
yright alone is not the correct solution to the wave equation, since it does not vanish at the point
where the string is attached.
It is the superposition principle that saves us. The solution is illustrated graphically in figure 8-1,
Figure 8-1. A wave pulse traveling from left to right has just started to impinge on a fixed point
of the string. The condition that y = 0 at the fixed point is satisfied by the linear superposition of
an inverted pulse traveling in the opposite direction.
where a second pulse is shown, identical in shape to the first but (a) inverted, and (b) traveling in
the opposite direction, from right to left. The figure shows the first pulse disappearing behind the
wall, a region which we call "ghost space," and the second emerging from behind the wall, coming
out of ghost space. The part of space with x > 0 is not really part of the problem - there is no string
there. But for visualizing this problem it is helpful to imagine an invisible continuation of the
string with positive x. We can then picture the erect and inverted pulses both propagating on an
infinite string, of which we only see the part with x < 0. If the inverted pulse arrives at just the right
time, its negative displacement cancels the positive displacement of the erect pulse, and the
resulting zero displacement satisfies the boundary condition for the string.
8-2
How do we write this solution mathematically? If yright  f ( x  vt ) represents the shape f(u),
right side up and traveling to the right, with velocity v, then yleft   f ( x  vt ) represents the same
shape f(u), inverted and traveling with velocity -v, to the left. So, a possible solution to the wave
equation which satisfies the boundary condition at the fixed end is
y( x, t )  yright  yleft
, (8.3)
 f ( x  vt )  f ( x  vt )
Here is how you convince yourself that this is the solution we want. Suppose that the pulse shape
f(u) has its peak at u = 0, and vanishes except when u is fairly close to zero. Now consider the
solution y(x,t) given above, for large negative times. Each term is zero except when its particular
argument is near zero. So, the first pulse will be centered at a large negative x, in the "real world"
part of the string, and the second pulse will be centered at large positive x, out of sight in the "ghost
world." Thus, the initial conditions for this solution are that, at some large negative time, there is a
pulse of shape f(u), with transverse velocity such that it travels to the right. Next, consider the
solution for large positive times. Now yright peaks at positive x, out of sight in the "ghost world,"
and yleft peaks at negative x, where we can see it. Finally, check to see that the boundary condition
is satisfied:
Figure 8-2. A boundary between two parts of a string with different wave propagation velocities.
Shown are a wave incident from the left, a transmitted wave propagating to the right, and a
reflected wave propagating to the left.
8-3
y ( x  0, t )  yright ( x  0, t )  yleft ( x  0, t )
 f (vt )  f (vt ) , (8.4)
0
The form of the solution is such that at the point x = 0 the displacement vanishes, for all times.
Boundary between two different strings.
We can phrase a more general boundary condition for a string. Suppose that at x = 0 the string
attaches, not to a wall, but to another string with a different mass density. There will in general be
both a reflected pulse and a pulse transmitted across the boundary. The situation is shown in figure
8-2, with incident and reflected waves in region 1 and a transmitted wave in region 2. The mass
T
density in region 1 is 1, corresponding to wave speed v1  , and in region 2 a mass density of
1
T
2 gives v2  . The waves in region 1 must travel at the same speed, in order that the
2
emerging wave comes out at the same rate that the disappearing wave comes in. Thus in region 1
we will use the function
y1 ( x, t )  Af  x  v1t   Bf   x  v1t  x  0 (region 1) (8.5)
In region 2, the wave must have a different shape; if it travels slower, it must appear shorter when
plotted as a function of position, so that it emerges into region 2 during the precise time interval
when the incoming wave of region 1 disappears. This means that we have to use the function
v 
y2 ( x, t )  Cg ( x  v2 t )  Cf  1  x  v2 t   x  0 (region 2) (8.6)
 v2 
This is an awkward-looking function; however, notice that
(a) the functions used for y1 are solutions of the wave equation traveling with speed v1 and
the function used for y2 is a solution of the wave equation traveling with speed v2.
(b) If we set x to zero, all of the functions used have a common factor of f(-v1t).
We will now use these properties to match boundary conditions at x = 0.
The boundary conditions at a boundary between two regions of the string with different
propagation speeds are:
Boundary Conditions for Waves on a String
y1 ( x  0, t )  y2 ( x  0, t ) continuity of the displacement (8.7)
y1 y
 2 continuity of the slope
x x 0 x x 0
The reason for having the displacement the same on each side of the boundary is obvious - the two
strings are attached to each other. The reason for the slopes being equal can be understood by
considering the forces acting at the point where the two strings join (x = 0). The situation is similar
to that shown in figure 7-3; the magnitude of the force exerted from the right-hand side and from
the left-hand side is the same, but if the slopes are not the same, there will be a net transverse force.
For a finite difference in slope, there would be a finite force acting on an infinitesimal element of
string, giving it an infinite transverse acceleration. This is not possible; if there were a finite
8-4
transverse force, the string would just move quickly up or down until the slopes equalized. Thus
the slope may be assumed to be continuous across the boundary.
Applying the boundary conditions gives
y1 ( x  0, t )  y2 ( x  0, t )
Af (v1t )  Bf (v1t )  Cf (v1t ) (8.8)
A B  C
and
y1 y 2

x x 0
x x 0
v1
Af (v1t )  Bf  v1t   Cf  v1t  (8.9)
v2
v1
A B  C
v2
We can add equation 8-8 to equation 8-9, eliminating B, and solve for the "transmission
coefficient" T  C/A. Similarly one can eliminate C and find the "reflection coefficient" R  B/A.
The results are:
C 2v2
T 
A v2  v1
(8.10)
B v2  v1
R 
A v2  v1
Example. Consider the limiting case where the second string, in region 2 with x > 0, has a much
higher mass density than the string in region 1. Show that this leads to the result that we found
above for a string fixed at one end.
T
Solution. In the case where 2 1 , the relation v  tells us that v2 v1 . In this limit, the

transmission and reflection coefficients become
2v2
T 0
v1  v2
(8.11)
v1  v2
R  1
v1  v2
That is, there is no transmitted wave, and the reflected wave is inverted and of the same magnitude
as the incident wave. This is the result that we found above for simple pulse reflection from a fixed
end.
8-5
B. Standing waves on a
string.
Now we are ready to take on one of
the famous problems in physics - the
vibrations of a string held fixed at
both ends. The most familiar
application of this theory is to the
stringed musical instruments, but the
mathematical patterns displayed here
are applied very widely in physics.
the problem is stated pictorially in
figure 8-3.
The string will be described by a
wave function y  x, t  which must
satisfy the wave equation for all x in
Figure 8-3. A string stretched between two fixed points, at x = 0 and
the interval (0,L), and which must x = L.
obey the boundary conditions
y  x  0, t   0
, (8.12)
y  x  L, t   0
There are many such functions. The irregularly wiggling profile shown in figure 8-3 is a possible
"snapshot" of the position of the string at one instant of time - if you held the string in that position
and let it go, it would certainly do something! We are going to look however for certain special
solutions, called standing waves, of the form
y  x, t   X  x  cos t . (8.13)
This represents a wave which always has the same shape, determined by the function X(x) of
position only, with its time variation restricted to a sinusoidal modulation of this shape. Solutions
like this where everything varies together sinusoidally with time are sometimes referred to as
"normal modes." They are a sort of continuous version of the normal modes for vibrating and
oscillating systems which we discussed in Chapter 4.
Now we substitute this form into the wave equation:

 2  X  x  cos t  1  2  X  x  cos t 
 2
x 2 v t 2
d2X
cos t 2  2 X   2 cos t  .
1
(8.14)
dx v
d2X 2
  X  k 2 X
dx 2 v2
The partial differential equation has turned into an ordinary differential equation, thanks to writing
y(x,t) as a product of a function of x and a function of t. [You may recognize this from a course on
differential equations as the method of separation of variables. It makes it easy to find a variety of
solutions to the wave equation. It is a trickier matter to convince yourself that you find all the
solutions this way.]
8-6
Equation 8-14 is fairly easy to solve:

X  x   A sin kx  B cos kx . (8.15)
giving
y  x, t   X  x  cos t   A sin kx  B cos kx  cos t . (8.16)
How can we make this solution satisfy the boundary conditions, equation 8-12, for all values of t?
The only way to make the function vanish at x = 0 is to take B  0 . The solution is now
y  x, t   A sin kx cos t . (8.17)
To make this vanish at x = L requires the sine to vanish there:
sin kL  0
n . (8.18)
 k  kn  , n = 1, 2, 3, . . . .
L
Note that choosing fixed values of k also chooses corresponding values of , via the relation
  kv . This is where the resonant frequencies of the string are determined!
Thus, the solutions of the type Asin kx cos t satisfying the boundary conditions are
 
 yn  x, t   sin  kn x  cos nt   n 
 
k  n  
 n L 

 1 

n  2 L , n = 1, 2, 3, . . . . . (8.19)
 n 
 v 
n  n L 
 
f n v 


n
2L 

The functions n are the normal modes for the vibrating string, representing the special shapes of
the string which oscillate in time without changing their form. They are described in figure 8-4,
where the envelope function sin kn x is plotted for increasing values of n.
8-7
Figure 8-4. The normal modes of a vibrating string.

.
Problems
Problem 8-1. Reflection and transmission coefficients for sinusoidal waves. Consider the
problem discussed in the text of a sinusoidal wave propagating down a string from left to right, as
shown in figure 8.2. The string changes its mass density at x = 0, from a value 1 to a value 2, with
a corresponding change in propagation velocity, from v1 to v2. There is a reflected wave which
travels back along the left-hand string, in the opposite direction but with the same velocity, and a
transmitted wave, which continues to the right with velocity v2. Take the waves on the string to be
sinusoidal traveling waves, given by
y1 ( x, t )  A cos(k1x  t )  B cos(k1x  t ), x0
y2 ( x, t )  C cos(k2 x  t ), x0
Note that the wave number k   / v is different in the two parts of the string, but that the
frequency  is the same.
Use the conditions of continuity of the displacement and its derivative (eqn (8.7)) to derive
the expressions for the reflection and transmission coefficients T=C/A and R=B/A in terms of v1
and v2, as given in equation (8.10). [The relations in equation (8-10) were derived for the more
8-8
general case of an arbitrary traveling wave. You are checking them for the special case of a
sinusoidal wave.]
Problem 8-2. In section A above we considered the case of a wave incident on the boundary
between strings of different mass densities, in the limit where the second string was much heavier
than the first ( 2 1 ). Carry out the same sort of analysis of the limiting case where the second
string is much lighter ( 2 1 ). Describe what the reflection of a pulse traveling down the string
would look like in this limiting case.
8-9
Chapter 9. Fourier Series
A. The Fourier Sine Series

The general solution. In Chapter 8 we found solutions to the wave equation for a string fixed at
both ends, of length L, and with wave velocity v,
 
 yn  x, t   An sin  kn x  cos n t 
 
k  n  
 n L 

 1 

n  2 L , n = 1, 2, 3, . . . . . (9-1)
 n 
 v 
n  n L 
 
f n v 


n
2L 

We are now going to proceed to describe an arbitrary motion of the string with both ends fixed in
terms of a linear superposition of normal modes:

n x general solution, string 
y  x, t    An sin cos n t   , (9-2)
n 1 L fixed at both ends 
n v
where n  and the coefficients An are to be determined. Here are some things to be noted:
L
(1) We are assuming for the moment that this equation is true - that is, in the limit where
we include an infinite number of terms, that the infinite series is an arbitrarily good approximation
to the solution y  x, t  .
(2) Each term in the series is separately a solution to the wave equation satisfying the
boundary conditions, and so the series sum itself is such a solution.
(3) This series is sort of like an expansion of a vector in terms of a set of basis vectors. In
this picture the coefficients An are the coordinates of the function y  x, t  .
(4) We still have to specify initial conditions and find a method to ensure that they are
satisfied. This is the next order of business.
Initial conditions.
The solution above satisfies the boundary conditions appropriate to the problem,
y  0, t   y  L, t   0. But no initial conditions have been proposed. A complete set of initial
conditions would consist of specifying the displacement of the string at some initial time, say t = 0,
and the velocity of the string at the same time. These conditions might look as follows:
9-1
y  x, 0   f  x 
y . (9-3)
 g ( x)
t t 0
It might be noticed by the astute observer that if we take the partial derivative of equation (9-2)
with respect to time and set t = 0, it vanishes identically. (Each term would have a factor of sin n t
, which vanishes at t = 0.) This means that we have already built into this solution the condition
that the string is not moving at t = 0; or, in terms of the initial conditions just stated,
g  x  0 . (9-4).
This is a limitation; if we wanted equation (9-2) to be completely general, we would have to add
another set of terms multiplied by factors of cosnt . This makes things quite a bit more
complicated and does not add very much to the understanding of Fourier series; we will just live
with the limitation to wave motions where the string is stationary at t = 0.
This leaves us with the condition on the displacement at t = 0, which takes

on the form

n x
y  x, 0   f  x    An sin . (9-5).
n 1 L
The series

n x
f  x    An sin Fourier sine series . (9-6).
n 1 L
is a very famous equation in mathematics, representing the statement that any function of x defined
on the interval [0,L] and vanishing at the ends of the interval can be represented as a linear
superposition of the sine waves vanishing at the ends of the interval. We will now spend some time
seeing how this works.
Orthogonality.
n x
If the functions sin are to play the role of basis vectors in this process, it would be nice if
L
they were orthogonal. To define orthogonality we need to define an inner product. For functions
defined on the interval [0,L], a useful definition of an inner product is the following:
L
u, v   u  x  v  x  dx
x 0
inner product (9-7)
n x
Orthogonality of two functions u  x  and v  x  means that u, v  0 . For the functions sin ,
L
if this inner product is evaluated, the result is
n x m x n x m x
L
L
sin ,sin   sin sin dx   nm (9-8)
L L x 0
L L 2
9-2
where  nm is the Kronecker delta symbol. (The proof of this fact is left to the problems.) So, the
n x
functions sin almost constitute an orthonormal basis for the space of functions we are
L
2
considering. They could be properly normalized by multiplying each one by a factor of . This
L
is not usually done, just to keep the equations simpler.
Now we need a way to find the coefficients  An , n = 1, 2, 3, ... . If we remember the method for
determining the coordinates of vectors, it is very easy:
2 m x 
Am  f  x  ,sin 
L L 
 inversion of Fourier series (9-9)
m x 
L
2
  f  x  sin dx
L x 0 L 
This important relation can be verified as follows:

n x
f  x    An sin
n 1 L
m x 
n x m x
L L
2 2

L x 0
f  x  sin
L
dx   
L x 0 n 1
An sin
L
sin
L
dx
2  n x m x
L
  An  sin
L n 1 x 0 L
sin
L
dx (9-10)
2  L
 
L n 1
An  nm
2
 Am
Here we have of course used the orthogonality relation, equation 9-8. Thus for any given initial
condition f  x  we can calculate the coefficients An, and use equation 9-2 to calculate the position
at any time, to any desired accuracy.
Completeness.
n x
The functions sin are not only orthogonal. They are a “complete” representation of
L
functions of x over the interval 0 ≤ x ≤ L, meaning that such functions can be represented arbitrarily
well by a linear combination of these sine functions. A somewhat more precise statement of this
completeness condition is that given by Dirichlet:
For any function which is piecewise continuous over the interval [0,L], the Fourier series

n x

n 1
An sin
L
(9-11)
converges to f(x) at every point. At points of discontinuity, the series converges to the
average of f(x-) and f(x+).
9-3
Figure9-1. Initial conditions for the triangle wave, initially displaced by an amount h at the center.
So we have found one of the most important expansions of a function as a series of orthogonal
functions:
The Fourier Sine Series

0< x <L

n x
f  x    An sin (9-12)
n 1 L
with
m x
L
2
Am   f  x  sin dx
L x 0 L
Example. Calculate the Fourier coefficients of a Fourier sine series for the symmetrical triangle
wave, shown in figure 9-1.
Solution. The function f(x) can be written
9-4
 2h L
 L x, 0 x
f  x   2 (9-13)
 2h  L  x  , L  x  L
 L 2
and then the coefficients An can be evaluated:
m x
L
2
L x 0 L
2  2h  m x m x
L/2 L
2 2h
   x  sin dx    L  x  sin dx (9-14)
L x 0  L  L L xL / 2 L L
m x m x m x
L/2 L L
4h 4h 4h
2 
 x sin dx  2  x sin dx   sin dx
L x 0 L L xL / 2 L L x L / 2 L
There are various ways to do tiresome integrals like these. My preferred method is to change to
m x
dimensionless variables, then do the integrals by parts. So I will change to the variable u  ,
L
giving
2 m / 2 m
4h  L    4h L m
L  m   u0   sin udu
Am  2    u sin u du  u sin u du 
u  m / 2  L m u  m / 2
m / 2 m
4h  m / 2 m  4h m

m 2 2


u cos u u 0
 u cos u m / 2
 
u 0
cos u du  
u  m / 2
sin u du  
 m
cos u u  m / 2
4h  m m m / 2 m  4h  m 
 2 2 2 cos  m cos m  sin u u 0  sin u u  m / 2    cos m  cos 
m  2 2  m  2 
8h m
 2 2 sin
m 2
(9-15)
This finally simplified rather well. We could leave the result as it is, but another change shows the
m
pattern to the values of the coefficients better. We observe that sin vanishes for m even; and
2
for m odd,
m m 1
sin   1 2 (m odd) (9-16)
2
Thus the result is
 0, m even

Am   8h m 1 (9-17)
   1  2 , m odd
 m 2 2
The corresponding series for f(x) is
9-5
8h x
8h 3 x 8h 5 x
f  x  sin 
sin  sin  .  .  ..
 2
L 9 2
L 25 2
L
(9-18)
 x 3 x 5 x 
 h .81057 sin  .09006 sin  .03242 sin  .  .  .. .
 L L L 
The contribution from each of the first five terms is plotted in figure 9-2. Note that it is possible
just from comparing the function to be represented with each normal mode to determine which
ones will have a zero coefficient. For the case of the triangle wave, the n = 2 and n = 4 sine waves
are anti-symmetric about the center of the interval (x = 0.5), and the triangle wave is symmetric
about this point. This makes it pretty clear that none of the even-n modes are going to contribute.
It is also fairly clear that for the odd-n modes, the contributions from integrating over the intervals
[0,0.5] and [0.5,1] will be equal. So, retrospectively, one could have just done the first half of the
integral (the easier half), for odd n, multiplying by 2 to account for the other half of the integral.
9-6
n x
Figure 9-2. The Fourier series for the triangle wave. There is a graph for each value of sin (dash-dot line), the
L
nth term in the expansion (dashed line), and the nth partial sum (dark line).
B. The Fourier sine-cosine series
The preceding discussion was based on the analysis of a string fixed at x = 0 and x = L, and it
made sense to expand the initial displacement in terms only of functions obeying the same
restrictions. However, there is a more general form of the Fourier series which gives a complete
representation of all functions over the interval [-L,L], independent of their symmetry and with no
requirement that they vanish at any particular points. This is the series
9-7
The Fourier Sine - Cosine Series

-L < x < L (9-19)
1 
n x n x 
B0    An sin
f  x   Bn cos 
2 n 1  L L 
The inner product for this interval is now
L
u, v   u  x  v  x  dx
L
(9-20)
and the inversion formulae for this series are

m x
L
1
L x  L L
(9-21)
m x
L
1
Bm   f  x  cos dx
L x  L L
Odd and even functions.
In the inversion formulae, equation (9-21) above, the range of integration, from –L to L, is
m x m x
symmetric about the point x = 0. The functions sin and cos have definite symmetry
L L
about x = 0, and this can make the job of carrying out the integrations in equation (9-21) easier.
Functions are said to be even or odd about x=0 if they satisfy one of the following conditions:
g (  x)  g ( x) even function
(9-22)
h( x)  h( x) odd function
Example. Here some functions of x. Which ones are even or odd?
(a) sin x
(b) x2
(c ) x3
(d ) ex
Solution.
(a) From the properties of the sine function,
sin( x)   sin( x)
So, this is an odd function.
(b) Using simple algebra,
  x    1 x 2
2 2
  x2
This is therefore an even function.
(c) Similarly,
 x   1 x3
3 3
  x3
9-8
This is therefore an odd function.

(d) Compare
f ( x)  e x
and
f ( x)  e  x .
This function is always positive, so it cannot be odd. And, while f  x  is greater than 1, f   x 
is less than 1, and so f  x  cannot be even. So, this function is neither even nor odd.
There are some general conclusions which can be drawn about odd and even functions. It is easy
to see that
 The product of two even functions is an even function.
 The product of two odd functions is an even function.
 The product of an even function and an odd function is an odd function.
Some important properties of their integrals can also be demonstrated. Let g  x 
be an even function of x, and h(x) an odd function of x. Then it is easy to see that
L L
 g  x  dx  2 g  x  dx (9-23)
L 0
and
L
 h  x  dx  0
L
(9-24)
Periodic functions in time.
The Fourier expansion of functions of time plays a very important role in the analysis of signals, in
electrical engineering and other fields. A periodic function of time with period T can be expanded
as follows:

B
f (t )  0    An sin n t  Bn cos n t  (9-25)
2 n1
where
2 n
n  (9-26).
T
The inversion formula is
T /2
2
f  t  sin n t dt ,
T T/ 2
An 
T /2
(9-27)
2
f  t  cos n t dt .
T T/ 2
Bn 
Example. Calculate the coefficients of the Fourier series for the square wave shown in
figure (9-3), consisting of a pulse waveform with value 1 from t = -T/4 to t = T/4, and zero
over the rest of the interval [-T/2, T/2]. (The value of the function for all other values of t is
determined by its periodicity.)
9-9
Figure 9-3. A train of square pulses, consisting of the periodic extension of a single pulse defined over the interval
[-T/2, T/2].
Solution. Continuing this pulse as a periodic function of t leads to an infinite train of

square pulses, as shown. The function has the value 1 during half of each period (a "mark-
to-space ratio" of 1). Note that each of the functions in the expansion [Fourier sine-cosine
time series] is periodic with period T, and so the periodic continuation of the function is
automatic.
In calculating the coefficients, we can make use of the theorems stated above concerning
odd and even functions. The function of time f(t) which we are expanding is an even
function about t = 0. The integral for An thus is an integral over a symmetric interval of an
odd function, and vanishes. The integral for Bn can also be simplified by taking twice the
integral from 0 to T/2. Thus,
An  0. (9-28).
and
T /2
2
f  t  cos n t dt
T T/ 2
Bn 
T /2
4

T  f  t  cos  t dt
0
n
T /4
4

T  cos  t dt
0
n
(9-29).
T /4
4 1
 sin n t
T n 0
4 T
 sin n
n T 4
2 n
 sin
n 2
Summarizing,
9 - 10
An  0
n
sin
Bn  2
n
2
(9-30)
For n = 0, we can sort of use l'Hopital's rule to take the limit as n  0 to determine that Bn
= 1. Or, just do the integral:
An  0. (9-31).
T /4
2
1 dt
T T/ 4
B0 
1
Then, using properties of the sine function, we can re-write the result as
An  0 ,
 2 n 1
 n  1 2 , n odd
 (9-32)
Bn   0, n even and not 0 .
 1, n = 0


B0
As always, represents the average value of the function.
2
C. The complex Fourier series.

There is a way of writing the sine and cosine functions in terms of the exponential function with
complex argument, using the relations
ei  ei
cos  
2 (9-33)
e  ei
i
sin  
2i
(These relations can be derived easily from the Euler relation defining the exponential with
complex argument, ei  cos   i sin  .) If we substitute these relations into eq. (9-25), we get

B
f (t )  0    An sin n t  Bn cos n t 
2 n 1
B0   int  An Bn  int  An Bn  
  e   e   2i  2  
2 n 1  2 
(9-34)
 2i  

  C e
n 
n
i nt
where
9 - 11
 Bn  iAn  
1
Cn 
2
 n=0, 1, 2, . . . (9-35)
1
C n   Bn  iAn  
2 
The inverse relations, giving An and Bn in terms of the C's, are
A n =i  Cn  C n 
(9-36)
B n =  C n  C n 
If An and Bn are real numbers, f(t) is real. However, the series (9-34) is in fact more general, and
can be used to expand any function of t, real or complex. For complex numbers, the generalized
inner product is
T /2
u, v   u*  t  v  t  dt (9-37)
T / 2
where u*(t) represents the complex conjugate of the function u(t). It is easy to show that the
exponential functions used in this series obey the orthogonality relation
T /2

int imt
e ,e  eint eimt dt
T / 2
(9-38)
 T  nm
This leads to the inversion formula, which we give with the expansion formula for completeness:
The Complex Fourier Series

f (t )   C e
n 
n
i nt
(9-39)
T /2
1
Cn   f  t  e  int dt
T T / 2
Example. Calculate the coefficients Cn for the square-wave signal shown in figure 9-3.
Solution. Substitute into the inversion formula.

T /2
1
f  t  e int dt
T T/2
Cn 
T /4
1
 
T T /4
e int dt
1  1  int T /4
  e (9-40a)
T  in  T /4
T T
1  i 4n i n 
 e e 4 
inT  
2 T
 sin n
nT 4
or
9 - 12
1 n
Cn  sin n = . . . -2, -1, 0, 1, 2, . . .
n 2

 1/ 2 n0 (9-40)

 0 n even
 1
  1
( n 1)/2
n odd
 n
As with the coefficient Bn, the case of n=0 has to be interpreted in terms of the limit as
n  0 , giving C0 = 1/2, the correct average value for the square-wave function.
Problems
Problem 9-1. Orthogonality of the sine functions. The spatial dependence of the standing-wave
solutions for waves on a string of length L is given by
n x
f n ( x )  sin , n = 1,2,3, . . . .
L
Prove the orthogonality relation for these functions,
L
f n , f m   nm ,
2
where the inner product is defined by
L
fn , fm  f
x 0
n ( x) f m ( x)dx .
To carry out this integral, substitute in the definition of the sine function in terms of complex
exponentials,
nx nx
i i
nx e L
e L
sin 
L 2i
[Please do this integral "by hand," not with Mathematica.]
Problem 9-2. Odd and even functions. For each of the following functions, determine whether
it is even, odd, or neither, under reflection about the origin.
(a) cos x
(b) xsin x
(c) x
(d) 2 x 2  3 x  4
Problem 9-3. Products of odd and even functions. Suppose that f(x) and g(x) are even functions
and h(x) is an odd function - that is,
9 - 13
f ( x)  f ( x) ,
g ( x)  g ( x) ,
h(  x )   h( x ) .
Prove the following two properties:
(a) The product of two even functions is an even function; that is, the function u(x) = f(x)*g(x) is
an even function.
(b) The product of an even and an odd function is odd; that is, the function v(x) = f(x)*h(x) is an
odd function.
Problem 9-4. Integral of odd and even functions over a symmetrical range of integration.
Prove that, for f(x) an even function about x=0 and h(x) an odd function,
a
 h( x)dx  0
a
and
a a

a
f ( x)dx  2 f ( x)dx .
0
To carry out this proof, break the integral up into an integral from -a to 0 and another from 0 to a,
and, for the integral over negative values of x, make a change of variable, from x to a new variable
y, with y = -x.
Problem 9-5. Fourier coefficients for a square wave. Calculate the coefficients An in the Fourier
sine series,

n x
f  x    An sin , 0 xL
n 1 L
for the square-wave function.
 1, 0  x  L
f ( x)  
0, x  0 or x  L
Problem 9-6. Numerical evaluation of the expansion of the square wave. Use Excel to make a
series of graphs, showing the curve obtained from the partial sums of 1, 3, 5, 7, and 9 terms of the
expansion of the square wave discussed in problem 9.5. Each graph should show the n-th term in
the sum, and the n-th partial sum. Take L = 1, so that the x-axis on the plots runs from 0 to 1. (In
case you didn't do problem 9.5, the answer is

n x
f  x    An sin , 0 xL
n 1 L
 4
 , n odd
An   n .)

0, n even
Here is a suggested layout for a spreadsheet.
9 - 14
nth term nth partial sum

x 1 3 5 7 9 1 3 5 7 9
0 0 0 0 0 0 0 0 0 0 0
0.01 0.04 0.0399 0.0398 0.0397 0.0395 0.04 0.07993 0.11977 0.15945 0.19892
0.02 0.0799 0.0795 0.0787 0.0774 0.0758 0.0799 0.15947 0.23817 0.31561 0.39141
Problem 9-7. Comparison of sine-cosine series and exponential series. The coefficients Cn of
the exponential Fourier series for the square wave were calculated in the text, and are given by
n
sin
Cn  2 , n  ,...  .
n
These coefficients are supposed to be related to those of the sine-cosine Fourier series as follows:
A n =i  Cn  C n 
B n =  C n  C n 
Using these relations, calculate the coefficients An and Bn for the square wave Then compare with
equation 9-30 in the text.. NOTE: you will obtain a very elegant solution if you use the relation
between the coefficients of the exponential series for the square wave,
C n  Cn .
9 - 15
Chapter 10. Fourier Transforms and the Dirac Delta Function
A. The Fourier transform.

The Fourier-series expansions which we have discussed are valid for functions either defined over a
finite range ( T / 2  t  T / 2 , for instance) or extended to all values of time as a periodic function.
This does not cover the important case of a single, isolated pulse. But we can approximate an
isolated pulse by letting the boundaries of the region of the Fourier series recede farther and farther
away towards  , as shown in figure 10-1. We will now outline the corresponding mathematical
limiting process. It will transform the Fourier series, a superposition of sinusoidal waves with
Figure10-1. Evolution of a periodic train of pulses into a single isolated pulse, as the domain of the Fourier series goes
from [-T/2, T/2] to [-, ].
discrete frequencies n, into a superposition of a continuous spectrum of frequencies .
As a starting point we rewrite the Fourier series, equation 9-39, as follows:


f (t )   C e
n 
n
i nt
n
T /2
(10-1)
1
f  t  e dt
T T/ 2
 in t
Cn 
The only change we have made is to add, in the upper expression, a factor of n for later use;
n   n  1  n  1 is the range of the variable n for each step in the summation. We now imagine
letting T get larger and larger. This means that the frequencies
10 - 1
2 n
n  (10-2)
T
in the sum get closer and closer together. In the large-n approximation we can replace the integer
variable n by a continuous variable n, so that
Cn  C  n 
n    n 
n  dn (10-13)
 

n 
 
n 
We thus have

 C  n  e dn
 
f (t )  i nt
n 
T /2
(10-4)
1
C  n   f  t  ei  n t dt
T T / 2
2 n
Next we change variables in the first integral from n to   n   :
T

T
f (t )   C   e  d  i t
2  
T /2
(10-5)
1
C     f  t  eit dt
T T / 2
Now define
T
g    C   (10-6)
2
This gives

1
f (t )   g   e  d  i t
2  
T /2
(10-7)
1
g    dt  f t  e
 it
2 T / 2
Finally, we take the limit T   , giving the standard for m for the Fourier transform:

1
f (t )   g   eit d  Inverse Fourier Transform (10-8)
2  

1
g     f t  e
dt  it
Fourier Transform (10-9)
2 
There are a lot of notable things about these relations. First, there is a great symmetry in the roles
of time and frequency; a function is completely specified either by f(t) or by g(). Describing a
function with f(t) is sometimes referred to as working in the "time domain," while using g() is
referred to as working in the "frequency domain." Second, both of these expressions have the form
of an expansion of a function in terms of a set of basis functions. For f(t), the basis functions are
10 - 2
1 1
eit ; for g(), the complex conjugate of this function, e  it , is used. Finally, the
2 2
function g() emerges as a measure of the "amount" of frequency  which the function f(t)
contains. In many applications, plotting g() gives more information about the function than
plotting f(t) itself.
Example - the Fourier transform of the square pulse. Let us consider the case of an isolated
square pulse of length T, centered at t = 0:
 T T
1,   t 
f (t )   4 4 (10-10)
0 otherwise
This is the same pulse as that shown in figure 9-3, without the periodic extension. It is
straightforward to calculate the Fourier transform g():

1
g     f  t  e  it dt
2 t 
T /4
1

2

t T / 4
e  it dt
T T
1  1    i i  (10-11)
  e
4
e 4

2  i   
T
sin
T 4

2 2 T
4
i  i
e e
Here we have used the relation sin   . We have also written the dependence on  in the
2i
sin x
form  sinc x . This well known function peaks at zero and falls off on both sides, oscillating
x
as it goes, as shown in figure 10-2.
B. The Dirac delta function (x).

The Dirac delta function was introduced by the theoretical physicist P.A.M. Dirac, to describe a
strange mathematical object which is not even a proper mathematical function, but which has many
uses in physics. The Dirac delta function is more properly referred to as a distribution, and Dirac
played a hand in developing the theory of distributions. Here is the definition of (x):
 ( x)  0, x  0

(10-12)
  ( x)dx  1

10 - 3
Isn't this a great mathematical joke? This function is zero everywhere! Well, almost everywhere,
except for being undefined at x=0. How can this be of any use? In particular, how can its integral
Figure10-2. The Fourier transform of a single square pulse. This function is sometimes called the sync function.
be anything but zero? As an intellectual aid, let's compare this function with the Kronecker delta
symbol, which (not coincidentally) has the same symbol:
0, i  j
 ij  
1, i  j (10-13)
3

i 1
ij 1
There are some similarities. But the delta function is certainly not equal to 1 at x = 0; for the
integral over all x to be equal to 1, (x) must certainly diverge at x = 0 In fact, all the definitions
that I know of a Dirac delta function involve a limiting procedure, in which (x) goes to infinity.
Here are a couple of them.
The rectangular delta function
Consider the function
10 - 4
 a
 1/ a x  
 ( x)  lim   2
(10-14)
 0
 a
 0 x  
 2
This function, shown in figure 10-3, is a rectangular
pulse of width a and height h = 1/a. Its area is equal to f(x)

A 

f ( x)dx  h  a  1 , so it satisfies the integral
requirement for the delta function. And in the limit that

a  0, it vanishes at all points except x = 0. This is one 1/a
perfectly valid representation of the Dirac delta a
function.
The Gaussian delta function

x
Another example, which has the advantage of being an
analytic function, is
. Figure 10-3. Rectangular function,
becoming a delta function in the limit a  0.
 1 1  x2 
2
 ( x)  lim  e 2 
 0   
 2 
(10-15)
The function inside the limit is the Gaussian function,
x2

g ( x) 
1 1
e 2 2
(10-16)
g(x)
2 
in a form often used in statistics which is normalized so that


g  ( x)dx  1 , and so that the standard deviation of the
distribution about x=0 is equal to . A graph of the 
Gaussian shape was given earlier in this chapter; the width
of the curve at half maximum is about equal to 2. (See
figure 10-4.) It is clear that in the limit as  goes to zero x
this function is zero everywhere except at x = 0 (where it
1
Figure 10-4. )
The Gaussian function,
diverges, due to the factor ), maintaining the becoming a delta function in the limit
 0.
normalization condition all the while.
Properties of the delta function
By making a change of variable one can define the delta function in a more
general way, so that the special point where it diverges is x = a (rather
than x=0):
10 - 5
 ( x  a)  0, x  a
 (10-17)

 ( x  a)dx  1
Two useful properties of the delta function are given below:



f ( x) ( x  a )dx  f (a ) , (10-18)


f ( x) '( x  a ) dx   f '(a ) , (10-19)
Here the prime indicates the first derivative.
The property given in equation (10-18) is fairly easy to understand; while carrying out the integral,
the argument vanishes except very near to x=a; so, it makes sense to replace f(a) by the constant
value f(a) and take it out of the integral. The second property, Eqn. (10-19), can be demonstrated
using integration by parts. The proof will be left to the problems.
C. Application of the Dirac delta function to Fourier transforms
Another form of the Dirac delta function, given either in k-space or in -space, is the following:
1  i ( k  k0 ) x
2 
e dx   (k  k0 )
. (10-20)
1  i ( 0 ) x
2 
e dx   (  0 )
We will not prove this fact, but just make an argument for its plausibility. Look at the integral (10-
20), for the case when k = k0. The exponential factor is just equal to 1 in that case, and it is
clear that the integral diverges. On the other hand, if k is not equal to k0, it is plausible that the
oscillating nature of the argument makes the integral vanish. If we accept these properties, we can
interpret the Fourier transform as an expansion of a function in terms of an orthonormal basis, just
as the Fourier series is an expansion in terms of a series of orthogonal functions. Here is the picture.
Basis states
The functions
1
eˆ( )  e i t . (10-21)
2
constitute a complete orthonormal basis for the space of ''smooth'' functions on the interval
   t    . We are not going to prove completeness; as with the Fourier series, the fact that the
expansion approximates a function well is usually accepted as sufficient by physicists. The
orthonormality is defined using the following definition of an inner product of two (possibly
complex) functions u(t) and v(t):

u , v   u * vdt . (10-22)

where u* represents the complex conjugate of the function u(t). Now the inner product of two basis
states is
10 - 6

eˆ , eˆ    eˆ  eˆ dt
*

1  it it
2 
 e e dt (10-23)
  (    )
(The proof of the last line in the equation above is beyond the scope of these notes - sorry.) This is
the equivalent of the orthogonality relation for sine waves, equation (9-8), and shows how the Dirac
delta function plays the same role for the Fourier transform that the Kronecker delta function plays
for the Fourier series expansion.
We now use this property of the basis states to derive the Fourier inversion integral. Suppose that
we can expand an arbitrary function of t in terms of the exponential basis states:
.

1
f (t ) 
2 
 g ( )e it d (10-24)
Here  represents a sort of weighting function, still to be determined, to include just the right
amount of each frequency to correctly reproduce the function f(t). This is a continuous analog of

the representation of a vector A in terms of its components,
.

A  A j eˆ j (10-25)

The components Ak are obtained by taking the inner product of the k-th basis vector with A :
.

Ak  eˆk  A (10-26)
If the analogy holds true, we would expect the weighting function (the ''coordinate'') to be
determined by
g   
1
e it , f (t )
2
 *
 1 i t 
   e  f (t )dt (10-27)
  2 

1
e
 it
 f (t )dt
2 
This is exactly the inverse Fourier transformation which we postulated previously in equation (10-
9). We can now prove that it is correct. We start with the inner product of the basis vector with f(t):

1 i t 1
2
e , f (t )  
2 
e it f (t )dt
(10-28)
1

 1


 

 e  i t  g ( )e i t d  dt
2   2  
Here we have substituted in the Fourier expansion for f(t), changing the variable of integration to
’ to keep it distinct. Now we change the order of the integrations:
10 - 7
 
1 1 i   t
2
e it , f (t ) 

 g ( )d  2 e

dt (10-29)
and next use the definition of the delta function, giving
 
1 1 i    t
2
e it , f (t )   g ( )d 

2 e

dt



 g ( ) ( ' )d  (10-30)
 g ( )
as we had hoped to show.
Functions of position x
The Fourier transform can of course be carried out for functions of a position variable x, expanding
in terms of basis states
.
1 ikx
eˆ(k )  e (10-31)
2
Here
2
k (10-32)

is familiar as the wave number for traveling waves. In terms of x and k, the Fourier transform takes
on the form
1 
f ( x)  
2 
g (k )eikx dk Inverse Fourier Transform
  x   (10-33)
  k  

1
g (k ) 
2 

f ( x)eikx dx Direct Fourier Transform (10-34)
D. Relation to Quantum Mechanics

The expressions just above may evoke memories of the formalism of quantum mechanics. It is a
basic postulate of quantum mechanics that a free particle of momentum p is represented by a wave
function
,
px
 p  x   Ae
i
 Aeikx (10-35)
h 2
where   and h is Planck's constant and k  , giving back the deBroglie relationship
2 
between a particle’s momentum and wavelength,
10 - 8
.
h
 (10-36)
p
The wave function  p  x  corresponds to a particle with a precisely defined wavelength, but whose
spatial extent goes from x = - to x = . The Fourier transform thus represents a linear
superposition of wave functions with different wavelengths, and can be used to create a ''wave
packet'' f(x) which occupies a limited region of space. However, there is a price to pay; now the
wave function corresponds to a particle whose wavelength, and momentum, is no longer exactly
determined. The interplay between the uncertainty in position and the uncertainty in momentum is
one of the most famous results of quantum mechanics. It is often expressed in terms of the
uncertainty principle,

xp  . (10-37)
2
We can see how this plays out in a practical example by
P(k)
creating a wave packet with the weighting function
 k  k0 2 a2
a 
g (k )  e 2
. (10-38)

In quantum mechanics the probability distribution is obtained
by taking the absolute square of the wave function. Here g(k)
1
is the wave function in “k-space,” with a corresponding k 
probability distribution given by a 2
k
P(k )  g (k )
2
k0
Figure 10-5. Probability function
 
P  k   a /  exp  (k  k0 )2 a 2 
2
 a 
 k  k0  2 a 2 
 e 2  , (10-39)
  
 
1   k  k0  a 2
2
 ae

plotted in figure 10-5. Comparing to our previous expression for the normalized Gaussian with
standard deviation ,
x2
1 1  2 2
g ( x)  e , (10-40)
2 
we see that P(k) is a Gaussian probability distribution which is normalized to unity and which has a
1
standard distribution (of k, about k0, of ; that is,
a 2
.
1
 k  k  . (10-41)
a 2
In terms of the corresponding particle momentum p,
,
10 - 9

p  k   (10-42)
a 2
where we have taken the rms deviation of the momentum distribution about its average value to
represent the uncertainty in p. We see that this distribution can be made as narrow as desired,
corresponding to determining the momentum as accurately as desired, by making a large.
Now, what is the corresponding uncertainty in position? We have to carry out the Fourier
transform to find out:

1
f ( x)  
2 
g (k )e ikx dk
 k  k0 2 a 2
1 a 

2  e

2
e ikx dk (10-43)
a2 
1 a
   k  k0 2  22i kx 
2 
 
 e dk a 
2
We will work with the quantity in square brackets, collecting terms linear in k and performing an
operation called completing the square:
k  k 0 2  22i kx  k 2  2k k 0  ix / a 2   k 0 2
a
 
 k 2  2k k 0  ix / a 2  k 0  ix / a 2     k
2
0  ix / a 2 
2
 k0
2
(10-44)
 
 k  k 0  ix / a 2  2
 k 0  2ik 0 x / a 2  x 2 / a 4
2
The factor k  k 0  ix / a 2  is the “completed square.” Substituting into the previous equation
2
gives

1
f ( x)  
2 
g (k )e ikx dk
  k  k0 2 a 2
1 a 

2  e

2
e ikx dk
(10-45)
k  k 0  ix / a 2 
2
 k 0  2ik 0 x / a 2  x 2 / a 4
2
a   2 k k0 ix / a 2   ik0 x  2 a 2

2
 a 2
2 x
1
 
  e dk  e e
2 
2
The quantity in square brackets is the integral of a Gaussian function and is equal to ; so we
a
have
x2
1 
f ( x)  e ik0 x
e 2a2
(10-46)
a 
10 - 10
This is a famous result: the Fourier transform of a Gaussian is a Gaussian! Furthermore, the wave
function in ''position space,'' f(x), also gives a Gaussian probability distribution:
P( x)  f ( x)
2
1 
x2 (10-47)
 e a2
a 
This is a normalized probability distribution, plotted in figure 10-6, with an rms deviation for x,
about zero, of
a
 x  x  (10-48)
2
So, the product of the uncertainties in position space and k-space is
a 1 1
xk   (10-49)
2 2a 2
and

xp  xk  (10-50)
2
The Gaussian wave function satisfies the uncertainty
principle, as it should.
P(x)
The uncertainty principle requires the product of the
uncertainties of any two “conjugate variables” (x and px, or E
and t, for examples) to be greater than or equal to  /2, for
any possible wave function. The Gaussian wave function is a
special case, the only form of the wave function for which
the lower limit given by the uncertainty principle is exactly a
satisfied. x 
2
x
0
Figure10-6. Probability function
x2
1 
P( x)  e a2
a 
Problems
Problem 10-1. Fourier Transform of a Rectangular Spectral Function.
Calculate the (inverse) Fourier transform f(x), given by eq. (10-33), for the spectral function g(k)
given by
 1
1 k 
g (k )   2a .

0 otherwise
10 - 11
Make a plot of the result, f(x) vs x. Indicate the values of x at the zero crossings.
Problem 10-2. Theorem Concerning the Dirac Delta Function. Use integration by parts to
prove the theorem given in eq. 10-19,

 f ( x) x  a dx   f (a)


(Here f' represents the derivative of f with respect to its argument, and ' represents the derivative
of  with respect to its argument.)
Problem 10-3. Fourier Transform of a Gaussian. Calculate the Fourier transform (eq. 10-34)
of the wave function
x2
1 
f ( x)  eik0 x e 2 a2
a 
and verify that the result is
 k  k0  2 a 2
a 
g (k )  e 2
.

10 - 12
Chapter 11. Maxwell's Equations in Special Relativity.1
In Chapter 6a we saw that the electromagnetic fields E and B can be considered as

components of a space-time four-tensor. This tensor, the Maxwell field tensor F  ,
transforms under relativistic "boosts" with the same coordinate-transformation matrix
  used to carry out the Lorentz transformation on the space-time vector x  . Since
then we have introduced vector differential calculus, centered around the gradient
operator  . This operator, operating on the fields E and B and the potentials  and
A , can be used to express the four Maxwell equations, giving a complete theory of the
electromagnetic field.
In this chapter we will start to put these equations into "covariant form," expressed in
terms equally valid in any Lorentz frame.
In Chapter 6a we introduced the position four-vector,

 ct 
 
 x
x  
y
 
z
 
the electromagnetic-field four-tensor, 
 1 1 1 
 0  Ex  E y  Ez 
c c c
 
1E 0  Bz By 
 c
x 
F    ,
1
 Ey Bz 0  Bx 
c 
1 
 Ez  By Bx 0 
c 
and the Lorentz transformation matrix
   0 0 
 
    0 0 
  ,
 0 0 1 0
 
 0 0 0 1
which is used in the "usual tensor way" to transform vector and tensor indices from one
rest frame to another. Contraction of indices must always be between a upper and a
lower index, with the metric tensor

One may note factors of 1/c which were not present in the form of the field-strength tensor introduced in
Chapter 6a. This is a pesky issue of unit systems. The form given here gives the correct constants in the SI
system of units.
11 - 1
1 0 0 0 
 
0 1 0 0 
g   g   
 0 0 1 0 
 
 0 0 0 1
used to raise or lower indices. Upper indices are referred to as "contravariant" indices,
and lower indices, as "covariant" indices, referring to details of tensor analysis which we
hope to avoid discussing here.
More Four-Vectors
Let's just see some other combinations of a scalar and a three-vector which form four-
vectors.
 E 
 
, p   E  px c  p y c  pz c 
  px c 
four-momentum p 
 py c  
 
 pz c 
 c 
 
J   x  , J   c   J x  J y  J z 
 J
four-current
 Jy 
 
 Jz 

 
A   x  , A     Ax  Ay  Az 
A
four-potential:
 Ay 
 
 Az 
1  
 c t 
 
  
 x  1     
four-gradient:     ,    
    c t x y z 
 y 
 
   
 z 
Note the cool scalar invariants formed by contracting certain of these vectors with
themselves.
11 - 2
x x  c 2t 2  x 2  y 2  z 2 The proper-time interval, invariant under Lorentz trans.

p  p  E 2  p 2 c 2  m 2 c 4 A particle's invariant mass-squared.
1 2 2 2 2
       2
The wave-equation operator, or the d'Alembertian.
c 2 t 2 x 2 y 2 z 2
Here we are interested in seeing what important relations of electromagnetism can be

expressed simply in covariant language. Here is an interesting contraction to form a four-
scalar:

 J    J  0 Conservation of charge.
t
Remember? Positive divergence of J requires a decrease in the charge density.
Now let's show where Maxwell's equations come from. Since the divergence of the
electric field equals the charge, probably the divergence of the field-strength tensor
equals the four-vector combination of charge and current.
 1 1 1 
 0  Ex  E y  Ez 
c c c
 
1E 0  Bz By 
    
  c x 
  F     
 ct x y z   1 E Bz 0  Bx 
c y 
1 
 Ez  By Bx 0 
c 
  E E E 
  0  x  y  z 
 ct cx cx cx 
  
 Ex  Bz By   c 
  2   0     0 
 c t x y z 
  0 J   0 J x 

 E y B  Bx   
 2  z
  0    0 J y 
 c t x y z   J 
 E    0 z
B
  2 z  y  x    0 
B
 c t x y z 
 
This gives a stack of four equations,
11 - 3
 1  
 E  
c c 0
 
   Ex 
 Bz  By  0 J x  0  0 
 y z t 
  E y 
 Bx  Bz  0 J y  0  0 
 z x t 
   Ez 
 By  Bx  0 J z  0  0 
 x y t 
Or, in old Earth-bound three-vector notation,

E 
0
E
  B  0 J  0  0
t
Here we have the two most complicated of Maxwell's equations, the source equation.
And you might notice that the famous "displacement-current" term, invented by Maxwell
to make the wave-equation work, has appeared as by magic:
E
J displacement   0 .
t
Well, that is about as much excitement as most people can bear. But if you are good
for more - - - nobody really likes the curl. Let's set the four-curl of the field-strength
tensor equal to zero. This will of course involve the four-dimensional version of the
Levi-Civita totally anti-symmetric tensor,
 1,   an even permutation of  0123

  1,   an odd permutation of  0123
 0 otherwise

Then
0      F 
 1  2 Bx    2  2 By    3  2 Bz  
 
  2   2  
 0    2  Ez   3   E y    0  2 Bx  
   c   c  
0   
3  2 
 0     E    2B   0 1  2 
    x  y  E z  
 c   c  
0
 0 12  2  2 
   2 Bz     E y      Ex  
 c   c 
The top line gives
 B  0
and the next three lines give
11 - 4
B
 E  
t
These complete Maxwell's equations. Good enough for one day.
1
http://www.phy.duke.edu/~rgb/Class/phy319/phy319/node135.html
11 - 5
Appendix A. Useful Mathematical Facts and Formulae
1. Complex Numbers
Complex numbers in general have both a real and an imaginary part. Here "i" represents
the square root of -1. If C represents a complex number, it can be written
C  AiB, (AA-1)
with A and B real numbers. Thus,
A  Re  C 
(AA-2)
B  Im  C 
There is another representation of a complex number, in terms of its magnitude  and
phase :
C   ei , (AA-3)
There is a very useful relation between the complex exponential representation and the
real trigonometric functions, the Euler equation:
ei  cos   i sin  (AA-4)
and the inverse relations,
ei  e i
cos  
2 (AA-5)
ei  e i
sin  
2i
From equation (AA-4) one can deduce some useful special values for the complex
exponential:
ei  1
(AA-6)
ei 2  1
And from equation (AA-5) one easily deduces
cos   A  cos A
(AA-7)
sin   A   sin A
2. An Integrals and Two Identities.


u 2
e

du   (AB-1)
sin  A  B   sin A cos B  sin B cos A

(AB-2)
cos  A  B   cos A cos B  sin A sin B
3. Power Series and the Small-Angle Approximation. It is especially convenient to

expand a function of a dimensionless variable as a power series when the variable can be
taken to be reasonably small. Some useful power series are given below.
A-1
x3 x5
sin x  x    ...  ...
3! 5!
x2 x4
cos x  1    ..  ...
2! 4
(AC-1)
x 2 x3
e  1 x 
x
  ...
2! 3!
n(n  1 2 n  n  1 n  2  3
1  x   1  nx  x  x  ...  nx n 1  x n
n
2! 3!
The small-angle approximation usually amounts to keeping terms up through the first
non-constant term, as below.
 1 
sin  
 11  1  2
cos   2
1 x 11  x
 (AC-2)
1 x
x 11  1 x
1  x  2
1  x  x 11  nx

n
4. Mathematical Logical Symbols.
Some symbols used in mathematical logic make definitions and other mathematical
discussions easier. Here are some that I might use in class.
 there exists
 for all or for every
 is contained in (as an element of a set)
 such that
 implies
A-2
Appendix B. Using the P&A Computer System
The Physics and Astronomy Department maintains a rather complete computer system
for use by P&A faculty and students, located in TH 124. There are about 10 PC's running
Linux, and several others running Windows. The Linux machines are typical of the
platform used by many scientists for calculation. They should all have MatLab, Pyghon,
IDL and Mathematica installed locally. The information given below is rather volatile,
since the computer system is currently in a rapid state of change [written August 24,
2009.] Up-to-date information on this system can be obtained over the web at
http://www.physics.sfsu.edu/compsys/compsys.html
1. Logging on. To use the Physics and Astronomy computer system you need a system
account. Your instructor should request it, and when it is set up, you need to go to the
Department office to get your password, and to ask Peter Verdone (TH 104 or TH 212)
for a PIN to get you into TH 123. (If you have the password, a teacher or fellow student
will probably let you into the room, on a temporary basis.) We will assume that you have
your password and can get into TH 123.
(1) Choose one of the Linux machines. Pressing [enter] brings up a prompt for
entering your user name and password. A first-time user may receive some startup
messages.
(2) A desktop should open. If you only get a command window, start the desktop by
entering the command startx.
(3) From the desktop, open a terminal window by clicking on the third icon from the
left on the task bar at the bottom of the screen.
(4) I prefer to work from x-terminal windows. To open one of these, enter (from the
command line of the terminal window)
th123-22:bland% xterm &
NOTE: the first part of this instruction, th123-22:bland%, represents the command-line
prompt, and you do not type it. It will have your name, not bland. The command "xterm"
should open another window. The "&" character at the end of the line detaches the xterm
window from the window executing the instruction. Otherwise the terminal window is
useless as long as the xterm window is open.
2. Running Mathematica, MatLab and IDL.
You should be able to run these programs natively on all of the machines in TH123,
Linux or Windows. If your machine does not seem to have Mathematica, it might help to
connect to some other machine. For instance, to connect to th123-21, enter
th123-22:bland% ssh th123-21
Next, check to see that X-windows communication is open by running the clock window:
th123-21:bland% xclock &
B-1
A small window with a clock face should appear on your desktop. If it does not,
something is wrong. If xclock worked, you can run one of the big computation programs.
th123-21:bland% MatLab &

th123-21:bland% IDL
th123-21:bland% Mathematica
All of this can be done from Windows, too. You need to run an X-windows host, such
as xwin32 or xming, then connect to th123-22.sfsu.edu using a secure shell application
such as SSH. (See the Department computer system site,
http://www.physics.sfsu.edu/compsys/compsys.html, for more information.) Once
connected, open an xterm window, and continue as above. You will find that many
operations take a lot longer this way, due to the network connection. And with
Mathematica there are often problems with fonts. I recommend that you start the
semester working in the computer room, and set up remote access later if you want.
3. Mathematica.
This section gives some details on using Mathematica. Follow these instructions to load
and execute Mathematica. One warning - you may have to turn the "Num Lock" selection
off for Mathematica to respond to the keyboard.
th123-22:bland% mathematica &
A Mathematica icon should appear at once in the lower left-hand corner of the desktop,
and in a few seconds a Mathematica window will open. If instead you receive a message
telling you that the proper fonts are not installed, something is wrong. Try moving to
another workstation.
Here is a Mathematica command to enter, just to make sure that it is working:
Plot[Sin[x], {x, 0, 4 Pi}]
Execute the instruction with [shift][enter]. (Just pressing the [enter] key moves to a new
line, but does not execute the instruction.) A graph of two cycles of a sine wave should
appear.
To use the function for vector addition discussed in the text, carry out the following
commands from a terminal window:
th123-22:bland% ls Shows files on your home dir.
th123-22:bland% mkdir 385 Create directory for Ph 385
th123-22:bland% cd 385 Change to Ph 385 directory
th123-22:bland% cp ~bland/export/385/vectsum2.nb .
(Note the space and period at the end of the line.)
th123-22:bland% ls Should show the file vectsum2.nb
B-2
Now you can open the file vectsum2.nb with Mathematica and do the vector-addition
exercises.
If you prefer to work from home, and if you have Mathematica on your own computer,
you can download the file vectsum2.nb from the download page on the course website
and work on it there.
B-3
Appendix C. Mathematica
See Appendix B for instructions on accessing Mathematica on the Department computer

system. Here we will just present a routine for numerical addition of vectors. The
(magnitude, angle) representation for vectors is assumed.
1. Calculation of the vector sum using Mathematica. Today we rely increasingly on

"machines" to carry out laborious calculations like the preceding one for us.
Mathematica is one such machine, and we will use it in this course.
As a start, we will use Mathematica to carry out
the solution to the numerical example given in
Chapter 1, section B, illustrated in figures (1-4)
and (1-5). Here is the method. If we can
determine angle 1 of the triangle 1-2-3, we can
calculate the angle C (C = A - 1) and the
magnitude C of the sum vector (C2 = A2 + B2 – 2
A B cos 2, where 2 is determined from 2 + A
- B = 180). We will show how to do the
calculation using Mathematica. [The following
Mathematica commands are taken from the file
~bland/export/385/vectsum2.nb, also available on
the Ph 385 download page. Rather than typing
everything in, feel free to copy this file to your
directory and open it with Mathematica.] The first Figure C-1.
command defines a Mathematica function which
 
calculates the vector sum of the two vectors A and B ; the second shows a sample call to
this function; and the following line shows the answer by Mathematica. To run this
function yourself, just type in the commands exactly as shown, and press [Shift][Enter].
Vsum[{Amag_, Athdeg_, Bmag_, Bthdeg_}] := (

Ath = Athdeg*Pi/180. ;
Bth = Bthdeg*Pi/180. ;
Th2 = Pi - Ath + Bth;
Cmag = Sqrt[Amag^2 + Bmag^2 - 2 Amag Bmag Cos[Th2]];
CosTh1 = (Amag^2 + Cmag^2 - Bmag^2)/(2 Amag Cmag);
SinTh1 = Bmag*Sin[Th2]/Cmag;
Th1 = ArcTan[CosTh1, SinTh1];
Cth = Ath - Th1;
{Cmag, Cth*180. /Pi}
) <shift><enter>
Vsum[{10.,48.,14.,20.}] <shift><enter>
Out[2] = {23.3072,31.6205}
We will use this function for a variety of numerical calculations with vectors in the
(magnitude, angle) form.
C-1
2. Matrix operations in Mathematica.
m={{2,4,3},{1,3,2},{1,3,1}} defining a matrix

MatrixForm[m] display as a rectangular array
cm multiply by a scalar
a.b matrix product
Inverse[m] matrix inverse
MatrixPower[m,n] nth power of a matrix
Det[m] determinant
Tr[m] trace
Transpose[m] transpose
Eigenvalues[m] eigenvalues
Engenvectors[m] eigenvectors
Eigenvalues[N[m]],Eigenvectors[N[m]] numerical eigenvalues and eigenvectors
m=Table[Random[],{3},{3}] 3x3 matrix of random numbers
<<Graphics` load graphics package
Solve[....,x] solve simultaneous linear equations
Plot3D[function(x,y),{x,...},{y ...}] 3-D plot
Table[...,{500},{500}] create stuff to plot
ListPlot[Abs[Eigenvalues[m]]] matrix eigenvalues
N[Pi,100] pi to a hundred places
Simplify[%] simplify the preceding answer
//N give numerical result
Table C-1. Some useful mathematical operations which Mathematica can carry out.
3. Speed test for Mathematica. Here's a function to use to see how long Mathematica
takes to calculate the determinant of a matrix.
n=3
m=Table[Random[],{n},{n}];
Det[m]
Note: the semicolon at the end of the second line prevents printing out the matrix m. For
small values of m you could remove the semicolon, but for large values of m the printout
will take forever.
C-2
REFERENCES
Abbott, Edwin A. [ca. 1865], Flatland, a Romance of Many Dimensions (Barnes &
Noble, New York).
Axler, Sheldon [1995], “Down with Determinants,” American Mathematical Monthly

102, 139.
Burger, Dionysis [1965], Sphereland: a fantasy about curved spaces and an expanding
universe (Thomas Y. Crowell, New York), translated by Cornelie J. Rheinboldt.
Jorday, D.W. and P. Smith [2002], Mathematical Techniques, 3rd Edition (Oxford
University Press)
Lea, Susan M. [2004], Mathematics for Physicists (Thomson-Brooks/Cole, Belmont, Ca).
McQuistan, Richmond B. [1965], Scalar & Vector Fields - a Physical Interpretation

(Wiley). Covers much of the same material as div, grad, curl and all that, at a good
level for junior physics and astronomy majors.
Wolfram, Steven, scienceworld.wolfram.com/physics/doublependulum.html
Lectures on vectors by M. Evans posted on line; for example,

http://www2.ph.ed.ac.uk/~mevans/mp2h/VTF/lecture05.pdf

Vector Completo Group

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vector Completo Group

Uploaded by

Copyright:

Available Formats

San Francisco State University

Department of Physics and Astronomy

Vector Spaces in Physics

Chapter 4. Practical examples

Chapter 5. The inverse; numerical methods

Chapter 6. Rotations and tensors

Chapter 6a. Space-time four-vectors.

Appendix A. Useful mathematical facts and formulae.

Figure 1-1. Where is the vector?

A. The Displacement Vector

C. Product of Two Vectors

This simple definition is illustrated in figure

Example. To find the component of a vector A in a direction given by the unit

D. Vectors in Terms of Components

E. Algebraic Properties of Vectors.

Multiplication of a vector by a scalar. We will take the following, rather obvious,

The Zero Vector

We define the zero vector as follows:

Subtraction of Vectors. Subtraction is then defined by

Vectors follow algebraic rules similar to

 a  b  A  aA  aB another distributive property

F. Properties of a Vector Space.

Table 1-1. Properties of a vector space.

gives another scalar.

5. The algebraic properties (1-29) through (1-33) were discussed above;

G. Metric Spaces and the Scalar Product

H. The vector product.

The cross product satisfies the following algebraic properties:

I. Dimensionality of a vector space and linear independence.

Example using coordinates. Consider the three vectors

Solution: Try to find c1, c2, and c3 such that

Solving simultaneous linear equations with Mathematica. It is hard to resist

This is Mathematica’s way of telling us that there is no solution.

Is this cheating? I don't think so!

J. Components in a Rotated Coordinate System. In physics there are lots of reasons

iˆ  iˆ'  cos ,

It is easy to generalize this procedure to three or more dimensions. However, we will

K. Other Vector Quantities

Problem 1-1. Consider the 5 quantities below as possible vector quantities:

1. Compass bearing to go to San Jose.

Problem 1-3. [Mathematica] Consider the following vectors:

Problem 1-6. Consider the "vector-subtraction

In order to see if this set of vectors constitutes a vector space,

In order to see if this set of vectors constitutes a vector space,

Let us imagine defining vector addition as follows:

In order to see if this set of vectors constitutes a vector space,

Problem 1-30. Consider a cube of side a, with one z

Chapter 2. The Special Symbols  ij and  ijk , the Einstein

A. The Kronecker delta symbol,  ij .

This symbol has two indices, and is defined as follows:

Example: Find the value of ˆj  kˆ obtained by using equation (2-5).

B. The Einstein summation convention.

The Einstein Summation Convention. In expressions involving vector or tensor indices,

So, the definition of the dot product is now

C. The Levi-Civita totally antisymmetric tensor.

The Levi-Civita symbol is an object with three vector indices,

 ijk ,  i  1, 2,3; j  1, 2,3; k  1, 2,3 Levi-Civita Symbol

Groups. A group is a mathematical concept, a special kind of set. It is defined as

Examples of permutations operating on triplets of indices:

 P123  I  P12 2  P132  P232 

There is a special subset of permutations of a series of objects called the circular

(a,b,c,d,e,f,g) ---> (g,a,b,c,d,e,f)

Figure 2-1. A circular permutation.