You are on page 1of 141

Lecture Notes for MATH6106

March 25, 2010

1
Contents
1 Vectors 4
1.1 Points in Space . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Distance between Points . . . . . . . . . . . . . . . . . . . . . 4
1.3 Scalars and Vectors . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Vectors and Coordinate Systems . . . . . . . . . . . . . . . . . 5
1.5 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Revision of Vectors . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Topics in 3D Geometry 19
2.1 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Functions of several variables . . . . . . . . . . . . . . . . . . 23
2.5 Change of Coordinates . . . . . . . . . . . . . . . . . . . . . . 25

3 Partial Derivatives 31
3.1 First Order Partial Derivatives . . . . . . . . . . . . . . . . . . 31
3.2 Higher Order Partial Derivatives . . . . . . . . . . . . . . . . . 33
3.3 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . 38

4 A little Vector Calculus 43


4.1 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Directional Derivative . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Curl and Divergence . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Linear Equations 56
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Soving Systems of Linear Equations . . . . . . . . . . . . . . . 59

6 Linear Independence and Gram Schmidt 64


6.1 Linear Combination of Vectors . . . . . . . . . . . . . . . . . . 64
6.2 Linear Independence of Vectors . . . . . . . . . . . . . . . . . 66
6.3 Orthogonal and orthonormal vectors . . . . . . . . . . . . . . 68
6.4 Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . 70

7 Matrices 74
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . 74
7.3 Special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.5 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8 Eigenvalues, Eigenvectors and Diagonalization 83


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.2 Method to find eigenvalues and eigenvectors . . . . . . . . . . 83
8.3 Dimension of Eigenspace . . . . . . . . . . . . . . . . . . . . . 97
8.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.5 Diagonalization of symmetric matrices . . . . . . . . . . . . . 105

9 Operators and Commutators 112


9.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.2 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . 114
9.3 Composing Operators . . . . . . . . . . . . . . . . . . . . . . . 117
9.4 Commutators . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

10 Fourier Series 125


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.2 Definition of a Fourier Series . . . . . . . . . . . . . . . . . . . 125
10.3 Why the coefficients are found as they are. . . . . . . . . . . . 126
10.4 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
10.5 Odd and even functions . . . . . . . . . . . . . . . . . . . . . 132
10.6 Functions with period 2L . . . . . . . . . . . . . . . . . . . . . 136
10.7 Parseval’s Result . . . . . . . . . . . . . . . . . . . . . . . . . 139
1 Vectors
1.1 Points in Space
A point in two dimensional (2D) space is represented on the XY -plane using
two coordinates (x, y).

Example. Plot P (1, 2).

3D Coordinate system
The coordinate system in 3D space consists of three axes X,Y and Z. To
determine the directions of these axes, we follow the right hand rule:
We first define a plane in 3D space as the XY plane. Then place your right
hand on the X-axis, curl your fingers in the direction of the Y -axis, and let
your thumb stick out. Your thumb then points in the direction of the positive
Z axis.

A point in three dimensional (3D) space is represented using three coor-


dinates (x, y, z).

Example. Plot P (1, 2, 3).

1.2 Distance between Points


In 2D-space, the distance between two points P (x1 , y1 ) and Q(x2 , y2 ) is
p
d(P, Q) = (x2 − x1 )2 + (y2 − y1 )2 .
Similarly in 3D-space, the distance between two points P (x1 , y1 , z1 ) and
Q(x2 , y2 , z2 ) is
p
d(P, Q) = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2 .

Example. Find the distance between P (1, 2, 3) and Q(0, 0, 1).


p p
d(P, Q) = (x2 − x1 )2 + (y2 − y1 )2 = (0 − 1)2 + (0 − 2)2 + (1 − 3)2 = 3.

1.3 Scalars and Vectors


Certain quantities in physical applications of mathematics are represented
by vectors. To understand what a vector is it is perhaps better to first
understand what a scalar is:
Definition. A scalar quantity is defined completely by a single number, e.g.
length, area, volume, mass, time etc. It has only magnitude.
Definition. (Physical) A vector quantity is defined by magnitude and
direction, e.g. force, velocity, acceleration.
Example. 1. Force is reresented by a vector; the direction of the vector
describes the direction in whch the force is applied, and the magnitude
of the vector indicates the strength applied.
2. Velocity of a car is represented by a vector; the direction gives the
direction of motion and the magnitude gives the speed of the car.
Note. Two vectors with the same length and same direction are equal, we
do not care about their position in space.

1.4 Vectors and Coordinate Systems


Vectors can occur in two, three or even higher dimensions.
Definition. (Geometric) A vector in 2D-space can be represented by a
directed line segment, with an initial point and a terminal point. We denote
a vector by an over head arrow, e.g. ~v . Then the magnitude of ~v is given by
the length of the directed line segment, and is denotd by |~v |, and the direction
of ~v is given by the direction of the line segment.
Example. Plot the vector P~Q with initial point P (2, 3) and terminal point
Q(−1, 4). Find the magnitude of the vector P~Q.


|P~Q| =
p
(−1 − 2)2 + (4 − 3)2 = 10

Example. We can also represent the vector P~Q as the difference of the
~ and OP
vector OQ ~ (Use diagram.)

P~Q = OQ
~ − OP ~
= (−1, 4) − (2, 3)
= (−3, 1)
Then the directed line segment from O(0, 0) to R(−3, 1) also represents the
vector P~Q. Verify that the magnitude of both line segments is the same.

~ = (−3 − 0)2 + (1 − 0)2 = 10
p
|OR|
The two vectors also point in the same direction.
Definition. (Algebraic) A vector in 2D-space is represented by two com-
ponents (x, y), indicating the change in the X and Y directions from the
origin (0, 0).
Note. When we say a point P (−3, 1) in 2D-space, we mean the point with
x-coordiante −3 and y-coordinate 1. When we say the vector ~v = (−3, 1)
in 2D-space, we mean the directed line segment
p from the point
√ (0, 0) to the
2 2
point (−3, 1), which has magnitude or length (−3) + 1 = 10 and points
in the direction of the line segment.
Definition. A vector in 3D-space is given by a three tuple (x, y, z), indi-
cating the change in the X, Y and Z directions from the origin.
Note. We represent a vector ~v = (x, y, z) in 3D-space as the directed
line
p segment from the point (0, 0, 0) to the point (x, y, z), which has length
x2 + y 2 + z 2 and points in the direction of the line segment.
Example. Plot the vector ~v = (1, 2, 3) and determine it’s length.
1.5 Vector Operations
Addition and Subtraction
We add and subtract vectors component-wise.
Example. Let ~v1 = (1, 0, 1) and ~v2 = (−1, 2, 3). Then
~v1 + ~v2 = (1 − 1, 0 + 2, 1 + 3) = (0, 2, 4)
and
~v1 − ~v2 = (1 − (−1), 0 − 2, 1 − 3) = (2, −2, −2).

Scalar Multiplication
If c is a real number and ~v = (x, y, z), then
c~v = (cx, cy, cz).
This is known as scaling - we change the length but not the direction of v.
p p
|c~v | = c2 x2 + c2 y 2 + c2 z 2 = c x2 + y 2 + z 2 = c|~v |

Special vectors
The zero vector: ~o = (0, 0, 0).
Definition. A unit vector is a vector with length equal to one. i.e. |~v | = 1.
In 2D space, the unit vectors in the X and Y direction are special and denoted
by:
~i = (1, 0)
~j = (0, 1)

Similarly, in 3D space, the unit vectors in the X, Y and Z directions are


denoted by:
~i = (1, 0, 0)
~j = (0, 1, 0)
~k = (0, 0, 1)
Example. Using the special unit vectors i, j, k, we get a different way of
representing a vector,

~v = (x, y, z) = x~i + y~j + z~k.

Consider ~v = (13, 6, −8). Then,

~v = (13, 6, −8)
= 13(1, 0, 0) + 6(0, 1, 0) − 8(0, 0, 1)
= 13~i + 6~j − 8~k

This representation is often useful and we will use both notations in the
course.
Example. Find a unit vector in the direction of the vector ~v = (6, 8, 10).

Dot Product
Definition. The dot product of 2 vectors ~a = (a1 , a2 , a3 ) and ~b = (b1 , b2 , b3 )
in 3-dimensional space is defined to be the scalar

~a.~b = a1 b1 + a2 b2 + a3 b3 .

This is the algebraic definition of the dot product

On the other hand, if θ is the angle between the vectors ~a and ~b, using the
Law of Cosines on the triangle with sides ~a, ~b, ~a − ~b we get,

|~a − ~b|2 = |~a|2 + |~b|2 − 2|~a|.|~b| cos θ


(a1 − b1 )2 + (a2 − b2 )2 + (a3 − b3 )2 = a21 + a22 + a23 + b21 + b22 + b23 − 2|~a|.|~b| cos θ
a1 b1 + a2 b2 + a3 b3 = |~a|.|~b| cos θ
~a.~b = |~a|.|~b| cos θ

Definition. Therefore we could equivalenty define the dot product to be

~a.~b = |~a|.|~b| cos θ.

This is the geometric definition of the dot product.


Remark. The two main applications of taking dot products;
(i) to find the angle between vectors and
(ii) to determine if two vectors are perpendicular.

Example. Find the dot product of ~a = (1, −1, 3) and ~b = (2, −1, 0). Also
find the angle between the two vectors.
Solution:

~a.~b = 1.2 + (−1)(−1) + 3.0 = 3
√ √
|~a| = 1+1+3= 5
√ √
|~b| = 4+1= 5
~a.~b
cos θ = = 3/5
|~a||~b|
3
θ = cos−1 ( )
5

Proposition Let ~a and ~b be non-zero vectors. The vectors, ~a and ~b, are
perpendicular to each other if and only if ~a · ~b = 0.

Solution: Let ~a and ~b be non zero vectors, then |~a| =


6 0 and |~b| =
6 0. Let θ
be the angle between the two vectors.

~a · ~b = 0
⇔ |~a|.|~b| cos θ = 0
6 0, |~b| =
⇔ cos θ = 0; Since |~a| = 6 0, we can divide the equation by them.
⇔ θ = π/2

Therefore the angle between ~a and ~b is 90 degrees.

Example. Determine if the vectors ~a = (1, −1, 1) and ~b = (1, 2, 1) are per-
pendicular to each other.

~a · ~b = (1)(1) + (−1)(2) + (1)(1) = 0


Therefore by the above proporsition, the vectors are perpendicular to
each other.
Cross Product
Definition. Given 2 vectors ~a = (a1 , a2 , a3 ) and ~b = (b1 , b2 , b3 ) in 3-dimensional
space the cross product ~a × ~b is a vector defined as follows:

~a × ~b = (|~a|.|~b| sin θ)~n

where ~n is a unit vector which is perpendicular to the plane containing ~a and


~b, determined by the right-hand-rule. i.e. place your right hand on ~a and
curl it towards ~b, ~n now points in the direction of your thumb. This is the
geometric definition of the cross-product.

We also have an algebraic definition of the cross-product, but before we can


define it, we need to be able to compute determinants!

Definition. The determinant of a 2 × 2 matrix is defined as follows:



a b
c d = ad − bc.

Example. Compute the determinant.



1 3
−5 6 = 1.6 − 3(−5) = 9.

Definition. The determinant of a 3 × 3 matrix is defined as follows:



a11 a12 a13
a21 a22 a23 = a11 a22 a23 − a12 a21 a23 + a13 a21 a22

a32 a33 a31 a33 a31 a33
a31 a32 a33
= a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 )

Example. Compute the determinant.



4 2 1
−6 3
− 2 −2 3
−2 −6
−2 −6 3 = 4 + 1
5 0 −7 0 −7 5
−7 5 0
= 4(−15) − 2(21) + (1)(−52)
= −154
Definition. Given 2 vectors ~a = (a1 , a2 , a3 ) and ~b = (b1 , b2 , b3 ) in 3-dimensional
space the cross product ~a × ~b can also be defined as follows:
~i ~j ~k

~a × ~b = a1 a2 a3

b1 b2 b3

where ~i = (1, 0, 0), ~j = (0, 1, 0), ~k = (0, 0, 1) are unit vectors along the X,
Y , Z axes respectively. However while calculating the determinant they only
have symbolic value.

Remark. The cross product operates on 2 vectors and produces a new vec-
tor.

Example. Consider the two vectors ~a = (2, 1, 0) and ~b = (−1, 1, 0) in 3-


dimensional space. Find the cross-product of the two vectors and verify that
it is indeed perpendicular to both ~a and ~b. Also find the magnitude of ~a × ~b.

Solution:
~j ~k

~i
~

~a × b = 2 1 0
−1 1 0

1 0 ~ 2 0 ~ 2 1 ~
= i− j+ k
1 0 −1 0 −1 1
= 0~i − 0~j + 3~k
= 0(1, 0, 0) − 0(0, 1, 0) + 3(0, 0, 1)
= (0, 0, 3)

Observe that the two vectors lie in the XY -plane and the cross product lies
on the Z-axis and is indeed perpendicular to the plane containing the two
vectors. But we can verify this more concretely using the dot product. Let
~n = ~a × ~b = (0, 0, 3).

~n · ~a = (2, 1, 0).(0, 0, 3) = 0 + 0 + 0 = 0.
~n · ~b = (−1, 1, 0).(0, 0, 3) = 0 + 0 + 0 = 0.
Since both the dot products are zero, we can conclude that ~a × ~b is perpen-
dicular to both ~a and ~b.
Finally the magnitude of ~a × ~b is given by,

|~a × ~b| = 0 + 0 + 9 = 3.

Remark. The two important properties of cross products are:

(1) The direction of the cross product ~a × ~b is perpendicular to both ~a and


~b. In particular it points in the direction of your thumb if the fingers of
your right hand curl from ~a to ~b.

(2) The magnitude of ~a × ~b is given by |~a|.|~b| sin θ, if θ is the angle between


~a and ~b. It is the area of the parallelogram determined by ~a and ~b.

1.6 Revision of Vectors


Example. Find a unit vector in the direction of the following vectors:

(i) ~v1 = (2, −3).

(ii) ~v2 = (1, −2, 5).

(iii) ~v3 = (1, 1, 1, 1).

Solution: To get a unit vector in the direction of a given vector, all we


need to do is scale the size of the vector by the inverse of its length.

(i) ~v1 = (2, −3). p √


The length of ~v1 is |~v1 | = 22 + (−3)2 = 13.
A unit vector u1 in the direction of v1 may then be obtained as follows:
1 2 −3
u1 = √ (2, −3) = ( √ , √ ).
13 13 13

(ii) ~v2 = (1, −2, 5). p √


The length of ~v2 is |~v2 | = 12 + (−2)2 + 52 = 30.
A unit vector u2 in the direction of v2 may then be obtained as follows:
1 1 −2 5
u2 = √ (1, −2, 5) = ( √ , √ , √ ).
30 30 30 30
(iii) ~v3 = (1, 1, 1, 1). √ √
The length of ~v3 is |~v3 | = 12 + 12 + 12 + 12 = 4 = 2.
A unit vector u3 in the direction of v3 may then be obtained as follows:
1 1 1 1 1
u3 = (1, 1, 1, 1) = ( , , , ).
2 2 2 2 2

Example. Consider the two vectors ~a = (2, 1, −4) and ~b = (3, −2, 5) in
3-dimensional space. Find the cross-product of the two vectors.

Solution:
~k

~i ~j

~a × ~b =

2 1 −4
3 −2 5


1 −4 ~ 2 −4 ~ 2 1 ~k

= i− j+
−2 5 3 5 3 −2
= −3~i − 22~j − 7~k
= (−3, −22, −7)
1.7 Quiz
d
1. Find dx
(tan(x2 ) cos(ex ).
Solution:
d d d
(tan(x2 ) cos(ex ) = tan(x2 ) (cos(ex ) + cos(ex ) (tan(x2 ))
dx dx dx
= tan(x )(− sin(e ))e + cos(e )(sec2 (x2 ))2x
2 x x x

2. Find the stationary points of y = x3 − 3x.


Solution: Let f (x) = x3 − 3x. Since f (x) is continuous, the stationary
points of f (x) are the solutions of the equation f 0 (x) = 0.

f (x) = x3 − 3x
f 0 (x) = 3x2 − 3
3x2 − 3 = 0
3(x − 1)(x + 1) = 0

The stationary points are: {1, −1}.


R
3. Find x ln xdx.
Solution:
x2
u = ln(x); v=
2
dx
du = ; dv = xdx
Z Zx Z
x ln(x)dx = udv = uv − vdu
x2 ln(x)
Z 2
x dx
= −
2 2 x
2 Z
x ln(x) x
= − dx
2 2
x2 ln(x) x2
= − + C.
2 4

4. Find (1 − x2 )−1/2 dx.


R
Solution: Z
1 u
√ du = sin−1 ( ) + c.
a2 − u2 a
Therefore, Z
(1 − x2 )−1/2 dx = sin−1 (x) + c.

 
2 −1
5. Find the determinant of .
−6 2
Solution:
2 −1
−6 2 = (2)(2) − (−1)(−6) = −2

6. Let ~u = (2, 3, 5), ~v = (1, 1, −1). Find

(a) The unit vector in the direction of ~v = (1, 1, −1) is,


1 1 1 1 −1
p (1, 1, −1) = √ (1, 1, −1) = ( √ , √ , √ ).
12 + 12 + (−1)2 3 3 3 3

(b)

3~u − 5~v = 3(2, 3, 5) − 5(1, 1, −1) = (6, 9, 15) − (5, 5, −5) = (1, 4, 20).

(c)
~u · ~v = (2, 3, 5).(1, 1, −1) = 2 + 3 − 5 = 0.
(d)

~j ~k

~i

~u × ~v = 23 5
11 −1

3 5 ~ 2 5 ~ 2 3 ~
= i− j+ k
1 −1 1 −1 1 1
= (3(−1) − (5)(1))~i − (2(−1) − (5)(1))~j + ((2)(1) − (3)(1))~k
= −8~i + 7~j − ~k
= (−8, 7, −1)
7. Determine if ~u = (−2, 2, 1, −1) is perpendicular to ~v = (−2, −3, 1, 1).

Solution:

~u.~v = (−2)(−2) + (2)(−3) + (1)(1) + (−1)(1) = −2 6= 0.

Hence the two vectors are not perpendicular.


Summary
1. The dot product of 2 vectors ~a = (a1 , a2 , a3 ) and ~b = (b1 , b2 , b3 ) in
3-dimensional space is defined to be the scalar

~a.~b = a1 b1 + a2 b2 + a3 b3 .

or
~a.~b = |~a|.|~b| cos θ.

2. Let ~a and ~b be non-zero vectors. The vectors, ~a and ~b, are perpendic-
ular to each other if and only if ~a · ~b = 0.

3. Given 2 vectors ~a = (a1 , a2 , a3 ) and ~b = (b1 , b2 , b3 ) in 3-dimensional space


the cross product ~a × ~b can be defined as follows:
~i ~j ~k

~a × ~b = a1 a2 a3

b1 b2 b3

where ~i = (1, 0, 0), ~j = (0, 1, 0), ~k = (0, 0, 1) are unit vectors along the X,
Y , Z axes respectively.

4. The direction of the cross product ~a × ~b is perpendicular to both


~a and ~b. In particular it points in the direction of your thumb if the
fingers of your right hand curl from ~a to ~b.

5. The magnitude of ~a × ~b is given by |~a|.|~b| sin θ, if θ is the angle between ~a


and ~b. It is the area of the parallelogram determined by ~a and ~b.
References
1. A complete set of notes on Pre-Calculus, Single Variable Calculus, Mul-
tivariable Calculus and Linear Algebra. Here is a link to the chapter on
Vectors.
http://tutorial.math.lamar.edu/Classes/CalcII/VectorsIntro.aspx.

2. A collection of examples, animations and notes on Multivariable Calculus.


http://people.usd.edu/ jflores/MultiCalc02/WebBook/Chapter1 3/Graphics/Chapter131 /Demo

3. Gallery of animated and graphical demonstrations of calculus and related


topics, from the University of Minnesota.
http://www.math.umn.edu/ nykamp/m2374/readings/completelist.html.

4. Links to various resources on Calculus.


http://www.calculus.org/.

5. Engineering Mathematics, by K. A. Stroud.


2 Topics in 3D Geometry
In two dimensional space, we can graph curves and lines. In three dimensional
space, there is so much extra space that we can graph planes and surfaces in
addition to lines and curves. Here we will have a very brief introduction to
Geometry in three dimensions.

2.1 Planes
Just as it is easy to write the equation of a line in 2D space, it is easy to
write the equation of a plane in 3D space.

The point-normal equation of a plane


A vector perpendicular to a plane is said to be normal to the plane and is
called a normal vector, or simply a normal.

To write the equation of a plane we need a point P (x0 , y0 , z0 ) on the plane


and a normal vector ~n = (a, b, c) to the plane.

Let P = (x0 , y0 , z0 ) be a point on the plane and ~n be a vector perpendicular


to the plane. Then a point Q(x, y, z) lies on the plane,

⇔ the vector P~Q lies on the plane,

⇔ P~Q and ~n are perpendicular,

⇔ ~n · P~Q = 0,

⇔ (a, b, c) · (x − x0 , y − y0 , z − z0 ) = 0,

⇔ a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.

Definition. The point-normal equation of a plane that contains the point


P (x0 , y0 , z0 ) and has normal vector ~n = (a, b, c) is

a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.

Example. Let P be a plane determined by the points A = (1, 2, 3), B =


(2, 3, 4), and C = (−2, 0.3). Find a vector which is normal to the plane. Find
an equation of the plane.
Solution: We need a point on the plane and a normal to the plane. The
~ AC
vector AB× ~ = (2, −3, 1) is a normal to the plane and we take A = (1, 2, 3)
as a point on the plane (you can choose B or C instead of A if you want).
The equation on the plane in point-normal form is:
2(x − 1) − 3(y − 2) + (z − 3) = 0
or equivalently,
2x − 3y + z = −1
Observe that the coefficients of x, y and z are (2, −3, 1) which is the normal
to the plane.

2.2 Lines
Vector equation of a line
To write the vector equation of a line, we need a point P (x0 , y0 , z0 ) on the
line and a vector ~v = (a, b, c) that is parallel to the line.

Definition. The vector equation of a line that contains the point P (x0 , y0 , z0 )
and is parallel to the vector ~v = (a, b, c) is:
P + t~v = ~r, where t is scalar.
or,
(x0 , y0 , z0 ) + t(a, b, c) = (x, y, z)
(x0 + ta, y0 + tb, z0 + tc) = (x, y, z)

Parametric equation of a line


The parametric equation of a line is derived from the vector equation of a
line.
Definition. The parametric equation of a line that contains the point P (x0 , y0 , z0 )
and is parallel to the vector ~v = (a, b, c) is:
x = x0 + ta
y = y0 + tb
z = z0 + tc
Example. Let L which passes through the points P (1, 1, 1) and Q(3, 2, 1).
Find a vector which is parallel to the line. Find the vector-equation and
parametric equation of the line.

Solution: The vector P~Q = (2, 1, 0) is parallel to the line and we take the
point P (1, 1, 1) on the line.
The vector equation of the line:

(1, 1, 1) + t(2, 1, 0) = (x, y, z)

The parametric equations of the line:

x = 1 + 2t
y = 1+t
z = 1

Example. Find the equation of the plane which contains the point (0, 1, 2)
and is perpendicular to the line (1, 1, 1) + t(2, 1, 0) = (x, y, z).

2.3 Surfaces
The graph in 3D space of an equation in x, y and z is a surface. Often the
graph is too difficut to draw, but here we sketch the graph of a few special
types of equations whose graphs are easy to visualize.

Cylindrical surfaces
The graph in 3D space of an equation containing only one or two of the three
variables x, y, z is called a cylindrical surface.

Example. Plot y = x2 .

Plot x2 + y 2 = 5.
Quadric Surfaces
The graph in 2D space of a second degree equation in x and y is an ellipse,
parabola or hyperbola. In 3D space, the graph of a second degree equation
in x, y and z is one of six quadric surfaces.
x2 y2 z2
1. Ellipsoid a2
+ b2
+ c2
=1
x2 y2
2. Elliptic Cone z2 = a2
+ b2

x2 y2
3. Elliptic Paraboloid z= a2
+ b2

y2 x2
4. Hyperbolic Parabolid z= b2
− a2

x2 y2 z2
5. Elliptic Hyperboloid of one sheet a2
+ b2
− c2
=1
2 y2 z2
6. Elliptic Hyperboloid of two sheets − xa2 − b2
+ c2
=1

Cross-sections of some quadric surfaces


2.4 Functions of several variables
So far you have studied about functions of one variable, e.g.

f (x) = x + x2 .

You have learned how to graph these functions, perhaps how to determine
their domain and range. You have gone much further in your quest to under-
stand functions; you have learned how to differentiate them, then to calculate
the maximum and minimum, then to integrate them. At the end of this term
(in March) we will learn how to write a function as a sum of simpler func-
tions, i.e. a Fourier Series Expansion of a function.

However now we will learn something new. We will learn about functions
of several variables, e.g.

f (x, y) = x2 + y 2 , g(x, y) = 2xy + 7, h(x, y) = ex + 2y.

You will very quickly see that although the concepts are new, the techniques
are old and familiar.

Graphs and Level Sets


To draw the graph of f (x) = x2 , we drew the graph of the equation

y = x2 .

Similarly, to draw the graph of the equation f (x, y) = x2 + y 2 , we draw the


graph of the equation
z = x2 + y 2 .
We now use the methods developed in the last lecture to draw graphs. Note
however that it is difficult to graph general surfaces.

Remark. The graph of a function f (x) of one variable is the graph of the
equation y = f (x), a curve in 2D space. The graph of a function f (x, y) of
two variables is the graph of the equation z = f (x, y), a surface in 3D space.
The graph of a function f (x, y, z) is a set of points in 4D space and we cannot
draw the graph.
One way to understand functions of two or more variables is by using
level sets.
Example. An example of level sets is a topographic map, which maps hills
and valleys in a region by drawing curves indicating height or elevation. If
h(x, y) is the height function over a region, say the Himalayas, then if we
mark all the points (x, y) on the ground at which the height of the mountain
is h(x, y) = 3000m, we get the level set for L = 3000. If we draw the level
sets for different heights, e.g 2000m, 3000m, 4000m, 5000m, 6000m, 7000m,
8000m, we get a rough topographic map for the Himalayas.
Note that we draw the level sets on the ground, i.e. in the domain.
Definition. We fix a number C. The level set of a function of two variables
f (x, y) is the set of points (x, y) in the domain which satisfy the equation

f (x, y) = C.

For every real number C, we get a level set.


In general, for a fixed number C and a function of several variables f , we
define the level set to be the collection of points in the domain which satisfy
the equation f = C.

Example. Let f (x) = x2 and say the constant c = 1. The level set is the
set of points such that

x2 = 1
x = 1, −1

Hence the level set for c = 1 is {1, -1}.

Example. Find the level sets of the function f (x, y) = x2 +y 2 for C = 1, 4, 9.

Solution: The level sets are

C = 1 : x2 + y 2 = 1; a circle of radius 1
C = 4 : x2 + y 2 = 4; a circle of radius 2
C = 9 : x2 + y 2 = 9; a circle of radius 3
Example. Let f (x, y, z) = x2 + y 2 + z 2 and say the constant c = 1. The
level set is the set of points such that

x2 + y 2 + z 2 = 1

Hence the level set for c = 1 is the set of points lying on the unit sphere.

2.5 Change of Coordinates


Before we describe cylindrical and spherical coordinate systems, we will recall
the polar coordinate system in 2D space.

Polar Coordinates
The polar coordinate system is equivalent to the rectangular coordinate sys-
tem. It locates points using two coordinates r and θ. The coordinate r is the
distance from a point to the origin, and θ is the angle used in trignometry
which measures the counterclockwise rotation from the positive X-axis.

Conversion from rectangular to polar coordinates:


Let (x, y) be a point in the rectangular coordinate system.
p
r = x2 + y 2
y
tan θ =
x
Conversion from polar to rectangular coordinates:
Let (r, θ) be a point in the polar coordinate system.

x = r cos θ
y = r sin θ

Example. Express in polar coordiantes the portion of the unit disc that lies
in the first quadrant.
Solution: The region may be expressed in polar coordinates as

0 ≤ r ≤ 1; 0 ≤ θ ≤ π/2
Example. Express in polar coordinates the function

f (x, y) = x2 + y 2 + 2yx.

Solution: We substitute x = r cos θ and y = r sin θ in f , to get

f (r, θ) = r2 cos2 (θ) + r2 sin2 (θ) + 2r2 sin(θ) cos(θ) = r2 (1 + sin(2θ)).

Cylindrical Coordinates
This is a three dimensional extension of plane polar coordinates. Cylindrical
coordinates are given by the 3-tuple (r, θ, z), the polar coordinates of the
X, Y plane and the rectangular coordinate z. Given a point (x, y, z) in 3-
dimensional space, to calculate r and θ we project the point to the XY-plane
(x, y, z) 7→ (x, y). Then calculate (r, θ) as in the previous section.
Cylindrical to Rectangular Conversion Formulas:

x = r cos θ
y = r sin θ
z = z

Rectangular to Cylindrical Conversion Formulas:


p
r = x2 + y 2
y
θ = tan−1 ( )
x
z = z

Example. 1. Express in cylindrical coordinates the function


p
f (x, y, z) = x2 + y 2 + z 2 − 2z x2 + y 2

2. Express in rectangular coordinates the equation

r = sin θ
Spherical coordinates
Spherical coordinates consist of the 3-tuple (ρ, θ, φ). These are determined
as follows:

1. ρ = the distance from the origin to the point.

2. θ = the same angle that we saw in polar/cylindrical coordinates.

3. φ = the angle between the positive z-axis and the line from the origin to
the point.

Spherical to Rectangular Conversion Formulas:

x = ρ sin(φ) cos(θ)
y = ρ sin(φ) sin(θ)
z = ρ cos φ

Rectangular to Spherical Conversion Formulas:


p
ρ = x2 + y 2 + z 2
y
θ = tan−1 ( )
x
−1 z
φ = cos ( )
ρ

Example. 1. Express in spherical coordinates the function

f (x, y, z) = x2 + y 2 + z 2

2. Express in rectangular coordinates the equation

ρ=5
Summary
1. The point-normal equation of a plane that contains the point P (x0 , y0 , z0 )
and has normal vector ~n = (a, b, c) is
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.

2. The vector equation of a line that contains the point P (x0 , y0 , z0 ) and
is parallel to the vector ~v = (a, b, c) is:
P + t~v = ~r, where t is scalar.

3. The parametric equation of a line that contains the point P (x0 , y0 , z0 )


and is parallel to the vector ~v = (a, b, c) is:
x = x0 + ta
y = y0 + tb
z = z0 + tc

4. We fix a number C. The level set of a function of two variables f (x, y)


is the set of points (x, y) in the domain which satisfy the equation
f (x, y) = C.

5. The polar coordinate system has two coordinates r and θ. The coor-
dinate r is the distance from a point to the origin, and θ is the angle
used in trignometry which measures the counterclockwise rotation from
the positive X-axis.
Conversion from rectangular to polar coordinates:
Let (x, y) be a point in the rectangular coordinate system.
p
r = x2 + y 2
y
tan θ =
x
Conversion from polar to rectangular coordinates:
Let (r, θ) be a point in the polar coordinate system.
x = r cos θ
y = r sin θ
6. Cylindrical Coordinate System is a three dimensional extension of
plane polar coordinates. Cylindrical coordinates are given by the 3-tuple
(r, θ, z), the polar coordinates of the X, Y plane and the rectangular
coordinate z.
Cylindrical to Rectangular Conversion Formulas:
x = r cos θ
y = r sin θ
z = z

Rectangular to Cylindrical Conversion Formulas:


p
r = x2 + y 2
y
θ = tan−1 ( )
x
z = z

7. Spherical coordinates consist of the 3-tuple (ρ, θ, φ). These are deter-
mined as follows:
1. ρ = the distance from the origin to the point.
2. θ = the same angle that we saw in polar/cylindrical coordinates.
3. φ = the angle between the positive z-axis and the line from the origin
to the point.
Spherical to Rectangular Conversion Formulas:
x = ρ sin(φ) cos(θ)
y = ρ sin(φ) sin(θ)
z = ρ cos φ

Rectangular to Spherical Conversion Formulas:


p
ρ = x2 + y 2 + z 2
y
θ = tan−1 ( )
x
−1 z
φ = cos ( )
ρ
References

1. A complete set of notes on Pre-Calculus, Single Variable Calculus, Mul-


tivariable Calculus and Linear Algebra. Here is a link to the chapter on
Lines, Planes and Quadric Surfaces. Also read the section on cylindrical
and spherical coordiantes.
http://tutorial.math.lamar.edu/Classes/CalcIII/3DSpace.aspx.

2. Another gallery of animated and graphical demonstrations of calculus and


related topics, from the University of Minnesota.
http://www.math.umn.edu/%7Erogness/quadrics/.

3. Links to various resources on Calculus.


http://www.calculus.org/.
3 Partial Derivatives
3.1 First Order Partial Derivatives
A function f (x) of one variable has a first order derivative denoted by f 0 (x)
or
df f (x + h) − f (x)
= lim .
dx h→0 h
It calculates the slope of the tangent line of the function f at x.

A function f (x, y) of two variables has two first order partials ∂f , ∂f .


∂x ∂y
Just like in the one variable case, the partial derivatives are also related to
the tangent plane of the function at a point (x, y)
Definition. ∂f ∂x
is defined as the derivative of f with respect to x with y
treated as a constant.
∂f f (x + h, y) − f (x, y)
= lim .
∂x h→0 h
∂f
∂y
is defined as the derivative of f with respect to y with x treated as a
constant.
∂f f (x, y + k) − f (x, y)
= lim .
∂y k→0 k
∂f
Similarly for f (x, y, z) we can define three first order partial derivatives: ∂x
,
∂f ∂f
, .
∂y ∂z

Example. 1.

f (x, y) = x2 y 3
∂f ∂x2 y 3 ∂x2
= = y3 = 2xy 3
∂x ∂x ∂x
∂f ∂x2 y 3
= = 3x2 y 2
∂y ∂y

2. f (x, y, z) = ze2x+3y+4z
3. f (x, y) = x2 + y 2

Example. Consider the function f (x, y) = x2 /y. Calculate the first order
partial derivatives and evaluate them at the point P (2, 1).
Solution:
x2
f (x, y) =
y
∂f 2x
=
∂x y
∂f
|{x=2,y=1} = 4
∂x
∂f x2
= − 2
∂y y
∂f 1
|{x=2,y=1} = −
∂y 9
Remark. Partial derivatives are used in the same manner as the derivative
of a function of one variable. The partial of f (x, y) with respect to x is
the rate of change (or the slope) of f with respect to x as y stays constant.
Similarly the partial of f (x, y) with respect to y is the rate of change (or the
slope) of f with respect to y as x stays constant. For instance in the above
example, the slope of the function at the point P (2, 1) in the x direction is
4, and the slope of the function at P in the y direction is −1/9.

Theorem Let f (x, y) be a function of two variables and let P = (x0 , y0 , z0 )


be a point on the graph. The equation of the tangent plane to the graph of
f (x, y) at the point P is given by,
∂f ∂f
z − z0 = |{x0 ,y0 } (x − x0 ) + |{x ,y } (y − y0 ).
∂x ∂y 0 0
We will see why this is true after we study about Directional Derivative and
the Gradient. For now we accept it as a result and use it in some examples.
Example. We consider the function f (x, y) = x3 + 2xy − y + 3. Compute
the tangent plane at the point P0 = (1, 2, 6) on the graph of the function.
Solution: (Solution:) We first check that the point P0 (x0 , y0 , z0 ) = (1, 2, 6)
is on the graph of the function:

f (1, 2) = 13 + 2.1.2 − 2 + 3
= 1+4−2+3
= 6
So the point P0 does indeed lie on the graph of f .
Then the equation of the tangent plane is given by;
∂f ∂f
z − z0 = |(x − x0 ) + |{x ,y } (y − y0 ).
∂x {x0 ,y0 } ∂y 0 0

∂f
|{1,2} = 3x2 + 2y|{1,2} = 7
∂x
∂f
|{1,2} = 2x − 1|{1,2} = 1
∂y
z − 6 = 7(x − 1) + 1(y − 2)
= 7x − 7 + y − 2
z = 7x + y − 3
Therefore the tangent plane to the graph of f at (1, 2, 6) is
7x + y − z − 3 = 0.

3.2 Higher Order Partial Derivatives


If f is a function of several variables, then we can find higher order partials
in the following manner.
Definition. If f (x, y) is a function of two variables, then ∂f ∂x
and ∂f
∂y
are
also functions of two variables and their partials can be taken. Hence we can
differentiate them with respect to x and y again and find,

∂2f
∂x2
, the derivative of f taken twice with respect to x,
∂2f
∂x∂y
, the derivative of f with respect to y and then with respect to x,
∂2f
∂y∂x
, the derivative of f with respect to x and then with respect to y,
∂2f
∂y 2
, the derivative of f taken twice with respect to y.

∂ f 3
We can carry on and find ∂x∂y 2 , which is taking the derivative of f first with

respect to y twice, and then differentiating with respect to x, etc. In this


manner we can find nth-order partial derivatives of a function.
∂2f ∂2f
Theorem ∂x∂y
and ∂y∂x
are called mixed partial derivatives. They are equal
∂2f ∂2f
when ∂x∂y
and ∂y∂x
are continuous.

Note. In this course all the fuunctions we will encounter will have equal
mixed partial derivatives.

Example. 1. Find all partials up to the second order of the function

f (x, y) = x4 y 2 − x2 y 6 .

Solution:
∂f
= 4x3 y 2 − 2xy 6
∂x
∂f
= 2x4 y − 6x2 y 5
∂y
∂2f
= 12x2 y 2 − 2y 6
∂x2
∂2f
= 8x3 y − 12xy 5
∂y∂x
∂2f
= 2x4 − 30x2 y 4
∂y 2
∂2f
= 8x3 − 12xy 5
∂x∂y
Notations:
∂f
fx =
∂x
∂f
fy =
∂y
∂2f
fxx =
∂x2
∂2f
fyy =
∂y 2
∂2f
fxy =
∂x∂y
3.3 Chain Rule
You are familiar with the chain rule for functions of one variable: if f is
a function of u, denoted by f = f (u), and u is a function of x, denoted
u = u(x). Then
df df du
= .
dx du dx

Chain Rules for First-Order Partial Derivatives


For a two-dimensional version, suppose z is a function of u and v, denoted

z = z(u, v)

and u and v are functions of x and y,

u = u(x, y) and v = v(x, y)

then
∂z ∂z ∂u ∂z ∂v
= +
∂x ∂u ∂x ∂v ∂y
∂z ∂z ∂u ∂z ∂v
= +
∂y ∂u ∂y ∂v ∂y
oo z PPP
o PPP
ooo
∂z ∂z
∂u
oo PP∂vP
ooo PPP
wooo PP'
u v
 ??? ∂u  ??? ∂y
∂x  ∂x 
∂u ∂v ∂v
 ??∂y  ??
 ??  ??
 ?  
x y x y

Example. 1. Find the first partial derivatives using chain rule.

z = z(u, v)
u = xy
v = 2x + 3y
Solution:
∂z ∂z ∂u ∂z ∂v
= +
∂x ∂u ∂x ∂v ∂x
∂z ∂z
= y +2
∂u ∂v
∂z ∂z ∂u ∂z ∂v
= +
∂y ∂u ∂y ∂v ∂y
∂z ∂z
= x +3
∂u ∂v

Chain Rule for Second Order Partial Derivatives


To find second order partials, we can use the same techniques as first order
partials, but with more care and patience!

Example. Let

z = z(u, v)
u = x2 y
v = 3x + 2y
∂2z
1. Find ∂y 2
.
∂2z
Solution: We will first find ∂y 2
.

∂z ∂z ∂u ∂z ∂v ∂z ∂z
= + = x2 +2 .
∂y ∂u ∂y ∂v ∂y ∂u ∂v
Now differentiate again with respect to y to obtain

∂2z ∂ 2 ∂z ∂ ∂z
2
= (x )+ (2 )
∂y ∂y ∂u ∂y ∂v
∂ ∂z ∂ ∂z
= x2 ( ) + 2 ( )
∂y ∂u ∂y ∂v
Note that z is a function of u and v, and u and v are functions of x and
∂z ∂z
y. Then the partial derivatives ∂u and ∂v are also initially functions of
u and v and eventually functions of x and y. In other words we have
the same dependence diagram as z.
∂ ∂z ∂2z ∂2z
Note. ( )
∂y ∂u
6= ∂y∂u
). Never write ∂y∂u
as it is mathematically mean-
ingless.

∂z
p ∂u NNNN ∂ 2 z
∂2z
∂u2 ppppp NNN∂v∂u
pp pp NNN
NNN
ppp N'
w p
u ? ∂u v ? ∂v
 ?? ∂y  ??? ∂y
∂x 
∂u ∂v
 ??? ∂x
 ??
 ??  ?
  
x y x y

∂z
N
∂2z
ppppp ∂v NNNNN ∂ 22z
∂u∂v
ppp N∂v
NNN
p pp NNN
pw p p N'
u ? ∂u v
 ??? ∂y  ??? ∂y
∂x  ∂x 
∂u ∂v ∂v
 ??  ??
 ??  ??
   
x y x y
Therefore
∂2z 2
2 ∂ z ∂u ∂ 2 z ∂v ∂ 2 z ∂u ∂ 2 z ∂v
= x ( + ) + 2( + )
∂y 2 ∂u2 ∂y ∂v∂u ∂y ∂u∂v ∂y ∂v 2 ∂y
∂2z ∂2z ∂2z ∂2z
= x2 (x2 2 + 2 ) + 2(x2 + 2 2)
∂u ∂v∂u ∂u∂v ∂v
The mixed partials are equal so the answer simplifies to

∂2z 2
4∂ z
2
2 ∂ z ∂2z
= x + 4x + 4 .
∂y 2 ∂u2 ∂u∂v ∂v 2

∂2z
2. Find ∂x∂y

Solution: We have
∂z ∂z ∂u ∂z ∂v ∂z ∂z
= + = x2 +2 .
∂y ∂u ∂y ∂v ∂y ∂u ∂v
Now differentiate again with respect to x to obtain
∂2z ∂ 2 ∂z ∂ ∂z
= (x )+ (2 )
∂x∂y ∂x ∂u ∂x ∂v
∂ ∂z ∂z ∂ ∂z
= x2 ( ) + 2x +2 ( )
∂x ∂u ∂u ∂x ∂v
2 2
∂ z ∂u ∂ z ∂v ∂z ∂ 2 z ∂u ∂ 2 z ∂v
= x2 ( 2 + ) + 2x + 2( + )
∂u ∂x ∂v∂u ∂x ∂u ∂u∂v ∂x ∂v 2 ∂x
∂2z ∂2z ∂z ∂2z ∂2z
= x2 (2xy 2 + 3 ) + 2x + 2(2xy + 3 2)
∂u ∂v∂u ∂u ∂u∂v ∂v
2 2 2
∂ z ∂ z ∂z ∂ z
= 2x3 y 2 + (3x2 + 4xy) ) + 2x +6 2
∂u ∂v∂u ∂u ∂v

3.4 Maxima and Minima


Recall from 1-dimensional calculus, to find the points of maxima and minima
of a function, we first find the critical points i.e where the tangent line is
horizontal f 0 (x) = 0. Then
(i) If f 00 (x) > 0 the gradient is increasing and we have a local minimum.

(ii) If f 00 (x) < 0 the gradient is decreasing and we have a local maximum.

(iii) If f 00 (x) = 0 it is inconclusive.

Critical Points of a function of 2 variables


We are now interested in finding points of local maxima and minima for a
function of two variables.
Definition. A function f (x, y) has a relative minimum (resp. maximum)
at the point (a, b) if f (x, y) ≥ f (a, b) (resp. f (x, y) ≤ f (a, b)) for all points
(x, y) in some region around (a, b)
Definition. A point (a, b) is a critical point of a function f (x, y) if one of
the following is true
(i) fx (a, b) = 0 and fy (a, b) = 0

(ii) fx (a, b) and/or fy (a, b) does not exist.


Classification of Critical Points
We will need two quantities to classify the critical points of f (x, y):

1. fxx , the second partial derivative of f with respect to x.


2
2. H = fxx fyy − fxy the Hessian

If the Hessian is zero, then the critical point is degenerate. If the Hessian
is non-zero, then the critical point is non-degenerate and we can classify the
points in the following manner:

case(i) If H > 0 and fxx < 0 then the critical point is a relative maximum.

case(ii) If H > 0 and fxx > 0 then the critical point is a relative minimum.

case(iii) If H < 0 then the critical point is a saddle point.

Example. Find and classify the critical points of

12x3 + y 3 + 12x2 y − 75y.

Solution: We first find the critical points of the function.

fx = 36x2 + 24xy = 12x(3x + 2y)


fy = 3y 2 + 12x2 − 75 = 3(4x2 + y 2 − 25).

The critical points are the points where fx = 0 and fy = 0

fx = 0
12x(3x + 2y) = 0

Therefore either x = 0 or 3x + 2y = 0. We handle the two cases separately;

case(i) x = 0.
Then substituting this in fy we get fy = 3(y 2 − 25) = 0 implies y = ±5.
case(ii) 3x + 2y = 0.
Then y = −3x/2 and substituting this in fy we get,

9x2
fy = 3(4x2 + − 25)
4
3
= (16x2 + 9x2 − 100)
4
3
= (25x2 − 100)
4
75 2
= (x − 4)
4
fy = 0
75 2
(x − 4) = 0
4
x2 − 4 = 0
x = ±2

Thus we have found four critical points: (0, 5), (0, −5), (2, −3), (−2, 3).
We must now classify these points.

fxx = 72x + 24y = 24(3x + y)


fxy = 24x
fyy = 6y
2
H = fxx fyy − fxy
= (24)(3x + y)(6y) − (24x)2

Points fxx H Type


(0, 5) 120 3600 Minimum
(0, −5) -120 3600 Maximum
(2, −3) 72 -3600 Saddle
(−2, 3) -72 -3600 Saddle
Summary
∂f
1. The First Order Partial Derivative is defined as the derivative of
∂x
f with respect to x, with y treated as a constant.
∂f
Similarly we can define .
∂y
∂f ∂f
2. and are also functions of two variables and hence their partials
∂x ∂y
can be taken. We can differentiate them with respect to x and y again
and find Higher Order Partial Derivatives,
∂2f ∂2f ∂2f ∂2f ∂3f
, , , , , etc.
∂x2 ∂x∂y ∂y∂x ∂y 2 ∂x3
3. Chain Rule for Partial Dericatives. Suppose z is a function of u and
v, denoted z = z(u, v) and u and v are functions of x and y, u = u(x, y)
and v = v(x, y), then
∂z ∂z ∂u ∂z ∂v ∂z ∂z ∂u ∂z ∂v
= + , = +
∂x ∂u ∂x ∂v ∂y ∂y ∂u ∂y ∂v ∂y

4. A point (a, b) is a critical point of a function f (x, y) if one of the following


is true
(i) fx (a, b) = 0 and fy (a, b) = 0
(ii) fx (a, b) and/or fy (a, b) does not exist.
5. We will need two quantities to classify the critical points of f (x, y):
1. fxx , the second partial derivative of f with respect to x.
2
2. H = fxx fyy − fxy the Hessian
If the Hessian is zero, then the critical point is degenerate. If the Hessian
is non-zero, then the critical point is non-degenerate and we can classify
the points in the following manner:
case(i) If H > 0 and fxx < 0 then the critical point is a relative maximum.
case(ii) If H > 0 and fxx > 0 then the critical point is a relative minimum.
case(iii) If H < 0 then the critical point is a saddle point.
References

1. Engineering Mathematics, by K. A. Stroud.

2. Engineering Mathematics, by K. A. Stroud.

3. A complete set of notes on Pre-Calculus, Single Variable Calculus, Mul-


tivariable Calculus and Linear Algebra. Here is a link to the chapter on
Partial Differentiation.
http://tutorial.math.lamar.edu/Classes/CalcIII/PartialDerivatives.aspx.

4. Here is a link to the chapter on Higher Order Partial Differentiation.


http://tutorial.math.lamar.edu/Classes/CalcIII/HighOrderPartialDerivs.aspx.

Here is a link to the chapter on Chain Rules.


http://tutorial.math.lamar.edu/Classes/CalcIII/ChainRule.aspx.

Here is a link to the chapter on Maxima and Minima.


http://tutorial.math.lamar.edu/Classes/CalcIII/RelativeExtrema.aspx

5. A collection of examples, animations and notes on Multivariable Calculus.


http://people.usd.edu/ jflores/MultiCalc02/WebBook/Chapter 15/

6. Links to various resources on Calculus.


http://www.calculus.org/.
4 A little Vector Calculus
4.1 Gradient
Vector Function/ Vector Fields
The functions of several variables we have so far studied would take a point
(x, y, z) and give a real number f (x, y, z). We call these types of functions
scalar-valued functions i.e. for example

f (x, y, z) = x2 + 2xyz.

We are now going to talk about vector-valued functions, where we take a


point (x, y, z) and the value of f (x, y, z) is a vector. i.e. for example

f (x, y, z) = (y, x, z 2 ) = y~i + x~j + z 2~k.

Definition. A vector function is a function that takes one or more variables


and returns a vector.

Example. 1. A vector function of a single variable:

r(t) = (2 + t, 3 + 2t, 1 − 3t).

Let us look at a few values.

f (2) = (4, 7, −5), f (−2)= (0, −1, 7),


f (1) = (3, 5, −2), f (−1) = (1, 1, 4),
f (0) = (2, 3, 1)

2. A vector function of three variables:

f (x, y, z) = (y, x, z 2 ).

Let us look at a few values.

f (0, 0, 0) = (0, 0, 0), f (1, 0, 0)= (0, 1, 0),


f (0, 1, 0) = (1, 0, 0), f (0, 0, 1)= (0, 0, 1),
f (1, 2, 1) = (2, 1, 1), f (2, 1, 2) = (1, 2, 4)
Definition. A vector field is an assignment of a vector to every point in
space.
Example. 1. In a magnetic field, we can assign a vector which describes
the force and direction to every point in space.
2. Normal to a surface. For a surface, at every point on the surface, we
can get a tangent plane and hence a normal vector to every point on the
surface.
Example. Let f (x, y, z) = x2 + 2xyz be a scalar-valued function. We then
define a vector-valued function by taking its partial derivatives.
∂f ∂f ∂f
∇f = ( , , )
∂x ∂y ∂z
= (2x + 2yz, 2xz, 2xy)
This kind of vector function has a special name, the gradient.
Definition. Suppose that f (x, y) is a scalar-valued function of two vari-
ables. Then the gradient of f is the vector function defined as,
∂f ∂f ∂f ∂f
∇f = ( , ) = ~i + ~j.
∂x ∂y ∂x ∂y
Similarly if f (x, y, z) is a scalar-valued function of three variables. Then the
gradient of f is the vector function defined as,
∂f ∂f ∂f ∂f ∂f ∂f ~
∇f = ( , , ) = ~i + ~j + k.
∂x ∂y ∂z ∂x ∂y ∂z
Remark. The gradient of a scalar function is a vector field which points in
the direction of the greatest rate of increase of the scalar function, and whose
magnitude is the greatest rate of change.
Example. Consider the scalar-valued function defined by f (x, y) = x2 + y 2 .
Find the gradient of f at the point x = 2, y = 5.
Solution: Then gradient of f , is a vector function given by,
∂f ∂f
∇f = ( , )
∂x ∂y
= (2x, 2y)
∇f (2, 5) = (4, 10)
4.2 Directional Derivative
For a function of 2 variables f (x, y), we have seen that the function can be
used to represent the surface

z = f (x, y)

and recall the geometric interpretation of the partials:

(i) fx (a, b)-represents the rate of change of the function f (x, y) as we vary
x and hold y = b fixed.

(ii) fy (a, b)-represents the rate of change of the function f (x, y) as we vary
y and hold x = a fixed.

We now ask, at a point P can we calculate the slope of f in an arbitrary


direction?
Recall the definition of the vector function ∇f ,
∂f ∂f 
∇f = , .
∂x ∂y
We observe that,

∇f · î = fx
∇f · ĵ = fy

This enables us to calculate the directional derivative in an arbitrary direc-


tion, by taking the dot product of ∇f with a unit vector, ~u, in the desired
direction.

Definition. The directional derivative of the function f in the direction ~u


denoted by D~u f , is defined to be,
∇f · ~u
D~u f =
|~u|

Example. What is the directional derivative of f (x, y) = x2 + xy, in the


direction ~i + 2~j at the point (1, 1)?
Solution: We first find ∇f .
∂f ∂f
∇f = ( , )
∂x ∂y
= (2x + y, x)
∇f (1, 1) = (3, 1)

Let u = ~i + 2~j. √ √ √
|~u| = 12 + 22 = 1+4= 5.

∇f · ~u
D~u f (1, 1) =
|~u|
(3, 1).(1, 2)
= √
5
(3)(1) + (1)(2)
= √
5
5
= √
5

= 5

Properties of the Gradient deduced from the formula of Directional


Derivatives

∇f · ~u
D~u f =
|~u|
|∇f ||~u| cos θ
=
|~u|
= |∇f | cos θ

1. If θ = 0, i.e. i.e. ~u points in the same direction as ∇f , then D~u f is


maximum. Therefore we may conclude that

(i) ∇f points in the steepest direction.


(ii) The magnitude of ∇f gives the slope in the steepest di-
rection.
2. At any point P , ∇f (P ) is perpendicular to the level set through
that point.

Example. 1. Let f (x, y) = x2 + y 2 and let P = (1, 2, 5). Then P lies on


the graph of f since f (1, 2) = 5. Find the slope and the direction of the
steepest ascent at P on the graph of f

Solution: • We use the first property of the Gradient vector. The


direction of the steepest ascent at P on the graph of f is the direction
of the gradient vector at the point (1, 2).

∂f ∂f
∇f = ( , )
∂x ∂y
= (2x, 2y)
∇f (1, 2) = (2, 4).

• The slope of the steepest ascent at P on the graph of f is the mag-


nitude of the gradient vector at the point (1, 2).
√ √
|∇f (1, 2)| = 22 + 42 = 20.

2. Find a normal vector to the graph of the equation f (x, y) = x2 + y 2 at


the point (1, 2, 5). Hence write an equation for the tangent plane at the
point (1, 2, 5).

Solution: We use the second property of the gradient vector. For a


function g, ∇g(P ) is perpendicular to the level set. So we want our
surface z = x2 + y 2 to be the level set of a function.
Therefore we define a new function,

g(x, y, z) = x2 + y 2 − z.

Then our surface is the level set

g(x, y, z) = 0
x2 + y 2 − z = 0
z = x2 + y 2
∂g ∂g ∂g
∇g = ( , , )
∂x ∂y ∂z
= (2x, 2y, −1)
∇g(1, 2, 5) = (2, 4, −1)

By the above property, ∇g(P ) is perpendicular to the level set g(x, y, z) =


0. Therefore ∇g(P ) is the required normal vector.
Finally an equation for the tangent plane at the point (1, 2, 5) on the
surface is given by

2(x − 1) + 4(y − 2) − 1(z − 5) = 0.

4.3 Curl and Divergence


We denoted the gradient of a scalar function f (x, y, z) as
∂f ∂f ∂f
∇f = ( , , )
∂x ∂y ∂z
∂ ∂ ∂
Let us separate or isolate the operator ∇ = ( ∂x , ∂y , ∂z ). We can then define
various physical quantities such as div, curl by specifying the action of the
operator ∇.

Divergence
Definition. Given a vector field ~v (x, y, z) = (v1 (x, y, z), v2 (x, y, z), v3 (x, y, z)),
the divergence of ~v is a scalar function defined as the dot product of the vector
operator ∇ and ~v ,

Div ~v = ∇ · ~v
∂ ∂ ∂
= ( , , ) · (v1 , v2 , v3 )
∂x ∂y ∂z
∂v1 ∂v2 ∂v3
= + +
∂x ∂y ∂z

Example. Compute the divergence of (x − y)~i + (x + y)~j + z~k.


Solution:

~v = ((x − y), (x + y), z)


∂ ∂ ∂
∇ = ( , , )
∂x ∂y ∂z
Div ~v = ∇ · ~v
∂ ∂ ∂
= ( , , ) · ((x − y), (x + y), z)
∂x ∂y ∂z
∂(x − y) ∂(x + y) ∂z
= + +
∂x ∂y ∂z
= 1+1+1
= 3

Curl
Definition. The curl of a vector field is a vector function defined as the
cross product of the vector operator ∇ and ~v ,

i j k
∂ ∂ ∂
Curl ~v = ∇ × ~v = ∂x ∂y ∂z

v1 v2 v3
∂v3 ∂v2 ∂v3 ∂v1 ∂v2 ∂v1
= ( − )i − ( − )j + ( − )k
∂y ∂z ∂x ∂z ∂x ∂y

Example. Compute the curl of the vector function (x − y)~i + (x + y)~j + z~k.

Solution:

i j k
∂ ∂ ∂

Curl ~v = ∇ × ~v = ∂x ∂y

∂z
(x − y) (x + y) z
∂z ∂(x + y) ∂z ∂(x − y) ∂(x + y) ∂(x − y)
= ( − )i − ( − )j + ( − )k
∂y ∂z ∂x ∂z ∂x ∂y
= (0 − 0)~i − (0 − 0)~j + (1 − (−1))~k
= 2~k
4.4 Laplacian
We have seen above that given a vector function, we can calculate the diver-
gence and curl of that function. A scalar function f has a vector function
∇f associated to it. We now look at Curl(∇f ) and Div(∇f ).

Curl(∇f ) = ∇ × ∇f
∂fz ∂fy ∂fx ∂fz ∂fy ∂fx
= ( − )i + ( − )j + ( − )k
∂y ∂z ∂z ∂x ∂x ∂y
= (fyz − fzy )i + (fzx − fxz )j + (fxy − fyx )k
= 0
Div(∇f ) = ∇ · ∇f
∂ ∂ ∂ ∂f ∂f ∂f
= ( , , )·( , , )
∂x ∂y ∂z ∂x ∂y ∂z
∂2f ∂2f ∂2f
= + +
∂x2 ∂y 2 ∂z 2
Definition. The Laplacian of a scalar function f (x, y) of two variables is
defined to be Div(∇f ) and is denoted by ∇2 f ,
∂2f ∂2f
∇2 f = + .
∂x2 ∂y 2
The Laplacian of a scalar function f (x, y, z) of three variables is defined to
be Div(∇f ) and is denoted by ∇2 f ,
∂2f ∂2f ∂2f
∇2 f = + + .
∂x2 ∂y 2 ∂z 2
Example. Compute the Laplacian of f (x, y, z) = x2 + y 2 + z 2 .
Solution:
∂2f ∂2f ∂2f
∇2 f = + +
∂x2 ∂y 2 ∂z 2
∂2x ∂2y ∂2z
= + +
∂x ∂y ∂z
= 2+2+2
= 6.
We have the following identities for the Laplacian in different coordinate
systems:
∂2f ∂2f ∂2f
Rectangular : ∇2 f = + +
∂x2 ∂y 2 ∂z 2
1 ∂ ∂f  1 ∂ 2 f
P olar : ∇2 f = r + 2 2
r ∂r ∂r r ∂θ
1 ∂ ∂f 1 ∂2f ∂2f
Cylindrical : ∇2 f =

r + 2 2 + 2
r ∂r ∂r r ∂θ ∂z
1 ∂ ∂f 1 ∂ ∂f  1 ∂2f
Spherical : ∇2 f = 2 ρ2

+ 2 sin θ + 2 2
ρ ∂ρ ∂ρ ρ sin θ ∂θ ∂θ ρ sin θ ∂φ2
Example. Consider the same function f (x, y, z) = x2 + y 2 + z 2 . We have
seen that in rectangular coordinates we get

2 ∂2f ∂2f ∂2f


∇f= + 2 + 2 = 6.
∂x2 ∂y ∂z
We now calculate this in cylindrical and spherical coordinate systems, using
the formulas given above.

1. Cylindrical Coordinates.
We have x = r cos θ and y = r sin θ so
f (r, θ, z) = r2 cos2 θ + r2 sin2 θ + z 2 = r2 + z 2 .
Using the above formula:
1 ∂ ∂f  1 ∂ 2 f ∂2f
∇2 f = r + 2 2 + 2
r ∂r ∂r r ∂θ ∂z
1 ∂ ∂(2z)
= (r2r) + 0 +
r ∂r ∂z
1
= (4r) + 2
r
= 4+2
= 6

2. Spherical Coordinates. p
We have x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ and ρ = x2 + y 2 + z 2 ,
so
f (r, θ, z) = ρ2 .
Using the above formula:

1 ∂ 2 ∂f  1 ∂ ∂f  1 ∂2f
∇2 f = ρ + sin θ +
ρ2 ∂ρ ∂ρ ρ2 sin θ ∂θ ∂θ ρ2 sin2 θ ∂φ2
1 ∂ 2
= (ρ 2ρ) + 0 + 0
ρ2 ∂ρ
1 ∂
= 2
(2ρ3 )
ρ ∂ρ
1
= (6ρ2 )
ρ2
= 6.

These three different calculations all produce the same result because ∇2
is a derivative with a real physical meaning, and does not depend on the
coordinate system being used.
Summary
1. If f (x, y) is a scalar-valued function of two variables. Then the gradient
of f is the vector function defined as,
∂f ∂f ∂f ∂f
∇f = ( , ) = ~i + ~j.
∂x ∂y ∂x ∂y

2. Important Properties about the Gradient.


• ∇f points in the steepest direction.
• The magnitude of ∇f gives the slope in the steepest direc-
tion.
• At any point P , ∇f (P ) is perpendicular to the level set through
that point.
3. The directional derivative of the function f in the direction ~u denoted
by D~u f , is defined to be,
∇f · ~u
D~u f =
|~u|

4. Given a vector field ~v (x, y, z) = (v1 (x, y, z), v2 (x, y, z), v3 (x, y, z)), the di-
vergence of the vector field, ~v is a scalar function defined as the dot
product of the vector operator ∇ and ~v ,
Div ~v = ∇ · ~v
∂ ∂ ∂
= ( , , ) · (v1 , v2 , v3 )
∂x ∂y ∂z
∂v1 ∂v2 ∂v3
= + +
∂x ∂y ∂z

5. The curl of a vector field is a vector function defined as the cross


product of the vector operator ∇ and ~v ,

i j k
∂ ∂ ∂
Curl ~v = ∇ × ~v = ∂x ∂y ∂z
v1 v2 v3
∂v3 ∂v2 ∂v3 ∂v1 ∂v2 ∂v1
= ( − )i − ( − )j + ( − )k
∂y ∂z ∂x ∂z ∂x ∂y
6. We have the following identities for the Laplacian in different coordinate
systems:

∂2f ∂2f ∂2f


Rectangular : ∇2 f = + +
∂x2 ∂y 2 ∂z 2
1 ∂ ∂f  1 ∂ 2 f
P olar : ∇2 f = r + 2 2
r ∂r ∂r r ∂θ
1 ∂ ∂f 1 ∂2f ∂2f
Cylindrical : ∇2 f =

r + 2 2 + 2
r ∂r ∂r r ∂θ ∂z
1 ∂ 2 ∂f 1 ∂ ∂f  1 ∂2f
Spherical : ∇2 f = 2

ρ + 2 sin θ + 2 2
ρ ∂ρ ∂ρ ρ sin θ ∂θ ∂θ ρ sin θ ∂φ2
References

1. A briliant animated example, showing that the maximum slope at a point


occurs in the direction of the gradient vector. The animation shows:

• a surface
• a unit vector rotating about the point (1, 1, 0), (shown as a rotating
black arrow at the base of the figure)
• a rotating plane parallel to the unit vector, (shown as a grey grid)
• the traces of the planes in the surface, (shown as a black curve on
the surface)
• the tangent lines to the traces at (1, 1, f (1, 1)), (shown as a blue
line)
• the gradient vector (shown in green at the base of the figure)

http://archives.math.utk.edu/ICTCM/VOL10/C009/dd.gif

2. A complete set of notes on Pre-Calculus, Single Variable Calculus, Multi-


variable Calculus and Linear Algebra.
Here is a link to the chapter on Directional Derivatives.
http://tutorial.math.lamar.edu/Classes/CalcIII/DirectionalDeriv.aspx.

Here is a link to the chapter on Curl and Divergence.


http://tutorial.math.lamar.edu/Classes/CalcIII/CurlDivergence.aspx
5 Linear Equations
5.1 Introduction
A linear equation is one in which all the unknown variables occur with a
power of one. i.e. 2x + y = 5 is a linear equation, but 2x2 + y = 5 is not a
linear equation, since x has power 2. Systems of linear equations are common
in all branches of science. In school you have learned how to simultaeously
solve a system of two linear equations.

Example. Solve the system of two linear equations:

3x1 + 2x2 = 7
−x1 + x2 = 6

Then x1 = −1 and x2 = 5 is a solution. Geometrically the solution, (−1, 5)


is the point of intersection of the two lines given by the two equations.

Here is an example of larger system of equations from Chemistry.

Example. Under certain controled conditions, mix toulene C7 H8 and nitric


acid HN O3 to produce trinitrotoluene (TNT) C7 H5 O6 N3 along with wa-
ter. In what proportion should those components be mixed, i.e. solve for
x, y, z, w?

Solution: The number of atoms of each element before the reaction must
equal the number present afterward!

xC7 H8 + yHN O3 → zC7 H5 O6 N3 + wH2 O

So we get the system:

7x = 7z
8x + y = 5z + 2w
y = 3z
3y = 6z + w

To find the answer we need to solve the above system of linear equations. In
this chapter we will learn an easy method to solve such a system.
5.2 Row Echelon Form
In this section we learn Gauss’ Method to solve a system of linear equations.
We will use an example to understand the method.

Example. Find the solution set of the linear system

x− y =0
2x − 2y + z + 2w =4
y + w =0
2z + w =5

We can abbreviate this linear system with the linear array of numbers,
whose entries are the coefficients of the equations:
 
1 −1 0 0 0
 2 −2 1 2 4 
 0 1 0 1 0 .
 

0 0 2 1 5

The vertical bar just reminds us that the coefficients of the system are on
the left hand side of the bar and the constants are on the right. We call this
an augmented matrix. We can now proceed with Gauss’ method.
   
1 −1 0 0 0 1 −1 0 0 0
 2 −2 1 2 4  R2 =−2R1 +R2  0 0 1 2 4 

 0 1 0 1 0 
 → 
 0 1 0 1 0 

0 0 2 1 5 0 0 2 1 5
 
1 −1 0 0 0
R2 ↔R3  0 1 0 1 0 
→ 
 0 0 1 2 4 

0 0 2 1 5
 
1 −1 0 0 0
 0 1 0 1
R4 =−2R3 +R4

0 
 0 0 1 2 4 
0 0 0 −3 −3
The resulting equations are:

x−y =0
y + w =0
z + 2w =4
w =1

Back substitution now gives w = 1, z = 2, y = −1, x = −1.

Remark. We transformed the original set of equations into a new simpler


set of equations, using row operations to put the array into Row Echelon
Form. You probably have two questions;

1. Why do both the original system and the new simpler system has the
same solution set?
Answer: There is a theorem which says that, if a linear system is changed
to another by one of these operations

• an equation is swapped with another


• an equation has both sides multiplied by a nonzero constant
• an equation is replaced by the sum of itself and a multiple of another

then the two systems have the same set of solutions.

2. What is Row Echelon Form?


Answer: A matrix is in Row Echelon Form if it has the following form:

• any all zero rows are at the bottom of the reduced matrix
• in a non-zero row, the first-from-left nonzero value is 1
• the first 1 in each row is to the right of the first 1 in the row above

Algorithm to put a matrix into Row Echelon Form


(i) Make the first entry of the first row non-zero by doing a swap if neces-
sary.

(ii) Make the first entry of the first row 1 by multiplying by the reciprocal.
The first row is now in the correct form.
(iii) Make the first entry in all the rows below the first row 0, by using
 the1
1
 0 
 
 0 
in the first row above to subtract. The first column should be   . 

 
 . 
.

(iv) Now we can disregard the first row and first column because they are in
the correct form, and only consider the remaining sub-matrix. Repeat
steps (i)-(iv) for the smaller sub-matrix. Continue till you run out of
rows.

5.3 Soving Systems of Linear Equations


The solution set of a system of linear equations can be;

(i) Empty

(ii) A one point set, i.e. a unique solution

(iii) Infinite

An example where the solution set is empty.

Example. Consider the system

x + 2y = 4
x + 2y = 8

The augmented matrix:


   
1 2 4 R2 =−R1 +R2 1 2 4

1 2 8 0 0 4

We get the absurd relation 0 = 4. This implies that the system has no
solutions. Geometrically the two equations are two parallel lines and hence
there is no point of inersection.
An example where the solution set is infinite.

Example. Consider the linear system of equations

x1 + 2x2 =4
x2 − x3 = 0
x1 + 2x3 = 4.

The associated augmented matrix:


 
1 2 0 4
 0 1 −1 0 
1 0 2 4

We now reduce the matrix to row echelon form:


   
1 2 0 4 1 2 0 4
R =−R +R3
 0 1 −1 0  3 →1  0 1 −1 0 
1 0 2 4 0 −2 2 0
 
1 2 0 4
R3 =2R2 +R3
→  0 1 −1 0 
0 0 0 0

We have a bottom row of zeros. The second row gives x2 = x3 and the first
row gives x1 + 2x2 = 4. We can express x1 in terms of x3 to get x1 = 4 − 2x3 .
Thus we can write the solution set for the system in the following manner:
 
x1
S = { x2  |x2 = x3 and x1 = 4 − 2x3 }
x3
 
4 −2x3
= { x3 }
x3
   
4 −2
= { 0  +  1  x3 |x3 ∈ R}
0 1
Example. Consider the linear system

x+y+ z− w =1
y− z+ w = −1
3x + 6z − 6w =6
−y+ z− w =1

The associated augmented matrix


 
1 1 1 −1 1
 0
 1 −1 1 −1 

 3 0 6 −6 6 
0 −1 1 −1 1

Gauss’ Method:
   
1 1 1 −1 1 1 1 1 −1 1
 0 1 −1 1 −1 
 R3 =−3R +R3  0 1 −1 1 −1 
 →1  
 3 0 6 −6 6   0 −3 3 −3 3 
0 −1 1 −1 1 0 −1 1 −1 1
 
1 1 1 −1 1
R3 =3R2 +R3 ,R4 =R2 +R4  0 1 −1 1 −1 
→ 
 0 0 0

0 0 
0 0 0 0 0

x+y+z−w =1
y − z + w = −1

From the second equation we get, y = −1 + z − w and substituting this in


equation one we get, x + (−1 + z − w) + z − w = 1. Solving for x we get
x = 2 − 2z + 2w. The soution set:
 
x
 y 
S = {  z  |y = −1 + z − w and x = 2 − 2z + 2w}

w
 
2 −2z +2w
 −1 +z −w 
= { }
 z 
w
     
2 −2 2
 −1   1   −1 
= { 0  +  1  z +  0  w|z, w ∈ R}
    

0 0 1
Summary
1. A matrix is in Row Echelon Form if it has the following form:

• any all zero rows are at the bottom of the reduced matrix
• in a non-zero row, the first-from-left nonzero value is 1
• the first non-zero entry in each row is to the right of the first non-zero
entry in the row above

2. Algorithm to put a matrix into Row Echelon Form:

(i) Make the first entry, of the first row, non-zero by doing a swap if
necessary.
(ii) Make the first entry, of the first row 1, by multiplying by the recip-
rocal. The first row is now in the correct form.
 
1
 0 
 
 0 
(iii) The first column should be   . . We do it, by making the first

 
 . 
.
entry in all the rows below the first row 0, by using the 1 in the first
row.
(iv) Now we can disregard the first row and first column because they
are in the correct form, and only consider the remaining sub-matrix.
Repeat steps (i)-(iv) for the smaller sub-matrix. Continue till you
run out of rows.

3. Algorithm to solve a system of equations:

(i) Find the augmented matrix


(ii) Reduce the augmented matrix to row echelon form
(iii) Back substitute to find the solution set.
6 Linear Independence and Gram Schmidt
6.1 Linear Combination of Vectors
A linear combination of vectors in R2 is a sum of the form
   
x1 y1
α +β .
x2 y2
e.g.    
1 0
2 + (−4) .
−1 1
A linear combination of vectors in R3 is a sum of the form
   
x1 y1
α  x2  + β  y2  .
x3 y3
e.g.    
1 0
2  −1  + (−4)  1  .
1 3
     
9 1 2 3
Example. Write  8  as a linear combination of the vectors  2 , −1 , 1 .
3 3 0 −1

Solution: We need to find scalars like x, y, z such that


      
1 2 3 9
x  2  + y  −1  + z  1  =  8 
3 0 −1 3
   
x +2y +3z 9
 2x −y +z  =  8 
3x −z 3
We get a system of three linear equations:

x + 2y + 3z = 9
2x − y + z = 8
3x − z=3
We can use the methods of the last chapter to find the solution set of this
system of linear equations. The associated augmented matrix:
 
1 2 3 9
 2 −1 1 8 
3 0 −1 3

We now reduce the matrix to row echelon form:


   
1 2 3 9 1 2 3 9
R2 =−2R1 +R2,R3 =−3R1 +R3
 2 −1 1 8  →  0 −5 −5 −10 
3 0 −1 3 0 −6 −10 −24
 
R2 = −1 R2
1 2 3 9
5
→  0 1 1 2 
0 −6 −10 −24
 
1 2 3 9
R3 =6R2 +R3
→  0 1 1 2 
0 0 −4 −12
 
−1
R3 = 4 R3
1 2 3 9
→  0 1 1 2 
0 0 1 3

Therefore we get the relations:

z = 3
y = 2 − z = 2 − 3 = −1
x = 9 − 2y − 3z = 9 − 2(−1) − 3(3) = 2

Therefore we write,
       
1 2 3 9
(2)  2  + (−1)  −1  + (3)  1  =  8 
3 0 −1 3
6.2 Linear Independence of Vectors
Definition. A set of vectors S is linearly independent if there is no non-
trivial linear combination of vectors that sums to zero.
Remark. If a set is not linearly independent, it is linearly dependent..
Example. 1. Consider the linear combination which sums to zero,
     
1 0 0
α +β =
0 1 0
     
α 0 0
+ =
0 β 0
   
α 0
=
β 0
α = 0
β = 0
   
1 0
is a unique solution. Therefore the vectors and are linearly
0 1
independent.
2. Consider the linear combination which sums to zero
       
1 0 1 0
x +y +z =
0 1 1 0
       
x 0 z 0
+ ++ =
0 y z 0
   
x+z 0
=
y+z 0
x+z = 0; x = −z
y+z = 0; y = −z
So for any value of z, say z = 1, we can choose x = y = −z = −1 and get
       
1 0 1 0
(−1) + (−1) + (1) =
0 1 1 0
a non-trivial linear
 combination
   which sums
 to zero.
1 0 1
So the vectors , and are not linearly independent.
0 1 1
3. Determine whether the set of vectors {[1, 1, 0], [1, 0, 1], [2, 1, 1]} is linearly
independent.
Solution: Take a linear combination of the vectors which sums to zero.
       
1 1 0 0
x 1
  +y 0  +z  1  =  0 
2 1 1 0
   
x +y +2z 0
 x +z  =  0 
y +z 0
We get a system of three linear equations:

x + y + 2z = 0
x + z=0
y+ z=0

We can use the methods of the last chapter to find the solution set of this
system of linear equations. If x = y = z = 0 is a unique solution, then
the vectors are linearly independent, other wise we can find a dependency.
The associated augmented matrix:
 
1 1 2 0
 1 0 1 0 
0 1 1 0
We now reduce the matrix to row echelon form:
   
1 1 2 0 1 1 2 0
 1 0 1 0  R2 =−R +R2
→1  0 −1 −1 0 
0 1 1 0 0 1 1 0
 
1 1 2 0
R3 =R2 +R3
→  0 1 1 0 
0 0 0 0
We have a bottom row of zeros and hence cannot have a unique solution.
The second row gives y + z = 0 and the first row gives x + y + 2z = 0. So

y = −z
x = −y − 2z = −(−z) − 2z = −z
If we choose z = 1, then x = −1 and y = −1we get a non-trivial linear
combination which sums to zero:
       
1 1 2 0
(−1) 1
  + (−1) 0
  + (1) 1
  =  0 
0 1 1 0

Therefore the set of vectors is not linearly independent.

Theorem Let S = {~v1 , ~v2 , ..., ~vn } be a set of n, n-dimensional vectors, then
the set S is linearly independent if and only if the determinant of the matrix
having the n vectors as columns is non-zero.

Example. 1. We can apply this result to the example above. Let {[1, 1, 0], [1, 0, 1], [2, 1, 1]}
be three, 3-dimensional vectors.

1 1 2

1 0 1 = 1(0 − 1) − 1((1 − 0) + 2(1 − 0) = −1 − 1 + 2 = 0.

0 1 1

Since the determinant is zero, the set of vectors are not linearly indepen-
dent.

2. Consider the set {[1, −1, 1], [1, 0, 1], [1, 1, 2]}.

1 1 1

−1 0 1 = 1(0 − 1) − 1((−1)(2) − (1)(1)) + 1((−1)(1) − 0) = 1 6= 0.

1 1 2

Since the determinant is non-zero, the set of vectors are linearly indepen-
dent.

6.3 Orthogonal and orthonormal vectors


Definition. We say that 2 vectors are orthogonal if they are perpendicular
to each other. i.e. the dot product of the two vectors is zero.

Definition. We say that a set of vectors {~v1 , ~v2 , ..., ~vn } are mutually or-
thogonal if every pair of vectors is orthogonal. i.e.

~vi .~vj = 0, for all i 6= j.


    
1 √1 1

Example. The set of vectors  0  ,  2  ,  − 2  is mutually
−1 1 1
orthogonal.

(1, 0, −1).(1, 2, 1) = 0

(1, 0, −1).(1, − 2, 1) = 0
√ √
(1, 2, 1).(1, − 2, 1) = 0

Definition. A set of vectors S is orthonormal if every vector in S has


magnitude 1 and the set of vectors are mutually orthogonal.

Example. We just checked that the vectors


     
1 √1 1

~v1 =  0  , ~v2 =  2  , ~v3 =  − 2 
−1 1 1

are mutually orthogonal. The vectors however are not normalized (this term
is sometimes used to say that the vectors are not of magnitude 1). Let
   √ 
1 1/ 2
~v1 1
~u1 = = √  0 = 0√ 
|~v1 | 2 −1 −1/ 2
   
√1 √1/2
~v2 1
~u2 = =  2  =  2/2 
|~v2 | 2
1 1/2
   
1
√ 1/2

~v3 1
~u3 = =  − 2  =  − 2/2 
|~v3 | 2
1 1/2

The set of vectors {~u1 , ~u2 , ~u3 } is orthonormal.

Proposition An orthogonal set of non-zero vectors is linearly independent.


6.4 Gram-Schmidt Process
Given a set of linearly independent vectors, it is often useful to convert them
into an orthonormal set of vectors. We first define the projection operator.

Definition. Let ~u and ~v be two vectors. The projection of the vector ~v on


~u is defined as folows:
(~v .~u)
Proj~u~v = ~u.
|~u|2
   
1 1
Example. Consider the two vectors ~v = and ~u = .
1 0
These two vectors are linearly independent.
However they are not orthogonal to each other. We create an orthogonal
vector in the following manner:

~v1 = ~v − (Proj~u~v )
   
(1)(1) + (1)(0) 1 1
Proj~u~v = √ = (1)
2
( 1 +0 ) 2 2 0 0
     
1 1 0
~v1 = − (1) =
1 0 1

~v1 thus constructed is orthogonal to ~u.

The Gram-Schmidt Algorithm:


Let v1 , v2 , ..., vn be a set of n linearly independent vectors in Rn . Then we
can construct an orthonormal set of vectors as follows:
~
u1
Step 1. Let ~u1 = ~v1 . ~e1 = |~
u1 |
.
~
u2
Step 2. Let ~u2 = ~v2 − Proj~u1 ~v2 . ~e2 = |~
u2 |
.
~
u3
Step 3. Let ~u3 = ~v3 − Proj~u1 ~v3 − Proj~u2 ~v3 . ~e3 = |~
u3 |
.
~
u4
Step 4. Let ~u4 = ~v4 − Proj~u1 ~v4 − Proj~u2 ~v4 − Proj~u3 ~v4 . ~e4 = |~
u4 |
.


Example. We will apply the Gram-Schmidt algorithm to orthonormalize
the set of vectors
     
1 1 1
~v1 =  −1 , ~v2 =
  0 , ~v3 =
  1 .
1 1 2
To apply the Gram-Schmidt, we first need to check that the set of vectors
are linearly independent.

1 1 1

−1 0 1 = 1(0 − 1) − 1((−1)(2) − (1)(1)) + 1((−1)(1) − 0) = 1 6= 0.

1 1 2
Therefore the vectors are linearly independent.
Gram-Schmidt algorithm:
Step 1. Let
 
1
~u1 = ~v1 =  −1 
1
 
1
~u1 1
~e1 = = √  −1  .
|~u1 | 3 1

Step 2. Let
~u2 = ~v2 − Proj~u1 ~v2
   
1 1
(1, 0, 1).(1, −1, 1)   = 2  −1 
Proj~u1 ~v2 = −1
12 + (−1)2 + 12 3
1 1
   
1 1
~u2 =  0  − 2  −1 
3
1 1
 
1/3
=  2/3 
1/3
 
1/3
~u2 3
~e2 = = √  2/3  .
|~u2 | 6 1/3
Step 3. Let

~u3 = ~v3 − Proj~u1 ~v3 − Proj~u2 ~v3


   
1 1
(1, 1, 2).(1, −1, 1)  2
Proj~u1 ~v3 = 2 2 1
−1  =  −1 
1 + (−1) + 1 3
1 1
   
1/3 1/3
(1, 1, 2).(1/3, 2/3, 1/3)  5
Proj~u2 ~v3 = 2/3  =  2/3 
(1/3)2 + (2/3)2 + (1/3)2 2
1/3 1/3
     
1 1 1/3
2 5
~u3 =  1  −  −1  −  2/3 
3 2
2 1 1/3
 
−1/2
=  0 
1/2
 
√ −1/2
~u3
~e3 = = 2 0 .
|~u3 |
1/2

Example. Consider the vectors {[3, 0, 4], [−1, 0, 7], [2, 9, 11]} Check that the
vectors are linearly independent and use the Gram-Schmidt process to find
orthogonal vectors.

Ans. {[3, 0, 4], [−4, 0, 3], [0, 9, 0]} Check that the vectors are mutually orthog-
onal.
Summary
We assume ~v1 , ~v2 , ~v3 are vectors and a, b, c are real numbers for this summary.

1. To write vector ~v as a linear combination of vectors ~v1 , ~v2 , ~v3 , we have to


find real numbers a, b, c such that

~v = a~v1 + b~v2 + c~v3 .

This amounts to solving a system to linear equations.


2. A set of vectors ~v1 , ~v2 , ~v3 is linearly independent if the only solution of the
vector equation
a~v1 + b~v2 + c~v3 = ~0,
is a = b = c = 0.
If one can find a non-zero solution, then the vectors are linearly dependent.
i.e. we can express one vector as a linear combination of the remaining
vectors.
3. We say that a set of vectors {~v1 , ~v2 , ..., ~vn } are mutually orthogonal if every
pair of vectors is orthogonal. i.e.

~vi .~vj = 0, for all i 6= j.

4. A set of vectors S is orthonormal if every vector in S has magnitude 1


and the set of vectors are mutually orthogonal.
5. The Gram-Schmidt Algorithm:
Let v1 , v2 , ..., vn be a set of n linearly independent vectors in Rn . Then
we can construct an orthonormal set of vectors as follows:
~
u1
Step 1. Let ~u1 = ~v1 . ~e1 = |~
u1 |
.
~
u2
Step 2. Let ~u2 = ~v2 − Proj~u1 ~v2 . ~e2 = |~
u2 |
.
~
u3
Step 3. Let ~u3 = ~v3 − Proj~u1 ~v3 − Proj~u2 ~v3 . ~e3 = |~
u3 |
.
~
u4
Step 4. Let ~u4 = ~v4 − Proj~u1 ~v4 − Proj~u2 ~v4 − Proj~u3 ~v4 . ~e4 = |~
u4 |
.


7 Matrices
7.1 Introduction
Definition. A m × n matrix is a rectangular array of numbers having m
rows and n columns.
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
 
 a31 a32 . . . a3n 
A=
 .

 . . . 

 . . . . 
am1 am2 . . . amn

The element aij represents the entry in the ith row and jth column. We
sometimes denote A by (aij )m×n .

7.2 Matrix Operations


Addition
We can only add two matrices of the same dimension i.e. same number of
rows and columns. We then add element-wise.

Example.      
2 1 4 3 6 4
+ = .
1 3 5 6 6 9

Scalar Multiplication
If c is a real number and A = (aij )m×n is a matrix then cA = (caij )m×n .

Example.    
2 1 10 5
5 = .
1 3 5 15

Matrix Multiplication
Given a matrix A = (aij )m×n and a matrix B = (bij )r×s , we can only multiply
them if n = r. In such a case the multiplication is defined to be the matrix
C = (cij )m×s as follows:
n
X
cij = aik bkj .
k=1

We may view the cij th element as the dot product of the ith row of the
matrix A and jth column of the matrix B.

Example. 1.
    
2 1 4 3 (2)(4) + (1)(5) (2)(3) + (1)(6)
=
1 3 5 6 (1)(4) + (3)(5) (1)(3) + (3)(6)
 
13 12
= .
19 21
    
1 2 3 2 0 0 4 12 4
2.  2 1 0   1 3 2  =  5 3 2 
0 3 1 0 2 0 3 11 6
7.3 Special matrices
(a) A matrix is called a square matrix if the number of rows is equal to the
number of columns.

(b) The transpose of a square matrix A = (aij ) is the matrix AT = (aji ).


The rows of A become the columns of AT .
 
1 2 3
Example. The transpose of A =  2 1 0  is
0 3 1
 
1 2 0
AT =  2 1 3  .
3 0 1

(c) A square matrix is said to be symmetric if A = AT , i.e. aij = aji , ∀i, j.

(d) A square matrix is said to be a diagonal matrix, if all the non-diagonal


elements in the matrix are zero.
 
3 0 0
e.g.  0 −4 0  .
0 0 1

(e) A square matrix is said to be the identity matrix, if it is a diagonal


matrix and all the non-zero elements are 1.
 
1 0 0
e.g.  0 1 0  .
0 0 1

The identity marix has the property that

AI = A = IA

for all square matrices A.


7.4 Determinants
Definition. The determinant of a 2 × 2 matrix is defined as follows:

a b
c d = ad − bc.

Definition. The determinant of a 3 × 3 matrix is defined as follows:



a11 a12 a13
a21 a22 a23 = a11 a22 a23 − a12 a21 a23 + a13 a21 a22

a32 a33 a31 a33 a31 a33
a31 a32 a33
= a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 )

Definition. The definition of determinants can be generalized for a square


matrix of any size. Let A = (aij ) be a square n × n matrix. Fix a row i.
Then the determinant is defined to be,
n
X
det (A) = (−1)i+j aij det (Aij ),
j=1

where Aij is the minor matrix obtained by deleting row i and column j. It is
called the ij Cof actor of A. T

Example. Find the determinant of


 
1 1 0 −1
 0 −1 1 4 
A=  0 0
.
2 2 
1 0 1 2
Choose row i = 1.
 
−1 1 4
A11 =  0 2 2 
0 1 2
det (A11 ) = −2
 
0 1 4
A12 = 0 2 2 
1 1 2
det (A11 ) = −6
 
0 −1 4
A13 = 0 0 2 
1 0 2
det (A11 ) = −2
 
0 −1 1
A14 = 0 0 2 
1 0 1
det (A11 ) = − 2
X n
det (A) = (−1)1+j a1j det (A1j )
j=1

= (−1)2 (1)det (A11 ) + (−1)3 (1)det (A12 ) + (−1)4 (0)det (A13 ) + (−1)5 (−1)det (A14 )
= (1)(1)(−2) + (−1)(1)(−6) + (1)(0)(−2) + (−1)(−1)(−2)
=2
7.5 Inverses
Definition. A matrix I which has 1’s on the diagonal and 0’s everywhere
else is called the identity matrix. This matrix has the property that AI = A.
 
1 0
Example. 1. A 2 × 2 identity matrix has the form I = .
0 1
 
1 0 0
2. A 3 × 3 identity matrix has the form I =  0 1 0 .
0 0 1
Definition. Let A = (aij ) be a square matrix. A matrix B = (bij ) is called
the inverse of A if
AB = BA = I.
Remark. A matrix A has an inverse if and only if det(A) 6= 0.

Inverse of a 2 × 2 matrix
 
a b
Definition. The inverse of a 2 × 2 matrix, A = is given by,
c d
 
−1 1 d −b
A = .
−c a
ad − bc
 
1 1
Example. Find the inverse of the matrix A = .
1 2
Solution: We first check that the inverse of A exists!
det(A) = (1)(2) − (1)(1) = 1 6= 0.
Hence the inverse of A must exist and is given by
 
−1 1 2 −1
A = .
1 −1 1
We check that this is correct by multiplying A−1 A to see if we get the
identity matrix.
    
2 −1 1 1 (2)(1) + (−1)(1) (1)(1) + (−1)(2)
=
−1 1 1 2 (−1)(1) + (1)(2) (−1)(1) + (1)(2)
 
1 0
= .
0 1
We can find the inverse in another way: Write as (A|I2 )
 
1 1 1 0
.
1 2 0 1

Then use matrix row operations to get it into the form (I2 |B).
 
1 0 a b
.
0 1 c d

You will find that B = A−1 .


     
1 1 1 0 R2 =R2 −R1 1 1 1 0 R1 =R1 −R2 1 0 2 −1
→ →
1 2 0 1 0 1 −1 1 0 1 −1 1

Observe that the matrix obtained


   
a b 2 −1
=
c d −1 1

is the inverse of A.

Inverse of a 3 × 3 matrix
The method is the same as in the 2 × 2 case:
• Check determinant A is non-zero.

• Rewrite as (A|I3 ).

• Use row operations to put in the form (I3 |A−1 ).


 
1 −1 1
Example. Let A =  0 −2 1 . Find A−1 .
−2 −3 0
1. Check that det(A) 6= 0.

det(A) = 1(0 − (−3)(1)) − (−1)(0 − (−2)(1)) + 1(0 − (−2)(−2))


= 3+2−4
= 1 6= 0

Therefore the inverse exists.


2. Rewrite as (A|I3 ).  
1 −1 1 1 0 0
 0 −2 1 0 1 0  .
−2 −3 0 0 0 1

3. Use row operations to put in the form (I3 |A−1 ).


   
1 −1 1 1 0 0 1 −1 1 1 0 0
 0 −2 1 0 1 0  R3 =R→ 3 +2R1
 0 −2 1 0 1 0 
−2 −3 0 0 0 1 0 −5 2 2 0 1
 
−1
R2 = 2 R2
1 −1 1 1 0 0
→  0 1 −1/2 0 −1/2 0 
0 −5 2 2 0 1
 
1 −1 1 1 0 0
R3 =R3 +5R2
→  0 1 −1/2 0 −1/2 0 
0 0 −1/2 2 −5/2 1
 
1 −1 1 1 0 0
R3 =−2R3
→  0 1 −1/2 0 −1/2 0 
0 0 1 −4 5 −2
 
R2 =R2 + 12 R3
1 −1 1 1 0 0
→  0 1 0 −2 2 −1 
0 0 1 −4 5 −2
 
1 −1 0 5 −5 2
R1 =R1 −R3
→  0 1 0 −2 2 −1 
0 0 1 −4 5 −2
 
1 0 0 3 −3 1
R1 =R1 +R2
→  0 1 0 −2 2 −1 
0 0 1 −4 5 −2

4.  
3 −3 1
A−1 =  −2 2 −1  .
−4 5 −2
Summary
1. The determinant of a 2 × 2 matrix is defined as follows:

a b
c d = ad − bc.

2. The determinant of a 3 × 3 matrix is defined as follows:



a11 a12 a13
a22 a23 a21 a23 a21 a22
a32 a33 − a12 a31 a33 + a13 a31 a33
a21 a22 a23 = a11

a31 a32 a33
= a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 )

3. The method to find the inverse of a matrix:

• Check determinant A is non-zero.


• Rewrite as (A|I3 ).
• Use row operations to put in the form (I3 |A−1 ).
8 Eigenvalues, Eigenvectors and Diagonaliza-
tion
8.1 Introduction
Definition. Suppose that A is an n × n square matrix. Suppose also that
~x is a non-zero vector in Rn and that λ is a scalar so that,
A~x = λ~x.
We then call ~x an eigenvector of A and λ an eigenvalue of A.
 
4 2
Example. Suppose A = .
  1 3
2
Then is an eigenvector associated to the eigenvalue 5 because
1
    
4 2 2 2
A~x = =5 = 5~x.
1 3 1 1
 
−1
Also is an eigenvector associated to the eigenvalue 2 because
1
    
4 2 −1 −1
A~x = =2 = 2~x.
1 3 1 1
In this chapter we will learn how to find these eigenvalues and eigenvec-
tors.

8.2 Method to find eigenvalues and eigenvectors


We start with A~x = λ~x and rewrite it as follows,
A~x = λI~x
λI~x − A~x = 0
(λI − A)~x = 0

Theorem λ is an eigenvalue of A if and only if λI − A is not invertible if


and only if det (λI − A) = 0.
Steps to find eigenvalues and eigenvectors:
1. Form the characteristic equation
det(λI − A) = 0.

2. To find all the eigenvalues of A, solve the characteristic equation.


3. For each eigenvalue λ, the corresponding set of eigenvectors are the non-
zero solutions the linear system of equations
(λI − A)~x = 0

Example. Find the eigenvalues and eigenvectors of A.


 
4 2
1. A = .
1 3
Solution: (i) The characteristic equation of A.
We must first find the matrix λI − A.
   
1 0 4 2
λI − A = λ −
0 1 1 3
   
λ 0 4 2
= −
0 λ 1 3
 
λ−4 2
=
1 λ−3
The characteristic equation:
det(λI − A) = 0

λ−4 2
= 0
1 λ−3
(λ − 4)(λ − 3) − 2 = 0
λ2 − 7λ + 12 − 2 = 0
λ2 − 7λ + 10 = 0
(ii) To find all the eigenvalues of A, solve the characteristic equa-
tion.
λ2 − 7λ + 10 = 0
(λ − 5)(λ − 2) = 0
So we have two eigenvalues λ1 = 2, λ2 = 5.
(iii) For each eigenvalue λ, the corresponding set of eigenvectors
are the non-zero solutions of the linear system of equations
given by,
(λI − A)~x = 0.
case(i) λ1 = 2.
(2I − A)~x = 0
    
2 − 4 −2 x1 0
=
−1 2 − 3 x2 0
    
−2 −2 x1 0
=
−1 −1 x2 0
   
−2x1 − 2x2 0
=
−x1 − x2 0
We get two equations:
2x1 + 2x2 = 0
x1 + x2 = 0
Both equations give the relation x1 = −x2 . Therefore the set of
eigenvectors corresponding to λ = 2 is given by:
 
x1
{ |x1 = −x2 }
x2
 
−x2
{ }
x2
 
−1
{ x2 |x2 is a non-zero real number.}
1
 
−1
An eigenvector corresponding to λ1 = 2 is .
1
case(i) λ2 = 5.
(5I − A)~x = 0
    
5 − 4 −2 x1 0
=
−1 5 − 3 x2 0
    
1 −2 x1 0
=
−1 2 x2 0
   
x1 − 2x2 0
=
−x1 + 2x2 0
We get two equations:

x1 − 2x2 = 0
−x1 + 2x2 = 0

Both equations give the relation x1 = 2x2 . Therefore a general


eigenvector corresponding to λ2 = 2 is,
 
x1
{ |x1 = 2x2 }
x2
 
2x2
{ }
x2
 
2
{ x2 |x2 is a non-zero real number.}
1
 
2
An eigenvector corresponding to λ2 = 5 is .
1

2. An example of three distinct eigenvalues.


 
4 0 1
A =  −1 −6 −2  .
5 0 0

Solution: (i) Form the Characteristic Equation.


The characteristic equation is:

det (λI − A) = 0

λ−4 0 −1

1 λ + 6 2 = 0

−5 0 λ
(λ − 4)((λ + 6)(λ) − 0) − 1(0 − (−5)(λ + 6)) = 0
λ3 + 2λ2 − 29λ − 30 = 0

(ii) Find the eigenvalues.


We need to solve the characteristic equation. i.e. we need to factor-
ize the characteristic polynomial. We can factorize it by either using
long division or by directly trying to spot a common factor.

Method 1: Long Division.


We want to factorize this cubic polynomial. In general it is quite dif-
ficult to guess what the factors may be. We try λ = ±1, ±2, ±3, etc.
and hope to quickly find one factor. Let us try λ = −1. We divide
the polynomial λ3 + 2λ2 − 29λ − 30 by λ + 1, to get,

λ2 +λ −30
3 2
λ + 1 |λ +2λ −29λ −30;
−λ3 +λ2
λ2 −29λ −30
− λ2 +λ
−30λ −30
− −30λ −30
0
The quotient is λ2 + λ − 30.
The remainder is 0.
Therefore λ3 + 2λ2 − 29λ − 30 = (λ + 1)(λ2 + λ − 30) + 0.
λ3 + 2λ2 − 29λ − 30 = 0
(λ + 1)(λ2 + λ − 30) = 0
(λ + 1)(λ + 6)(λ − 5) = 0
Therefore the eigenvalues are: {−1, −6, 5}.

Method 2: Direct factorization by spotting common factor.



λ−4 0 0

1 λ + 6 2 = 0

−5 0 λ
(λ − 4)((λ + 6)(λ) − 0) − 1(0 − (−5)(λ + 6)) = 0
(λ − 4)((λ + 6)(λ)) − 5(λ + 6) = 0
(λ + 6)(λ(λ − 4) − 5) = 0
(λ + 6)(λ2 − 4λ − 5) = 0
(λ + 6)(λ − 5)(λ + 1) = 0
Therefore the eigenvalues of A are: λ1 = −1, λ2 = −6 and λ3 = 5.
(iii) Find Eigenvectors corresponding to each Eigenvalue:

We now need to find eigenvectors corresponding to each eigenvalue.


case(i) λ1 = −1.
The eigenvectors are the solution space of the following system:
    
−5 0 −1 x1 0
 1 5 2   x2  =  0 
−5 0 −1 x3 0
−1
−5x1 − x3 = 0; x1 = x3
5
−9
x1 + 5x2 + 2x3 = 0; x2 = x3
25
The set of eigenvectors corresponding to λ1 = −1 is,
 
x1
−1 −9
{  x2  |x1 = x 3 , x2 = x3 }
5 25
x3
 −1 
x
5 3
−9
{  x  |x3 is a non-zero real number}
25 3
x3
 −1 
5
−9
{ x3  25
 |x3 is a nn-zero real number}
1
−1
 
5
−9
An eigenvector corresponding to λ1 = −1 is  25
.
1
case(ii) λ2 = −6.
The eigenvectors are the solution space of the following system:
    
−10 0 −1 x1 0
 1 0 2   x2  =  0 
−5 0 −6 x3 0
−10x1 − x3 = 0; x3 = −10x1
x1 + 2x3 = 0; x1 = −2x3 = −2(−10)x1
x1 = 0 = x3
The set of eigenvectors corresponding to λ2 = −6 is,
 
x1
{  x2  |x1 = x3 = 0}
x3
 
0
{  x2  |x2 is a non-zero real number}
0
 
0
{ x2  1  |x2 is a non-zero real number}
0
 
0
An eigenvector corresponding to λ2 = −6 is  1 .
0
case(iii) λ3 = 5.
The eigenvectors are the solution space of the following system:
    
1 0 −1 x1 0
 1 11 2   x2  =  0 
−5 0 5 x3 0
x1 = x3
−3
x2 = x3
11
The set of eigenvectors corresponding to λ3 = 5 is,
 
x1
−3
{  x2  |x1 = x3 , x2 = x3 }
11
x3
 
x3
{  −3 x  |x3 is a non-zero real number}
11 3
x3
 
1
{ x3  −3 11
 |x3 is a non-zero real number}
1
 
1
An eigenvector corresponding to λ3 = 5 is  −3 11
.
1
3. An example of repeated eigenvalue having two eigenvectors.
 
0 1 1
A =  1 0 1 .
1 1 0

Solution: (i) Form the Characteristic Equation.


The characteristic equation is:

λ −1 −1

−1 λ −1 = 0

−1 −1 λ
λ(λ2 − (−1)(−1)) − (−1)((−1)λ − (−1)(−1)) − 1((−1)λ − (−1)(−1)) = 0
λ3 − 3λ2 − 2 = 0

(ii) Find the eigenvalues.


We need to solve the characteristic equation. i.e. we need to factor-
ize the characteristic polynomial. We can factorize it by either using
long division or by directly trying to spot a common factor.

Method 1: Long Division.


We want to factorize this cubic polynomial. In general it is quite dif-
ficult to guess what the factors may be. We try λ = ±1, ±2, ±3, etc.
and hope to quickly find one factor. Let us try λ = −1. We divide
the polynomial λ3 + 2λ2 − 29λ − 30 by λ + 1, to get,

λ2 −λ −2
3
λ + 1 |λ −3λ −2;
−λ3 +λ2
−λ2 −3λ −2
− −λ2 −λ
−2λ −2
− −2λ −2
0
The quotient is λ2 − λ − 2.
The remainder is 0.
Therefore λ3 − 3λ − 2 = (λ + 1)(λ2 − λ − 2) + 0.

λ3 − 3λ − 2 = 0
(λ + 1)(λ2 − λ − 2) = 0
(λ + 1)(λ + 1)(λ − 2) = 0

Therefore the eigenvalues are: {−1, 2}.

Method 2: Direct factorization by spotting common factor.


λ −1 −1

−1 λ −1 = 0

−1 −1 λ
λ(λ2 − (−1)(−1)) − (−1)((−1)λ − (−1)(−1)) − 1((−1)λ − (−1)(−1)) = 0
λ(λ2 − 1) + 1(−λ − 1) − 1(1 + λ) = 0
λ(λ − 1)(λ + 1) − 1(λ + 1) − 1(1 + λ) = 0
(λ + 1)(λ(λ − 1) − 1 − 1) = 0
(λ + 1)(λ2 − λ − 2) = 0
(λ + 1)2 (λ − 2) = 0

Therefore the eigenvalues of A are: {−1, 2}.


(iii) Find Eigenvectors corresponding to each Eigenvalue:

We now need to find eigenvectors corresponding to each eigenvalue.


case(i) λ1 = −1.
The eigenvectors are the solution space of the following system:
    
−1 −1 −1 x1 0
 −1 −1 −1   x2  =  0 
−1 −1 −1 x3 0
−x1 − x2 − x3 = 0; x1 = −x2 − x3
The set of eigenvectors corresponding to λ1 = −1 is,
 
x1
{  x2  |x1 = −x2 − x3 }
x3
 
−x2 − x3
{  x2  | at least one of x2 and x3 is a non-zero real number}
x3
   
−1 −1
{ x2  1  + x3  0  | at least one of x2 and x3 is a non-zero real number}
0 1

We therefore can get two linearly independent eigenvectors cor-


responding
  λ1 = −1:
to  
−1 −1
~v1 =  1 , ~v2 =  0 .
0 1
case(ii) λ2 = 2.
The eigenvectors are the solution space of the following system:
    
2 −1 −1 x1 0
 −1 2 −1   x2  =  0 
−1 −1 2 x3 0
2x1 − x2 − x3 = 0
−x1 + 2x2 − x3 = 0
−x1 − x2 + 2x3 = 0

Since this system of equations looks fairly complicated, it may


be a good idea to use row reduction to simplify the system.
   
2 −1 −1 0 −1 −1 2 0
↔R1
 −1 2 −1 0  R3→  −1 2 −1 0 
−1 −1 2 0 2 −1 −1 0
 
1 1 −2 0
R1 =(−1)R1
→  −1 2 −1 0 
2 −1 −1 0
 
1 1 −2 0
R =R2 −2R1
−−3−−− −−→  0 3 −3 0 
R2 =R1 +R2
0 −3 3 0
 
1 1 −2 0
R2 =1/3R2
→  0 1 −1 0 
0 −3 3 0
 
1 1 −2 0
R3 =R3 +3R2
→  0 1 −1 0 
0 0 0 0
The resulting set of equations is:

x2 − x3 = 0 =⇒ x2 = x3
x1 + x2 − 2x3 = 0 =⇒ x1 = x3

The set of eigenvectors corresponding to λ2 = 2 is,


 
x1
{  x2  |x1 = x2 = x3 }
x3
 
x3
{  x3  |x3 is a non-zero real number}
x3
 
1
{ x3  1  |x3 is a non-zero real number}
1
 
1
An eigenvector corresponding to λ2 = 2 is  1 .
1
4. An example of a repeated eigenvalue having only one linearly independent
eigenvector.  
6 3 −8
A =  0 −2 0  .
1 0 −3
Solution: The characteristic polynomial for A:

λ − 6 −3 8

0 λ+2 0 = 0

−1 0 λ+3
(λ − 6)((λ + 2)(λ + 3) − 0) − 3(0 − 0) + 8(0 − (−1)(λ + 2)) = 0
λ3 − λ2 − 16λ − 20 = 0
(λ + 2)(λ2 − 3λ − 10) = 0
(λ + 2)2 (λ − 5) = 0
Therefore the eigenvalues of A are: λ1 = −2, λ2 = 5.
case(i) λ1 = −2.

The eigenvectors are the solution space of the following system:


    
−8 −3 8 x1 0
 0 0 0   x2  =  0 
−1 0 1 x3 0
−x1 + x3 = 0 =⇒ x1 = x3
−8x1 − 3x2 + 8x3 = 0 =⇒ x2 = 0
The set of eigenvectors corresponding to λ1 = −2 is,
 
x1
{  x2  |x1 = x3 , x2 = 0}
x3
 
x3
{  0  |x3 is a non-zero real number}
x3
 
1
{ x3  0  |x3 is a non-zero real number}
1
 
1
An eigenvector corresponding to λ1 = −2 is  0 .
1
case(ii) λ2 = 5.
The eigenvectors are the solution space of the following system:
    
−1 −3 8 x1 0
 0 7 0   x2  =  0 
−1 0 8 x3 0
−x1 + 8x3 = 0 =⇒ x1 = 8x3
7x2 = 0 =⇒ x2 = 0

The set of eigenvectors corresponding to λ2 = 5 is,


 
x1
{  x2  |x1 = 8x3 , x2 = 0}
x3
 
8x3
{  0  |x3 is a non-zero real number}
x3
 
8
{ x3  0  |x3 is a non-zero real number}
1
 
8
An eigenvector corresponding to λ2 = 5 is  0 .
1

5. An example of an eigenvalue repeated three times having only two linearly


independent eigenvectors.
 
4 0 −1
A =  0 3 0 .
1 0 2
Solution: The characteristic equation for A:

det (λI − A) = 0

λ−4 0 1

0
λ−3 0 = 0
−1 0 λ−2
λ3 − 9λ2 + 27λ − 27 = 0
(λ − 3)3 = 0

Therefore A has only one eigenvalue: λ = 3.


We now find the eigenvectors corresponding to λ = 3:
    
−1 0 1 x1 0
 0 0 0   x2  =  0 
−1 0 1 x3 0
−x1 + x3 = 0 =⇒ x1 = x3

The set of eigenvectors corresponding to λ = 3 is,


 
x1
{  x2  |x1 = x3 }
x3
 
x3
{  x2  | at least one of x2 and x3 is a non-zero real number}
x3
   
0 1
{ x2  1  + x3  0  | at least one on x2 and x3 is a non-zero real number}
0 1

Therefore we can find two linearly independent eigenvectors corresponding


to the eigenvalue λ = 3:
   
0 1
 1  and  0  .
0 1
8.3 Dimension of Eigenspace
This section is optional for the exam. Here we will see why a repeated
eigenvalue can have one or more linearly independent eigenvectors associated
to it. First some definitions.
Definition. The rank of a matrix is defined to be the number of linearly
independent rows or columns.

Remark. We use the following results to determine the rank of a matrix.


• Let A be a square matrix and let Aechelon be the row echelon form of
the matrix. Then both A and Aechelon have the same number of linearly
independent rows.
• Therefore, Rank(A) = Rank(Aechelon ).
• But Rank(Aechelon )= number of linearly independent rows of Aechelon
= number of non-zero rows of Aechelon
• We may conclude that
Rank(A) = Rank(Aechelon ) = number of non-zero rows in Aechelon .
   
−1 −1 −1 1 1 1
Example. 1. Rank  −1 −1 −1  = Rank  0 0 0  = 1.
−1 −1 −1 0 0 0
   
−8 −3 8 1 0 −1
2. Rank  0 0 0  = Rank  0 −3 0  = 2.
−1 0 1 0 0 0
   
−1 0 1 −1 0 1
3. Rank  0 0 0  = Rank  0 0 0  = 1.
−1 0 1 0 0 0
Definition. Let A be a square matrix of size n × n having an eigenvalue λ.
The set of eigenvectors corresponding to the eigenvalue λ, alongwith the zero
vector, is called the eigenspace corresponding to eigenvalue λ.
The dimension of the eigenspace is defined to be the maximum number of
linearly independent vectors in the eigenspace.
Theorem Let A be a matrix of size n × n and let λ be an eigenvalue of A
repeating k times. Then the dimension of the eigenspace of λ or equivalently
the maximum number of linearly independent eigenvectors corresponding to
λ is:
n − Rank(λI − A).

Example. Let us use this result to see why some repeated eigenvalues had
one or more linearly independent eigenvalues in the examples in the previous
section.
 
0 1 1
1. Recall A =  1 0 1 
1 1 0
• has eigenvalue λ = −1 repeating 2 times.
• We found two linearly independent eigenvectors corresponding to this
eigenvalue.
   
−1 −1
~v1 =  1 , ~v2 =  0 .
0 1
We will use the above theorem to confirm that we should indeed expect
to find two linearly independent eigenvectors. By the above theorem, the
maximum number of linearly independent eigenvectors corresponding to
λ = −1 is:

= 3 − Rank(λI − A)
 
−1 −1 −1
= 3 − Rank  −1 −1 −1 
−1 −1 −1
=3−1
=2
 
6 3 −8
2. Recall A =  0 −2 0 
−1 0 1
• has eigenvalue λ = −2 repeating 2 times.
• We found only one linearly independent eigenvector corresponding
to this
 eigenvalue.

1
~v1 =  0 .
1
We will use the above theorem to confirm that we should indeed expect
to find only one linearly independent eigenvector. By the above theorem,
the maximum number of linearly independent eigenvectors corresponding
to λ = −2 is:

= 3 − Rank(λI − A)
 
−8 −3 8
= 3 − Rank  0 0 0 
−1 0 1
=3−2
=1
 
4 0 −1
3. Recall A =  0 3 0 
1 0 2
• has eigenvalue λ = 3 repeating 3 times.
• We found only two linearly independent eigenvectors corresponding
to this
 eigenvalue.
  
0 1
~v1 =  1 , ~v2 =
  0 .
0 1
We will use the above theorem to confirm that we should indeed expect to
find only two linearly independent eigenvectors. By the above theorem,
the maximum number of linearly independent eigenvectors corresponding
to λ = 3 is:

= 3 − Rank(λI − A)
 
−1 0 1
= 3 − Rank  0 0 0 
−1 0 1
=3−1
=2
8.4 Diagonalization
Definition. A square matrix of size n is called a diagonal matrix if it is of
the form  
d1 0 . . . 0
 0 d2 0 . . 0 
 
 . . . 
 
 . . . 
 
 0 . . dn−1 0 
0 . . . 0 dn
Definition. Two square matrices A and B of size n are said to be similar,
if there exists an invertible matrix P such that
B = P −1 AP

Definition. A matrix A is diagonalizable if A is similar to a diagonal ma-


trix. i.e. there exists an invertible matrix P such that P −1 AP = D where D
is a diagonal matrix. In this case, P is called a diagonalizing matrix.

Example. Recall the example we did in the last chapter.


 
4 2
A= .
1 3
Solution: In previous lectures we have found,
the eigenvalues of A are: λ1 = 2, λ2 = 5.  
−1
An eigenvector corresponding to λ1 = 2 is .
 1
2
An eigenvector corresponding to λ2 = 5 is .
1
Define a new matrix P such that its columns are the eigenvectors of A.
 
−1 2
P = .
1 1
Note that since eigenvectors corresponding to distinct eigenvalues are linearly
independent, the columns of P are linearly independent, hence P is an in-
vertible matrix. (We can also confirm this by verifying that the determinant
of P is non-zero).  
−1 −1 1 −2
P =
3 −1 −1
   
−1 −1 1 −2 4 2 −1 2
P AP =
3 −1 −1 1 3 1 1
  
−1 2 −4 −1 2
=
3 −5 −5 1 1
 
−1 −6 0
=
3 0 −15
 
2 0
= ,
0 5
which is a diagonal matrix. Therefore A is diagonalizable and a diagonalizing
matrix for A is  
−1 2
P = ,
1 1
and the corresponding diagonal matrix D is
 
2 0
D= .
0 5

Theorem Let A be a square matrix of size n × n. Then A is diagonalizable


if and only if there exists n linearly independent eigenvectors. In this case a
diagonalizing matrix P is formed by taking as its columns the eigenvectors of
A,
P −1 AP = D.
where D is a diagonal matrix whose diagonal entries are eigenvalues corre-
sponding to the eigenvectors of A. The ith diagonal entry is the eigenvalue
corresponding to the eigenvector in the ith column of P .

Proposition Let A be a square matrix of size n × n. If λ1 and λ2 are two


distinct eigenvalues of A, and ~v1 and ~v2 are eigenvectors corresponding to λ1
and λ2 respectively, then ~v1 and ~v2 are linearly independent.

Corollary Let A be a square matrix of size n. If A has n distinct eigen-


values then A is diagonalizable.
Example. For the following matrices:
(i) Find the eigenvalues and eigenvectors of A.
(ii) Is A diagonalizable?
(iii) If it is diagonalizable, find a diagonalizing matrix P such that P −1 AP =
D, where D is a diagonal matrix.
1.  
4 0 1
A =  −1 −6 −2  .
5 0 0
Solution: (i) In previous lectures we have found,
The eigenvalues of A are: λ1 = −1, λ2 = −6 
and λ3= 5.
−1
5
−9
An eigenvector corresponding to λ1 = −1 is  25
.
 1
0
An eigenvector corresponding to λ2 = −6 is  1 .
 0
1
−3 
An eigenvector corresponding to λ3 = 5 is 
11
.
1
(ii) The matrix A is diagonalizable, since A is a square matrix of size
3 × 3 and we have found three linearly independent eigenvectors
for A.
(iii) A diagonalizing matrix P has the eigenvectors as its columns. Let
 −1 
5
0 1
P =  −925
1 −311
.
1 0 1
The corresponding diagonal matrix D will have the eigenvalues as
diagonal entries corresponding to the columns of P . i.e. the ith
diagonal entry will be the eigenvalue corresponding to the eigen-
vector in the ith column.
 
−1 0 0
D =  0 −6 0  .
0 0 5
We will now verify that P −1 AP = D.
   −1 
4 0 1 5
0 1
AP =  −1 −6 −2   −9 25
1 −3
11

5 0 0 1 0 1
 
1/5 0 5
=  9/25 −6 −15/11 
−1 0 5
 −1  
5
0 1 −1 0 0
P D =  −9 25
1 −311
  0 −6 0 
1 0 1 0 0 5
 
1/5 0 5
=  9/25 −6 −15/11 
−1 0 5
AP = P D (1)

Since the columns of P are linearly independent, P has non-zero de-


terminant and is therefore invertible. We multiply (1) by P −1 on both
sides.

P −1 AP = P −1 P D,
= ID
=D

2. An example where the matrix is not diagonalizable.


 
6 3 −8
A =  0 −2 0  .
−1 0 1

Solution: In previous lectures we have found,


A had two eigenvalues:
λ1 = −1 occurring with multiplicity 2, and
λ2 = 5 occurring with multiplicity 1.
We could find only one linearly independent eigenvector corresponding
to λ1 = −1,
and one linearly independent eigenvector corresponding to λ2 = 5.
Therefore we could find only two linearly independent eigenvectors. By
the above theorem, A is not diagonalizable.

8.5 Diagonalization of symmetric matrices


Definition. Let A be a square matrix of size n. A is a symmetric matrix
if AT = A

Definition. A matrix P is said to be orthogonal if its columns are mutually


orthogonal.

Definition. A matrix P is said to be orthonormal if its columns are unit


vectors and P is orthogonal.

Proposition An orthonormal matrix P has the property that

P −1 = P T .

Theorem If A is a real symmetric matrix then there exists an orthonormal


matrix P such that

(i) P T AP = D, where D a diagonal matrix.

(ii) The diagonal entries of D are the eigenvalues of A.

(iii) If λi 6= λj then the eigenvectors are orthogonal.

(iv) The column vectors of P are linearly independent eigenvectors of A,


that are mutually orthogonal.
Example. Recall  
0 1 1
A =  1 0 1 .
1 1 0

(i) Find the eigenvalues and eigenvectors of A.

(ii) Is A diagonalizable?

(iii) Find an orthonormal matrix P such that P T AP = D, where D is a


diagonal matrix.

Solution: We have found the eigenvalues and eigenvectors of this matrix


in a previous lecture.

(i), (ii) Observe that A is a real symmetric matrix. By the above theorem, we
know that A is diagonalizable. i.e. we will be able to find a sufficient
number of linearly independent eigenvectors.
The eigenvalues of A were; −1, 2. We found  two linearly
 independent
 
−1 −1
eigenvectors corresponding to λ1 = −1: ~v1 =  1 , ~v2 =  0 .
0  1
1
And one eigenvector corresponding to λ2 = 2:  1 .
1

(iii) We now want to find an orthonormal diagonalizing matrix P .


Since A is a real symmetric matrix, eigenvectors corresponding to dis-
tinct
  eigenvalues are orthogonal.
 i.e.  
1 −1 −1
 1  is orthogonal to  1  and  0 .
1 0 1
However
  the eigenvectors
 corresponding
 to eigenvalue λ1 = −1, ~v1 =
−1 −1
 1  and ~v2 =  0  are not orthogonal to each other, since
0 1
we chose them from the eigenspace by making arbitrary choices*. We
will have to use Gram Schmidt to get two corresponding orthogonal
vectors.

~u1 = ~v1
 
−1
((−1, 1, 0).(−1, 0, 1)) 
Proj~u1 ~v2 = 1 
2
0
 
−1
1
= 1 
2
0
~u2 = ~v2 − Proj~u1 ~v2
   
−1 −1/2
=  0  −  1/2 
1 0
 
−1/2
=  −1/2 
1

Note that since ~u1 = ~v1 and ~u2 is a linear combination of ~v1 and ~v2 ,
the two new vectors are also eigenvectors of λ = −1. We now have a
set of orthogonal eigenvectors:
     
1 −1 −1/2
{ 1  ,  1  ,  −1/2 }.
1 0 1
We normalize the vectors to get a set of orthonormal vectors:
 √   √   √ 
1/√3 −1/√ 2 −1/√6
{ 1/√3  ,  1/ 2  ,  −1/√ 6 }.
1/ 3 0 2/ 6
We are now finally ready to write the orthonormal diagonalizing matrix:
 √ √ √ 
1√3 −1/√ 2 −1/√6
P =  1√3 1/ 2 −1/√ 6 
1 3 0 2/ 6
and the corresponding diagonal matrix D
 
2 0 0
D =  0 −1 0 .
0 0 −1

We will now verify that P T AP = D.


  √ √ √ 
0 1 1 1√3 −1/√ 2 −1/√6
AP =  1 0 1   1√3 1/ 2 −1/√ 6 
1 1 0 1 3 0 2/ 6
 √ √ √ 
2√3 1/ √2 1/√6
=  2√3 −1/ 2 1/ √6 
2 3 0 −2/ 6
 √ √ √  
1√3 −1/√ 2 −1/√6 2 0 0
P D =  1√3 1/ 2 −1/√ 6   0 −1 0 
1 3 0 2/ 6 0 0 −1
 √ √ √ 
2√3 1/ √2 1/√6
=  2√3 −1/ 2 1/ √6 
2 3 0 −2/ 6
AP = P D (1)

Since the columns of P are linearly independent, P has non-zero determinant


and is therefore invertible. We multiply (1) by P −1 on both sides.
P −1 AP = P −1 P D,
= ID
=D (2)
Also since P is orthonormal, we have
P −1 = P T
i.e. P P T = I = P T P .
 √ √ √  √ √ √   
1√3 −1/√ 2 −1/√6 1 √3 1/√3 1/ 3 1 0 0
 1 3 1/ 2 −1/ 6   −1 2 1/ 2 0√  =  0 1 0  .
√ √ √ √
1 3 0 2/ 6 −1 6 −1/ 6 2/ 6 0 0 1
Therefore from (2) and since P −1 = P T we finally get the relation

P T AP = D.

Note. *:Look back at how we selected the eigenvecors ~v1 and ~v2 ; we chose
x2 = 1, x3 = 0 to get ~v1 and x2 = 0, x3 = 1 to get ~v2 . If we had chosen
x2 = 1, x3 = 0 to get ~v1 and x2 = −1/2, x3 = 1 to get ~v2 , then ~v1 and ~v2
would be orthogonal. However it is much easier to make arbitrary choices
for x1 and x2 and then use the Gram Schmidt Process to orthogonalize the
vectors as we have done in this example.
Summary
1. Suppose that A is an n × n square matrix. Suppose also that ~x is a
non-zero vector in Rn and that λ is a scalar so that,

A~x = λ~x.

We then call ~x an eigenvector of A and λ an eigenvalue of A.

2. Steps to find eigenvalues and eigenvectors:

(i) The characteristic equation of A is:

det(λI − A) = 0.

(ii) To find all the eigenvalues of A, solve the characteristic equation for
λ.
(iii) For each eigenvalue λ, to find the corresponding set of eigenvectors,
find a non-zero solution to the linear system of equations:

(λI − A)~x = 0

3. Let A be a square matrix of size 3 × 3. Then A is diagonalizable if


and only if there exists 3 linearly independent eigenvectors. In this case a
diagonalizing matrix P is formed by taking as its columns the eigenvectors
of A. Then we have
P −1 AP = D.
where D is a diagonal matrix whose ith diagonal entry is the eigenvalue
corresponding to the eigenvector in the ith column of P .

4. If A is a real symmetric matrix of size 3 × 3 then A is diagonalizable and


there exists an othogonal matrix P such that

(i) P T AP = D, D a diagonal matrix.


(ii) The diagonal entries of D are the eigenvalues of A
(iii) The column vectors of P are eigenvectors of A
(iv) If λi 6= λj then the eigenvectors are orthogonal.
Note. Point (iv) above says that if we have 3 distinct eigenvalues, then
the 3 corresponding eigenvectors will be mutually orthogonal and we can
define P to be the matrix whose columns are these eigenvectors.
However if we have a repeated root, then the 2 eigenvectors corresponding
to the repeated eigenvalue will not necessarily be orthogonal. Here we will
have to use Gram Schmidt to orthogonalize the vectors. Note that the
orthogonalized vectors will also be eigenvectors of the repeated eigenvalue.

References
1. http://tutorial.math.lamar.edu/Classes/LinAlg/EVal EVect Intro.aspx

2. http://www.sosmath.com/matrix/diagonal/diagonal.html

3. K. A. Stroud, Engineering Mathematics


9 Operators and Commutators
9.1 Operators
Operators are commonly used to perform a specific mathematical operation
on another function. The operation can be to take the derivative or integrate
with respect to a particular term, or to multiply, divide, add or subtract a
number or term with regards to the initial function. Operators are commonly
used in physics, mathematics and chemistry.

Example. 1. A regular function can be thought of as an operator.

(i)
f : x 7→ ax,
where a is a real number.
Operator: the function f
Operates on: real numbers
Action: multiply by a.
(ii) Squaring
f : x 7→ x2
Operator: the function f
Operates on: real numbers
Action: squaring

2. Operators which acts on functions:

(i)
â : f (x) 7→ af (x),
where a is a real number.
Operator: â
Operates on: scalar functions
Action: multiply by a.
(ii)
x̂ : f (x) 7→ xf (x),
Operator: x̂
Operates on: scalar functions
Action: multiply by x.
(iii) Differentiation Operator
d
D : f (x) 7→ f (x)
dx
Operator: D
Operates on: scalar functions
Action: Differentiate with respect to x.
We can similarly define partial differentiation operators, grad opera-
tor, integral operator, etc.
(iv) The Momentum Operator
~ ∂
P̂ =
i ∂x
where:
– ~ is Planck’s constant,
– i is the imaginary unit.
~ ∂ ~ ∂ψ(x, t)
P̂ = : ψ(x, t) 7→
i ∂x i ∂x
Operator: P̂
Operates on: wave function, ψ(x, t).

(iv) The Energy Operator


~∂
Ê =
i ∂t
where:
– ~ is Planck’s constant,
– i is the imaginary unit.
~∂ ~ ∂ψ(x, t)
Ê = : ψ(x, t) 7→
i ∂t i ∂t
Operator: Ê
Operates on: wave function, ψ(x, t).

(iv) The Hamiltonian Operator


~2 ∂ 2
Ĥ = −
2m ∂x2
where:
– ~ is Planck’s constant,
– m is the mass of the particle (regard as constant).
– i is the imaginary unit.
~2 ∂ 2 ~2 ∂2
Ĥ = − : ψ(x, t) →
7 −
2m ∂x2 2m ∂ψ(x, t)x2
Operator: Ĥ
Operates on: wave function, ψ(x, t).

3. Operators which act on vectors.


   
1 1 x1
(i) Let A = be a square matrix and ~v = be an arbi-
0 1 x2
trary vector. Define an operator in the following manner:

~v →
 A : 7 A~
v    
x1 1 1 x1 x1 + x2
7 → = .
x2 0 1 x2 x2

Operator: A
Operates on: two dimensional
 vectors
 
x1 x1 + x2
Action: maps a vector to .
x2 x2

9.2 Linear Operators


An operator O is a linear operator if it satisfies the following two conditions:

(i) O(f + g) = O(f ) + O(g).

(ii) O(λf ) = λO(f ), where λ is a scalar.

Example. Determine if the following operators are linear:

1. O = â : f (x) 7→ af (x).
Solution:

O(f + g) = a(f + g)
= af + ag
= O(f ) + O(g)
O(λf ) = a(λf )
= λaf
= λO(f )

Therefore O is linear.

2. O = squaring : x 7→ x2 .
Solution:

O(x + y) = (x + y)2
= x2 + y 2 + 2xy
= O(x) + O(y) + 2xy
O(1 + 2) = O(3) = 9
O(1) + O(2) = 1+4=5

Therefore O is not linear.


d
3. D : f (x) 7→ dx
f (x).
Solution:
d
D(f + g) = (f + g)
dx
d d
= f+ g
dx dx
= D(f ) + D(g)
d
D(λf ) = (λf )
dx
d
= λ f
dx
= λD(f )

Therefore D is linear.
 
1 1
4. A = : ~v 7→ A~v .
0 1
Solution:
     
1 1 x1 y1
A(~v + w)
~ = ( + )
0 1 x2 y2
   
1 1 x1 + y1
= ( )
0 1 x2 + y2
 
(x1 + y1 ) + (x2 + y2 )
=
x2 + y2
   
x1 + x2 y1 + y2
= +
x2 y2
     
1 1 x1 1 1 y1
= +
0 1 x2 0 1 y2
= A(~v ) + A(w)~
   
1 1 x1
A(λ~v ) = (λ )
0 1 x2
  
1 1 λx1
=
0 1 λx2
 
λx1 + λx2
=
λx2
 
x1 + x2
= λ
x2
  
1 1 x1
= λ
0 1 x2
= λA~v

Therefore A is linear.
9.3 Composing Operators
When we have two operators and we want to apply them to a function in
sequence, we use the notation O1 ◦ O2 .

O1 ◦ O2 : f 7→ O1 (O2 (f )).

First apply O2 and then O1 . The order is important.

Example. 1. Consider the following two operators.

O1 : f (x) 7→ xf (x)
O2 : g(x) 7→ (g(x))2

Find O1 ◦ O2 and O2 ◦ O1 .

Solution: We first find O1 ◦ O2 .

O1 ◦ O2 (h(x)) = O1 (O2 (h(x))


= O1 ((h(x)2 )
= x(h(x)2 )
O1 ◦ O2 : h(x) 7→ x(h(x)2

We now find O2 ◦ O1 .

O2 ◦ O1 (h(x)) = O2 (O1 (h(x))


= O2 (xh(x))
= x2 (h(x)2 )
O2 ◦ O1 : h(x) 7→ x2 (h(x)2 )

Observe that O1 ◦ O2 6= O2 ◦ O1 . Hence the order in which you apply the


operators is important.
9.4 Commutators
Let OA and OB be two operators. The commutator of OA and OB is the
operator defined as
[OA , OB ] = OA ◦ OB − OB ◦ OA .
Example. 1. Consider the following two operators.
 
1 0
OA : ~v 7→ A~v , A =
0 0
 
1 1
OB : ~v 7→ B~v , B =
1 0
Find [OA , OB ].
Solution: We first find OA ◦ OB .
OA ◦ OB (~v ) = OA (OB (~v ))
= OA (B~v )
= AB~v  
1 1
OA ◦ OB : ~v 7→ AB~v , AB =
0 0
      
x1 1 1 x1 x1 + x2
7→ =
x2 0 0 x2 0
We now find O2 ◦ O1 .
OB ◦ OA (~v ) = OB (OA (~v ))
= OB (A~v )
= BA~v  
1 0
OB ◦ OA : ~v 7→ BA~v , BA =
1 0
      
x1 1 0 x1 x1
7→ =
x2 1 0 x2 x1
Therefore the commutator of OA and OB is
[OA , OB ] = OA ◦ OB − OB ◦ OA
[OA , OB ](~v ) = (OA ◦ OB − OB ◦ OA )(~v )
= (AB − BA)(~v )
2. Consider the following two operators.
d
D : f (x) 7→ (D)f (x) = f (x)
dx
d
xD : f (x) 7→ (xD)f (x) = x f (x)
dx

Find [D, xD].

Solution:

(D ◦ xD)(f (x)) = D(xD(f (x)))


d
= D(x f (x))
dx
d d
= (x f (x))
dx dx
d2 f df
= x 2+
dx dx
2
= (xD + D)(f (x))
(xD ◦ D)(f (x)) = xD(D(f (x)))
d
= xD( f (x))
dx
d d
= x ( f (x))
dx dx
d2 f
= x 2
dx
= (xD2 )(f (x))

Therefore the commutator of D and xD is

[D, xD] = D ◦ xD − xD ◦ D
= xD2 + D − xD2
= D
3. Consider the two operators
d
O1 = D + 1 : f (x) 7→ (D + 1)f (x) = f (x) + f (x)
dx
d
O1 = D + 2 : f (x) 7→ (D + 2)f (x) = f (x) + 2f (x)
dx

Find the commutator [O1 , O2 ].


Solution:

O1 ◦ O2 (f (x)) = O1 (O2 (f (x)))


d
= O1 ( f (x) + 2f (x))
dx
d d d
= ( f (x) + 2f (x)) + ( f (x) + 2f (x))
dx dx dx
d2 f df df
= +2 + + 2f
dx2 dx dx
= (D2 + 3D + 2)(f (x))
O2 ◦ O1 (f (x)) = O2 (O1 (f (x)))
d
= O2 ( f (x) + 2f (x))
dx
d d d
= ( f (x) + f (x)) + 2( f (x) + f (x))
dx dx dx
2
df df df
= 2
+ + 2 + 2f
dx dx dx
= (D2 + 3D + 2)(f (x))

Therefore the commutator of O1 and O2 is

[O1 , O2 ] = O1 ◦ O2 − O2 ◦ O1
= (D2 + 3D + 2) − (D2 + 3D + 2) = 0

∂ ∂
4. Find the commutator [x ∂y , y ∂x ].
Solution:
∂ ∂ ∂ ∂ ∂ ∂
[x ,y ] = x ◦y −y ◦x
∂y ∂x ∂y ∂x ∂x ∂y
∂ ∂ ∂ ∂
(x ◦ y )(f ) = x (y (f )
∂y ∂x ∂y ∂x
∂2f ∂f
= x(y + )
∂y∂x ∂x
∂2 ∂
= (xy + x )(f )
∂y∂x ∂x
∂ ∂ ∂ ∂
(y ◦ x )(f ) = y (x (f )
∂x ∂y ∂x ∂y
∂2f ∂f
= y(x + )
∂x∂y ∂y
∂2 ∂
= (xy + y )(f )
∂x∂y ∂x
Therefore the commutator is:
∂ ∂ ∂ ∂ ∂ ∂
[x ,y ] = x ◦y −y ◦x
∂y ∂x ∂y ∂x ∂x ∂y
2
∂ ∂ ∂2 ∂
= (xy + x ) − (xy +y )
∂y∂x ∂x ∂x∂y ∂x
∂ ∂
= x −y
∂x ∂y

5. Find the commutator of the following two operators.



Ê : ψ(x, t) 7→ i~ ψ(x, t)
∂t
t̂ : ψ(x, t) 7→ tψ(x, t)

Solution:
∂ ∂
[Ê, t̂]ψ(x, t) = (i~ ◦ t̂ − t̂ ◦ i~ )(ψ(x, t))
∂t ∂t
∂ ∂
= (i~ ◦ t̂)(ψ(x, t)) − (t̂ ◦ i~ )(ψ(x, t))
∂t ∂t
∂ ∂(ψ(x, t))
= i~ (tψ(x, t)) − ti~
∂t ∂t
∂(ψ(x, t)) ∂t ∂(ψ(x, t))
= i~t + i~(ψ(x, t)) − ti~
∂t ∂t ∂t
= i~(ψ(x, t))
6. Find the commutator of the following two operators.
~2 ∂ 2
Ĥ : ψ(x, t) 7→ − ψ(x, t)
2m ∂x2
x̂ : ψ(x, t) 7→ xψ(x, t)

Solution:

[Ĥ, x̂] = Ĥ ◦ x̂ − x̂ ◦ Ĥ
~2 ∂ 2
(Ĥ ◦ x̂)(ψ(x, t)) = − (x̂(ψ))
2m ∂x2
~2 ∂ 2
=− (x(ψ))
2m ∂x2
~2 ∂ ∂
=− ( (xψ))
2m ∂x ∂x
~2 ∂ ∂ψ
=− (x + ψ)
2m ∂x ∂x
~2 ∂ ∂ψ ∂ψ
=− ( (x ) + )
2m ∂x ∂x ∂x
~2 ∂ 2 ψ ∂ψ
=− (x 2 + 2 )
2m ∂x ∂x
~2 x ∂ 2 ~2 ∂
Ĥ ◦ x̂ = − −
2m ∂x2 m ∂x
2 2
~ ∂
(x̂ ◦ Ĥ)(ψ(x, t)) = (x̂ ◦ − )(ψ)
2m ∂x2
~2 x ∂ 2 ψ
=−
2m ∂x2
~2 x ∂ 2
x̂ ◦ Ĥ = −
2m ∂x2
~2 ∂ 2 ~2 ∂ ~2 x ∂ 2
[Ĥ, x̂] = − x 2− +
2m ∂x m ∂x 2m ∂x2
2
~ ∂
=−
m ∂x

7. Prove the following identity for three operators A,B, C:

[A ◦ B, C] = A ◦ [B, C] + [A, C] ◦ B.
Solution:

A ◦ [B, C] + [A, C] ◦ B = A ◦ (B ◦ C − C ◦ B) + (A ◦ C − C ◦ A) ◦ B
= A◦B◦C −A◦C ◦B+A◦C ◦B−C ◦A◦B
= (A ◦ B) ◦ C − C ◦ (A ◦ B)
= [A ◦ B, C]

8. Find the commutator [D2 , ex ].

Solution: We can find the commutator directly

[D2 , ex ] = D2 ◦ ex − ex ◦ D2

or by using the above identity we get,

[D2 , ex ] = D ◦ [D, ex ] + [D, ex ] ◦ D.

Try the direct method at home. In class we will use the identity to find
the commutator.

We first find the commutator [D, ex ].


d x
(D ◦ ex )(f ) = (e f )
dx
df dex
= ex +f
dx dx
df
= ex + f ex
dx
= (ex D + ex )(f )
(ex ◦ D)(f ) = (ex D)(f )
[D, ex ] = (ex D + ex ) − (ex D)
= ex
[D2 , ex ] = D ◦ [D, ex ] + [D, ex ] ◦ D
= D ◦ ex + ex ◦ D
= (ex D + ex ) + ex D
= ex (2D + 1)
Summary
1. Operators are commonly used to perform a specific mathematical opera-
tion on another function.

2. An operator O is a linear operator if it satisfies the following two condi-


tions:

(i) O(f + g) = O(f ) + O(g).


(ii) O(λf ) = λO(f ), where λ is a scalar.

3. Let OA and OB be two operators. The commutator of OA and OB is the


operator defined as

[OA , OB ] = OA ◦ OB − OB ◦ OA .
10 Fourier Series
10.1 Introduction
When the French mathematician Joseph Fourier (1768-1830) was trying to
study the flow of heat in a metal plate, he had the idea of expressing the
heat source as an infinite series of sine and cosine functions. Although the
original motivation was to solve the heat equation, it later became obvious
that the same techniques could be applied to a wide array of mathematical
and physical problems.
In this course, we will learn how to find Fourier series to represent periodic
functions as an infinite series of sine and cosine terms.
A function f (x) is said to be periodic with period T , if

f (x + T ) = f (x), for all x.

The period of the function f (t) is the interval between two successive repe-
titions.

10.2 Definition of a Fourier Series


Let f be a bounded function defined on the [−π, π] with at most a finite
number of maxima and minima and at most a finite number of discontinuities
in the interval. Then the Fourier series of f is the series
1
f (x) = a0 + a1 cos x + a2 cos 2x + a3 cos 3x + ...
2
+ b1 sin x + b2 sin 2x + b3 sin 3x + ...

where the coefficients an and bn are given by the formulae


1 π
Z
a0 = f (x)dx
π −π
Z π
1
an = f (x)cos(nx)dx
π −π
1 π
Z
bn = f (x)sin(nx)dx
π −π
10.3 Why the coefficients are found as they are.
We want to derive formulas for a0 , an and bn . To do this we take advantage
of some properties of sinusoidal signals. We use the following orthogonality
conditions:

Orthogonality conditions
(i) The average value of cos(nx) and sin(nx) over a period is zero.
Z π
cos(nx)dx = 0
−π
Z π
sin(nx)dx = 0
−π

(ii) The average value of sin(mx)cos(nx) over a period is zero.


Z π
sin(mx)cos(nx)dx = 0
−π

(iii) The average value of sin(mx)sin(nx) over a period,


Z π 
π if m = n 6= 0
sin(mx)sin(nx)dx =
−π 0 otherwise

(iv) The average value of cos(mx)cos(nx) over a period,



Z π  2π if m = n = 0
cos(mx)cos(nx)dx = π if m = n 6= 0
−π 
0 if m 6= n

Remark. The following trigonometric identities are useful to prove the


above orthogonality conditions:
1
cos A cos B = [cos(A − B) + cos(A + B)]
2
1
sin A sin B = [cos(A − B) − cos(A + B)]
2
1
sin A cos B = [sin(A − B) + sin(A + B)]
2
Finding an
We assume that we can represent the function by
1
f (x) = a0 + a1 cos x + a2 cos 2x + a3 cos 3x + ...
2
+ b1 sin x + b2 sin 2x + b3 sin 3x + ... (1)

Multiply (1) by cos(nx), n ≥ 1 and integrate from −π to π and assume it is


permissible to integrate the series term by term.
Z π Z π Z π Z π
1
f (x) cos(nx)dx = a0 cos(nx) + a1 cos x cos(nx)dx + a2 cos 2x cos(nx)dx + ...
−π 2 −π −π −π
Z π Z π
+ b1 sin x cos(nx)dx + b2 sin 2x cos(nx)dx + ...
−π −π
= an π, because of the above orthogonality conditions
1 π
Z
an = f (x) cos(nx)dx
π −π

Similarly if we multiply (1) by sin(nx), n ≥ 1 and integrate from −π to


π, we can find the formula for the coefficient bn . To find a0 , simply integrate
(1) from −π to π.
10.4 Fourier Series
Definition. Let f (x) be a 2π-periodic function which is integrable on [−π, π].
Set
1 π
Z
a0 = f (x)dx
π −π
1 π
Z
an = f (x)cos(nx)dx
π −π
Z π
1
bn = f (x)sin(nx)dx
π −π

then the trigonometric series



1 X
a0 + (an cos(nx) + bn sin(nx))
2 n=1

is called the Fourier series associated to the function f (x).

Remark. Notice that we are not saying f (x) is equal to its Fourier Series.
Later we will discuss conditions under which that is actually true.

Example. Find the Fourier coefficients and the Fourier series of the square-
wave function f defined by

0 if −π ≤ x < 0
f (x) = and f (x + 2π) = f (x)
1 if 0 ≤ x < π

Solution: So f is periodic with period 2π and its graph is:


Using the formulas for the Fourier coefficients we have
1 π
Z
a0 = f (x)dx
π −π
Z 0
1 π
Z
1
= ( 0dx + 1dx)
π −π π 0
1
= (π)
π
=1
1 π
Z
an = f (x) cos(nx)dx
π −π
Z 0
1 π
Z
1
= ( 0dx + cos(nx)dx)
π −π π 0
1  sin(nx) π
= (0 + 0
)
π n
1
= (sin nπ − sin 0)

=0
1 π
Z
bn = f (x) sin(nx)dx
π −π
Z 0
1 π
Z
1
= ( 0dx + sin(nx)dx)
π −π π 0
1  cos(nx) π
= (− 0
)
π n
1
= − (cos nπ − cos 0)

1
= − ((−1)n − 1), since cos nπ = (−1)n .
( nπ
0, if n is even
= 2
, if n is odd

The Fourier Series of f is therefore
1
f (x) = a0 + a1 cos x + a2 cos 2x + a3 cos 3x + ...
2
+ b1 sin x + b2 sin 2x + b3 sin 3x + ...
1
= + 0 + 0 + 0 + ...
2
2 2 2
+ sin x + 0 sin 2x + sin 3x + 0 sin 4x + sin 5x + ...
π 3π 5π
1 2 2 2
= + sin x + sin 3x + sin 5x + ...
2 π 3π 5π
Remark. In the above example we have found the Fourier Series of the
square-wave function, but we don’t know yet whether this function is equal
to its Fourier series. If we plot
1 2
+ sin x
2 π
1 2 2
+ sin x + sin 3x
2 π 3π
1 2 2 2
+ sin x + sin 3x + sin 5x+
2 π 3π 5π
1 2 2 2 2
+ sin x + sin 3x + sin 5x + sin 7x+
2 π 3π 5π 7π
1 2 2 2 2 2 2 2
+ sin x + sin 3x + sin 5x + sin 7x + sin 9x + sin 11x + sin 13x,
2 π 3π 5π 7π 9π 11π 13π
we see that as we take more terms, we get a better approximation to the
square-wave function. The following theorem tells us that for almost all
points (except at the discontinuities), the Fourier series equals the function.

Theorem (Fourier Convergence Theorem) If f is a periodic func-


tion with period 2π and f and f 0 are piecewise continuous on [−π, π], then
the Fourier series is convergent. The sum of the Fourier series is equal to
f (x) at all numbers x where f is continuous. At the numbers x where f is
discontinuous, the sum of the Fourier series is the average value. i.e.
1
[f (x+ ) + f (x− )].
2
Remark. If we apply this result to the example above, the Fourier Series
is equal to the function at all points except −π, 0, π. At the discontinuity 0,
observe that
f (0+ ) = 1 and f (0− ) = 0
The average value is 1/2.
Therefore the series equals to 1/2 at the discontinuities.
10.5 Odd and even functions
Definition. A function is said to be even if

f (−x) = f (x) for all real numbers x.

A function is said to be odd if

f (−x) = −f (x) for all real numbers x.

Example. cos x, x2 , |x| are examples of even functions. sin x, x, x3 are


examples of odd functions. The product of two even functions is even, the
product of two odd functions is also even. The product of an even and odd
function is odd.

Remark. If f is an odd function then


Z π
f (x)dx = 0,
−π

while if f is an even function then


Z π Z π
f (x)dx = 2 f (x)dx.
−π 0

(i) If f (x) is odd, then

• f (x) cos(nx) is odd hence an = 0 and



• f (x) sin(nx) is even hence bn = π2 0 f (x) sin(nx)dx

(ii) If f (x) is even, then


2

• f (x) cos(nx) is even hence an = π 0
f (x) cos(nx)dx and
• f (x) sin(nx) is even hence bn = 0
Example. 1. Let f be a periodic function of period 2π such that
f (x) = x for − π ≤ x < π
Find the Fourier series associated to f .
Solution: So f is periodic with period 2π and its graph is:

We first check if f is even or odd.


f (−x) = −x = −f (x), so f (x) is odd.
Therefore,
an = 0
2 π
Z
bn = f (x)sin(nx)dx
π 0
Using the formulas for the Fourier coefficients we have
2 π
Z
bn = x sin(nx)dx
π 0
Z π
2  cos nx π cos nx
= ( −x 0
− (− )dx)
π n 0 n
2 1 sin nx
= ( [−π cos nπ] + [ 2 ]π0 )
π n n
2
= − cos nπ
n
−2/n if n is even
=
2/n if n is odd
The Fourier Series of f is therefore
f (x) = b1 sin x + b2 sin 2x + b3 sin 3x + ...
2 2 2 2
= 2 sin x − sin 2x + sin 3x − sin 4x + sin 5x + ...
2 3 4 5
2. Let f be a periodic function of period 2π such that

f (x) = π 2 − x2 for − π ≤ x < π

Solution: So f is periodic with period 2π and its graph is:

We first check if f is even or odd.

f (−x) = π 2 − (−x)2 = π 2 − x2 = f (x), so f (x) is even.

Since f is even,

bn = 0
2 π
Z
an = f (x) cos(nx)dx
π 0
Using the formulas for the Fourier coefficients we have
2 π
Z
an = f (x) cos(nx)dx
π 0
2 π 2
Z
= (π − x2 ) cos(nx)dx
π 0
Z π
2  2 2 sin nx π
 sin nx
= ( (π − x ) 0
− −2x dx)
π n 0 n
Z π
2 sin nπ sin 0 sin nx
= ([(π 2 − π 2 ) − (π 2 − 0) ]+ 2x dx)
π n n 0 n
22 π
Z
= x sin(nx)dx
πn 0
Z π
22  cos nx π cos nx
= ( −x 0
− (− )dx)
πn n 0 n
221 sin nx
= ([−π cos nπ] + [ 2 ]π0 )
πnn n
4
= − 2 cos nπ
n
−4/n2 if n is even
=
4/n2 if n is odd

It remains to calculate a0 .
2 π 2
Z
a0 = (π − x2 )dx
π 0
2 x3
= [π 2 x − ]π0
π 3
2

=
3
The Fourier Series of f is therefore
1
f (x) = a0 + a1 cos x + a2 cos 2x + a3 cos 3x + ...
2
b1 sin x + b2 sin 2x + b3 sin 3x + ...
2π 2 1 1 1 1
= + 4(cos x − cos 2x + cos 3x − cos 4x + cos 5x + ....)
3 4 9 16 25
10.6 Functions with period 2L
If a function has period other than 2π, we can find its Fourier series by making
a change of variable. Suppose f (x) has period 2L, that is f (x + 2L) = f (x)
for all x. If we let
πx Lt
t= and g(t) = f (x) = f ( )
L π
then, g has period 2π and x = ±L corresponds to t = ±π. We know the
Fourier series of g(t):

1 X
a0 + (an cos nt + bn sin nt).
2 n=1

where
1 π
Z
a0 = f (t)dt
π −π
1 π
Z
an = f (t) cos(nt)dt
π −π
1 π
Z
bn = f (t) sin(nt)dt
π −π
Lt πx
If we now use the Substitution Rule with x = π
, then t = L
, dt = ( Lπ )dx,
we have the following:
Definition. If f (x) is a piecewise continuous function on [−L, L], its Fourier
Series is ∞
1 X nπx nπx
a0 + an cos( ) + bn sin( ),
2 n=1
L L
where
1 L
Z
a0 = f (x)dx
L −L
1 L
Z
nπx
an = f (x) cos( )dx
L −L L
1 L
Z
nπx
bn = f (x) sin( )dx
L −L L
Example. Find the Fourier series of the triangular wave function defined
by f (x) = |x| − a for −a ≤ x ≤ a and f (x + 2) = f (x) for all x.

Solution: So f is periodic with period 2a and its graph is:

We first check if f is even or odd.

f (−x) = | − x| − a = |x| − a = f (x), so f (x) is even.

Therefore,

bn = 0
2 a
Z
nπx
an = f (x) cos( )dx
a 0 a
Using the formulas for the Fourier coefficients we have
2 a
Z
nπx
an = (|x| − a) cos( )dx
a 0 a
2 a
Z
nπx
= (x − a) cos( )dx
a 0 a
sin( nπx sin( nπx
Z a
2  a
) a a
)
= ( (x − a) nπ 0
− a nπ dx)
a a 0 a
2 a2  nπx a
= ( 2 2 cos( ) 0)
a nπ a
2a
= 2 2 (cos nπ − 1)

(
0 if n is even
= 4a
− 2 2 if n is odd

It remains to determine a0 .
2 a
Z
a0 = |x| − adx
a 0
2 a
Z
= x − adx
a 0
2 x2
= [ − ax]a0
a 2
= −a

The Fourier Series of f is therefore

f (x) = a0 + a1 cos x + a2 cos 2x + a3 cos 3x + ...


a 4a πx 1 3πx 1 5πx
= − − 2 (cos( ) + cos( )+ cos( ) + ...)
2 π a 9 a 25 a
10.7 Parseval’s Result
The folowing result can be used to find the sum of certain series from calcu-
lated Fourier series.
Theorem Let f be a bounded function defined in the interval [−L, L] with
at most a finite number of maxima and minima and at most a finite number
of discontinuities in the interval [−L, L], and associated to the Fourier series

1 X
a0 + (an cos nx + bn sin nx).
2 n=1

Then ∞
Z L
1 1 X
(f (x))2 dx = a20 + (a2n + b2n ).
L −L 2 n=1

Example. Let f be a periodic function defined by f (x) = x for all −π ≤


x ≤ π. Use Parseval’s result to show that

π2 X 1
= 2
.
6 n=1
n

Solution: We have shown that


2 2 2 2
f (x) = 2 sin x − sin 2x + sin 3x − sin 4x + sin 5x + ...
2 3 4 5
Using Parseval’s Identity

1 π
Z
2 1 2 X 2
(f (x)) dx = a0 + (an + b2n )
π −π 2 n=1
1  x3 π 1 1
= 4(1 + + + ...)
π 3 −π 4 9
2π 2 1 1
= 4(1 + + + ...)
3 4 9
π2 1 1
= 1 + + + ...
6 4 9

π2 X 1
=
6 n=1
n2
Summary
1. The Fourier Series of a periodic function f (x) defined on the interval
[−π, π] is:
1
a0 + a1 cos x + a2 cos 2x + a3 cos 3x + ...
2
+ b1 sin x + b2 sin 2x + b3 sin 3x + ...
where the coefficients an and bn are given by the formulae
1 π
Z
a0 = f (x)dx
π −π
1 π
Z
an = f (x)cos(nx)dx
π −π
1 π
Z
bn = f (x)sin(nx)dx
π −π
2. Even and Odd Functions for functions with period 2π.
If f (x) is odd, (i.e. f (−x) = −f (x)) , then
• an = 0 and

• bn = π2 0 f (x) sin(nx)dx
If f (x) is even, (i.e. f (−x) = f (x)), then

• an = π2 0 f (x) cos(nx)dx and
• bn = 0
3. The Fourier Series of a periodic function f (x) defined on the interval
[−L, L] is:

1 X nπx nπx
a0 + an cos( ) + bn sin( ),
2 n=1
L L
where
1 L
Z
a0 = f (x)dx
L −L
1 L
Z
nπx
an = f (x) cos( )dx
L −L L
1 L
Z
nπx
bn = f (x) sin( )dx
L −L L
4. Even and Odd Functions for functions with period 2L.
If f (x) is odd, (i.e. f (−x) = −f (x)) , then

• an = 0 and
RL
• bn = π2 0 f (x) sin( nπx
L
)dx

If f (x) is even, (i.e. f (−x) = f (x)), then


RL
• an = π2 0 f (x) cos( nπxL
)dx and
• bn = 0

5. Parseval’s Result
Z L ∞
1 1 X
(f (x))2 dx = a20 + (a2n + b2n ).
L −L 2 n=1

References
1. http://www.intmath.com/Fourier-series/Fourier-intro.php

2. http://www.falstad.com/fourier/e-index.html

3. Ian S. Murphy, Advanced Calculus

4. Mary L Boas, Mathematical Methods in the Physical Sciences.