You are on page 1of 57

Prelims: Introductory Calculus

2013 Michaelmas Term. Lecturer: Z. Qian


Mathematical Institute, Oxford

November 6, 2013

1 Standard integrals, integration by parts


It is important to grasp some basic techniques for evaluating integrals such as methods of substi-
tutions, integration by parts and etc. You should read Richard Earls lecture notes, posted on the
course web page, about a few standard substitutions.
Before presenting several examples, we need R to introduce some notations. Recall
R1 that in A-level
math, we write integral of a function f (x) as f (x)dx (indefinite integral), or 0 f (x)dx in the case
of definite integral. f (x) is called an integrand, and the expression f (x)dx is called a differential
form (of first order). It will be beneficial to introduce the notion of differentials. If y = f (x) is a
function of one variable x on some interval, then dy = df (x) f (x)dx is called the differential of
dy
f . With this definition, we may write f (x) as a fraction of differentials dx .
The Fundamental Theorem in Calculus (we will return to the FTC in Analysis III - Integration,
Trinity term) says that Z Z
df (x) = f (x)dx = f (x) + C

where C is an arbitrary constant.


The chain rule for derivatives implies that the differential of a function is invariant under
substitutions. More precisely, suppose y = f (x) is a function of x, making substitution x = g(t) so
that y = f (g(t)) is a function of t . Then dx = g (t)dt so that

dy = f (x)dx = f (g(t))g (t)dt


d
= f (g(t))dt. [Chain rule]
dt
That is df (x) = df (g(t)) if x = g(t). In other words, when we work out the differential df (x) it
doesnt matter if we consider x as a variable or as a function of another variable. This principle
also applies to differential forms of first order. The substitution method can thus be summarized
as the following equality
Z Z
f (x)dx = [Substitute x = (t)] f ((t))d(t)
Z
= f ((t)) (t)dt.

There is a similar version for definite integrals.

1
R1 dx
Example 1.1 Evaluate I = 0

42xx2
.
R dx
The integral is close to the integral 1x 2
which equals sin1 x up to a constant, so we attempt
to use this known integral. By completing square we may write

4 2x x2 = 5 (x + 1)2
 !
x+1 2

= 5 1 ,
5

x+1

making substitution t = , dx = 5dt, where t : 1 2 , we have
5 5 5

2

Z 2
1 5dt dt
Z
5 5
I = =
5 1

5
1 t2 1
5
1 t2
2 1
= sin1 sin1 .
5 5
Now let us recall the technique of integration by parts, which is in many aspects the soul of
the analysis. Integration by parts is the integral form of the product rule for derivatives. Since
(f g) = f g + f g so that
Z Z
f (x)g(x) = g(x)f (x)dx + f (x)g (x)dx

rearranging the terms to obtain


Z Z
f (x)g (x)dx = f (x)g(x) g(x)f (x)dx.

Similarly we have
Z b Z b

f (x)g (x)dx = f (x)g(x)|ba g(x)f (x)dx,
a a
or in terms of differentials we can rewrite the preceding formula as
Z b Z b
f (x)dg(x) = f (x)g(x)|ba g(x)df (x).
a a

However there is no general rule to tell us how to split an integrand into g (x)f (x).

xex dx. Then


R
Example 1.2 Consider I =
Z Z
I = xdex = xex ex dx
= (x 1)ex + C.

2
Example 1.3 Now let us consider In = xn ex dx where n = 1, 2, 3, . Using integration by parts
R

Z Z
In = x de = x e ex dxn
n x n x

Z
= x e n xn1 ex dx
n x

= xn ex nIn1

which gives an induction formula. Repeating the use of integration by parts, one can eventually
work out the result. For example

I2 = x2 ex 2I1 = x2 2(x 1) ex + C


and

I3 = x3 ex 3I2
= x3 3 x2 2(x 1) ex + C
 

etc.

Example 1.4 Consider In = cosn xdx where n is a non-negative integer. Split the integrand
R

cosn x into cosn1 x cos x = cosn1 x(sin x), and perform integration by parts. Then
Z Z
n1
In = cos x(sin x)dx = cosn1 xd sin x

Z
n1
= cos x sin x sin xd cosn1 x
Z
n1
= cos x sin x (n 1) sin x cosn2 x( sin x)dx
Z
n1
= cos x sin x + (n 1) sin2 x cosn2 xdx.

Applying the identity sin2 x = 1 cos2 x in the last integral, we obtain


Z
n1
In = cos x sin x + (n 1) (1 cos2 x) cosn2 xdx

= cosn1 x sin x + (n 1)In2 (n 1)In .

Collecting In together to obtain

nIn = (n 1)In2 + cosn1 x sin x

so that
n1 1
In = In2 + cosn1 x sin x
n n

3
which reduces the calculation of In to I0 or I1 , both are easy to evaluate. For example
/2 /2
/2
n1 1
Z Z
n n2 n1

cos xdx = cos xdx + cos x sin x
0 n 0 n 0
/2
n1
Z
= cosn2 xdx =
n 0
( R /2
n1 n3
n n2 0 cos xdx if n = odd,
= n1 n3
R /2
n n2 0 dx if n = even.

ex sin xdx. We have


R
Example 1.5 Consider I =
Z Z
I = ex d cos x = ex cos x + ex cos xdx
Z
= ex cos x + ex d sin x
Z
= e cos x + e sin x ex sin xdx
x x

= ex cos x + ex sin x I

so that
2I = ex cos x + ex sin x + C.

2 First order differential equations


A (ordinary) differential equation is an equation involving an independent variable x, a function
y(x) and its derivatives:
F (x, y, y , , y (n) ) = 0.
By solving the highest order derivative y (n) in terms of lower order derivatives y (k) for k < n and
x, the above equation may be written as

y (n) = f (x, y, y , , y (n1) ). (2.1)

Such an equation is called an n-th order differential equation. If n = 1, then it is called a first order
differential equation. Thus a first order differential equation has the general form y = f (x, y), or
implicitly F (x, y, y ) = 0.
A function y = (x) defined on some interval J is called a solution of (2.1) if

(n) (x) = f (x, (x), (x), , (n1) (x)) x J.

A function y = (x) which contains n independent arbitrary constants C1 , , Cn is called the


general solution of (2.1) if 1) it is a solution for any arbitrary choice of C1 , , Cn , 2) any solution
of (2.1) has this form.
The concept of general solutions is not very useful. We are often interested in the so-called initial
problems or boundary problems. Observe that in order to determine the constants C1 , , Cn in

4
general we need n conditions which appear as initial conditions. More precisely, an initial condition
for n-th order differential equation (2.1) may be formulated as

y(x0 ) = y0 , , y (n1) (x0 ) = yn1

where x0 J and y0 , , yn1 are given data.


A differential equation is called an (inhomogeneous) linear differential equation, if it is linear in
y, y , , y (n) , so that a linear differential equation may be written as the following general form

an (x)y (n) + an1 (x)y (n1) + + a0 (x)y = h(x)

where an , , a0 and h are functions of x. If h 0, then the linear equation is homogenous.


A first order linear differential equation can be thus put in the following general form

y + p(x)y = q(x).

2.1 Separable first order DE


dy
Consider a first order differential equation dx = f (x, y). It is separable if f (x, y) = a(x)b(y), so that
dy
dx = a(x)b(y). Dividing the equation by b(y) and multiplying it by dx to separate the variables x
and y and write the equation to be
dy
= a(x)dx.
b(y)
Integrating both sides of the equation to obtain the solution given by
dy
Z Z
= a(x)dx
b(y)

which gives solutions of a separable equation implicitly. If y0 is a root to b(y) = 0, then clearly the
constant function y = y0 is also a solution.

Example 2.1 Find the general solution to


dy
x(y 2 1) + y(x2 1) = 0.
dx
The equation is separable and can be rearranged as
xdx ydy
+ = 0.
x2 1 y 2 1
After integration we obtain
ln |x2 1| + ln |y 2 1| = C
(C is a constant), which can be put in the form

(x2 1)(y 2 1) = C.

The constant functions y = 1 or y = 1 are solutions but are already included in the above general
form with C = 0.

5
Example 2.2 Find the solution to (1 + ex )yy = ex satisfying the initial condition that y(0) = 1.
The equation is separable:
ex
ydy = dx.
1 + ex
After integration we obtain the general solution
1 2
y = ln(1 + ex ) + C.
2
To match the initial condition, we set x = 0 and y = 1 in the general solution to determine the
constant C = 21 ln 2, so that 21 y 2 = ln(1 + ex ) + 21 ln 2. After simplification we have
he i
y 2 = ln (1 + ex )2 .
4
Some differential equations of first order can be transformed by proper substitutions to separable
equations.

Example 2.3 Find the general solution to y = sin(x + y + 1).

Let u(x) = x + y(x) + 1 so that u = 1 + y . The original equation can be formulated as a DE


of u, namely u = 1 + sin u which is separable. Dividing the equation by 1 + sin u and write the
equation as
du
= dx.
1 + sin u
Integrating the equation we obtain
du
Z Z
= dx.
1 + sin u
Let us evaluate the integral on the left hand side.
du (1 sin u)du
Z Z
=
1 + sin u (1 + sin u)(1 sin u)
(1 sin u)du (1 sin u)du
Z Z
= 2 =
cos2 u
Z 1 sin Zu
du sin udu
= 2

cos u cos2 u
du d cos u 1
Z Z
= 2
+ 2
= tan u + C.
cos u cos u cos u
Therefore
1
tan u =x+C
cos u
or in terms of y and x, the solution is given by
1
tan(x + y + 1) =x+C
cos(x + y + 1)
or
sin(x + y + 1) 1 = (x + C) cos(x + y + 1).

We also have solutions x + y + 1 = 2n 2 where n =integers.

6
2.2 Homogenous equations
dy
Consider a first order differential equation dx = f (x, y). If the function f (x, y) (of two variables)
is homogenous, i.e. f (x, y) = h( xy ) where h is a function of one variable, then we can make a
substitution u(x) = y(x) dy du
x so that y = xu. The product rule gives that dx = u + x dx , and the
equation may be written as
du
u+x = f (u)
dx
which is separable.
p
Example 2.4 Find general solutions to xy = x2 y 2 + y. The equation, by dividing x both
sides, is homogenous r  y 2 y

y = 1 +
x x
y
so we make substitution u = x and change equation to be
du p
u+x = 1 u2 + u.
dx
du dx
Rearrange the equation: 1u 2
= x . Integrating both sides to obtain

sin1 u = ln |x| + C

or in terms of y, general solutions are given by sin1 ( xy ) = ln |x| + C, together with solutions y
x =1
and xy = 1.

Some differential equations of first order can be transformed into homogenous ones by simple
substitutions.
For example, consider the following type of first order differential equations
 
dy a 1 x + b1 y + c 1
=f .
dx a 2 x + b2 y + c 2
If c1 = c2 = 0 then the equation is homogenous, so we consider the case that c1 or c2 does not
vanish. If
a 1 b1
a 2 b2 = 0

and b1 6= 0, then we make substitution u(x) = a1 x+b1 y(x) to transform the equation to a separable
one. For the case where
a 1 b1
a2 b2 6= 0

we make translation x = t + k and y = z + l such that



a1 k + b1 l + c1 = 0,
a2 k + b2 l + c2 = 0.
Consider t as a new independent variable, and z as a function of t, then

z(t) = y(x) l = y(t + k) l

7
therefore, by chain rule,
dz dy
= .
dt dx
The differential equation we are interested becomes
 
dz a 1 t + b1 z
=f
dt a 2 t + b2 z
which is homogenous.

Example 2.5 Find the general solution to


 2
y+2
y = 2 .
x+y1
Solve the linear system 
l + 2 = 0,
k+l1=0
to obtain l = 2 and k = 3. Let x = t + 3 and y = z 2. Then the differential equation can be
written as  2
z
z =2
t+z
z(t)
which is homogenous. Now making standard substitution u(t) = t , so that z = u + tu and

du 2u2
u+t =
dt (1 + u)2
which is separable. Rearrange the equation
du 2u2 u(1 + u)2 u(1 + u2 )
t = =
dt (1 + u)2 (1 + u)2
and separate the variables to obtain
(1 + u)2 dt
2
du = . (2.2)
u(1 + u ) t
Since
(1 + u)2
Z  
1 2
Z
du = + du
u(1 + u2 ) u 1 + u2
= ln |u| + 2 tan1 u

therefore, by integrating the equation (2.2) we obtain

ln |u| + 2 tan1 u = ln |t| + C.

In terms of x and y the general solution is given by


y+2
ln |y + 2| + 2 tan1 = C.
x3

8
2.3 Linear differential equations of first order
Consider a linear differential equation of first order
dy
+ p(x)y = q(x) (2.3)
dx
dz
where p and q are two continuous functions. The corresponding homogenous equation dx +p(x)z =0
is separable, and has the general solution
R
p(x)dx
z(x) = Ce
R
where p(x)dx is a primitive of p(x), and C is an arbitrary constant. It follows that z(x)e p(x)dx
R

is a constant, so that
d  R 
z(x)e p(x)dx = 0
dx
which is in turn equivalent to the homogenous equation z + p(x)z = 0.
Next we consider the inhomogeneous
R equation (2.3). The previous discussion suggests to con-
sider the differential of y(x)e p(x)dx , and by employing the product rule for derivatives, we obtain
 
d  R
p(x)dx
 R
p(x)dx dy
y(x)e = e + p(x)y
dx dx
R
p(x)dx
= q(x)e (2.4)

so by integrating the equation both sides we obtain


R Z R
p(x)dx
ye = q(x)e p(x)dx dx + C
R
dividing by e p(x)dx the equality to obtain the general solution of (2.3)
R
Z R

p(x)dx p(x)dx
y=e q(x)e dx + C . (2.5)

R R
The function e p(x)dx which is multiplied to y to form ye p(x)dx is called an integrating factor
to the inhomogeneous equation (2.3).
We may describe the above procedure to obtain general solutions for first order linear differential
equations as following, which includes an idea that can be applied to other different situations, thus
is worthy of learning. R
Observe that z(x) = e p(x)dx is a non-trivial solution to the corresponding homogenous equa-
tion z + p(x)z = 0, in order to obtain the general solution to the inhomogeneous one (2.3), we
make use of the solution z(x): making substitution

y(x)
u(x) = (2.6)
z(x)

(which is a standard substitution as long as z is a known function which has some thing to do with
the differential equation we are interested. We will use this idea in several instances later on), and

9
turn (2.3) into a differential
R equation in u. Of course, according to the explicit form of z(x) we
have u(x) = y(x)e p(x)dx and (2.4) just says that
R
p(x)dx
u = q(x)e

which can be integrated to obtain the solution u.


2
Example 2.6 Solve differential equation y + 2xy = 2xex .
R 2
First work out an integrating factor r(x) = e 2xdx = ex . Multiplying r(x) both sides the
equation we obtain
2 2
ex y + 2xex y = 2x
that is
d  x2 
e y = 2x.
dx
After integration we obtain
2
ex y = x 2 + C
2
so that y = x2 + C ex is the general solution.


Example 2.7 Bernoullis equation is a non-linear first order equation


dy
+ p(x)y = q(x)y n
dx
where n 6= 0 or 1 (but not necessary an integer).

Dividing by y n , the equation becomes


1 dy
+ p(x)y 1n = q(x)
y n dx

By using transformation z = y 1n the equation is transformed to a linear equation


dz
+ (1 n)p(x)z = (1 n)q(x)
dx
so that  Z 
R R
1n (1n) p(x)dx (1n) p(x)dx
y =e (1 n) q(x)e dx + C .

3 Linear differential equations


Differential equations of second order play a special role in science. Many physical equations are
second order ordinary or partial differential equations, such as the dynamics described by Newtons
law of gravity, fluid dynamics which are determined by the fluid equations: Navier-Stokes equations,
and etc.

10
3.1 Structure of general solutions to linear differential equations
Let us first describe the structures of solutions to linear differential equations. Recall the general
linear differential equation of order n is an equation that can be written
an (x)y (n) + + a1 (x)y + a0 (x)y = f (x) (3.1)
where ai are continuous functions (on some interval) and an 6= 0.
Suppose yp is a particular solution of (3.1), then clearly, y is a solution to (3.1) if and only if
y yp is a solution to the corresponding homogenous linear DE of n-th order

an (x)y (n) + + a1 (x)y + a0 (x)y = 0. (3.2)


If y1 and y2 are two solutions to (3.2), then so is y1 + y2 , and moreover, there are n linearly
independent solutions y1 , , yn of (3.2) such that the general solution
y = C 1 y1 + + C n y n
where C1 , , Cn are arbitrary constants. That is, the collection of all solutions to a homogenous
linear equation of n-th order is a vector space of dimension n. It follows that the general solution
to (3.1) is given by
y = C 1 y1 + + C n y n + y p
where yp is a particular solution of (3.1), C1 y1 + + Cn yn is the general solution to the corre-
sponding homogenous equation (3.2).
Let us investigate again the general observation we have used to solve general linear differential
equation of first order. That is, if there is a non-trivial function z(x) which has some connection to
the differential equation we are interested (for example, for a linear equation, the function may be
a solution to the corresponding homogenous equation), we can make use of the known function in
a canonical way by making substitution that u(x) = y(x) z(x) and work with the differential equation
that u must satisfy.
Obviously the constant zero function is a trivial solution to any homogenous linear equation
which of course gives us no useful information. Suppose however we know, say by inspection, a
non-trivial solution z(x) to the homogenous equation (3.2), then we may reduce the equation to
a lower order differential equation. Let us demonstrate this idea for homogeneous second order
differential equations, for simplicity.
Suppose z(x) 6= 0 is a non-trivial solution to a homogenous linear differential equation of second
order
d2 y dy
p(x) 2 + q(x) + r(x)y = 0. (3.3)
dx dx
Making the standard substitution u(x) = y(x)
z(x) , so that y = uz. Then y = u z + uz and y =
u z + 2u z + uz , substitute these equations to (3.3) we obtain
p(x) u z + 2u z + uz + q(x) u z + uz + r(x)uz = 0.
 

Rearrange the above equation and use the fact that z is a solution to (3.3)
p(x)z(x)u + 2p(x)z (x) + q(x)z(x) u = 0

(3.4)
which is a homogenous differential equation of first order for u .

11
1
Example 3.1 Verify that z(x) = x is a solution to

xy + 2(1 x)y 2y = 0

hence find its general solution.

Since z = x2 and z = 2x3 we can easily see that z is a solution. Making substitution
y(x) = x1 u(x) in the equation we obtain a differential equation for u:
 
1 1
x u + 2x2 x + 2(1 x) u = 0.
x x

Let w = u and simplify the above equation:

w 2w = 0

which is separable, and has the general solution w(x) = C1 e2x . Integrating w to obtain
Z
u(x) = w(x)dx = C1 e2x + C2

so that
1
C1 e2x + C2

y(x) = (3.5)
x
is the general solution, where C1 and C2 are arbitrary constants.

Example 3.2 Find the general solution to the inhomogeneous linear equation

xy + 2(1 x)y 2y = 12x.

We have found the general solution to the corresponding homogenous equation which is given
by (3.5), thus, according to the structure of solutions to linear equations, we only need to find a
particular solution. Since the coefficients of the equation are all polynomials in x so we may look
for a solution with a form y(x) = ax+b where a, b are constants. Plugging into the equation y = 0,
y = a and y = ax + b into the equation

2a(1 x) 2(ax + b) = 12x

so we should have 2a2b = 0 and 2a2a = 12 so that a = 3 and b = 3. Thus y0 (x) = 3x3
is a particular solution, and the general solution thus is given by
1
C1 e2x + C2 3x 3.

y(x) =
x

12
3.2 Linear ODE with constant coefficients
For homogenous linear ODE with constant coefficients:
y (n) + an1 y (n1) + + a1 y + a0 y = 0 (3.6)
where an1 , , a0 are constants, we can construct its general solution if we can find the roots to
the auxiliary equation
mn + an1 mn1 + + a1 m + a0 = 0. (3.7)
The auxiliary equation comes from the following observation. Since the derivative of emx is memx
it is thus reasonable to search for a solution y = emx . Substitute y (k) = mk emx into (3.6) we have
mn + an1 mn1 + + a1 m + a0 emx = 0


thus emx is a solution if and only if m is a root to (3.7) and as long as m is real. If m = + i is a
complex root of the auxiliary equation, then since the coefficients an1 , , a0 are real numbers, so
that m = i is also a root. Now the complex functions emx and emx both satisfy the differential
equation (3.6) so that the real part and imaginary parts of
emx = ex cos(x) + iex sin(x)
(Eulers equation) are solutions of (3.6), i.e. if m = + i is a complex root of the auxiliary
equation, then
y1 (x) = ex cos(x)
and
y2 (x) = ex sin(x)
are a pair of linearly independent solutions of (3.6).
If m is a repeated root to the auxiliary equation with multiplicity k 2, then emx , xemx , , xk1 emx
are solutions. The similar conclusion is valid for complex roots. We therefore are able to construct
n linearly independent solutions to (3.6) via the roots to the auxiliary equation.
Example 3.3 Consider the harmonic motion described by
d2 y
+ 2 y = 0
dx2
where 6= 0 is real. The auxiliary equation is m2 + 2 = 0 which has two complex roots m = i
and m. So we have two independent solutions cos x and sin x and the general solution
y(x) = A cos x + B sin x
where A, B are arbitrary constants.
Example 3.4 Solve the equation
d3 y d2 y dy
3
4 2
+ + 6y = 0.
dx dx dx
The auxiliary equation
m3 4m + m + 6 = 0
has roots 1, 2, 3 so the general solution
y(x) = C1 ex + C2 e2x + C3 e3x .

13
The situation for second order differential equations with constant coefficients is particularly
simple. Consider the homogenous linear equation

d2 y dy
2
+a + by = 0 (3.8)
dx dx
where a, b are two real numbers.

Theorem 3.5 Suppose the auxiliary equation

m2 + am + b = 0

has two roots m1 and m2 .


1) If m1 6= m2 are real, then the general solution is given by

y(x) = C1 em1 x + C2 em2 x .

2) If m = m1 = m2 is a repeated real root, then the general solution

y(x) = (C1 + C2 x) emx .

3) If m1 = + i is a complex root ( 6= 0) so that m2 = i, then the general solution

y(x) = ex (C1 cos x + C2 sin x) .

Proof. Note that a = (m1 + m2 ) and b = m1 m2 . We consider 1) and 2) first. In this case
em1 x is a solution, so we make substitution y(x) = u(x)em1 x in the differential equation. Since

y = u + m1 u em1 x


and
y = u + 2m1 u + m21 u em1 x


we obtain
u + 2m1 u + m21 u + a u + m1 u
 
+ bu = 0.
Using the fact that 1 is a root and that a = (m1 + m2 ), we have

u (m2 m1 ) u = 0.

Thus, if m2 m1 6= 0,
u (x) = C1 e(m2 m1 )x
and integrating the equation to obtain

u(x) = C1 e(m2 m1 )x + C2

which proves 1). If m2 m1 = 0 then u = 0 so by integrating twice to obtain

u(x) = C1 + C2 x

which shows 2).

14
Example 3.6 Solve the differential equation

d2 y dy
2
2 + 5y = 0.
dx dx
The auxiliary equation m2 2m+5 = 0 has complex roots m = 1+2i and m, so the general solution

y(x) = C1 ex cos 2x + C2 ex sin 2x.

Next we give some examples for inhomogeneous linear equations.

Example 3.7 Solve the equation


d2 y
+ 4y = sin 3x.
dx2
It is easy to find the general solution to the corresponding homogeneous equation

d2 y
+ 4y = 0
dx2
whose auxiliary equation m2 + 4 = 0 has two complex roots 2i. Since sin 3x is the imaginary
part of e3ix , and 3i is not a root of the auxiliary equation. Thus we search for a particular solution
yp (x) = A sin 3x. Plugging it into the equation we find A = 15 . Hence the general solution

1
y(x) = C1 cos 2x + C2 sin 2x sin 3x.
5
Example 3.8 Consider
d2 y dy
2
+4 + 4y = sin 3x.
dx dx

The auxiliary equation m2 + 4m + 4 has a repeated root 2. There is no particular solution


with a form A sin 3x by a simple inspection, instead we look for a particular solution

yp (x) = A cos 3x + B sin 3x.

Then
9A + 12B + 4A = 0
and
9B 12A + 4B = 1.
Thus
12 5
A= ,B= .
169 169
The general solution
12 5
y(x) = (C1 x + C2 ) e2x cos 3x sin 3x.
169 169

15
Example 3.9 Let us now consider

d2 y
+ 4y = sin 2x.
dx2
We have seen that sin 2x is a solution to the corresponding homogenous equation, so we look
for a particular solution
yp (x) = Ax cos 2x + Bx sin 2x.
Then B = 0 and A = 14 , so the general solution
1
y(x) = C1 cos 2x + C2 sin 2x x cos 2x.
4
Example 3.10 Find a particular solution to

d2 y
+ 4y = sin x + sin 2x .
dx2
1
By a simple inspection, y1 = 3 sin x is a particular solution to

d2 y
+ 4y = sin x
dx2
and we know from the previous example y2 = 14 x cos 2x is a particular solution to

d2 y
+ 4y = sin 2x.
dx2
Thus
1 1
yp = sin x x cos 2x
3 4
is a particular solution.

Example 3.11 Let us consider inhomogeneous linear equation

d2 y dy
2
3 + 2y = f (x)
dx dx
where f (x) is a given function.

The auxiliary equation m2 3m + 2 = 0 has two real roots 1 and 2, so the general solution to
the corresponding homogenous equation is C1 ex + C2 e2x .
1) Suppose f (x) = sin x which is the imaginary part of eix , since i is not a root of the auxiliary
equation, so we may search for a particular solution yp = A sin x + B cos x, but not just A sin x
which is not good. Feeding yp , yp = A cos x B sin x and yp = yp into the differential equation

yp 3 (A cos x B sin x) + 2yp = sin x

and collecting the terms of sin x and cos x together to obtain

(2A + 3B 1) sin x + (2B 3A) cos x = 0.

16
2 3
Set 2A + 3B 1 = 0 and 2B 3A = 0, and solve the system to obtain A = 13 and B = 13 . The
general solution is given by
2 3
y = C1 ex + C2 e2x + sin x + cos x.
13 13
2) f (x) = e3x . Since 3 is not a root of the auxiliary equation, so search for a particular solution
yp = Ae3x . Feeding it into the differential equation:

(9A 9A + 2A) e3x = e3x

to obtain a particular solution yp = 21 e3x .


If however f (x) = e2x , then we may attempt a particular solution yp = Axe2x as e2x is a solution
to the corresponding homogenous equation. Using the equations yp = Ae2x + 2yp and

yp = 2Ae2x + 2yp = 4Ae2x + 4yp

feeding them into the differential equation

4Ae2x + 4yp 3 Ae2x + 2yp + 2yp = e2x




and collecting the terms e2x and yp together

(4A 3A 1) e2x = 0

so that A = 1, i.e. yp = xe2x is a solution, so that the general solution to

d2 y dy
2
3 + 2y = e2x
dx dx
is given by
y = C1 ex + C2 e2x + xe2x
3) f (x) = xe2x . Since 2 is a root of the auxiliary equation, so we may search for a particular
solution in a form yp = (Ax2 + Bx)e2x (we have included Bxe2x as well, since e2x is a solution to
the homogenous equation, but not xe2x ).
4) f (x) = ex sin x which is the imaginary part of e(1+i)x and 1 + i is not a root of the auxiliary
equation, so we may search for a particular solution yp = (A cos x + B sin x)ex .
5) f (x) = sin2 x. Since sin2 x = 21 12 cos 2x, so we may attempt a particular solution with a
form yp = A + B cos 2x + C sin 2x.

4 Some facts about matrices


An m n matrix A is an array of numbers arranged into m rows and n columns

a11 a12 a1n
a21 a22 a2n
A= . .. .. .

.. . .
am1 am2 amn

17
In short we may write A = (aij ), where aij is the entry in the ith row and jth column. If m = n,
then A is called a square matrix.
Let us concentrate on 2 2 matrices. You will learn the general theory about matrices in linear
algebra (topics in your paper Mathematics I).
First of all, we have elementary operations among 2 2 matrices: if
   
a11 a12 b11 b12
A= ,B=
a21 a22 b21 b22

then we can form a matrix A + B by simply adding their corresponding entries


 
a11 + b11 a12 + b12
A+B = .
a21 + b21 a22 + b22

If is a number we may form a matrix


 
a11 a12
A = .
a21 a22

That is A B = (aij bij ) and A = (aij ). The more interesting operation is the multiplication
of two matrices, defined as the following
  
a11 a12 b11 b12
AB =
a21 a22 b21 b22
 
a11 b11 + a12 b21 a11 b12 + a12 b22
= .
a21 b11 + a22 b21 a21 b12 + a22 b22

That is, if AB = (cij ) then the entry


 
b1j
cij = (ai1 , ai2 )
b2j
= dot product of (ai1 , ai2 ) and (b1j , b2j ).

Example 4.1 Let    


2 1 0 3
A= and B =
1 3 2 5
Find A + B, A B, A, AB and BA.

In general we have A + B = B + A, C(A + B) = AC + CB, (AB)C = A(BC), but the


multiplication of matrices is in general not commutative.
The determinate of a 2 2 matrix A is denoted by det(A) or |A| defined by

a11 a12
det(A) = = a11 a22 a12 a21 .
a21 a22

The mapping A det(A) is not additive, but it is multiplicative i.e.

det(AB) = det(A) det(B) = det(BA).

18
We will use I to denote the identity matrix
 
1 0
.
0 1
It is trivial that IA = AI for any 2 2 matrix. Clearly det(I) = 1.
Given a 2 2 matrix A, we say a 2 2 matrix B (if ever exists) is an inverse matrix of A if
AB = BA = I. Since det(AB) = det(A) det(B), so that a necessary condition for the existence of
an inverse matrix to A is that det(A) 6= 0. It turns out this condition is also sufficient.
Theorem 4.2 Let A = (aij ) be a 2 2 matrix. Then A has an inverse matrix if and only if
det(A) 6= 0. In this case, the inverse matrix is unique and thus is denoted by A1 , given by
 
1 1 a22 a12
A = .
det(A) a21 a11
Proof. By a direct computation we can see that A1 defined as above is an inverse matrix. If
B is an inverse of A, then
B = B(AA1 ) = (BA)A1 = IA1 = A1
so the inverse matrix is unique.  
a11
We observe that det(A) = 0, i.e. a11 a22 = a21 a12 means two column vectors and
a21
 
a12
are proportional, that is, they are linearly dependent.
a22
 
2 v1
Let us consider R as the vector space of column vectors v = (also consider as 2 1
v2
matrix). Let A = (aij ) be a 2 2 matrix. Then we associate A a linear mapping from R2 R2
denoted by A and defined by
    
a11 a12 v1 a11 v1 + a12 v2
Av = = .
a21 a22 v2 a21 v1 + a22 v2
Proposition 4.3 Let A = (aij ) be a 2 2 matrix.
1) The linear system Av = 0 has non zero solutions if and only if det(A) = 0.
2) The linear system Av = v has a solution v 6= 0 if and only if is an eigenvalue of A, that
is, det(A I) = 0 (which is called the characteristic equation of A). In this case, v is called an
eigenvector (corresponding to the eigenvalue ).
A square matrix A = (aij ) is diagonal if aij = 0 for any i 6= j.
Theorem
 4.4
 Suppose a 2 2 matrix A = (aij ) has distinct real eigenvalues 1 and 2 , and let
v1i
vi = be corresponding vectors with eigenvalues i (i = 1, 2). Let
v2i
 
v11 v12
P = (v1 , v2 ) = .
v21 v22
Then  
1 1 0
P AP = .
0 2

19
Proof. First show that P is invertible, which is equivalent to that v1 and v2 are linearly
independent. Suppose v1 + v2 = 0, so that Av1 + Av2 = 0. Hence 1 v1 + 2 v2 = 0. It
follows that (2 1 )v2 = 0, so that = 0 and similarly = 0. Therefore v1 and v2 are linearly
independent, and P 1 exists.
By definition

AP = A (v1 , v2 ) = (Av1 , Av2 )


 
1 0
= (1 v1 , 2 v2 ) = P .
0 2

Since P 1 exists so that  


1 1 0
P AP = .
0 2

Example 4.5 Find all the eigenvalues and eigenvectors for the following matrices
     
2 1 2 1 0 1
A= , B= and C = .
6 3 0 2 1 0

5 Systems of linear differential equations


Consider a linear differential equation of order n:

y (n) + an1 y (n1) + + a0 y = f (t).

By introducing functions yk = y (k) where k = 0, , n 1, the previous linear equation of order


n is equivalent to the following system of linear equations of first order:


yn1 = an1 yn1 a0 y0 + f (t),

y = yn1 ,
n2







y0 = y1 .

For example, a second order linear differential equation

d2 y dy
2
+ a + by = f (t)
dt dt
is equivalent to the system  dx
dt = ax by + f (t)
dy
dt = x.
In terms of matrix notations, it can be written as
 dx      
dt a b x f (t)
dy = + .
dt
1 0 y 0

20
Example 5.1 Solve the following initial value problem
 dx
dt = 3x + y, x(0) = 1;
dy
dt = 6x + 4y, y(0) = 1.
dx
From the first equation, substitute y = dt 3x to the second equation, to obtain

d2 x dx dx
3 = 6x + 4 12x
dt2 dt dt
so x solves the homogenous linear equation of second order
d2 x dx
2
7 + 6x = 0
dt dt
whose auxiliary equation has roots 1 and 6, so x(t) = C1 et + C2 e6t . Since x(0) = 1 and x (0) =
y(0) + 3x(0) = 4 we have
C1 + C2 = 1, C1 + 6C2 = 4.
3
Thus C2 = 5 and C1 = 52 , and
2 3 4 9
x(t) = et + e6t , y(t) = et + e6t .
5 5 5 5
We next describe another method which is contained in the following

Theorem 5.2 Consider the system of linear equations with constant coefficients
 dx    
dt a11 a12 x
dy = .
dt
a21 a22 y
 
v1k
Suppose A = (aij ) has distinct eigenvalues 1 and 2 with corresponding eigenvectors vk =
v2k
(k = 1, 2). Then the general solution of the system is given by
 
x(t)
= C 1 e 1 t v 1 + C 2 e 2 t v 1
y(t)

where C1 and C2 are arbitrary constants.

Proof. Let P = (v1 , v2 ). We know that P 1 exists. The system may be written as
     
x x 1 x
= A = AP P .
y y y

In order to use the fact that  


1 1 0
P AP =
0 2
we introduce a pair of functions z1 and z2 defined by
   
z1 (t) x(t)
z(t) = = P 1 .
z2 (t) y(t)

21
Then
 d 
d 1 x(t)
z(t) = P dt
d = P 1 AP z(t)
dt dt y(t)
  
1 0 z1 (t)
= .
0 2 z2 (t)

That is
z1 (t) = 1 z1 (t) and z2 (t) = 2 z2 (t)
so that zk (t) = Ck ek t (k = 1, 2). Hence
     
x(t) z1 (t) z1 (t)
= P = (v1 , v2 )
y(t) z2 (t) z2 (t)
= C 1 e 1 t v 1 + C 2 e 2 t v 2 .

Remark 5.3 If 1 = + i is complex (where 6= 0) and v = v1 + v2 i is a (complex) eigenvector,


then the general solution is given by
 
x(t)
= C1 et ((cos t) v1 (sin t) v2 )
y(t)
+C2 et ((sin t) v1 + (cos t) v2 )

where C1 , C2 are arbitrary constants.

Example 5.4 Solve the system of linear differential equations


 dx    
dt 0 1 x
dy = .
dt
2 3 y

Solve the characteristic equation det(A I) = 0, i.e. 2 3 + 2 = 0 to obtain eigenvalues


1 = 1 and 2 = 2. For 1 = 1, solve the linear system
  
01 1 c1
=0
2 3 1 c2
 
1
to obtain c1 = c2 , so is an eigenvector with eigenvalue 1. Similarly, solve the linear system
1
  
02 1 c1
=0
2 3 2 c2
 
1
to obtain an eigenvector , so the general solution
2
     
x(t) t 1 2t 1
= C1 e + C2 e .
y(t) 1 2

22
Example 5.5 Solve the system
 dx    
dt 2 5 x
dy = .
dt
2 4 y

The characteristic equation of the matrix in system



2 5

2 = 2 + 2 + 2 = 0
4

has a pair of conjugate complex roots 1 = 1 + i and 2 = 1 i. For 1 = 1 + i the linear


system   
2 1 5 c1
=0
2 4 1 c2
has a solution vector    
5 0
+ i
3 1
thus
    
 
x(t) t 5 0
= C1 e cos t sin t
y(t) 3 1
    
0 5
+C2 et cos t + sin t .
1 3

Example 5.6 Solve the initial value problem to the linear system
 dx    
dt 2 1 x x(0) = 1,
dy = ,
dt
4 6 y y(0) = 1.

The characteristic equation of the matrix in the system



2 1 2
4 6 = ( 4) = 0

has repeated root 4, so x(t) = y(t) = e4t is a solution to the system. Taking into account the initial
condition, we may set x(t) = (At + 1)e4t and y(t) = (Bt + 1)e4t , and feed them into the system to
obtain A = 1 and B = 2. Thus the solution to the initial problem is given by

x(t) = (1 t)e4t ,


y(t) = (1 2t)e4t .

6 Partial derivatives, chain rule


From this section, we study functions of several variables.

23
6.1 Computations of partial derivatives
Let us begin with a (real) function of two variables, u = f (x, y) defined on an open subset such as
an open disk, and begin with the partial derivatives of f . By saying a subset U of R2 (resp. Rn )
an open subset we mean that if any point p U there is an open disk (resp. an open ball in Rn )
Bp (r) centered at p with radius r > 0 such that Bp (r) U .
Holding y = y0 as constant, consider f (x, y0 ) as a function of x, if its derivative (in x) exists at
x0 , i.e.
f (x, y0 ) f (x0 , y0 )
lim
xx0 x x0
exists, then its limit is called the partial derivative of f in x at (x0 , y0 ), denoted by one of the
following notations

u f (x0 , y0 )
, ; ux , fx (x0 , y0 ); Dx u, Dx f (x0 , y0 ).
x x
It was C.G.J.Jacobi who first proposed to use symbol instead of d for partial derivatives. Similarly
we may introduce partial derivative in y, denoted by u y etc. The definition of partial derivatives
applies as well to functions of three variables, and to functions of several variables.

Example 6.1 Find partial derivatives for u = y x . Holding y as constant then u is an exponential
function and u x u
x = y ln y, while if hold x as constant, it is a power function so that y = xy
x1 .

x
Example 6.2 Find partial derivatives for u = x2 +y 2 +z 2
. The results are

u 2x2 1
= +
x (x2 y2
+ + z 2 )2 x2 + y 2 + z 2
y2 + z 2 x2
= ,
(x2 + y 2 + z 2 )2

u 2xy u 2xz
= 2 , = .
y (x2 + y 2 + z 2 ) z (x2 + y 2 + z 2 )2

Example 6.3 Let u = yf (x2 y 2 ) where y 6= 0, and f (t) is a differentiable function with continuous
derivative f (t). Then
1 u 1 u u
+ = 2.
x x y y y
In fact, according to the chain rule we have
u u
= 2xyf (x2 y 2 ), = f (x2 y 2 ) 2y 2 f (x2 y 2 )
x y
so that
1 u 1 u 1 u
+ = f (x2 y 2 ) = 2 .
x x y y y y

24
Suppose u = f (x, y) whose partial derivatives u u u
x and y exist on an open subset, so that x
and u u
y are functions of variables x and y. Suppose that the partial derivative of x in x exists,
u u 2u
then x x
= x ( x ) is called the second order partial derivative of u, denoted by x2
or by any of
the following
2 f (x0 , y0 ) 2 2
; uxx , fxx (x0 , y0 ); Dxx u, Dxx f (x0 , y0 ).
x2
u 2u
Similarly y ( x ) is denoted by yx etc. Higher order derivatives can be defined inductively.

x
Example 6.4 Find partial derivatives of u = tan1 y up to second order. In fact

u y u x
= 2 2
, = 2
x x + y y x + y2
and
2u
 
y 2xy
2
= 2 2
= 2 ,
x x x + y (x + y 2 )2
2u x2 y 2 2u
 
y
= = = ,
xy y x2 + y 2 (x2 + y 2 )2 yx
2u
 
x 2xy
2
= 2 2
= 2 .
y y x +y (x + y 2 )2
In particular, u solves the Laplace equation

2u 2u
+ 2 = 0.
x2 y
We can carry on to find

2u 6xy 2 2x3
 
2xy
2
= = and etc.
x y y (x + y 2 )2
2 (x2 + y 2 )3

6.2 The chain rule


Let us concentrate on functions of two variables for simplicity, but what we are going to do can be
generalized to functions of several variables with proper modifications.

Lemma 6.5 Suppose that u = f (x, y) defined on an open subset U has first order partial derivatives
u u
x and y which are continuous functions on U . Let (x0 , y0 ) U , x = x x0 , y = y y0 and
u = f (x, y) f (x0 , y0 ). Then

f (x0 , y0 ) f (x0 , y0 )
u = x + y + (6.1)
x y

where the remainder depends on (x0 , y0 ) and (x, y) and



lim p =0.
(x,y)(x0 ,y0 ) x2 + y 2

25
That is, is small in comparison with x and y, thus the main part of the increment u at
(x0 , y0 ) is
f (x0 , y0 ) f (x0 , y0 )
x + y
x y
which is linear in the increments (x, y) of independent variables, called the first order differential
of f at (x0 , y0 ).

Proof. By a simple inspection we can see easily that


 
f (x, y) f (x0 , y) f (x0 , y)
= x
x x
 
f (x0 , y) f (x0 , y0 ) f (x0 , y0 )
+ y
y y
 
f (x0 , y) f (x0 , y0 )
+ x
x x

where we have used the convention that, if x = 0 , then the term

f (x, y) f (x0 , y)
= 0.
x
Similarly if y = 0,
f (x0 , y) f (x0 , y0 )
= 0.
y
Then, since
x y
p , p
x2 + y 2 x2 + y 2
p
are bounded, as x2 + y 2 0 we have
 
f (x, y) f (x0 , y) f (x0 , y) x
p 0,
x x x2 + y 2

and  
f (x0 , y) f (x0 , y0 ) f (x0 , y0 ) y
p 0.
y y x2 + y 2
Since the partial derivatives are continuous, so thar

f (x0 , y) f (x0 , y0 )
0
x x
p
as x2 + y 2 0. Putting these facts all together we may conclude that
p
p 0 as x2 + y 2 0.
x2 + y 2

26
The first order differential of u = f (x, y), denoted by du or df , is defined as

f (x, y) f (x, y)
df = dx + dy.
x y
The function
f (x0 , y0 ) f (x0 , y0 )
z = f (x0 , y0 ) + (x x0 ) + (y y0 )
x y
is the linear approximation of z = f (x, y) near the point (x0 , y0 ). The above linear equation in
(x, y, z) represents the tangent plane to the surface graph of the function z = f (x, y) at the point
(x0 , y0 , f (x0 , y0 )). We will return to this topic in the following lectures.

Lemma 6.6 (Chain rule for two variable functions) Suppose that f (x, y) is function on an open
subset U R2 with continuous partial derivatives f f
x and y , and suppose x = (t) and y = (t)
are two differentiable functions on an interval (a, b) such that ((t), (t)) U for every t (a, b).
Let F (t) = f (x(t), y(t)). Then F is differentiable on (a, b) and

dF (t) f d(t) f d(t)


= + (6.2)
dt x dt y dt
f f
where the partials x and y are evaluated at x = (t) and y = (t).

Proof. For t0 and t in (a, b) we have


f f
F (t) F (t0 ) = x + y +
x y
where
x = (t) (t0 ), y = (t) (t0 ).
Hence
F (t) F (t0 ) f x f y
= + + .
t x t y t t
Letting t 0, we obtain
x y
(t0 ), (t0 ),
t t
s
x 2 y 2 |t|
  

=p + 0,
t x2 + y 2 t t t
and therefore
F (t) F (t0 )
F (t0 ) = lim
tt0 t
f x f y
= lim + lim + lim
x tt0 t y tt0 t tt0 t
f d(t0 ) f d(t0 )
= + .
x dt y dt

27
Suppose we make change of variables: x = (s, t) and y = (s, t), assume that and have
continuous partial derivatives. Consider F (s, t) = f ((s, t), (s, t)). Holding t as constant and
applying the chain rule (6.2) to variable s we obtain

F f f
= +
s x s y s
and similarly
F f f
= + .
t x t y t
In terms of matrices, the chain rule may be put into a neat form
     
F F f f s t
, = , ,
s t x y s t

the 2 2 matrix on the right hand side is called the first order total derivative (or the Jacobian
matrix) of the transformation x = (s, t) and y = (s, t), denoted by D(, ).
If all involved functions have continuous higher order partial derivatives, then we may repeat
the use of the chain rule.
2F
Example 6.7 Let F (s, t) = f ((s, t), (s, t)). Evaluate ts .
2F F

By definition ts = s t , so that

2F
 
f f
= +
ts s x t y t
   
f f
= +
s x t s y t
f f 2
 
= +
s x t x ts
f f 2
 
+ + [Product rule]
s y t y ts
 2
2 f f 2

f f
= 2
+ + [chain rule to ]
x s xy s t x ts x
 2
f 2 f f 2

f
+ + 2 + [chain rule to ].
yx s y s t y ts y

Here, the important thing we should keep in mind when working with higher partial derivatives is
that the symbols f f
x and y are again functions of x and y, hence of s and t, so we have to apply
the chain rule to these functions as well.

As a direct consequence of the chain rule, we can show that the first order differentials are
invariant under substitutions. To be more precise, if F (s, t) = f ((s, t), (s, t)), then

f f F F
df = dx + dy = ds + dt
x y s t

28
where

dx = ds + dt, dy = ds + dt.
s t s t
The invariance of the first differentials under change of variables is useful in evaluating partial
derivatives, but more importantly, it implies that differentials of functions are globally defined
objects which do not depend on the coordinates we use to evaluate them.
Let us write down the chain rule for several variable functions.
Suppose that f (x1 , , xm ) is a function of m variables which has continuous partial derivatives.
Consider change of variables given by

x1 = 1 (t1 , , tn ),

.. .. (6.3)
. .
xm = m (t1 , , tn )

i
where n N and 1 , , m are functions of (t1 , , tn ) which have continuous derivatives tj .
Let
F (t1 , , tn ) = f (1 (t1 , , tn ), , m (t1 , , tn )).
Then
F f 1 f m
= + + (6.4)
tj x1 tj xm tj
where j = 1, , n. In terms of matrix notations, the chain rule may be put in the following form
1
1

    t1 tn
F F f f . .. ..
, , = , , .. . . (6.5)
t1 tn x1 xm m m
t1 tn

where the m n matrix on the right-hand side of (6.5) is called the first order total derivative
associated with the transformation (6.3), denoted by D(1 , , m ). A careful study about the
total derivatives for vector valued functions such as (6.3) will be the topic of Part A Multi-Variable
Calculus (Trinity Term in your second year).

Example 6.8 Consider u = xy where x = (t) and y = (t) so that u(t) = (t)(t) . According to
the chain rule
z z
u (t) = x (t) + y (t)
x y
= (t)yxy1 + (t)xy ln x
= 1 + ln .

Example 6.9 Let u = f (x, y, z) have continuous partial derivatives. Let x = , y = and
z = . Work out the matrix of the first order total derivative for the transformation
x x x
0 1 1
y y y
= 1 0 1
z z z 1 1 0

29
so, by the chain rule,

    0 1 1
f f f f f f 1 0
, , = , , 1
x y z
1 1 0
 
f f f f f f
= + , , + ,
y z x z x y

that is,
u f f u f f u f f
= + , = , = + .
y z x z x y
Using chain rule again we have

2u
 
f f f f
= + = +
y z y z
 2 2 2 2f

f f f
= + .
yx yz zx zz

6.3 Partial derivatives for implicit functions


The chain rule allows us to evaluate partial derivatives for implicit functions. Let us look at an
example first.

Example 6.10 Let y = y(x) be the function implicitly 2 2


given by the equation x + y = 1 where
x > 0 and y > 0. Of course by solving y to obtain y = 1 x2 , thus
dy 1 2x x
= = .
dx 2 1x 2 1 x2
dy
We can work out the derivative dx by just use the equation x2 + y 2 = 1. Taking derivative both
sides of the equation in x, keeping in mind y is a function of x, we obtain
d d
x2 + y 2 =

1=0
dx dx
so that
dy
2x + 2y =0
dx
dy dy
and solve dx to obtain dx = xy which gives just the same answer.

The idea used in the previous example can be applied to evaluating partial derivatives for
implicit functions. Suppose that y = y(x) is a function of x implicitly given by an equation

F (x, y) = 0 .

In order to solve y from the equation to determine a function y = y(x) at least locally, we need
to impose some conditions. Let us assume the partial derivatives of F (by considering x, y as

30
dy
independent variables) exist and are continuous. To find out the derivative dx , we take derivative
both sides of the equation F (x, y) = 0, in x, keep in mind that y = y(x) is a function of x. Then

F F dy
+ = 0.
x y dx

The left hand side is the result from applying the chain rule to F with x = x and y = y(x). In
dy
order to be able to solve dx we need to assume that Fy 6= 0, and we obtain

dy Fx
= .
dx Fy

In fact the condition that Fy 6= 0 ensures that the equation F (x, y) = 0 determines locally a function
y = y(x).
The previous idea applies to several variable implicit functions. For example, if z = z(x, y) is
function implicitly given by the following equation

F (x, y, z) = 0,

and if, the partial derivatives of F (considering x, y, z as independent variables) are continuous,
and Fz =6 0, then, by taking derivative in x holding y as constant to obtain
z
Fx + Fz =0 (6.6)
x
so that
z Fx
= .
x Fz
Similarly we have
z Fy
= .
y Fz
To compute the second partial derivatives, we continue the same procedure. Taking derivative both
side of (6.6) in y to obtain

2z
 
z z z
Fxy + Fxz + Fzy + Fzz + Fz =0
y y x xy
2z
and solving xy we obtain
 
z z z
2z Fxy + Fxz y + Fzy + Fzz y x
=
xy Fz
etc., though the results become increasingly complicated.
Finally we mention that the same idea applies to several functions with several variables. For
example, from the following system

F (x, y, z) = 0,
(6.7)
G(x, y, z) = 0,

31
we hope to solve y and z in terms of variable x, thus y = y(x) and z = z(x). By saying that y(x)
and z(x) are solutions means that if we substitute (y, z) in the system (6.7) by (y(x), z(x)), then
F (x, y(x), z(x)) = 0, G(x, y(x), z(x)) = 0 (6.8)
hold identically over the range of x. Therefore, by taking derivative on both sides of the equations
in x, and employing the chain rule, we have
dy dz dy dz
Fx + Fy + Fz = 0, Gx + Gy + Gz = 0, (6.9)
dx dx dx dx
dy dz
which is a linear system in ( dx dx ), and can be put in a matrix form, namely
   dy   
Fy Fz dx
Fx
dz = . (6.10)
Gy Gz dx
Gx
dy dz
We may solve dx and dx as long as
 
Fy Fz
det = Fy Gz Fz Gy 6= 0. (6.11)
Gy Gz
[In fact, near the point (x, y, z) where Fy Gz Fz Gy 6= 0, we can show that (y, z) can be solved from
the system (6.7) at least locally, which is a part of the conclusion of the so-called Inverse Function
Theorem. The proper formulation and its proof of the inverse function theorem will be the topics
for the Part A option Multi-Variable Calculus in Hilary term]. Indeed, since under the condition
(6.11) the matrix  
Fy Fz
Gy Gz
is invertible, so that
 dy   1  
dx
Fy Fz Fx
dz =
dx
Gy Gz Gx
  
1 Gz Fz Fx
=
Fy Gz Fz Gy Gy Fy Gx
 
1 Gz Fx Gx Fz
= .
Fy Gz Fz Gy Fy Gx Fx Gy
Hence
dy Gz Fx Gx Fz F Fx Fz Fy
= z

= /
dx Fy Gz Fz Gy Gz Gx Gz Gy
and
dz Fy Gx Fx Gy F Fx Fz Fy
= y

= / .
dx Fy Gz Fz Gy G y G x
Gz Gy

Example 6.11 Let y = y(x) and z = z(x) be the functions satisfying the following equations
x2 y 2 z 2
+ 2 + 2 = 1 and x + y + z = 0
a2 b c
dy dz
where x > 0, y > 0 and z > 0. Find the derivatives dx and dx .

32
We may differentiate the equations in x, while keep in mind y and z are functions of x, so
according to chain rule
2x 2y dy 2z dz d
+ 2 + 2 = 1 = 0, (6.12)
a2 b dx c dx dx
and
dy dz d
1+ + = 0 = 0. (6.13)
dx dx dx
From (6.13) we obtain
dz dy
= 1
dx dx
and substitute it into (6.12) to get
 
x y dy z dy
+ + 1 =0
a2 b2 dx c2 dx
dy
from which we may solve dx , hence
z y
dy c2
ax2 dz b2
ax2
= y , =
dx b2
cz2 dx z
c2
by2
y z
at those points where b2
c2
6= 0.

6.4 Some differential operators



The symbols for partial differentiation such as x , y etc. may be considered as operations acting
on functions (which have continuous partial derivatives), sending f to its partial derivatives f
x ,
f
y etc. It is useful to be familiar with some differential operators which are used extensively in
science.

The symbol In the n-dimensional Euclidean space Rn , the symbol means the total differ-
entiation. Under the Cartesian coordinate system (x1 , , xn ), denotes the total derivative
 

= , , .
x1 xn

When applies to a function f (x1 , , xn ) with continuous partial derivatives, f means the
total derivative  
f f
f = , ,
x1 xn
called the gradient (vector field) of f . f may be considered a function taking values in Rn (such
a function is called a vector-valued function, also called a vector field in Rn in this special case that
the number of functions in f is exactly the dimension n of Rn ).
On the other hand, if

u(x1 , , xn ) = (u1 (x1 , , xn ), , un (x1 , , xn ))

33
is a function of n variables defined on U Rn , taking values in Rn (such a vector valued function
u is called a vector field on U ) then we may make dot product between and u to obtain a real
valued function by
 

u1 , , un

u = , ,
x1 xn
u 1 un
= + +
x1 xn
which is called the divergence of the vector field u.
We have seen that, if f (x1 , , xn ) is a scalar function on U Rn , then its gradient f is a
vector field, so we may apply dot product between and f , to obtain
   
f f
f = , , , ,
x1 xn x1 xn
2
f 2
f
= + + 2
x21 xn

which is called the Laplacian of f , denoted by f . Thus, we introduce the differential operator of
second order
2 2
= + +
x21 x2n
called the Laplace operator in Rn . We extend the operation of to vector valued functions as the
following. Suppose

f (x1 , , xn ) = f 1 (x1 , , xn ), , f m (x1 , , xn )




is a vector valued function (where m N) defined on U Rn , then we define

f = f 1 , , f m .


Curl operator In 3-dimensional Euclidean space R3 , besides the dot product, there is
another multiplication between two vectors called cross products. Recall that, under the Cartesian
coordinate system (x, y, z), if a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) then the cross product a b is
defined by

i j k
a2 a3 a1 a3 a1 a2
a b = a1 a2 a3 =
i j+
k
b1 b2 b3 b2 b3 b1 b3 b1 b2

= (a2 b3 a3 b2 , a3 b1 a1 b3 , a1 b2 a2 b1 ) ,

where i, j, k are the standard basis in R3 : i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1). a b is the
unique vector which is perpendicular to both a and b obeying the right hand rule, with magnitude
|a b| = |a||b| sin (a, b), where 0 (a, b) is the angle between
 a and b.

We apply this definition by replacing a with = x , y , z and a vector field u = (u1 , u2 , u3 )
where u1 , u2 , u3 are functions on U R3 with continuous partial derivatives, and define the curl

34
of the vector field u by

i j k





u = x y z = 2
y z i x z j + x y k
u u3 u1 u3 u1 u2
u1 u2 u3
 3
u2 u1 u3 u2 u1

u
= , , .
y z z x x y
u is again a vector field on U R3 , also called the vorticity of u.

Example 6.12 Let f (x, y, z) = x2 + y 2 + z 2 . Compute


f = 2 (x, y, z)
and
f = 2 (1 + 1 + 1) = 6
a constant function.

6.5 Change of coordinates and Jacobians


Sometimes it is useful to choose a special coordinate system which suites better to a specific
problem. Suppose (x, y) (respectively (x, y, z) in R3 ) the Cartesian coordinate system in R2 (resp.
in R3 ). Consider another coordinate system (u, v) which are given by equations u = u(x, y) and
v = v(x, y), and equivalently x = x(u, v) and y = y(u, v). We call the mapping (x, y) (u, v)
a transformation of coordinates, or change of variables. We only consider those transformations
which have continuous partial derivatives. According to chain rule, if f (x, y) is a function with
continuous partial derivatives, then
     x x 
f f f f u v
, = , y y
u v x y u v

The determinate of the first order total derivative (the Jacobian matrix)
 x x 
x y x y
det u y
v
y =
u v u v v u
(x,y)
is called the Jacobian of the transformation, which will be denoted by (u,v) , i.e.
x x
(x, y) u v

= y y
(u, v) u v

which is the density of area elements in a new coordinate system (u, v) in the following sense.
Suppose the transformation u = u(x, y) and v = v(x, y) which send a domain U in xy-plane one to
one and onto a domain D in uv-plane, then

(x, y)
Z Z
f (x, y)dxdy = f (x(u, v), y(u, v))
dudv.
U D (u, v)
That is to say, under the transformation (x, y) (u, v), the area element dxdy in the xy-plane is
(x,y)
equivalent to (u,v) dudv, where dudv is the area element in uv-plane.

35
Example 6.13 (Parabolic coordinate system) The coordinates (u, v) given by the following rela-
tions
1
x = (u2 v 2 ), y = uv
2
are called the parabolic coordinates in the planer. The Jacobian matrix and Jacobian are given by
 x x   
u v (x, y)
u
y
v
y = , = u2 + v 2 .
u v
v u (u, v)

The transformation (x, y) (u, v) is conformal in the sense that

(dx)2 + (dy)2 = (udu vdv)2 + (vdu + udv)2


 
= (u2 + v 2 ) (du)2 + (dv)2 .

and
2 2 2 2
 
+ = (u2 + v 2 ) + .
x2 y 2 u2 v 2

6.5.1 Polar coordinate system


If P = (x, y) R2 and (x, y) 6= (0, 0), then we can determinate the position of (x, y) by its distance

r to (0, 0) and the angle from x-axis to OP , so that x = r cos and y = r sin . Then the Jacobian
matrix  x x   
r cos r sin
y y =
r
sin r cos
(x,y)
so that its Jacobian = r.
(r,)
On the other hand, r = x2 + y 2 and tan = xy . The Jacobian matrix of this transformation
p

is given by
x y
! !
r r
x y x 2 +y 2 x 2 +y 2
= y x
x y x2 +y 2 x2 +y 2
(r,) 1
so that its Jacobian (x,y) = = 1r . Hence
x2 +y 2

(x, y) (r, )
= 1.
(r, ) (x, y)

If f (x, y) is a function with continuous partial derivatives, and F (r, ) = f (r cos , r sin ), then
(
F
r = cos f f
x + sin y ,
(6.14)
F
= r sin f f
x + r cos y .

36
f f f f
It is also useful to express x and y in terms of r and , which can be achieved by solving
f f
x , y from the above linear system:
    1
f f f f cos r sin
, = ,
x y r sin r cos
  
1 f f r cos r sin
= ,
r r sin cos
  
1 f 1 f r cos r sin
= ,
r r r sin cos
 
f 1 f f 1 f
= cos sin , sin + cos
r r r r

that is (
f
x = cos f sin f
r r ,
f (6.15)
y = sin f cos f
r + r .

It follows directly from (6.15) that


 2  2  2  2
f f f 1 f
+ = + 2 . (6.16)
x y r r

If f (x, y) is a function with continuous partial derivatives up to order 2, then

2f 2f
f = + .
x2 y 2
We wish to work out the Laplace operator in the polar coordinate system. To this end, we continue
to compute the second order partial derivatives. In fact,

2f
 
f f
= cos + sin
r2 r x y
f f
= cos + sin
r x r y
2f 2f
 
= cos cos 2 + sin
x yx
2 2f
 
f
+ sin cos + sin 2
xy y
2
f 2
f 2f
= cos2 2 + 2 sin cos + sin2 2
x xy y

37
and
2f
 
f f
= r sin + r cos
2 x y
f f
= r cos r sin
x y
f f
r sin + r cos
x y
f f
= r cos r sin
x y
2 2f
 
f
r sin r sin 2 + r cos
x xy
2 2f
 
f
+r cos r sin + r cos 2
yx y
2
f 2
f 2f
= r2 sin2 2 2r2 sin cos + r2 cos2 2
x xy y
f f
r cos r sin .
x y
Hence
2f 1 2f 2f 2f cos f sin f
+ = +
r2 r2 2 x2 y 2 r x r y
2
f 2
f 1 f
= 2
+ 2
x y r r
in other words
2f 2f 2f 1 2f 1 f
2
+ 2
= 2
+ 2 2
+ . (6.17)
x y r r r r
Therefore, under the polar coordinate system (r, ) the Laplace operator

2 1 2 1
= 2
+ 2 2
+ . (6.18)
r r r r

6.5.2 Cylindrical coordinate system in R3


Let (x, y, z) be the standard coordinate system. In the cylindrical polar coordinates we keep the
z-coordinate and use the polar coordinates for (x, y), that is

x = r cos , y = r sin , z = z

where 0 < 2. The Jacobian matrix is given by



cos r sin 0
sin r cos 0
0 0 1

38
(x,y,z)
and the Jacobian (r,,z) = r. The inverse transformation is given by
p y
r= x2 + y 2 , tan = , z = z.
x
The Laplace operator in the cylindrical coordinates is
2 1 2 1 2
= + + + . (6.19)
r2 r2 2 r r z 2

6.5.3 Spherical coordinates in R3


Let (x, y, z) be the Cartesian coordinates 3
p for a general point P R , P 6= (0, 0, 0). Let be
the distance between P and O: = x2 + y 2 + z 2 , and let be the angle from z-axis to the

position vector OP , so that z = cos where 0 . Change (x, y) to its polar coordinates
(r cos , r sin ), where r is the distance from O to the perpendicular projection of P to the xy-plane,
so that r = sin . In terms of the spherical coordinates (, , ) we have

x = sin cos ,
y = sin sin , (6.20)
z = cos .

where 0, 0 and 0 2. The Jacobian matrix of the transformation (6.20) can be


computed directly:
x x x
sin cos cos cos sin sin
y y y
= sin sin cos sin sin cos .
z z z cos sin 0

Hence the Jacobian is given by



sin cos cos cos sin sin
(x, y, z)
= sin sin cos sin sin cos
(, , ) cos sin 0

sin cos cos cos sin
2

= sin sin sin cos sin cos
cos sin 0
 
2
cos cos sin sin cos sin
= sin cos + sin
cos sin cos sin sin cos
 
cos sin cos sin
= 2 sin cos2 + sin2


sin cos sin cos
= 2 sin .

The inverse transformation can be worked out as the following


p

= x2+ y 2 + z 2 ,
x2 +y 2
tan = z ,
y

tan = .

x

39
Finally let us consider the Laplace operator in R3

2 2 2
= + +
x2 y 2 z 2

and we wish to write the Laplace operator in the spherical coordinate system. Suppose f (x, y, z)
has continuous derivatives up to second order. First, we use the cylindrical coordinates x = r cos ,
y = r sin , z = z. Then, according to (6.17)

2f 2f 2f 1 f 1 2f
+ = + + . (6.21)
x2 y 2 r2 r r r2 2

Next, we use the change of variables: z = cos and r = sin . Notice that (, ) are the
polar coordinates for (z, r), thus, according to (6.14)
(
f f f
= cos z + sin r ,
f f f (6.22)
= sin z + cos r ,

and, according to (6.17)


2f 2f 2f 1 f 1 2f
+ = + + . (6.23)
z 2 r2 2 2 2
Putting (6.21) and (6.23) together to obtain

2f 2f 1 f 1 2f
f = + + +
r2 z 2 r r r2 2
2
f 1 f 2
1 f 1 f 1 2f
= + + + + . (6.24)
2 2 2 r r r2 2
f
On the other hand, by solving r from (6.22) we have

f f cos f
= sin + (6.25)
r

and substituting it into (6.24) we finally obtain

2f 1 f 1 2f
f = + +
2 2 2
1 2f
 
1 f cos f
+ sin + + 2 2
sin r
2
f 2
1 f 1 2
f 2 f cot f
= 2
+ 2 2
+ 2 2 2
+ + 2 .
cos

That is, under the spherical coordinate system the Laplace operator in R3 can be written as

2 1 2 1 2 2 cot
= 2
+ 2 2
+ 2 + + 2 . (6.26)
cos 2
2

40
6.6 Some simple partial differential equations
An equation involving several variables, functions and their partial derivatives is called a partial
differential equation (abbreviated as PDE or PDEs for simplicity).
Example 6.14 z = x2 + y 2 is a solution to the following PDE
z z
x y =0
y x
such that when x = 0 then z = y 2 .
Example 6.15 z = y x2 is a solution to the following PDE
z z
x + (y + x2 ) =z
x y
which satisfies the condition that when x = 2 then z = y 4.
Example 6.16 (The heat equation) Consider the one-dimensional heat equation:
u(x, t) 2 2 u(x, t)
=
t 2 x2
where > 0 is a constant.
By an inspection, we can see that the Gaussian probability function
1 x2
u(x, t) = e 22 t for t > 0
2 2 t
is a positive solution to the heat equation for t > 0.
Let us search for solutions u(x, t) which are separable. To this end, make substitution u(x, t) =
g(x)h(t). Since
u(x, t) 2 u(x, t)
= g(x)h (t), and = g (x)h(t)
t x2
thus, the heat equation becomes
2
g(x)h (t) = g (x)h(t).
2
Separate the variables to obtain
h (t) 2 g (x)
= .
h(t) 2 g(x)
h (t) h (t)
Since h(t) only depends on t, while the equation implies that it only depends on x, therefore h(t)

must be a constant function. Similarly, gg(x) (x)
is a constant independent of x or t. Therefore we
must have
h (t) 2 g (x)
= =
h(t) 2 g(x)
where is a constant. The heat equation is thus transformed to a system of second order linear
ODEs
2
h (t) = h(t), g (x) = 2 g(x).

t
The first ODE has a general solution h(t) = C1 e and the second ODE has a general solution

41
1. If > 0, then q q
2 2
x x
g(x) = C2 e 2 + C3 e 2 .

2. If < 0, then r ! r !
2 2
g(x) = C2 cos 2x + C3 sin 2x .

3. If = 0, then
g(x) = C2 + C3 x.

7 Gradient vectors, normal vectors to surfaces


In this part we consider curves and surfaces in R3 . For simplicity, let us declare that all functions
we will encounter in this part are defined on open subsets (unless otherwise specified), and have
continuous partial derivatives.
The graph of a function y = f (x) defined on (a, b) is a curve in the plane R2 . The derivative
f (x0 ) measures the slope of the line tangent to the graph at (x0 , f (x0 )): tan = f (x0 ) where
is the angle from the x-axis to the tangent line at (x0 , f (x0 )). The equation for the tangent line at
(x0 , f (x0 )) is a linear equation
y f (x0 ) = f (x0 )(x x0 ).
The graph of y = f (x) has a natural parameterization: we may write the coordinates (x, y) on
the graph as the following
x = t, y = f (t) (7.1)
and consider t (a, b) as a parameter. The mapping t (t, f (t)) is called a parameterized curve in
the plane. Similarly, the tangent line at (x0 , f (x0 )) has a natural parameterization, namely given
as
x = t, y = f (x0 ) + f (x0 )(t x0 ),
where (x, y) is a general point lying on the tangent line. In terms of vector notations, it can be
written as
(x x0 , y f (x0 )) = (1, f (x0 ))(t x0 )
i.e. (x x0 , y f (x0 )) is parallel to the vector (1, f (x0 )) which is called the tangent vector of the
parameterized curve. Note that the first coordinate 1 appears as dx dt = 1, so the tangent vector at
(x(t), y(t)) can be written as (x (t), y (t)), where x(t) = t and y(t) = f (t) for the graph of y = f (x).
Generalized this notion to give the concept of parameterized curves in the plane R2 . That is, a
parameterized curve in the plane R2 is a mapping t (x(t), y(t)), i.e.

x = x(t), y = y(t),

where t (a, b) (some interval) is a parameter. Since dx dy
= xy (t)
(t) , so the tangent line to the curve
at a point (x(t0 ), y(t0 )) has a parameterization

x = x(t0 ) + x (t0 )(t t0 );




y = y(t0 ) + y (t0 )(t t0 )

42
which represents the line passing through (x(t0 ), y(t0 )) with slope (x (t0 ), y (t0 )).
A curve in R2 can also be described implicitly by an equation such as F (x, y) = 0. You should
be familiar with the standard quadratic curves such as circles, ellipses, parabolas and hyperbolas
(for a revision you may refer to Richard Earls notes).

Example 7.1 Consider an ellipse defined implicitly by the equation

x2 y 2
+ 2 = 1,
a2 b
which has a parameterization defined by

x = a cos t, y = b sin t

where 0 t < 2. A tangent vector at (a cos t, b sin t) is thus given as (a sin t, b cos t).

A parameterized curve in the space R3 may be described by a vector valued function of one
variable t, i.e. a mapping t (t) where

(t) = (x(t), y(t), z(t)), t (a, b). (7.2)

The tangent vector to the curve at (t0 ) is the vector (x (t0 ), y (t0 ), z (t0 )) and the line tangent to
the curve at (x(t0 ), y(t0 ), z(t0 )) is given by the equation

r = (t0 ) + (t0 )(t t0 ).

that is
x = x(t0 ) + x (t0 )(t t0 ),
y = y(t0 ) + y (t0 )(t t0 ),
z = z(t0 ) + z (t0 )(t t0 ).

7.1 Normal vectors, tangent planes


Let us consider smooth surfaces in R3 . As for the case of curves, the graph of a function z = f (x, y)
on a domain U is considered as a parameterized surface

(x, y) (x, y, f (x, y)), (x, y) U . (7.3)

By relabel the variables, the graph of z = f (x, y) is a parameterized surface defined by the mapping
(u, v) (u, v, f (u, v)) where (u, v) as two parameters. In general, a mapping

(u, v) (x(u, v), y(u, v), z(u, v)) (7.4)

where (u, v) runs through an open subset U R2 is called a parameterized surface in the space
R3 . The mapping or the parameterized surface is often written as

x = x(u, v), y = y(u, v), z = z(u, v).

When (u, v) runs through a subset U , then (x(u, v), y(u, v), z(u, v)) draws out a surface in the space
the image of U under the mapping (7.4).

43
A surface S may be described by an equation

F (x, y, z) = 0, (7.5)

where, in order to avoid technical difficulty, we assume that F 6= 0. For example, a sphere:
x2 + y 2 + z 2 = R2 which has a parameterized representation in terms of spherical coordinates (, )
[Notice that the equation of the sphere in spherical coordinates takes a simple form: = R]

x = R sin cos , y = R sin sin , z = R cos ,

where [0, ] and [0, 2) are two parameters.


The graph of a function z = f (x, y) is a surface which is a parameterized surface, but also can
be described by the equation
z f (x, y) = 0.
Let us now define the concept of the tangent plane at a point on the surface. Let P =
(x0 , y0 , z0 ) S the surface defined by the equation (7.5) and (t) = (x(t), y(t), z(t)) be any
parameterized curve on the surface S passing though the point P , say (0) = (x0 , y0 , z0 ) = P .
Then
F (x(t), y(t), z(t)) = 0 t
so, by differentiating in t at t = 0, employing the chain rule, we obtain

Fx (x0 , y0 , z0 )x (0) + Fy (x0 , y0 , z0 )y (0) + Fz (x0 , y0 , z0 )z (0) = 0. (7.6)

Recall that F = (Fx , Fy , Fz ) is the gradient vector field of F , so we may rewrite (7.10) as

F (x0 , y0 , z0 ) (0) = 0 (7.7)

which says the tangent vector (0) is perpendicular to the gradient of F . Since can be any curve
on the surface S, thus (0) can be any vector tangent to the surface S at P , so (7.7) means that
any vector tangent to the surface S at P is perpendicular to the gradient vector F (x0 , y0 , z0 ),
and therefore all tangent vectors to S at the point P lies on the plane passing through P and
perpendicular to F (x0 , y0 , z0 ), which is called the tangent plane to S at P . We therefore call
F (x0 , y0 , z0 ) a normal vector to the surface S at P .
Suppose that (x, y, z) belongs to the tangent plane at P , so that (x x0 , y y0 , z z0 ) lies on
the tangent plane, so it must be perpendicular to the normal vector F (x0 , y0 , z0 ), thus

Fx (x0 , y0 , z0 )(x x0 ) + Fy (x0 , y0 , z0 )(y y0 ) + Fz (x0 , y0 , z0 )(z z0 ) = 0 (7.8)

which is the equation for the tangent plane to S at P = (x0 , y0 , z0 ).


If the surface S is the graph of a function z = f (x, y), so that we may take F (x, y, z) = zf (x, y)
thus a normal vector at (x, y, f (x, y)) is the gradient vector of F which is (fx , fy , 1). Therefore
an equation for the tangent plane to the graph of z = f (x, y) at (x0 , y0 , f (x0 , y0 )) is given by

fx (x0 , y0 )(x x0 ) fy (x0 , y0 )(x x0 ) + z z0 = 0 (7.9)

which we have already seen before.

44
Consider on the other hand a parameterized surface S: which is described by a vector valued
function of two parameters (u, v):

x = x(u, v), y = y(u, v) and z = z(u, v) (7.10)

where (u, v) runs through an open subset D R2 . Let (u0 , v0 ) D and

P = (x0 , y0 , z0 ) = (x(u0 , v0 ), y(u0 , v0 ), z(u0 , v0 ))

is a point on the surface S. Consider the parameterized curve 1 defined by

1 (u) = (x(u, v0 ), y(u, v0 ), z(u, v0 ))

and the parameterized curve 2 defined by

2 (v) = (x(u0 , v), y(u0 , v), z(u0 , v))

where u (resp. v) is considered as a parameter. Then both curves 1 and 2 lie on the surface
and pass through P , and the tangent vectors 1 (u0 ) and 2 (v0 ) are two tangent vectors to the
parameterized surface S at P , thus, by definition 1 (u0 ) 2 (v0 ) (cross product of two vectors
1 (u0 ) 2 (v0 )) is a vector perpendicular to the both vectors 1 (u0 ) and 2 (v0 ) [Geometry I,
Prelims] and therefore 1 (u0 ) 2 (v0 ) is a normal vector to the surface S. On the other hand, by
definition of partial derivatives
   
x y z x y z
1 = = , , , 2 = = , ,
u u u u v v v v

where (u, v) = (x(u, v), y(u, v), z(u, v)). Thus u
v is a normal vector to the parameterized
surface S, which is given by according to the definition of the cross product

i j k
x y z

= u u u .
u v x y z
v v v

The tangent plane to S at (u0 , v0 ) has an equation

(u0 , v0 ) (u0 , v0 )
(r (u0 , v0 )) = 0 (7.11)
u v
where r = (x, y, z) is the position vector on for a general point in the tangent plane. In terms of
the Cartesian coordinates, (7.11) can be written, by working out the dot product, as

x x 0 y y0 z z 0
y
x z

=0 (7.12)
u u u
x y z
v v v

where the partial derivatives are evaluated at (u0 , v0 ), and (x0 , y0 , z0 ) = (u0 , v0 ).

45
Example 7.2 The sphere with radius R > 0 may be described implicitly by the equation

x2 + y 2 + z 2 = R 2 ,

so a normal vector to the tangent plane at (x0 , y0 , z0 ) is f (x0 , y0 , z0 ) = 2(x0 , y0 , z0 ) which has the
same direction as the coordinate vector, and the tangent plane has an equation

x0 (x x0 ) + y0 (y y0 ) + z0 (z z0 ) = 0.

Since the point (x0 , y0 , z0 ) lies on the sphere so that the equation can be simplified as

x 0 x + y0 y + z 0 z = R 2 .

The sphere may be parameterized via the spherical coordinates which are is given as the parameter-
ized surface
x = R sin cos , y = R sin sin , z = R cos
where 0 and 0 < 2, hence the tangent plane at a point (x0 , y0 , z0 ) has an equation

xx yy zz
0 0 0 x x0 y y0 z z0
x y z

= R cos cos R cos sin R sin


x
y z R sin sin R sin cos 0



x x0 y y0 z z0
= R2 cos cos cos sin sin = 0

sin sin sin cos 0

which may be simplified as the following

sin cos (x x0 ) + sin sin (y y0 ) + cos (z z0 ) = 0.

7.2 Directional derivatives


Suppose that F (x, y, z) has continuous partial derivatives, then its gradient vector by definition is
F = (Fx , Fy , Fz ). Suppose (t) = (x(t), y(t), z(t)) (where t (a, b)) is a parameterized curve with
continuous derivatives, then

f (t) = F (t) = F (x(t), y(t), z(t))

is differentiable, and, by chain rule,


df
= Fx x (t) + Fy y (t) + Fz z (t)
dt
= F ((t)) (t).

In particular, if v = (v1 , v2 , v3 ) is a no-zero vector, and take

(t) = (x0 + v1 t, y0 + v2 t, z0 + v3 t)

46
the line passing through (x0 , y0 , z0 ), then the derivative
d
F (0) = F (0)
dt
= F (x0 , y0 , z0 ) v
= v1 Fx + v2 Fy + v3 Fz

is called the directional derivative of F in v, denoted by Dv F , hence

Dv F = F v.

By definition
F (x0 + v1 t, y0 + v2 t, z0 + v3 t) F (x0 , y0 , z0 )
Dv F (x0 , y0 , z0 ) = lim .
t0 t
The previous discussion can be stated as the following

Proposition 7.3 Suppose that F (x, y, z) is a function on an open subset U R3 with continuous
partial derivatives, and (t) is a parameterized curve in U with tangent vector v at (0), i.e.
(0) = v, then
d
F (0) = Dv F ((0)) = F ((0)) v. (7.13)
dt

8 Taylors theorem
Suppose that f (x) is a function defined on [a, b] with derivatives of any order. For a given natural
number n we search for a polynomial in (x a) of degree n

pn (x) = a0 + a1 (x a) + + an (x a)n

so that f (x) agrees with pn (x) up to nth order derivatives at a, that is f (k) (a) = p(k) (a) for
1 (k)
k = 0, 1, , n. Since p(k) (a) = k!ak for k = 0, , n we obtain ak = k! f (a) and therefore

f (n) (a)
pn (x) = f (a) + f (a)(x a) + + (x a)n (8.1)
n!
which is called Taylors expansion (of order n) for f at the point a. We have the following theorem
which will be proved in Prelims Analysis II in Hilary term.

Theorem 8.1 (Taylors theorem for one variable function) Suppose f (x) has derivatives at a up
to nth order, then

f (n) (a)
f (x) = f (a) + f (a)(x a) + + (x a)n + o((x a)n )) (8.2)
n!
as x a [the right-hand side is called Taylors expansion of f at a with Peanos remainder]. That
is
f (x) pn (x)
lim = 0.
xa (x a)n

47
Taylors theorem says the Taylor expansion of nth order is a good approximation of f near a
up to (x a)n .
We can have better estimate for the difference f (x) pn (x) if f has derivatives on [a, b] up to
(n + 1)th order. Namely we have

Theorem 8.2 (Taylors Theorem) Suppose f (x) has derivatives on [a, b] up to (n + 1)th order,
then for any x (a, b] there is (a, x) such that

f (n) (a) f (n+1) ()


f (x) = f (a) + f (a)(x a) + + (x a)n + (x a)n+1 . (8.3)
n! (n + 1)!

In particular, if f has derivatives of any order, and if


Mn
(b a)n 0 as n
n!

where Mn = sup[a,b] |f (n) (x)|, then

f (n) (a)
f (x) = f (a) + f (a)(x a) + + (x a)n + x [a, b].
n!
For example, we can easily see that

x2 x4 x2n
cos x = 1 + + (1)n + x (, ).
2! 4! (2n)!

Let us now consider a function f (x, y) of two variables defined on an open subset U . Suppose
(x0 , y0 ) U . We search for a Taylor type expansion of f (x, y) near (x0 , y0 ). Let us assume that
f has continuous partial derivatives up to order n. Let (x, y) U close to (x0 , y0 ) so that the line
segment

(t) = (1 t)(x0 , y0 ) + t(x, y)


= (x0 , y0 ) + t(x x0 , y y0 )

(where t [0, 1]) between (x0 , y0 ) and (x, y) lies in U . Consider one variable function

g(t) = f (t) t [0, 1].

Then g has derivatives on [0, 1] up to order n, so we can apply Taylors Theorem at a = 0. To


simplify our computations below we introduce vector notation v = (x x0 , y y0 ). We want to
calculate g (k) (0) for k = 0,1, . Clearly g(0) = f (x0 , y0 ) and

g (0) = f (x0 , y0 ) v

as we have seen in the previous sections. In general

g (t) = f ((t)) (t) = f ((t)) v


= fx ((t))(x x0 ) + fy ((t))(y y0 )

48
so that, by differentiating in t again to obtain

g (t) = (x x0 )fx ((t)) v + (y y0 )fy ((t)) v


= (x x0 ) (fxx ((t))(x x0 ) + fxy ((t))(y y0 ))
+(y y0 ) (fyx ((t))(x x0 ) + fyy ((t))(y y0 ))
= fxx ((t))(x x0 )2 + fxy ((t))(x x0 )(y y0 )
+fxy ((t))(y y0 )(x x0 ) + fyy ((t))(y y0 )2 ,

and from which we can see the pattern for kth derivative, namely
X  k  k f ((t))
(k)
g (t) = (x x0 )i (y y0 )j
i xi y j
i+j=k
i,j0

and therefore
X k! k f (x0 , y0 )
g (k) (0) = (x x0 )i (y y0 )j . (8.4)
i!j! xi y j
i+j=k
i,j0

According to Taylors theorem for one variable function applying to g at a = 0:


n   k
X 1 X k f (x0 , y0 )
f (x, y) = f (x0 , y0 ) + (x x0 )i (y y0 )j
k! j xi y j
k=1 i+j=k
i,j0
+o(|(x x0 , y y0 )|) (8.5)

as (x, y) (x0 , y0 ). If f has partial derivatives on U up to (n + 1)th order, and the segment
between (x0 , y0 ) and (x, y) lies in U , then there is (0, 1) (depending on n, (x0 , y0 ), (x, y) and
the function f ) such that
n X
X 1 k f (x0 , y0 )
f (x, y) = f (x0 , y0 ) + (x x0 )i (y y0 )j
i!j! xi y j
k=1 i+j=k
i,j0
X 1 n+1 f ()
+ (x x0 )i (y y0 )j (8.6)
i!j! xi y j
i+j=n+1
i,j0

where
= (x0 , y0 ) + (1 )(x, y).
The right-hand side of (8.6) is called the Taylor expansion of two variable function f (x, y) at (x0 , y0 ).
To memorize this formula, you should compare it with the Binomial expansion
X k!
(a + b)k = a i bj
i!j!
i+j=k
i,j0

49
which corresponds to the kth derivative term in the Taylor expansion. But notice that the com-
k!
bination numbers in binomial expansion are i!j! but in Taylors expansion, they turn out to be
1
i!j! .
It is particularly interesting for n = 1. For simplicity, suppose U = BR (x0 , y0 ) is an open
disk centered at (x0 , y0 ) with radius R > 0. Suppose all first and second partial derivatives are
continuous on U . Then for any (x, y) U there is U such that

f (x, y) = f (x0 , y0 ) + f (x0 , y0 ) (x x0 , y y0 )


1
+ fxx ()(x x0 )2 + 2fxy ()(x x0 )(y y0 ) + fyy ()(y y0 )2 .

(8.7)
2
The remainder in (8.7) appears as a quadratic form in x x0 and y y0 with coefficients the second
partial derivatives, which can be written in terms of matrix multiplication as
  
fxx fxy x x0
(x x0 , y y0 ) .
fxy fyy y y0

The square matrix  


fxx fxy
fxy fyy
is called the hessian matrix of f , denoted by D2 f .
Similarly we may write down the Taylor expansion for a several variable function. Suppose
f (x1 , , xk ) is defined on a ball Br (a) centered at a = (a1 , , ak ) with radius r > 0, with
continuous partial derivatives of any order. Then, for any x = (x1 , , xk ) and n = 1, 2, . . . , we
have

f (x) = f (a) + f (a) (x a) +


n f (a)
 
1 X n
+ (x1 a1 )i1 (xk ak )ik
n! i1 , , ik
i1 ++ik =n xi1 xik 1 k
0i1 , ,ik n
n+1
+o(|x a| )

as x a, where  
n n!
= .
i1 , , ik i1 ! ik !

9 Critical points
In this part we apply Taylors theorem to the study of multi-variable functions near critical points.
For simplicity, we concentrate on two variable functions, though the techniques we are going to
develop apply to several variable functions with necessary modifications.
First of all we introduce the notions of local extrema. Let f (x, y) be a function defined on a
subset A R2 . Then a point (x0 , y0 ) A is a local maximum (resp. local minimum) of f , if there
is an open ball Br (x0 , y0 ) A for some r > 0 such that

f (x, y) f (x0 , y0 ) (x, y) Br (x0 , y0 ) (9.1)

50
(resp.
f (x, y) f (x0 , y0 ) (x, y) Br (x0 , y0 )). (9.2)
On the other hand, we say (x0 , y0 ) A is a (global) maximum (resp. (global) minimum) if
f (x, y) f (x0 , y0 ) (resp. f (x, y) f (x0 , y0 )) for every (x, y) A. We should note that a global
maximum (or a global minimum) for a function is not necessary a local one, for example consider
the function f (x, y) = x2 + y 2 defined on A = {(x, y) : x2 + y 2 1} the closed unit disk. Then
every point on the unit circle is a global maximum, but not local one.

Theorem 9.1 (Fermat) Suppose that f (x, y) defined on an open subset U has continuous partial
derivatives, and (x0 , y0 ) U is a local maximum (or a local minimum), then

f (x0 , y0 ) f (x0 , y0 )
= = 0. (9.3)
x y
That is the gradient vector f (x0 , y0 ) = 0.

Proof. Consider the local maximum case. There is > 0 such that B (x0 , y0 ) U and (9.1)
holds. For any unit vector v = (v1 , v2 ) and let (t) = (x0 , y0 ) + tv. Consider one variable function
g(t) = f (t). Then g(t) g(0) for any t (, ) and g (0) exists by the chain rule. On the
other hand
g(t) g(0)
g (0) = lim 0
t0 t
t>0

and
g(t) g(0)
g (0) = lim 0
t0 t
t<0

so we must have g (0) = 0. While, g (0) is just the directional derivative of f in v = (v1 , v2 ) so that

f (x0 , y0 ) f (x0 , y0 )
Dv f (x0 , y0 ) = v1 + v2 =0
x y
for any unit vector (v1 , v2 ), which yields (9.3).
Any point (x0 , y0 ) such that f (x0 , y0 ) = 0 is called a critical (or stationary) point. Fermats
theorem says local extrema must be stationary points. Therefore we search for local extrema among
the stationary points. Taylors expansion allows us say more about whether a stationary point is a
local extreme point or not.
To this end, we have to look at the remainder term which appears in Taylors expansion, i.e.
the term
fxx ()(x x0 )2 + 2fxy ()(x x0 )(y y0 ) + fyy ()(y y0 )2 .
By considering the quadratic function a2 + 2c + b whose discriminate is 4(c2 ab), we have
the following

Lemma 9.2 1) If c2 ab < 0 and a > 0 (so b > 0 as well) then

a2 + 2c + b2 0

and equality holds if and only if = = 0.

51
2) If c2 ab < 0 and a < 0 (so b < 0 as well) then Av v 0

a2 + 2c + b2 0

and equality holds if and only if = = 0.

Together with Taylors expansion we are now in a position to derive further information about
stationary points.

Theorem 9.3 Suppose that f (x, y) defined on an open subset U has continuous derivatives up to
second order, and suppose (x0 , y0 ) U is a critical point: f (x0 , y0 ) = 0.
1) If
 2 2
f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )
< 0, >0 (9.4)
xy x2 y 2 x2
then (x0 , y0 ) is a local minimum.
2) If
 2 2
f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )
< 0, <0 (9.5)
xy x2 y 2 x2
then (x0 , y0 ) is a local maximum.

Proof. Since all partial derivatives up to second order are continuous, we can choose a small
> 0 so that the open disk B (x0 , y0 ) U and (9.4) (resp. (9.5)) hold not only at a = (x0 , y0 ) but
also at any point in B (a). For any x B (a), according to Taylors theorem, there is B (a)
(though depending on x) such that

f (x) = f (a) + f (a) (x a)


+fxx ()(x x0 )2 + 2fxy ()(x x0 )(y y0 ) + fyy ()(y y0 )2

Suppose (9.4) holds, so it holds on B (a) for small > 0, so that

fxx ()(x x0 )2 + 2fxy ()(x x0 )(y y0 ) + fyy ()(y y0 )2 0

which yields f (x) f (a) on B (a) so a is a local minimum.


A natural question is, of course, what can we say if
2
2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )

0.
xy x2 y 2

If 2
2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )

=0 (9.6)
xy x2 y 2
then, based only on the information about the first and second partial derivatives at (x0 , y0 ), we
can not know the sign of

fxx ()(x x0 )2 + 2fxy ()(x x0 )(y y0 ) + fyy ()(y y0 )2

52
appearing in the Taylor expansion, so we are in this case unable to tell whether (x0 , y0 ) is a local
extreme point or not.
On the other hand, if
2
2 f (x0 , y0 ) 2 f (x0 , y0 ) 2 f (x0 , y0 )

>0 (9.7)
xy x2 y 2

then, by continuity, the same inequality remains to hold on a small disk near (x0 , y0 ), and thus

fxx ()(x x0 )2 + 2fxy ()(x x0 )(y y0 ) + fyy ()(y y0 )2

is indefinite, i.e. it can take both positive and negative values, so in this case the stationary point
(x0 , y0 ) is not a local extreme point, such a critical point is called a saddle point.

Example 9.4 Consider f (x, y) = sin x + sin y sin(x + y). Find the maximum and minimum
values of f on the triangle enclosed by the x-axis, y-axis and the line x + y = 2.

The triangle is bounded and closed, and f is continuous, so f achieves its maximum and
minimum values. The global extrema must lies on the boundary of the triangle, i.e. x = 0,
0 y 2; y = 0, 0 x 2; x + y = 2, 0 x, y 2, or lies in the interior of the triangle. In
this case, a global extreme point must be a local one, hence must be critical points of f . Hence we
first locate the possible critical points inside the triangle by solving the following system
f
= cos x cos(x + y) = 0,
x
f
= cos y cos(x + y) = 0,
y

to obtain only one critical point ( 2 2 2 2
3 , 3 ) and f ( 3 , 3 ) = 3/2. On the other hand on the
boundary f (x, y) = 0 so ( 2 2
3 , 3 ) is the global maximum.
Since
2f 2f
= sin x + sin(x + y), = sin y + sin(x + y),
x2 y 2
2f
= sin(x + y),
xy

at ( 2
3 ,
2
3 ), the discriminate

4 2
 
4 2
D = sin2 ( ) sin + sin
3 3 3
3
= 3<0
4
and
2 f 2 2 2 4
( , ) = sin + sin = 3<0
x2 3 3 3 3
so that ( 2
3 ,
2
3 ) is a local maximum,

53
There is a generalization to several variable functions. To this end we have to borrow a notion
about symmetric matrices from the linear algebra. We say an n n symmetric matrix A = (aij )
(where aij = aji for any pair (i, j)) is positive definite (resp. negative definite) if
n
X
Av v = aij vi vj 0 (resp. 0) v = (v1 , , vn ) Rn , (9.8)
i,j=1

and equality holds if and only if v = 0.


For a function f (x1 , , xn ) of n variables with continuous partial derivatives up to second
2f
order, then the hessian matrix D2 f is an n n matrix with entry xi x j
at the ith row and jth
column, i.e. 2f 2f
x1 x

x21 n

D2 f =
.. .. ..
(9.9)
. . .


2
f 2
f
xn x1 x2 n

which a symmetric matrix-valued function.

Theorem 9.5 Suppose f (x) is a function with n variables x = (x1 , , xn ) defined on an open
subset U Rn which has continuous partial derivatives up to second order. Let a = (a1 , , an )
be a critical point: f (a) = 0.
1) If the hessian matrix D2 f (a) is positive definite, then a is a local minimum of f .
2) If the hessian matrix D2 f (a) is negative definite, then a is a local maximum of f .

The proof follows from a discussion via Taylors expansion at the critical point a.

10 Lagranges multipliers
In this part we develop a method of locating relative local extrema. Let us first consider the question
with three variables, and consider the following problem. Let f (x, y, z) be a function defined on a
subset U R3 . We wish to locate the local extrema of f (x, y, z) subject to the following constraint

F (x, y, z) = 0. (10.1)

We say (x0 , y0 , z0 ) U is a (relative) local minimum subject to (10.1) if F (x0 , y0 , z0 ) = 0 and there
is a small ball B centered at (x0 , y0 , z0 ) with radius > 0 such that f (x, y, z) f (x0 , y0 , z0 ) for
every (x, y, z) B which satisfies (10.1).

Theorem 10.1 Let f (x, y, z) and F (x, y, z) be two functions on an open subset U R3 . Suppose
that both functions f and F have continuous partial derivatives, and the gradient vector field F 6= 0
on U . Let (x0 , y0 , z0 ) U be a local maximum or local minimum of f (x, y, z) subject to the constraint
(10.1). Then there is a real number such that f (x0 , y0 , z0 ) = F (x0 , y0 , z0 ).

Proof. Since F 6= 0 on U , the equation (10.1) defines a surface

S = {(x, y, z) U : F (x, y, z) = 0}.

54
By assumptions, (x0 , y0 , z0 ) S is a local maximum or minimum of the restriction of the function f
over S. Given any differential curve (t) = (x(t), y(t), z(t)) lying on the surface S, passing through
(x0 , y0 , z0 ), i.e.
F (t) = 0 t (, ), (0) = (x0 , y0 , z0 ),
and consider h(t) = f (t). Then by the definition of relative local extrema, 0 is a local maximum
or minimum of the function h(t). Therefore, by Fermats theorem, h (0) = 0. On the other hand,
according to the chain rule
h (0) = f ((0)) (0) = 0,
which means that f (x0 , y0 , z0 ) is perpendicular to (0). Since (t) is any curve lying on the
surface S passing through (x0 , y0 , z0 ), so that (0) can be any tangent vector to S at (x0 , y0 , z0 ).
Therefore f (x0 , y0 , z0 ) must be perpendicular to the tangent plane of S at (x0 , y0 , z0 ). It follows
that f (x0 , y0 , z0 ) either equals 0 or f (x0 , y0 , z0 ) 6= 0 is normal to S at (x0 , y0 , z0 ). On the other
hand a normal vector to S at (x0 , y0 , z0 ) is F (x0 , y0 , z0 ), therefore f (x0 , y0 , z0 ) and F (x0 , y0 , z0 )
are parallel. Since F (x0 , y0 , z0 ) 6= 0, so there is such that f (x0 , y0 , z0 ) = F (x0 , y0 , z0 ).
As a by-product, we have proved that if (x0 , y0 , z0 ) S is a relative local maximum or minimum
of f (x, y, z) along S (i.e. satisfying the constraint (8.5)), then f (x0 , y0 , z0 ) is perpendicular to
the level surface S : F (x, y, z) = 0.
According to the previous theorem, in order to find the constrained extrema of f we should
look among those (x, y, z) U and real number which satisfy the following system

f (x, y, z) = F (x, y, z),
(10.2)
F (x, y, z) = 0.
[We often assume that F (x, y, z) 6= 0]. Of course we are interested in those (x, y, z) U such
that there is a real number which solve the system (10.2). In practice, we need to solve (x, y, z),
but there is no need to know the explicit value . The constant introduced here to help us to
locate the relative extrema is called a Lagrange multiplier.
Introduce a function G(x, y, z, ) = f (x, y, z) F (x, y, z). Then the system (10.2) may be
written as
G G G G
= = = =0
x y z
which means a solution (x, y, z, ) to (10.2) is just a critical point of G(x, y, z, ).
Example 10.2 Maximize f (x, y, z) = x + y subject to the constraint x2 + y 2 + z 2 = 1.
To use the method of Lagrange multipliers, set
G(x, y, z, ) = x + y x2 + y 2 + z 2 1


and look for the critical points of G by solving the system


G
= 1 2x = 0,
x
G
= 1 2y = 0,
y
G
= 2z = 0,
z
G
= x2 + y 2 + z 2 1 = 0.

55
The first equation implies that 6= 0, so from the first and second equations, we obtain z = 0,
1
x = y = 2 , substituting them to the constraint to obtain
 2  2
1 1
+ + 02 = 1
2 2
q q q q q
1 1 1 1 1 1
so that 2 = 2. Thus there are two possible relative extrema ( 2, 2 , 0) and ( 2, 2 , 0).
Since the sphere S : x2 + y 2 + z 2 = 1 is compact (bounded and closed), and the function f (x, y, z) =
x + y is continuous, so it must achieve it maximum and minimum values [We will prove this kind
of statements in Prelims Analysis II]. Therefore the maximum of f subject to the constraint is


r r
1 1
f( , , 0) = 2
2 2
while

r r
1 1
f ( , , 0) = 2
2 2
is the constrained minimum values of f over the unit sphere.
To conclude our discussion, let us describe the general form of the Lagrange multipliers. Suppose
that f (x1 , , xn ) and F1 (x1 , , xn ), , Fk (x1 , , xn ) are functions with n variables defined
on an open subset U Rn , where n, k N. Suppose that f , F1 , , Fk have continuous partial
derivatives. Then the local extrema of f (x1 , , xn ) subject to the following constraints:

F1 (x1 , , xn ) = 0,

Fk (x1 , , xn ) = 0

are solutions to the following system


G G G G
= = = = = (10.3)
x1 xn 1 k
where

G(x1 , , xn , 1 , , k ) = f (x1 , , xn ) 1 F1 (x1 , , xn )


k Fk (x1 , , xn ),

the additional constants 1 , , k are called the Lagrange multipliers.

Example 10.3 Find the extreme points of f (x, y, z) = x+y+z subject to the conditions x2 +y 2 = 2
and y 2 + z 2 = 2.

Construct function

G(x, y, z, 1 , 2 ) = x + y + z 1 (x2 + y 2 2) 2 (y 2 + z 2 2).

56
We want to solve, in order to locate extreme points, the following system
G
= 1 21 x = 0,
x
G
= 1 2(1 + 2 )y = 0,
y
G
= 1 22 z = 0,
y

together with the constraints x2 + y 2 = 2 and y 2 + z 2 = 2. 1 , 1 , 1 + 2 6= 0 and x = 21 1 , z = 21 2


and y = 2(11+2 ) . From the constraints we deduce that x = z which implies that 1 = 2 as
1 + 2 6= 0. Hence x = z = 21 1 and y = 41 1 . Using again the constraints to obtain that

1 2 1 2
   
+ =2
21 41
q
1
which leads to the solutions 1 = 4 25 . Thus possible constrained extreme points are
r r r ! r r r !
2 2 2 2 2 2
2 , ,2 and 2 , , 2
5 5 5 5 5 5

at which the function f achieves the relative maximum and minimum values respectively.

57

You might also like