You are on page 1of 81

Mathematics Review

Compiled and Edited by Romann M. Weber*

Precalculus

Analytic Geometry

Parabolas, each defined as the locus of points equidistant from a point F (the focus) and

1 2 1 2
a line D (the directrix), have equations of the form y = ± x or x = ± y , where
4p 4p

the focus is located, respectively, at the point (0, ± p ) or (± p, 0) (with the sign matching

that of the equation above) and the directrix has the equation y = ∓ p or x = ∓ p ,

respectively. More generally, a parabola centered at ( h, k ) has the equation

1 1
( x − h ) or x − h = ± ( y − k ) .
2 2
y−k = ±
4p 4p

A circle of radius r centered at ( h, k ) has the equation ( x − h ) + ( y − k ) = r 2 . It


2 2

can be parameterized by x = r cos t , y = r sin t .

An ellipse is defined as the locus of points such that the sum of the distances from

each point on the graph to two given fixed points (the foci) is a given constant. Ellipses

( x − h) ( y −k)
2 2

have a major and a minor axis and follow the equation + = 1 , where the
a2 b2

axes have length a and b on either side of the center, ( h, k ) , terminating at the vertices.

In the case of a > b , the major axis is parallel to the x-axis, and the foci are located at the

*
The overall plan of coverage in this document was based on Cracking the GRE Math Subject Test, 3rd
Edition by Steven A. Leduc, many of whose results have been incorporated here. Certain items from that
reference have been corrected in this document, and new results from other sources have been included.

1
points ( h ± c, k ) , where c = a 2 − b 2 . For the case a < b , the major axis is parallel to the

y-axis, and the foci are located at the points ( h, k ± c ) , where c = b 2 − a 2 . The

c
eccentricity, which measures the “flatness” of the ellipse, is 0 ≤ e = ≤ 1 . An ellipse can
a

be parameterized by x = a cos t , y = b sin t .

A hyperbola is defined as the locus of points in the plane such that the difference

between the distances from every point on the hyperbola to two fixed points (the foci) is a

given constant. Depending on its orientation, it has either the equation

( x − h) (y −k)
2 2

− = 1 (opening horizontally), with the foci located at ( h ± c, k ) , or


a2 b2

(y −k) ( x − h)
2 2

− = 1 (opening vertically), with the foci located at ( h, k ± c ) . In either


b2 a2

b
case, c = a 2 + b 2 , and asymptotes are located at y − k = ± ( x − h) .
a

Polynomials
n
The rational roots theorem states that for a polynomial pn ( x) = ∑ ai x i with each
i =0

s
ai ∈ , if there are any roots r ∈ , then they are of the form r = , where s, t ∈ and
t

s | a0 and t | an . Complex and radical roots always come in conjugate pairs.

n n
For any polynomial pn ( x) = ∑ ai x i = an ∏ ( x − rj ) , the sum and product of the
i =0 j =1

n
an −1 n
a0
∑ rj = − ∏ r = ( −1)
n
roots are given by and j .
j =1 an j =1 an

2
Logarithms
The basic identities of logarithms are the following:

⎛ n ⎞ n
ƒ log b ⎜ ∏ xi ⎟ = ∑ log b xi
⎝ i =1 ⎠ i =1

⎛x ⎞
ƒ log b ⎜ 1 ⎟ = log b x1 − log b x2
⎝ x2 ⎠

ƒ log b x a = a log b x

ƒ b logb x = x

ƒ ( logb a )( log a x ) = logb x

Trigonometry
The sine function is odd, meaning that sin ( −θ ) = − sin θ , and the cosine function is even,

meaning that cos ( −θ ) = cos θ . Any trigonometric function is equal to its co-function

operating on its complement. In other words:

⎛π ⎞ ⎛π ⎞ ⎛π ⎞
sin x = cos ⎜ − x ⎟ , tan x = cot ⎜ − x ⎟ , sec x = csc ⎜ − x ⎟ , and vice versa.
⎝2 ⎠ ⎝2 ⎠ ⎝2 ⎠

θ sin θ cos θ tan θ


0 0 1 0
π 6 1
2
1
2 3
1
3 3

π 4 1
2 2 1
2 2 1
π 3 1
2 3
1
2 3
π 2 1 0 ∞
2π 3 1
2 3 − 1
2 − 3
3π 4 1
2 2 − 12 2 -1
5π 6 1
2
1
2 3 − 13 3
π 0 -1 0

3
Other values can be computed using the following identities:

sin x
tan x =
cos x

cos x
cot x =
sin x

1
csc x =
sin x

1
sec x =
cos x

sin 2 x + cos 2 x = sec 2 x − tan 2 x = csc 2 x − cot 2 x = 1

sin ( x ± y ) = sin x cos y ± cos x sin y ⇒ sin 2 x = 2sin x cos x

cos ( x ± y ) = cos x cos y ∓ sin x sin y ⇒ cos 2 x = cos2 x − sin 2 x = 1 − 2sin 2 x = 2cos 2 x − 1

tan x ± tan y 2 tan x


tan ( x ± y ) = ⇒ tan 2 x =
1 ∓ tan x tan y 1 − tan 2 x

x 1 − cos x
sin =±
2 2

x 1 + cos x
cos =±
2 2

x sin x
tan =
2 1 + cos x

Hyperbolic Functions

e x − e− x
sinh x =
2

e x + e− x
cosh x =
2

4
Differential Calculus
Sequences

A sequence, formally defined as a function on the set of positive integers, is an infinite,

ordered list of terms. A sequence ( xn ) approaches a limit L if for every ε > 0 , there’s an

integer N such that xn − L < ε when n > N . This is equivalent to challenging someone

to think of a positive number, no matter how small, and coming up with a member of the

sequence whose distance from the limit is less than that number. If such an L exists, the

sequence is said to converge. If no such L exists, the sequence is said to diverge.

A sequence is monotonic if it is either strictly increasing or decreasing for every n

from some point on.

• Every convergent sequence is bounded, but the converse is not necessarily true

(consider xn = ( −1) , which is bounded above and below but diverges).


n

• If a sequence is monotonic and bounded, it is convergent.

• If lim an = A and k is a constant, lim kan = kA .


n →∞ n →∞

• If lim an = A and lim bn = B , then:


n →∞ n →∞

o ( an + bn ) → A + B

o ( an − bn ) → A − B

o ( anbn ) → AB

⎛ an ⎞ A
o ⎜ ⎟ → , assuming B ≠ 0
⎝ bn ⎠ B

5
⎛ 1 ⎞
• If k is a positive constant, then ⎜ k ⎟→0.
⎝n ⎠

⎛ 1 ⎞
• If k > 1 , then ⎜ n ⎟→0.
⎝k ⎠

• Assume lim an = lim cn = L . If ( bn ) is a sequence such that an ≤ bn ≤ cn for


n →∞ n →∞

every n > N , then lim bn = L . This is known as the sandwich theorem (or the
n →∞

squeeze theorem).

• If an = f (n) , then ( an ) → L if lim f ( x) = L .


x →∞

Functions

A function f :A→ B is injective (or one-to-one) if for any x, y ∈ A ,

f ( x) = f ( y ) ⇒ x = y . Equivalently, x ≠ y ⇒ f ( x ) ≠ f ( y ) . In other words, f maps

distinct objects to distinct objects. (Graphically, an injective function will be intersected

only once by a horizontal line.) A function f : A → B is surjective (or onto) if for any

b ∈ B there exists an a ∈ A such that f ( a ) = b . This means that the range is “spoken

for” by at least one element in the domain. A function that is both injective and

surjective is bijective.

Let f be a function defined on a subset of the real line. We seek to examine the

behavior of f near a point a, which may not be included in its domain. If ( xn ) is a

sequence converging to a, all of whose terms are in the domain of f, and the

sequence ( f ( xn ) ) → L as ( xn ) → a , we call L the limit of f as x approaches a. The limit

of a function exists only if it is the same whether approached from above or below. A

6
limit is also defined in the following way:

lim f ( x) = L ⇒ ∀ε > 0, ∃δ : 0 < x − a < δ ⇒ f ( x) − L < ε .


x →a

• If lim f ( x) = L1 and lim g ( x) = L2 , then:


x→a x→a

o lim [ f ( x) + g ( x)] = L1 + L2
x →a

o lim [ f ( x) − g ( x)] = L1 − L2
x →a

o lim [ f ( x) g ( x)] = L1 L2
x →a

⎡ f ( x) ⎤ L1
lim ⎢ = (assuming L2 ≠ 0 )
x→a g ( x) ⎥
o
⎣ ⎦ L2

• Assume that lim f ( x) = L and lim h( x) = L . If there is a positive number δ such


x→a x→a

that f ( x ) ≤ g ( x) ≤ h( x ) for all x satisfying 0 < x − a < δ , then lim g ( x) = L .


x→a

This is another version of the sandwich (or squeeze) theorem.

Continuity

A function f is continuous at a if lim f ( x) = f (a ) . A function is continuous on an


x→a

interval if it is continuous at every point in that interval.

• The Extreme Value Theorem: If f is a function continuous on a closed interval

[ a, b] , then f obtains an absolute minimum value, m, at some point c ∈ [ a, b] and

an absolute maximum value, M, at some point d ∈ [ a, b ] . That is, there exist

points c and d such that f (c ) ≤ f ( x ) ≤ f ( d ) for all x ∈ [ a, b ] .

7
• Bolzano’s Theorem: If f is a function continuous on the closed interval [ a, b]

such that f ( a ) and f (b) have opposite signs, then there’s a point c between a

and b such that f (c ) = 0 .

• The Intermediate Value Theorem: Let f be a function continuous on the closed

interval [ a, b] . Let m be the absolute minimum value and M be the absolute

maximum value of f on [ a, b] . For every number Y such that m ≤ Y ≤ M , there is

at least one value c ∈ [ a, b] such that f (c) = Y .

Derivatives

Rules of Differentiation

• The derivative of a sum is the sum of the derivatives: ( f + g )′ ( x ) = f ′( x) + g ′( x)

• The derivative of a constant times a function: ( kf )′ ( x ) = kf ′( x)

• The product rule: ( fg )′ ( x ) = f ( x ) g ′( x ) + f ′( x) g ( x)

⎛ f ⎞′ g ( x) f ′( x) − f ( x) g ′( x)
• The quotient rule: ⎜ ⎟ ( x ) =
[ g ( x)]
2
⎝g⎠

• The chain rule (for composite functions): ( f u )′ ( x ) = f ′ ( u ( x ) ) u′ ( x )

• The inverse-function rule: If f −1 is the inverse of f, and f has a nonzero

derivative at x0 , then f −1 has a derivative at y0 = f ( x0 ) equal to

( f )′ ( y ) = f ′ (1x )
−1
0
0

8
Common Derivatives

d (k ) = 0

d ( u k ) = ku k −1du

( )
d eu = eu du

d ( a u ) = ( ln a ) a u du

1
d ( ln u ) = du
u
1
d ( log a u ) = du
u ln a
d ( sin u ) = cos u du

d ( cos u ) = − sin u du

d ( tan u ) = sec2 u du

d ( cot u ) = − csc2 u du

d ( sec u ) = sec u tan u du

d ( csc u ) = − csc u cot u du

1
d ( arcsin u ) = du
1− u2
−1
d ( arccos u ) = du
1− u2
du
d ( arctan u ) =
1+ u2
n
⎡ n u′ ⎤
If f = k ∏ uiαi , then f ′ = f ⋅ ⎢ ∑ α i i ⎥ .
i =1 ⎣ i =1 ui ⎦

Implicit Differentiation
fx ∂f ∂x
For a function f ( x, y ) = c , we can define y′ = − =− when f x and f y exist and
fy ∂f ∂y

fy ≠ 0 .

9
Theorems Concerning Differentiable Functions

Rolle’s Theorem

Assume that f is a continuous function on the closed interval [ a, b] , with f ( a ) = f (b)

and f differentiable everywhere in ( a, b ) . Then there is at least one point c ∈ ( a, b ) such

that f ′(c) = 0 .

Mean Value Theorem (for Derivatives)

Assume that f is a continuous function on the closed interval [ a, b] and differentiable

everywhere in ( a, b ) . Then there is at least one point c ∈ ( a, b ) such that

(b − a ) f ′(c ) = f (b) − f ( a ) . This theorem states that there is at least one point between a

and b at which the slope of the tangent line is equal to the slope of the secant line through

( a, f ( a ) ) and ( b, f (b) ) .

Maxima and Minima

f ′(c ) f ′′(c ) f ( n ) (c ) * f (c )

0 >0 N/A Local minimum

0 <0 N/A Local maximum

0 0 f ( n ) (c) > 0 , with n even Local minimum

f ( n ) (c) < 0 , with n even Local maximum

0 0 Nonzero, with n odd f (c ) is not a local maximum or minimum

*Here n is the smallest integer such that the nth derivative is nonzero.

10
For a function defined on a closed interval, an absolute extremum will occur either at a

point where the first derivative vanishes, where the derivative is not defined, or at an

endpoint of the interval.

Integral Calculus

Common Integrals

∫ k du = ku + C
⎧⎪ k1+1 u k +1 + C (if k ≠ −1)

k
u du =⎨
⎪⎩ln u + C (if k = −1)

∫e
u
du = eu

∫a
u
du = 1
ln a au + C

∫ sin u du = − cos u + C
∫ cos u du = sin u + C

∫ sec
2
u du = tan u + C

∫ csc
2
u du = − cot u + C

∫ sec u tan u du = sec u + C


∫ csc u cot u du = − csc u + C
⌠ du
⎮ = arcsin u + C
⌡ 1− u2

⌠ du = arctan u + C

⌡ 1+ u2

11
Integration by Parts

∫ u dv = uv − ∫ v du
The object when applying this method is carefully selecting the portion of the integrand

that can be easily integrated on the right-hand side when deciding which part is u and

which part is dv.

Trigonometric Substitution

For integrands of a certain form containing square roots, it can be useful to make

substitutions that take advantage of trigonometric identities.

For an integrand containing . . . Make the substitution . . .

u du

a2 − u 2 a sin θ a cos θ dθ

a2 + u 2 a tan θ a sec 2 θ dθ

u 2 − a2 a sec θ a sec θ tan θ dθ

Partial Fractions

If an integrand is a rational function of the form P( x) Q( x) , with deg P < deg Q , we can

evaluate the integral by splitting the integrand into more manageable units, a process

called partial-fraction decomposition. The denominator is factored, and each factor

becomes a denominator of a new rational function on the right-hand side. Any

denominator term of the form ( ax + b ) will be represented on the right by n fractions in


n

ascending powers from 1 to n. The numerators on the right side are made up of single

place-holding coefficients (typically A, B, C, etc.), which are to be solved for. In the case

12
that an irreducible quadratic occurs in the factored polynomial, the place-holding

coefficient in the numerator over that term would be replaced by a polynomial of the

form Bx + C . Multiplying everything out will work when solving for the coefficients,

but this could be quite a time-consuming series of steps. Since the resulting

decomposition will have to be true for all values of x in the domain, it is useful to

multiply by one term at a time and make substitutions that get the terms to cancel out

(namely, values of x that make terms equal zero—roots of the denominator polynomial).

Theorems Concerning Integrable Functions

Differentiating Under the Integral Sign

d b( x)
dx ∫a ( x )
f (t ) dt = f (b( x))b′( x) − f (a ( x)) a′( x)

Mean Value Theorem (for Integrals)

If f is a function continuous on the interval [ a, b] , then there is at least one point

b
c ∈ [ a, b] such that ∫ f ( x) dx = f (c)(b − a) . Here we see that f (c ) is the average value
a

of the function.

Polar Coordinates

T
( x, y ) ⎯⎯ →(r ,θ )

T ( x, y ) = ( x 2 + y 2 , arctan ( ))
y
x
Polar representations are not unique, so this

transformation is not universally true.

( r , θ ) ⎯⎯
T
→( x , y )

T (r ,θ ) = (r cos θ , r sin θ )

13
β
Area in polar coordinates: A = ∫α 1
2 r 2 dθ .

Area under (or within) a curve described parametrically:

If ( x, y ) = ( x(t ), y (t ) ) and the curve is traced out clockwise, beginning and ending at the

b b
same point (i.e., x( a ) = x(b) and y ( a ) = y (b) ), then A = ∫ y (t ) x′(t ) dt = − ∫ x(t ) y′(t ) dt .
a a

The sign is reversed if the curve is traced counterclockwise.

Volumes of Solids of Revolution

If a function f is rotated around a line, a solid is generated. Clearly, as any point on the

line is rotated about an axis, a disk is traced out. The value of f from point to point

therefore defines the length of the radius of that disk at any point. We can calculate the

volume of the solid formed using the following formulas:

b b
For a function f revolved about the x-axis: V = ∫ dV = ∫ π [ f ( x)] dx .
2
a a

d d
For a function g revolved about the y-axis: V = ∫ dV = ∫ π [ g ( y )] dy .
2
c c

In the case of a solid being formed with a gap between the defining function and the axis

(in which each disk becomes a washer), that gap being defined by a second function, we

have: V = ∫ π
b

a
{[ f ( x)] − [ g ( x)] } dx .
2 2

Arc Length

For a function y = f ( x) , the length of any portion of the curve can be calculated as

follows, the decision being whether to integrate along the x-axis (where x runs from a to

( ) ( )
b 2 d 2
b) or the y-axis (where y runs from c to d): s = ∫ ds = ∫ 1+ dy
dx dx = ∫ 1+ dx
dy dy .
a c

14
Series

An arithmetic series is a sequence of numbers taking the form

( an ) = a1 , a1 + d , a1 + 2d ,…, a1 + (n − 1)d , with each term differing from the one before by
the addition of a constant d. The sum of a finite arithmetic series can be computed easily

n n
by the following formula: S n = ( a1 + an ) = ( 2a1 + ( n − 1) d ) , where it should be clear
2 2

that the average of the first and last terms of the sequence is being multiplied by the

number of members of the sequence.

A geometric series is a sequence of numbers taking the form

( an ) = a1 , a1r , a1r 2 , a1r 3 ,…, a1r n−1 , with each term differing from the one before by the

multiplication of a constant ratio r. The sum of a finite geometric series can be computed

1 − r n a1 − ran
by the following formula: S n = a1 = , where r ≠ 1 .
1− r 1− r

An infinite series converges if the sequence ( sn ) of its partial sums converges to

a finite limit S. An infinite geometric series converges if and only if r < 1 , in which case


1
∑r
n =0
n
=
1− r
. For an infinite series to converge, the terms must tend to zero as n tends to


1
infinity. (The converse is not true, as illustrated by the harmonic series ∑ n = ∞ .)
n =1

ƒ If ∑a n converges, then ∑ ka n converges to k ∑ an for every constant k. If ∑an

diverges, then so does ∑ ka n for every constant k ≠ 0 .

ƒ If ∑a n and ∑bn both converge, then ∑(a n + bn ) converges to ∑ a + ∑b


n n .

15
1
ƒ The so-called p-series ∑n p
converges for every p > 1 and diverges for every p ≤ 1 .

ƒ The comparison test. Assume that 0 ≤ an ≤ bn for all n > N . Then if ∑b n

converges, ∑a n converges; if ∑a n diverges, then ∑b n diverges.

ƒ The ratio test. Given a series ∑a n of nonnegative terms, form the limit

an +1
lim
n →∞ a
= L . If L < 1 , then ∑a n converges; if L > 1 , then ∑a n diverges. The test is
n

inconclusive if L = 1 .

ƒ The root test. Given a series ∑a n of nonnegative terms, form the limit lim n an = L .
n →∞

If L < 1 , then ∑a n converges; if L > 1 , then ∑a n diverges. The test is inconclusive

if L = 1 .

ƒ The integral test. If f ( x ) is a positive, monotonically decreasing function for x ≥ 1


such that f (n) = an for every positive integer n, then ∑a
n =1
n converges if and only if


∫1
f ( x) dx converges.

∑ ( −1)
n +1
ƒ The alternating series test. A series of the form an (with an ≥ 0 for all n)
n =1

will converge provided that the terms an decrease monotonically, with a limit of zero.

Note that this is only necessarily true for alternating series.

ƒ A series ∑a n converges absolutely if ∑a n converges. Every absolutely

convergent series is convergent. (If ∑a n converges but ∑a n does not, ∑an is

said to converge conditionally.)

16
A power series in x is an infinite series whose terms are an x n , where each an is a

constant. (Such a series could technically be considered an infinite-degree polynomial.)

The convergence tests mentioned above are useful for the analysis of power series, which

are only useful if they converge. By the ratio test, the series will converge absolutely if:

an +1 x n +1 a
lim n
= x ⋅ lim n +1 < 1 .
n →∞ an x n →∞ a
n

We set L = lim an +1 an . If L = 0, then the power series converges for all x; if L = ∞ ,


n →∞

then the power series converges only for x = 0 ; and if L is positive and finite, then the

1 1
power series converges absolutely for x < and diverges for x > . The set of all
L L

values of x for which the series converges is called the interval of convergence. Every

power series in x falls into one of three categories:

1. The series converges for all x ∈ ( − R, R ) and diverges for all x > R , where R is

1
called the radius of convergence and R = , where L is defined as above.
L

(Whether the series converges at the endpoints of the interval must be checked on

a case-by-case basis.)

2. The power series converges absolutely for all x; the interval of convergence is

( −∞, ∞ ) , and the radius of convergence is ∞ .


3. The series converges only for x = 0 , so the radius of convergence is 0.


Functions are frequently defined by power series. If f ( x) = ∑ an x n , the domain of f
n =0

must be a subset of the interval of convergence for the series.

17
Within the interval of convergence of the power series:

1. The function f is continuous, differentiable, and integrable.

2. The power series can be differentiated term by term:

∞ ∞ ∞
d
f ( x) = ∑ an x n ⇒ f ′( x) = ∑
dx
( n ) ∑ nan x n−1 .
a x n
=
n =0 n =0 n =1

3. The power series can be integrated term by term:

(∫ )
∞ ∞ ∞
an n +1
f ( x) = ∑ an x n ⇒ ∫ f (t ) dt = ∑ ant n dt = ∑
x x
x .
n=0 n + 1
0 0
n =0 n =0

If a function f can be represented by a power series, that series is the Taylor series:


f ( n ) (0) n
f ( x) = ∑ x .
n =0 n!

Function Partial Taylor Expansion Form Interval of Convergence

1 1 + x + x 2 + x3 + ∞
( −1,1)
1− x ∑x
n=0
n

1 1 − x + x 2 − x3 + ∞
( −1,1)
1+ x ∑ (−1)
n =0
n
xn

ln(1 + x ) x 2 x3 ∞
(−1) n+1 n ( −1,1]
x−
2 3
+ − ∑
n =1 n
x

ex x 2 x3 ∞
1 ( −∞, ∞ )
1+ x + + +
2! 3!
∑ n! x
n =0
n

sin x x3 x5 ∞
(−1)n 2 n +1 ( −∞, ∞ )
x− + −
3! 5!

n = 0 (2n + 1)!
x

cos x x2 x4 ∞
(−1) n 2 n ( −∞, ∞ )
1− + −
2! 4!

n = 0 (2n)!
x

18
A Taylor series can be truncated to form a Taylor polynomial for use in approximating

n
f ( k ) (0) k
the value of a function. A Taylor polynomial of degree n, Pn ( x) = ∑ x , will
k =0 k!

f ( n +1) (c) n +1
have an error, called the remainder, exactly equal to f ( x) − Pn ( x) = x for
(n + 1)!

some c between 0 and x.

The Taylor series mentioned above is a special case of a more general series in

f ( n ) (a)
powers of ( x − a ) , which has Taylor coefficients an = . The remainder of a
n!

Taylor polynomial built on such a series would be equal to

f ( n +1) (c)
f ( x) − Pn ( x) = ( x − a)n +1 for some c between a and x.
(n + 1)!

Multivariable Calculus and Vector Analysis

In three dimensions, it is often convenient to think in terms of vectors. Vectors are often

represented by an ordered triple or addition of vector components. Whereas an ordered


3
triple could be regarded as a point in , the vector it represents corresponds to the line

connecting this point to the origin. Component notation, which would represent the

vector connecting the origin to the point ( x, y , z ) as xˆi + yˆj + zkˆ , makes use of

orthogonal unit vectors. Vectors are added and subtracted component-wise.

The norm of a vector is its magnitude, or length. For a vector

A = xa ˆi + ya ˆj + za kˆ , A = A = xa2 + ya2 + za2 .

There are two ways to multiply vectors. One, the dot (or scalar) product, results

in a scalar. For two vectors, A = xa ˆi + ya ˆj + za kˆ and B = xb ˆi + yb ˆj + zbkˆ , the dot product

19
is equal to A ⋅ B = AB cos θ = xa xb + ya yb + za zb , where θ is the angle between the two

vectors. Since the dot product can be computed component-wise, it is not necessary to

know the angle between the vectors, and the dot product can be useful when the angle is

sought. For any two vectors perpendicular to each other, the dot product is zero. The dot

product is commutative: A ⋅ B = B ⋅ A .

The projection of B onto A, denoted projA B , which can be thought of as B’s

shadow cast on A by light shining perpendicular to A, is:

ˆ = A⋅B A .
projA B = ( projA B ) A
A⋅A

The cross (or vector) product is the other way to multiply vectors. The result is

a vector perpendicular to each of the vectors being multiplied, following the right-hand

rule. For two vectors A = xa ˆi + ya ˆj + za kˆ and B = xb ˆi + yb ˆj + zbkˆ the cross product is

equal to:

ˆi ˆj kˆ
A × B = xa ya za ,
xb yb zb

and A × B = AB sin θ . This magnitude is equal to the area of the parallelogram spanned

by A and B. The cross product is anticommutative: A × B = − ( B × A ) .

The triple scalar product, the absolute value of which is equal to the volume of a

parallelepiped formed by three vectors A, B, and C, is equal to:

xa ya za
[ A, B, C] = ( A × B ) ⋅ C = xb yb zb .
xc yc zc

20
Lines

2
Just as a slope and a point define a line in , a point and a vector can define a line in

3
. Specifically, if we have a point, P0 = ( x0 , y0 , z0 ) , on a line L and a vector,

v = ( v1 , v2 , v3 ) , parallel to the line, we can say that any point P = ( x, y, z ) is on the line if

and only if the vector connecting that point to P0 , P0 P = ( x − x0 , y − y0 , z − z0 ) , is parallel

to v, which is to say that ( x − x0 , y − y0 , z − z0 ) = tv = t ( v1 , v2 , v3 ) for some scalar t (note

above the order of the points when subtracting). This is equivalent to three parametric

equations:

⎧ x = x0 + tv1

L : ⎨ y = y0 + tv2 .
⎪ z = z + tv
⎩ 0 3

These can each be solved for t and equated to yield the symmetric equations of L:

x − xo y − y0 z − z0
= = .
v1 v2 v3

In the case that any component of v is equal to zero, the term containing that component

is eliminated from the equation.

Planes

Planes in 3
are uniquely defined by a point, P0 = ( x0 , y0 , z0 ) , and a vector, n, normal to

the plane. Since any line in the plane must be perpendicular to the normal vector, we

have a clue as to the form that the equation of a plane must take. Specifically, the point

21
P = ( x, y, z ) is on the plane if and only if P0 P ⋅ n = ( x − x0 )n1 + ( y − y0 )n2 + ( z − z0 )n3 = 0 ,

which can be rewritten as n1 x + n2 y + n3 z = d , where d = n1 x0 + n2 y0 + n3 z0 .

Cylinders

2 3
We can take a curve C ⊂ and extend it into as a cylinder (which appears to

maintain this name even if C is not a closed curve). We do this by connecting to each

point in C a line parallel to some given line v, referred to as the generator. In the case

that this line is perpendicular to the plane of the curve, this extension is automatic and
2
leaves the original equation unchanged. (Since the equation is in , it is missing the
3
variable, say z, that would make it an equation in . That missing variable can then
3
take on all values once the curve is considered in , and the curve is converted into a

right cylinder.)

In the case that the line v = ( v1 , v2 , v3 ) is not perpendicular to the plane of the

curve, we recall the parametric and symmetric equations of the line from above in order

to generate our equation for the cylinder. Without loss of generality, we consider a curve

in the x-y plane, y = f ( x) . We say that a point on this curve in 3


is ( x0 , y0 , 0 ) , where

we note that no relationship for z is described by the equation of the curve. Any line

from a point on that curve will have to obey the following relationship:

x − x0 = v1t
y − y0 = v2t .
z = v3t

22
We can solve these equations for t and equate them, getting:

x − x0 y − y0 z
= = .
v1 v2 v3

These equations can be solved for the “naught values,” one at a time, getting:

v2
x0 = x − z
v3
.
v
y0 = y − 1 z
v3

3
Recall that these are the points on the curve, so the equation of the cylinder in

becomes:

v2 ⎛ v ⎞
y− z = f ⎜x− 1 z⎟.
v3 ⎝ v3 ⎠

Note the pattern in the naught values above. Note also that y = f ( x) was an arbitrary

choice. We could just as easily have had our function be f ( x, y ) = c or considered a

curve in the y-z plane or the x-z plane.

Surfaces of Revolution

2 3
A curve in can be rotated about an axis to form a surface in . Each point on the

original curve will trace out a circle around the axis about which it’s rotating. The

distance to that axis can be found by the distance formula.

This is the basis for the following substitutions, which are used to obtain

equations for the surface:

23
Curve Revolved Replace by Obtaining
around
f ( x, y ) = 0 x-axis y ± y2 + z2 ( )
f x, ± y 2 + z 2 = 0

y-axis x ± x2 + z 2 ( )
f ± x2 + z 2 , y = 0

f ( x, z ) = 0 x-axis z ± y2 + z2 ( )
f x, ± y 2 + z 2 = 0

z-axis x ± x2 + y 2 ( )
f ± x2 + y2 , z = 0

f ( y, z ) = 0 y-axis z ± x2 + z 2 ( )
f y, ± x 2 + z 2 = 0

z-axis y ± x2 + y 2 ( )
f ± x2 + y2 , z = 0

Cylindrical and Spherical Coordinates

3
Cylindrical coordinates are an extension of polar coordinates to by adding the

familiar z coordinate. The Cartesian coordinates ( x, y , z ) are transformed to

( r ,θ , z ) = ( )
x 2 + y 2 , arctan xy , z , where we repeat the warning about the non-uniqueness

of the angle θ .

Spherical coordinates are closer in spirit to polar coordinates than even

cylindrical coordinates. They take the form ( ρ , φ ,θ ) , where ρ is the length of the radius

vector from the origin, φ is the angle the radius vector makes with the positive z-axis,

and θ is the angle between the positive x-axis and the projection (or shadow, as it may be

helpful to think) of the radius vector on the x-y plane, r.

24
The following relations hold:

ρ 2 = x2 + y 2 + z 2
x = r cos θ = ρ sin φ cos θ
y = r sin θ = ρ sin φ sin θ
z = ρ cos φ

Tangent Plane to a Surface

For a function z = f ( x, y ) describing a surface, the equation of the plane tangent to that

surface at a point P = ( x0 , y0 , z0 ) is:

∂z ∂z
z − z0 = ⋅ ( x − x0 ) + ⋅ ( y − y0 ) .
∂x P ∂y P

In the case that the surface is described implicitly by f ( x, y , z ) = c , we find the above

partial derivatives by implicit differentiation. (There is another option as well, which

we’ll discuss in the section on directional derivatives.) Equations for tangent planes can

be used for linear approximations, analogous to tangent lines.

Chain Rule for Partial Derivatives

To use the chain rule for derivatives of multivariate functions, care must be taken to

account for every “path” to the variable of interest from the others. For example,

consider the function z = F ( x, u , v ) , where u = f ( x, y ) and v = g ( x, y ) . Let’s say that

we want to calculate the partial derivative of z with respect to x. We notice that there are

three paths to x, one directly from F and one each from u and v. We then have:

⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎛ ∂u ⎞ ⎛ ∂z ⎞ ⎛ ∂v ⎞
⎜ ⎟ = ⎜ ⎟ +⎜ ⎟ ⎜ ⎟ +⎜ ⎟ ⎜ ⎟ .
⎝ ∂x ⎠ y ⎝ ∂x ⎠u ,v ⎝ ∂u ⎠ x ,v ⎝ ∂x ⎠ y ⎝ ∂v ⎠ x ,u ⎝ ∂x ⎠ y

25
The subscripts are a bit of bookkeeping, reminding us which variables are treated as

constants in the partial differentiation steps.

Directional Derivatives and the Gradient

The main differential operator on 3


is del, denoted ∇ , which is equal to:

∂ ˆ ∂ ˆ ∂ ˆ
∇= i + j+ k .
∂x ∂y ∂z

A scalar function f can be converted into a vector called the gradient, and the del

operator can also operate on vectors through the divergence and curl, all of which will

be discussed later.

Consider a surface z = f ( x, y ) and a point P = ( x0 , y0 ) in the domain of f. If we

want to find the rate of change of the surface f, we must first give thought to the direction

we’re heading. (Think about standing on a rocky hill. Its rate of change is not uniform; it

will be different depending on whether we head forward or back, left or right.) The

directional derivative takes into account both the point we’re interested in and the

direction we’re heading, which we indicate with the unit vector û :

Du f P
= ∇f P
⋅ uˆ .

Examining this equation and noting the employment of the dot product, we know that

Du f P
= ∇f P
uˆ cos θ , where θ is the angle between the unit vector and the gradient.

Clearly, the norm of any unit vector is one, so this reduces to Du f P


= ∇f P
cos θ ,

which tells us that the directional derivative attains its maximum value when cos θ = 1 ,

which is when θ = 0 . This tells us that the gradient, ∇f , points in the direction in which

f increases most rapidly, and its magnitude gives the maximum rate of increase.

26
Similarly, −∇f points in the direction in which f decreases most rapidly. Examining a

level surface f ( x, y , z ) = c reveals that the gradient will equal zero at any point on that

level surface. This tells us that for a function f ( x, y , z ) , the vector ∇f P


is perpendicular

to the level surface of f that contains P. Similarly, for a function f ( x, y ) , the vector

∇f P
is perpendicular to the level curve of f that contains P. This fact gives us another

option for writing the equation of the tangent plane at P = ( x0 , y0 , z0 ) :

f x P ⋅ ( x − x0 ) + f y ⋅ ( y − y0 ) + f z P ⋅ ( z − z0 ) = 0 .
P

For completeness, we’ll briefly mention the divergence and curl of a vector field.

The divergence, a scalar quantity, can be interpreted as the net flow of “density” out of a

unit of space. It is defined as follows:

∂F1 ∂F2 ∂F3


div F = ∇ ⋅ F = + + .
∂x ∂y ∂z

The curl, a vector quantity, can be interpreted as the rotation of a vector field. It is equal

to the maximum circulation at each point and is oriented perpendicularly to the plane of

circulation, following the right-hand rule. It is defined as follows:

ˆi ˆj kˆ
∂ ∂ ∂
curl F = ∇ × F = .
∂x ∂y ∂z
F1 F2 F3

Maxima and Minima

Just as with single-variable functions, multivariable functions attain critical points when

their derivatives equal zero. Critical points are also not necessarily maxima or minima in

multivariable functions. Saddle points, which are critical points that are neither maxima

27
nor minima, can exist for a function z = f ( x, y ) at a point P0 = ( x0 , y0 ) if z = f ( x, y0 )

has a maximum at x = x0 while z = f ( x0 , y ) has a minimum at y = y0 (or vice versa).

We can evaluate the possibilities by computing the Hessian, the determinant of the

Hessian matrix:

⎡ f xx f xy ⎤ 2
∆ = det ⎢ ⎥ = f xx ( P0 ) f yy ( P0 ) − ⎡⎣ f xy ( P0 ) ⎤⎦ ,
⎣ f yx f yy ⎦
P0

where we make use of the fact that for continuous functions, f xy = f yx .

• If ∆ > 0 and f xx ( P0 ) < 0 , then f attains a local maximum at P0 .

• If ∆ > 0 and f xx ( P0 ) > 0 , then f attains a local minimum at P0 .

• If ∆ < 0 , then f has a saddle point at P0 .

• If ∆ = 0 , then no conclusion can be drawn.

When solving maxima and minima problems subject to a constraint, the constraint

function is solved for one of the variables and substituted into the main function,

differentiated and equated to zero, and then solved as usual for critical points. These

points, along with the endpoints of any intervals given, must be tested to find the

extrema.

The Lagrange Multiplier

In some cases, the constraint equation will not be easy to solve for one of the variables.

In such a case (and in others), it can help to employ the method of the Lagrange

multiplier, which takes advantage of several interesting facts. Let’s say we’re looking to

maximize f ( x, y ) subject to a constraint g ( x, y ) = c . If M is an extreme value of f,

attained at the point P = ( x0 , y0 ) , then the level curve f ( x, y ) = M and the curve

28
g ( x, y ) = c share the same tangent line at P. Because they share the tangent line, they

also share the normal line at P. Since ∇f is normal to the curve f ( x, y ) = M and ∇g is

normal to the curve g ( x, y ) = c , the gradients must be parallel, which is to say that one is

a scalar multiple of the other. That is, ∇f = λ∇g for some scalar λ , called the Lagrange

multiplier, whose value is unimportant in itself, although we will solve for it. This

assumes, of course, that ∇g is not the zero vector.

The gradients are computed, and the resulting components on the left and right

side are equated and treated as simultaneous equations that are solved for λ to find the

necessary relationship between the independent variables. (The parameter λ disappears

at this step.) The new relationship is substituted into the original equation to be

maximized and solved for the remaining variable. Critical values are then obtained and

tested.

Line Integrals

The standard definite integral computes the area between a curve and a straight-line

segment of one of the coordinate axes. But integration can follow more complicated

paths, giving rise to the line (or path or contour) integral.

Line Integrals with Respect to Arc Length

This is a close relative of the standard Riemann integral, in that the value of a function at

a point is multiplied by the length of the curve containing that point. Imagine, for

instance, a function f ( x, y ) and a curve C, which is defined parametrically. If we break

n
the curve into n tiny pieces, we have lim ∑ f ( xi , yi ) ∆si = ∫ f ds . Geometrically, this is
n →∞ C
i =1

29
equivalent to calculating the area of a curtain with C as its base and f ( x, y ) as its height

for each point ( x, y ) along C. The integral is evaluated by first parameterizing C:

⎧ x = x(t )
C=⎨ for a ≤ t ≤ b .
⎩ y = y (t )

The curve C is considered to be directed, meaning that it has a definite initial and final

b a
point. (This detail is similar to the idea that ∫
a
f ( x) dx = − ∫ f ( x) dx .) Since for a
b

differential treatment of the curve, we have ( ds ) = ( dx ) + ( dy ) , we can write:


2 2 2

2 2
ds ⎛ dx ⎞ ⎛ dy ⎞
[ x′(t )] + [ y′(t )]
2 2
= ± ⎜ ⎟ +⎜ ⎟ = ± ,
dt ⎝ dt ⎠ ⎝ dt ⎠

where the sign we choose depends on the direction we’re taking along our path. We then

b ds
have ∫ f ds = ∫ f ( x(t ), y (t ) ) dt .
C a dt

b ds
The treatment in 3
is identical, as ∫ f ds = ∫ f ( x(t ), y (t ), z (t ) ) dt , where
C a dt

2 2 2
ds ⎛ dx ⎞ ⎛ dy ⎞ ⎛ dz ⎞
[ x′(t )] + [ y′(t )] + [ z′(t )]
2 2 2
= ± ⎜ ⎟ +⎜ ⎟ +⎜ ⎟ = ± .
dt ⎝ dt ⎠ ⎝ dt ⎠ ⎝ dt ⎠

3
In , we can interpret the line integral using the example of C as a curved wire with

linear density f ( x, y , z ) . The line integral of f over C would give the total mass of the

wire.

The Line Integral of a Vector Field

A parametric curve given by the vector equation r = r (t ) = ( x(t ), y (t ) ) is considered

smooth if the derivative r ′(t ) is continuous and nonzero. A curve is considered

piecewise smooth if it is composed of finitely many smooth curves joined at consecutive

30
endpoints. A vector field is a vector-valued function. Let D be a region of the x-y plane

on which a pair of continuous functions, M ( x, y ) and N ( x, y ) , are both defined. Then

the function F( x, y ) = M ( x, y )ˆi + N ( x, y )ˆj = ( M ( x, y ), N ( x, y ) ) is a continuous vector

field on D. For a curve C parameterized by r (t ) = x(t )ˆi + y (t )ˆj = ( x(t ), y (t ) ) for a ≤ t ≤ b ,

we define the line integral of the vector field F as:

b
∫ F ⋅ dr = ∫ F ( r (t ) ) ⋅ r′(t ) dt
C a
b
= ∫ F ( x(t ), y (t ) ) ⋅ ( x′(t ), y′(t ) ) dt
a
b
= ∫ ⎡⎣ M ( x(t ), y (t ) ) x′(t ) + N ( x(t ), y (t ) ) y′(t ) ⎤⎦ dt.
a

3
The situation in is defined analogously. If we consider F to be a force field, the line

integral can be interpreted as the work done on a particle to move it along the path C.

In the case that F is equal to the gradient of a scalar field (i.e., a real-valued

function), then it is a gradient field. A function f such that F = ∇f is called a potential

for F. The fundamental theorem of calculus for line integrals says that the line

integral of a gradient field depends only on the endpoints of the path and not the path

itself. For any piecewise smooth curve C oriented from A to B and a continuously

differentiable function f defined on C, we have ∫


C
∇f ⋅ dr = f ( B) − f ( A) . For

F( x, y ) = M ( x, y )ˆi + N ( x, y )ˆj = ( M ( x, y ), N ( x, y ) ) to be a gradient field, it is necessary

(but not sufficient) that ∂M ∂y = ∂N ∂x . This simply states that f xy = f yx , which will be

the case for continuous functions. If the domain of F, which we’ll call R, is a simply

connected region—meaning that the interior of every simple closed curve drawn in R is

contained in R, then the above mixed-partials equality condition is sufficient.

31
It should be clear that for a gradient field F, ∫ C
F ⋅ dr = 0 for any closed path C

(where the circle on the integral symbol indicates a closed path).

Double Integrals

When evaluating double integrals over a region R, it is helpful to examine the region to

select the limits of integration that provide for the easiest calculation of each iterated

integral. For example, when evaluating the integral ∫∫ f ( x, y) dx dy ,


R
an imaginary

horizontal line is placed over the region, and expressions for x in terms of y are derived.

An integral with respect to x will then be evaluated, and the y terms will be substituted in

as the limits of integration. An imaginary vertical line will determine the y limits, which

should be numerical (the outside integral should not contain variables). An integral with

respect to y will be evaluated between these limits, and a numerical result will be

obtained. The situation is exactly reversed if the differentials are switched in the integral.

In short:
y =d x=h( y )
∫∫ f ( x, y) dA = ∫ ∫
R
y =c x= g ( y )
f ( x, y ) dx dy

x =b y=H ( x)
=∫ ∫ f ( x, y ) dy dx
x=a y =G ( x )

for some functions h, g, H, and G that describe the region R in terms of the variables

needed.

Double integrals are often evaluated in polar coordinates. The procedure is the

same as with Cartesian double integrals, except for one thing: The element of area

dA = r dr dθ . The reason for this becomes clear when one considers that these small

elements are “curvilinear rectangles,” the curved side of which will have arc length r dθ .

32
When evaluating an integral that corresponds to the volume of a region bounded above

by some function and bounded below by another function, the function providing the

upper bound is integrated, while the lower-bound function is treated as the region R.

Green’s Theorem

Green’s theorem is a powerful tool for dealing with integrals, since it relates double

integrals to line integrals, one of which might be easier to evaluate in a given situation.

First we define a simple closed curve, which is a curve that does not intersect itself.

Circles, ellipses, squares, and rectangles are examples of simple closed curves. A curve

is oriented, with the positive direction defined as the direction that, if walked, would put

the interior of the region bounded by the curve on your left.

Consider a simple closed curve C enclosing a region R, such that C is the

boundary of R. If M ( x, y ) and N ( x, y ) are functions that are defined and have

continuous partial derivatives on C and throughout R, Green’s theorem states:

⎛ ∂N ∂M ⎞
∫ C
M dx + N dy = ∫∫ ⎜
R ⎝ ∂x

∂y
⎟ dA.

A clever selection of M ( x, y ) = − y and N ( x, y ) = x gives us the interesting result

∫∫ dA = ∫
R
1
2 C
x dy − y dx by Green’s theorem. (Evaluating the integral with either M or N

set to zero while leaving the other unchanged will give the area of the region R, hence the

halving of the result above. That is, ∫ C


x dy = − ∫ y dx = AR .)
C

33
Differential Equations

Separable Equations

dy f ( x)
A differential equation of the form = is called separable, since its variables can
dx g ( y )

be separated out. Its solution is straightforward: ∫ g ( y ) dy = ∫ f ( x) dx .

Homogeneous Equations

A function of two variables, f ( x, y ) , is said to be homogeneous of degree n if there is a

constant n such that f (tx, ty ) = t n f ( x, y ) for all t, x, and y for which both sides are

defined. A differential equation M ( x, y ) dx + N ( x, y ) dy = 0 is homogeneous if M and N

are both homogeneous functions of the same degree. Homogeneous equations are

soluble by introducing the substitution y = vx , where v is considered as a function of x.

This substitution makes the differential equation separable after algebraic manipulation.

Exact Equations

Given a function f ( x, y ) , its total differential is defined as:

∂f ∂f
df = dx + dy .
∂x ∂y

This tells us that the family of curves f ( x, y ) = c satisfies the differential equation

df = 0 . If there exists a function f ( x, y ) such that ∂f ∂x = M ( x, y ) and

∂f ∂y = N ( x, y ) , then M ( x, y ) dx + N ( x, y ) dy = 0 is called an exact differential, and it

34
has the general solution f ( x, y ) = c . An equation is exact if ∂M ∂y = ∂N ∂x , which we

recognize as the now-familiar equating of the mixed partials, f xy = f yx .

Nonexact Equations and Integrating Factors

A differential equation that is not exact can be made exact by being multiplied through by

an integrating factor, µ , which is a function in one of the variables of the differential

equation. Any nonexact differential equation that has a solution also has an integrating

factor, even if that factor is not always particularly easy to find. There are, however,

some cases for which the process of finding it is straightforward.

Given a nonexact differential equation M ( x, y ) dx + N ( x, y ) dy = 0 , for which

M y − N x ≠ 0 , we have the following two cases:

1. If (M y − N x ) N is a function of x alone, call this function ξ ( x) . Then

µ ( x) = exp ∫ ξ ( x) dx is an integrating factor.

2. If (M y − N x ) − M is a function of y alone, call this function ψ ( y ) . Then

µ ( y ) = exp ∫ψ ( y ) dy is an integrating factor.

First-Order Linear Equations

A first-order linear equation is defined as a differential equation of the form:

dy
+ P( x) y = Q( x) .
dx

Equations of this type are soluble through the use of an integrating factor,

dy
µ ( x ) = exp ∫ P ( x) dx . Multiplying through yields µ ( x) + µ ( x) P( x) y = µ ( x)Q( x) . The
dx

35
d
left-hand side becomes ( µ y )′ , and the equation becomes ( µ y ) = µ Q . Integrating
dx

both sides gives the general solution:

1
( µQ ) dx .
µ∫
y=

Higher-Order Linear Equations with Constant Coefficients

The general second-order linear differential equation with constant coefficients is

ay′′ + by ′ + cy = d ( x) .

In the case that d ( x) = 0 , the equation is said to be homogeneous (the definition here,

which is more common, differs from that of the other type of homogeneous equation).

The general solution of any nonhomogeneous equation is y = yh + y p , where yh is the

general solution of the corresponding homogeneous equation, and y p is any particular

solution of the nonhomogeneous equation.

The homogeneous equation is solved by converting it into its corresponding

auxiliary polynomial, in which the nth derivative of y becomes the nth power of m:

ay′′ + by′ + cy = 0 → am2 + bm + c = 0 .

This equation is solved for its roots, which form the basis of the solution of the

differential equation according to the following three cases:

1. The roots are real and distinct: The general solution is y = c1e m1x + c2 e m2 x .

2. The roots are real and identical: The general solution is y = c1e m1x + c2 xe m1x .

3. The roots are complex conjugates, m = α ± β i : The general solution is

y = eα x ( c1 cos β x + c2 sin β x ) .

36
Linear Algebra

Matrices

An m × n matrix is an array of numbers corresponding to a group of m row vectors and n

column vectors. For a matrix A, the aij entry is in row i and column j. Matrices of the

same size can be added and subtracted element by element. That is, for matrices A and B,

( A ± B )ij = aij ± bij . Clearly, matrix addition and subtraction are commutative. Matrices

can be multiplied by scalars, yielding a new matrix in which each element has been

multiplied by that scalar.

Two matrices can be multiplied together only if the number of columns of the first

matrix equals the number of rows of the second. An m × n matrix multiplied by an n × p

matrix yields an m × p matrix. Matrix multiplication is defined such that

( AB )ij = ri ( A) ⋅ c j ( B) , where ri ( A) denotes the ith row vector of the matrix A and c j ( B )

denotes the jth column vector of the matrix B.

The identity matrix I has entries defined by the Kronecker delta:

⎧1 if i = j
δ ij = ⎨ .
⎩0 if i ≠ j

An inverse of a matrix A is another matrix A−1 such that the product AA−1 = A−1 A = I . A

matrix that has an inverse is called invertible. There are several methods for finding an

inverse matrix, which we’ll discuss shortly.

An augmented matrix is a coefficient matrix with the column vector of the

constants from the right-hand side of the simultaneous equations appended on the right.

An augmented matrix is solved by putting it in echelon form, in which the matrix is

37
upper triangular with any zero rows at the bottom, with the first nonzero entry in any row

appearing to the right of any nonzero entry in the row above. There are three elementary

row operations that are used to reduce a matrix to echelon form:

1. Multiplying a row by a nonzero constant.

2. Interchanging two rows.

3. Adding a multiple of one row to another row.

Variables are solved for by working from the bottom up of the reduced matrix, a

procedure called back-substitution. In order to obtain a unique solution for

simultaneous equations in n unknowns, n equations are required. (It is also required that

they be linearly independent, a concept we’ll visit shortly.) In the case that infinitely

many solutions to a system are possible, it may be the case that one or more of the

variables are free, in which case they can be represented by parameters. The other

variables can be solved for in terms of the parameters.

Any system of linear equations can be written as Ax = b , where A is the system’s

coefficient matrix, x is the column vector of the unknowns, and b is the column vector of

the constant terms to the right of the equal sign. In the case that A is invertible, the

system can be solved using the relation A−1 Ax = A−1b ⇒ x = A−1b .

Vector Spaces

A vector space V is a set that is closed under the two operations addition and scalar

multiplication. This means that for any x1 , x 2 ∈V , x1 + x 2 ∈V and kx1 , kx 2 ∈ V for any

scalar k. It should be evident from these conditions that the zero vector is required to be

in any vector space.

38
If A is an m × n matrix, the set of all vectors x such that Ax = 0 is called the

n
nullspace of A, denoted N ( A) . The nullspace is a subspace of . It should be

apparent that an invertible matrix will only have the trivial subspace, the zero vector, as

its nullspace.

Consider a collection of n-vectors, v1 , v 2 ,… , v m . Any expression of the form

∑k v
i =1
i i , where ki represent scalars, is called a linear combination of the vectors vi .

The set of all possible linear combinations of the vectors vi is called their span. For

example, since every vector in 3


can be expressed as a linear combination of ˆi , ĵ , and

3 n
k̂ , their span is . The span of any collection of n-vectors is a subspace of . The

m
vectors vi are called linearly independent if ∑k v
i =1
i i = 0 ⇒ ki = 0 for all i. This means

that no combination of some vectors cancels other vectors out, or that no vector can be

written as a linear combination of other vectors. If this is not the case, the vectors are

said to be linearly dependent. A minimal spanning set of vectors is called a basis for

the vector space they represent. The number of vectors in the basis is the dimension of

the vector space.

Let {a1 , a 2 ,… , a n } be a set of n-vectors in n


, and let A be an n × n matrix such

that {a1 , a 2 ,… , a n } form the columns of A. The vector set {a1 , a 2 ,… , a n } is linearly

independent if and only if det A ≠ 0 , which is equivalent to saying that A is nonsingular.

In an m × n matrix, the columns can be regarded as m-vectors, and the rows can

be regarded as n-vectors. The maximum number of linearly independent columns is

called the column rank, and the maximum number of linearly independent rows is called

39
the row rank. In any matrix, the column rank is equal to the row rank; this is simply the
m
rank of the matrix. The subspace of spanned by the columns is called the column

n
space, CS ( A) , and the subspace of spanned by the rows is called the row space,

RS ( A) . A vector b is in CS ( A) if and only if there is a collection of scalars ki such that

∑k c
i =1
i i = b , where ci are the column vectors. This requires that Ak = b have a solution

for k = ( k1 , k2 ,… kn ) . The row space of A is best considered as CS ( AT ) .


T

Determinants

The determinant is only defined for square matrices. For an n × n matrix A, we first

define the minor, M ij , as the determinant of the ( n − 1) × (n − 1) matrix that results from

eliminating row i and column j from matrix A. The cofactor of any matrix entry aij is

cof ( aij ) = ( −1)


i+ j
M ij . The determinant of any square matrix A can be computed by the

Laplace expansion:

n n
det A = ∑ aij cof ( aij ) = ∑ aij cof ( aij ) .
j =1 i =1

This tells us that determinants can be evaluated along any row or column. This fact

makes it clear that any matrix with a column or row made up of only zeros will have a

zero determinant.

The adjugate matrix of A is the transpose of the cofactor matrix of A. That is,

Adj A = ⎡⎣cof ( aij ) ⎤⎦ . This matrix is important in calculating the inverse of a matrix, since
T

1
A−1 = Adj A .
det A

40
Cramer’s rule allows a square linear system Ax = b to be solved exclusively by

determinants. Let A j denote the matrix formed by replacing column j of A with the

column vector b. Then Cramer’s rule says:

Aj
xj = .
A

The Wronskian can be helpful when evaluating functions for linear independence

in a vector space. For functions [ f1 , f 2 ,…, f n ] ( x) , we define the Wronskian as the

determinant

f1 ( x) f 2 ( x) f n ( x)
f1′( x) f 2′( x) f n′( x)
W [ f1 , f 2 ,… , f n ] ( x) = .

f1( n −1) ( x) f 2( n −1) ( x) f n( n −1) ( x)

If the Wronskian is nonzero, the functions are linearly independent. A zero Wronskian,

however, does not guarantee linear dependence.

Linear Transformations

For vector spaces V and W, a linear transformation (or linear map) is a function

T : V → W such that T (x1 + x 2 ) = T (x1 ) + T (x 2 ) and T ( kx) = kT ( x) for any vectors x ,

x1 , and x 2 in V and any scalar k. If W = V , then T is called a linear operator. There is

a connection to matrices in that, for an m × n matrix A, defining a function T : n


→ m

n m
by T ( x) = Ax gives a linear transformation. Conversely, if T : → is a linear

transformation, there exists an m × n matrix A such that T ( x) = Ax for all x in n


. In the

n m
case that and are considered with their standard bases, B = eˆ i , in which there is a

41
1 in position i and 0 elsewhere, then column i in A is simply the image of eˆ i . In this case,

A is called the standard matrix for T.

n m n
Let T : → be a linear transformation. The set of all vectors x in that

get mapped to 0 is called the kernel of T: ker T = {x : T (x) = 0} . It is a subspace of the

domain space, and its dimension is called the nullity of T. The set of all images of T is

called the range of T: R (T ) = {T ( x) : x ∈ n


}. This is a subspace of m
, and its

dimension is called the rank of T. If T is given by T ( x) = Ax , then the kernel of T is the

same as the nullspace of A, and the range of T is the same as the column space of A.

The rank plus nullity theorem states that the sum of the nullity and rank equals the

dimension of the domain. So for T above, dim ( ker T ) + R(T ) = n .

Let T : n
→ n
be a linear operator. If T is bijective, then T has an inverse, T −1 ,

defined so that T −1 (y ) = x if T ( x) = y . If A is the matrix representation of T, then A−1 is

the representative of T −1 . Further, if the matrix A represents the transformation


m n n p
S: → and B represents the transformation T : → , then the matrix product

AB represents the composition T S .

Eigenvalues and Eigenvectors

In general, multiplying a square matrix A by a compatible nonzero vector x does not

produce a scalar multiple of x. However, if it happens that Ax = λ x for some scalar λ ,

then λ is called an eigenvalue (or characteristic value) of A, and the nonzero vector x

is called an eigenvector (or characteristic vector). Eigenvalues and eigenvectors are

defined only for square matrices. To find these values, we must find scalars and vectors

42
that satisfy the equation Ax = λ x , which can be rewritten as ( A − λ I ) x = 0 . This will

only be nontrivially soluble if A − λI is noninvertible, which means that

det ( A − λ I ) = 0 . Calculation of this determinant produces the characteristic equation.

For 2 × 2 square matrices, this is a quadratic that must be solved for λ . Once the

eigenvalues are found, each is substituted into the equation ( A − λ I ) x = 0 and solved for

the corresponding eigenvector (which may be a family of vectors due to possible free

variables). It is an interesting property that ∑ λ = tr( A) , where tr( A) is the trace of the

matrix, the sum of its diagonal entries. Also, ∏ λ = det A . (In both cases, it is

understood that the sum and product are over all eigenvalues for A.)

The eigenspace of A corresponding to λ , Eλ ( A) = {x : Ax = λ x} , is the collection

of all eigenvectors corresponding to λ along with the zero vector. Eλ ( A) = N ( A − λ I ) .

The Cayley-Hamilton theorem states that every square matrix satisfies its own

characteristic equation. This is useful for expressing an integer power of a matrix A as a

n n
linear polynomial in A. That is, if det ( A − λ I ) = ∑ α i λ i = 0 , then ∑α A i
i
= 0 , where we
i =0 i =0

consider A0 = I .

43
Number Theory

If a and b are positive integers, the division algorithm says that we can find unique

integers q and r such that b = qa + r , with 0 ≤ r < a . This algorithm is employed in

sequence to find the greatest common divisor of two integers in the Euclidean

algorithm: Given two numbers, a and b (where we’ll assume a > b ), we know we can

find a quotient and remainder q1 and r1 such that a = q1b + r1 . We then apply the division

algorithm to the divisor and the first remainder: b = q2 r1 + r2 . We continue this process

until there is no remainder (i.e., rn − 2 = qn rn −1 + rn , where rn = 0 ). The divisor in the step

that yields no remainder is the greatest common divisor, or gcd, of the two numbers. If

gcd( a, b) = 1 , then a and b are said to be relatively prime. The product of the greatest

common divisor and the least common multiple of two numbers is equal to the product of

the two numbers: [ gcd(a, b)] ⋅ [ lcm(a, b)] = ab .

The Diophantine equation ax + by = c will have infinitely many integral

solutions if gcd( a, b) | c . Any one solution ( x1 , y1 ) ∈ 2


reveals the others:

b
x = x1 + t
gcd(a, b)
a
y = y1 − t
gcd(a, b)

for any t ∈ .

The greatest common divisor of any two numbers can always be written as a

linear combination of those two numbers. This is done by working backward through the

Euclidean algorithm and substituting in steps that lead to the original numbers.

44
Congruence

For integers a, b, and n, we say that a is congruent to b modulo n if a − b is divisible by

n. That is, a ≡ b mod n if n | ( a − b) .

ƒ If ab ≡ ac mod n , then

o b ≡ c mod n if gcd( a, n) = 1

o b ≡ c mod gcd(na ,n ) if gcd( a, n) > 1

ƒ Fermat’s little theorem states that if p is a prime and a is an integer:

o a p −1 ≡ 1mod p if p does not divide a

o a p ≡ a mod p for any integer a

The linear congruence equation, ax ≡ b mod n , has a solution if and only if gcd( a, n)

divides b. If gcd( a, n) = 1 , the solution is unique mod n; if gcd( a, n) > 1 , the solution is

unique mod n
gcd( a , n )
.

Abstract Algebra

Binary Structures and Groups

Let S be a nonempty set. A function f : S × S → S defined on every ordered pair of

elements of S to give a result that is also in S is called a binary operation on S. Binary

operations are typically written showing the set and the operation. In the case of f above,

we might write ( S ,i ) .

A binary operation, i , is said to be associative if, for every a, b, c ∈ S , the

following equation always holds: a i(bic ) = ( a ib)ic . A binary structure whose binary

operation is associative is called a semigroup.

45
Given a binary structure, ( S ,i ) , an element e ∈ S with the property aie = eia = a

for every a ∈ S is called the identity. A semigroup with an identity is called a monoid.

Let ( S ,i ) be a monoid, and let a be an element in S. If there is an element a ∈ S

such that aia = aia = e , we call a the inverse of a. A monoid with the property that

every element in S has an inverse is called a group.

If the binary operation of a group ( S ,i ) has the property that aib = bia for every

a, b ∈ S , the binary operation is commutative. A semigroup, monoid, or group whose

binary operation is commutative is said to be abelian.

If the group ( S ,i ) contains precisely n elements for some positive integer n, then

the group is finite of order n. Otherwise, we say the group is infinite.

For any group, the inverse ( xi y ) = y −1 i x −1 .


−1

Cyclic Groups

A group G with the property that there exists an element a ∈ G such that

G = {a n : n = 0,1, 2,…} is said to be cyclic, and the element a is called the generator of

the group. (We clarify here that a 0 is the identity, a1 = a , a 2 = a i a , etc.) A cyclic

group has at least one generator.

Consider the set n = {0,1, 2,… , n − 1} and the group ( n , ⊕ ) , the group whose

binary operation is addition modulo n. The integer m is a generator of ( n , ⊕ ) if and

only if m is relatively prime to n. In more general terms, let G be a cyclic group with

generator a, and let n be the smallest integer such that a n = e . Then the element a m is a

generator of G if and only if m is relatively prime to n.

46
Subgroups

Let ( G,i ) be a group. If there’s a subset H ⊆ G such that ( H ,i ) is also a group, then H

is a subgroup of G, and we write H ≤ G . (If H ≠ G , we can write H < G to denote H

as a proper subgroup of G.) Every group has at least two subgroups: the trivial

subgroup, {e} , consisting of just the identity element, and the group itself.

Let a be an element of a group G. With the binary operation defined on G, the set

{a n
:n∈ }, also denoted by a , is a subgroup of G called the cyclic subgroup

generated by a. It consists of all the integer powers of a. In the case that n is a negative

integer, we consider a −1 = a to be the inverse, so a − n = a i a i…i a (n times).

We can define the order of a group element as follows: the order of a ∈ G is the

order of a , the cyclic subgroup generated by a. Equivalently, the order of an element

is the smallest integer n such that a n = e , where e is the identity. The cyclic subgroup

generated by a is the smallest group of G containing a. More generally, if ai ∈ G for

every i in some indexing set I, then the subgroup generated by {ai } is the subgroup

consisting of all finite products of terms of the form aini and is the smallest subgroup of G

containing all the elements ai . If the subgroup is all of G, then we say that G is

generated by {ai } and that the elements ai are generators of G.

ƒ Lagrange’s theorem: Let G be a finite group. If H is a subgroup of G, then

the order of H divides the order of G.

ƒ Let G be a finite, abelian group of order n. Then G has at least one subgroup

of order d for every positive divisor d of n.

47
ƒ Let G be a finite, cyclic group of order n. Then G has exactly one subgroup—

a cyclic subgroup—of order d for every positive divisor d of n. If G is

generated by a, then the subgroup generated by the element b = a m has order

d = n gcd(m, n) . (If m = 0 , we say that gcd( m, n) = n .)

ƒ Cauchy’s theorem: Let G be a finite group of order n, and let p be a prime

that divides n. Then G has at least one subgroup of order p.

ƒ Sylow’s first theorem: Let G be a finite group of order n, and let n = p k m ,

where p is a prime that does not divide m. Then G has at least one subgroup of

order p i for every i ∈ [ 0, k ] ⊂ .

Isomorphisms

Consider the multiplication table for the three-element group below:

i e a b

e e a b

a a b e

b b e a

We compare that to the table for the group 3 :

⊕ 0 1 2

0 0 1 2

1 1 2 0

2 2 0 1

48
We notice that, although the symbols are different, the structure is exactly the same.

Structurally identical groups are said to be isomorphic. We denote an isomorphism

between two groups G1 and G2 by G1 ≅ G2 . Isomorphic groups share identical structural

properties, so if we know details about one group and also know that it is isomorphic to

another group, we know the other group’s details automatically. These details include

the order of the group, the number of subgroups of a particular order, whether it is cyclic

or abelian, etc.

The Classification of Finite Abelian Groups

Let ( G1 ,i1 ) and ( G2 ,i 2 ) be groups. On the set G1 × G2 = {(a, b) : a ∈ G1 ∧ b ∈ G2 } , define

a binary operation, i , such that ( a1 , a2 )i( b1 , b2 ) = ( a1 i1 b1 , a2 i 2 b2 ) . Then ( G1 × G2 ,i ) is a

group, called the direct product of the groups G1 and G2 . If G1 and G2 are finite, and

G1 has order m and G2 has order n, then G1 × G2 has order mn. If G1 and G2 are both

abelian, then the notation G1 ⊕ G2 is sometimes used, and the resulting abelian group is

called the direct sum of G1 and G2 . This definition can be generalized to any number of

groups, not just two.

ƒ The direct sum m1 ⊕ m2 ⊕ ⊕ mk is cyclic if and only if gcd( mi , m j ) = 1

for every distinct pair mi and m j , in which case m1 ⊕ m2 ⊕ ⊕ mk is

isomorphic to m1m2 mk .

ƒ Every finite abelian group G is isomorphic to a direct sum of the form

⊕ ⊕ ⊕ , where the pi are (not necessarily distinct) primes,


( p1 )k1 ( p2 )k2 ( pr ) k r

and the ki are (not necessarily distinct) positive integers. The collection of

49
( pi )
ki
prime powers, , for a given representation of G are known as the

elementary divisors of G.

ƒ Every finite abelian group G is isomorphic to a direct sum of the form

m1 ⊕ m2 ⊕ ⊕ mt , where m1 ≥ 2 and mi | m j for every j = i + 1 . The

integers mi are not necessarily distinct, but the list m1 ,… , mt is unique. These

integers are called the invariant factors of G.

Group Homomorphisms

Let ( G,i ) and ( G′, ∗) be groups. A function φ : G → G ′ with the property that

φ ( aib ) = φ (a) ∗ φ (b) for all elements a, b ∈ G is called a group homomorphism. An

injective (one-to-one) homomorphism is called a monomorphism; a surjective (onto)

homomorphism is called an epimorphism; a bijective (one-to-one and onto)

homomorphism is called an isomorphism. We recall the earlier fact that two groups that

are structurally identical are isomorphic; this is true if an only if there exists a bijective

homomorphism between the two groups. A homomorphism from a group to itself is

called an endomorphism, and an isomorphism from a group to itself is called an

automorphism.

For any group homomorphism φ : G → G ′ :

• If e is the identity in G, then φ (e) is the identity in G′ .

• If g ∈ G has finite order m, then φ ( g ) ∈ G ′ also has order m.

• If a −1 is the inverse of a in G, then φ (a −1 ) is the inverse of φ (a ) in G′ .

50
• If H is a subgroup of G, then φ ( H ) is a subgroup of G′ , where

φ ( H ) = {φ (h) : h ∈ H } .

• If G is finite, then the order of φ (G ) divides the order of G; if G′ is finite, then

the order of φ (G ) also divides the order of G′ .

• If H ′ is a subgroup of G′ , then φ −1 ( H ′) is a subgroup of G, where

φ −1 ( H ′) = {h ∈ G : φ (h) ∈ H ′} .

• If φ : G → G ′ is a homomorphism of groups, then {e′} , where e′ is the identity in

G′ , is the trivial subgroup of G′ . The inverse image of {e′} is a subgroup of G.

This subgroup is the kernel of φ , which is defined by ker φ = { g ∈ G : φ ( g ) = e′} .

A homomorphism is a monomorphism if and only if its kernel is trivial.

Rings

A set R, together with two binary operations (we’ll choose addition, +, and multiplication,

i , here), is called a ring if the following conditions are satisfied:

• ( R, + ) is an abelian group;

• ( R,i ) is a semigroup;
• The distributive laws hold; namely, for every a, b, c ∈ R , we have:

a i(b + c) = a ib + a ic and ( a + b)ic = a ic + bic .

If the multiplicative semigroup ( R,i ) is a monoid (that is, if it has a multiplicative

identity), then R is called a ring with unity. If the operation of multiplication is

commutative, then R is a commutative ring. For S ⊂ R satisfying the ring

51
requirements, we call ( S , +,i ) a subring of ( R, +,i ) . The characteristic of a ring is the

smallest integer n such that na = 0 for every a ∈ R . If no such n exists, as in the case of

the infinite rings , , , and , the ring is said to have characteristic zero. For cases

when char R > 0 , it is sufficient to check for the smallest n such that ni1 = 0 .

Ring Homomorphisms

Let ( R , +, × ) and ( R′, ⊕, ⊗ ) be rings. A function φ : R → R′ is called a ring

homomorphism if both of the following conditions hold for all a, b ∈ R :

φ (a + b) = φ (a ) ⊕ φ (b)
.
φ (a × b) = φ (a ) ⊗ φ (b)

For any ring homomorphism φ : R → R′ :

• The kernel of a ring homomorphism is the set ker φ = {a ∈ R : φ (a) = 0′} , where 0′

is the additive identity in R ′ . The kernel of a ring homomorphism φ : R → R′ is a

subring of R.

• The image of R, φ ( R) = {φ (r ) : r ∈ R} , is a subring of R.

• The image of 0, the additive identity in R, must be 0′ , the additive identity in R ′ .

It follows from this that φ ( − r ) = −φ (r ) for all r ∈ R , where −r is the additive

inverse of r in R.

Fields

Let a be a nonzero element of a ring R with unity. Recall that the multiplicative structure

( R,i ) is not required to be a group, so a may not have an inverse. If it does have a

multiplicative inverse, then a is called a unit. If every nonzero element of R is a unit—

52
namely, if ( R* ,i ) is a group—then R is called a division ring. A commutative division

ring is called a field.

Additional Topics

Set Theory

• The symmetric difference of two sets is written as A∆B = ( A − B ) ∪ ( B − A) .

• DeMorgan’s laws say that:

( A ∩ B ) = AC ∪ BC
C

( A ∪ B ) = AC ∩ BC
C

( A ∪ B) − C = ( A − C ) ∪ ( B − C )

( A ∩ B) − C = ( A − C ) ∩ ( B − C )

A ∩ (B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C )

A ∪ (B ∩ C ) = ( A ∪ B) ∩ ( A ∪ C )

Two sets are equivalent if there exists a bijection between them. If there exists a
+
bijection between a set and the positive integers, , we say that the set is countably

infinite with cardinality ℵ0 . Some sets are uncountably infinite, meaning that no

bijection exists between them and the positive integers. The cardinality of the reals is

called the cardinality of the continuum, denoted sometimes as 2ℵ0 , since ≈℘ ( ).


+

(The power set of A, ℘( A) is the set of all subsets of A; it has cardinality 2card A .)

53
Combinatorics

For k objects, there are k ! possible arrangements of their order, called a permutation.

For k objects chosen from n total objects, there are P ( n, k ) possible arrangements, where

n!
P ( n, k ) = .
(n − k ) !

If order is not important (as in the drawing of a lottery or a hand of cards), the

possibilities are referred to as combinations, and there are C (n, k ) possibilities, where

P(n, k ) ⎛ n ⎞ n!
C (n, k ) = =⎜ ⎟= .
k! ⎝ k ⎠ (n − k )!k !

We note that the combination C (n, k ) is equal to the binomial coefficient, which comes

from the binomial theorem:

n
⎛n⎞
(a + b) = ∑ ⎜ ⎟ a n−k bk .
n

k =0 ⎝ k ⎠

When repetition is allowed, Pr ( n, k ) = n k , and Cr (n, k ) = C (n + k − 1, k ) .

A generalized statement of the pigeonhole principle says that if you are given n

different objects, each of which is painted one of c different colors, then for any integer

k ≤ ⎡⎣( n − 1) c ⎤⎦ + 1 , there are at least k objects painted the same color.

54
Probability and Statistics

Let S be a nonempty set. A Boolean algebra (or simply an algebra) of sets on S is a

nonempty subfamily of the power set of S, E ⊆℘( S ) , that satisfies the following two

conditions:

1. If A and B are sets in E , then so are A ∪ B and A ∩ B .

2. If A ∈E , then so is AC = S − A .

By definition, this set is nonempty, so it contains some set A, and by the second condition

listed above, it also contains the complement of A. By the first condition above, it also

contains the union of these sets, which means it contains all of S (and, necessarily, the

empty set).

Let E be an algebra of sets on S. A function P : E → [ 0,1] is called a probability

measure on E (or S if E is understood) if all of the following conditions are met:

1. P (∅ ) = 0 .

2. P ( S ) = 1 .

3. P ( A ∪ B ) = P ( A) + P ( B ) for every pair of disjoint sets A and B in E .

A probability space is a set S together with an algebra of sets on S and a probability

measure on S. The set S is called the sample space, the elements of S are called

outcomes, and the sets in E (which are subsets of S) are called events. With this in

mind, P ( A) is interpreted to be the probability that event A occurs. Two events, A and B,

are considered mutually exclusive if P ( A ∩ B ) = 0 .

• P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B ) .

• A and B are independent if and only if P ( A ∩ B ) = P ( A)i P ( B ) .

55
A Bernoulli trial is an experiment in which there are only two possible outcomes,

often termed a success or a failure. The probability of exactly k successes in n such trials

is given by an application of the binomial theorem:

⎛n⎞
P(k ; n, p) = ⎜ ⎟ p k q n − k ,
⎝k ⎠

where p is the probability of a success on any given trial and q = 1 − p is the probability

of failure. The distribution of all possibilities for k ∈ [ 0, n] ⊂ is called the binomial

distribution.

Let ( S , E, P ) be a probability space. A function X : S → is called a random

variable. To each outcome ω ∈ S , the function assigns some real number, X (ω ) . If we

consider the subset {ω : X (ω ) ≤ t} ⊂ S and assume it to be a member of the family E ,

this subset is an event. We can associate a function to the random variable such that

FX (t ) = P ({ω : X (ω ) ≤ t}) , and we call FX the distribution function of X. We can

abbreviate this by writing FX (t ) = P ( X ≤ t ) . This function gives the probability that the

random variable will take on a value no greater than t. We can also calculate the

probability that the random variable will be in a certain interval, ( t1 , t2 ] , by

P(t1 < X ≤ t2 ) = P( X ≤ t2 ) − P ( X ≤ t1 ) = FX (t2 ) − FX (t1 ) . This fact is arrived at by noting

that the events X ≤ t1 and t1 < X ≤ t2 are disjoint and considering their union, X ≤ t2 . An

algebraic rearrangement of the probabilities gives the expression above. To close or open

the interval we’re considering, we need only to add or subtract (respectively) the

probability at the endpoint (e.g., P (t1 ≤ X ≤ t2 ) = P( X ≤ t2 ) − P( X ≤ t1 ) + P ( X = t1 ) .

56
Random variables are often defined so that the probability that X will equal any

particular value is zero. Meaningful results only come from considering an interval that

X can fall into. Such a random variable (and its distribution function) is called

continuous. The derivative of the distribution function, f X = FX′ (t ) , is called the

probability density function of X. We require this function to be nonnegative and

t2 ∞
integrable, so that FX (t2 ) − FX (t1 ) = ∫ f X (t ) dt . It is also required that
t1 ∫
−∞
f X (t ) dt = 1 .

We also have the following result:

t
P( X ≤ t ) = FX (t ) = ∫ f X ( x) dx .
−∞

If X is a continuous random variable, we can calculate the expectation (or mean)

of X by:


E ( X ) = µ ( X ) = ∫ tf X (t ) dt .
−∞

The variance of X is given by:


Var( X ) = σ 2 ( X ) = ∫ [t − µ ( X )]
2
f X (t ) dt .
−∞

The standard deviation is equal to the square root of the variance.

A random variable is said to be normally distributed if its probability density

function has the form:

1 ⎛ ( t − µ )2 ⎞
f X (t ) = exp ⎜ − ⎟.
σ 2π ⎜ 2σ 2 ⎟
⎝ ⎠

This function must be integrated numerically; it is often useful to consult tables for the

values of interest. These tables often make use of a change of variable: u = ( t − µ ) σ .

57
When the mean is set to zero and the standard deviation is set to one, we get the

standard normal probability density for the standardized normal random variable Z:

1 ⎛ u2 ⎞
f Z (u ) = exp ⎜ − ⎟ ,
2π ⎝ 2⎠

which is related to the original random variable by the equation:

⎛t −µ t −µ ⎞
P (t1 < X ≤ t2 ) = P ⎜ 1 <Z≤ 2 .
⎝ σ σ ⎟⎠

The integral of f Z (u ) gives the standard normal probability distribution, which is

commonly denoted by Φ :

z 1 −u2 2
Φ( z ) = ∫ e du .
−∞

For two extended real numbers z1 < z2 , we have:

P( z1 < Z ≤ z2 ) = Φ( z2 ) − Φ ( z1 ) .

A brief table of values is given below.

z 0 0.5 1 1.5 2 2.5 3+

Ф(z) 0.5 0.69 0.84 0.93 0.97 0.99 ≈1

(For negative values of z, we note that Φ ( − z ) = 1 − Φ ( z ) .)

Returning briefly to the binomial distribution, we can compute

a2
⎛n⎞
P (a1 ≤ X ≤ a2 ) = ∑ ⎜ ⎟ p k q n − k ,
k = a1 ⎝ k ⎠

but this can be unwieldy for large n. With µ = np and σ = npq , we can obtain an

approximation of the binomial distribution with a normal distribution, where

P(a1 ≤ X ≤ a2 ) ≈ Φ ( (a2 − µ + 12 ) σ ) − Φ ( (a1 − µ − 12 ) σ ) .

58
Point-Set Topology

Let X be a nonempty set. A topology on X, denoted by T , is a family of subsets of X,

for which the following three properties always hold:

1. ∅ and X are in T .

2. If O1 and O2 are in T , then so is their intersection, O1 ∩ O2 .

3. If {Oi }i∈I is any collection of sets from T , then their union, ∪ i∈I
Oi , is also in T .

In sum, a topology is always closed under finite intersections and arbitrary unions. The

sets in T are known as open sets. A set is open if all of its elements are interior points,

which means that for every point p ∈ O , p is contained in an open interval that is itself

contained in O. That is, p ∈ S p ⊂ O . A point p ∈ is called an accumulation point of

a set A if every open set G containing p contains a point of A different from p.

A set X together with a topology T is called a topological space, ( X , T) . A

Hausdorff space is a topological space such that for every pair of distinct points

x, y ∈ X there exist disjoint open sets Ox and Oy such that x ∈ Ox and y ∈ Oy .

For any nonempty set X, the collection {∅, X } is always a topology on X, called

the indiscrete (or trivial) topology. The power set of X, ℘( X ) , is also always a

topology on X, and it is referred to as the discrete topology. If T1 and T2 are topologies

on X , we say that T1 is finer than T2 (or that T2 is coarser than T1 ) if T1 ⊇ T2 . We

know that ℘( X ) ⊃ {∅, X } , and we say that these collections represent the extremes:

59
℘( X ) is the finest possible topology on X, and {∅, X } is the coarsest possible topology

on X. For T1 ⊇ T2 , we can also write T1 T2 to indicate fineness and coarseness.

The If ( X , T ) is a topological space, we can use T to define a topology on a subset


Subspace
Topology
S ⊂ X . A subset U ⊂ S is said to be open in S if U = O ∩ S for some set O that is open

in X. It is important to point out that U need not be open in X for this definition to apply.

Consider the sets O = ( −b, b ) and S = [ −a, a ] , both in , with a < b . Then

U = O ∩ S = S = [ −a, a ] is said to be open in S, since O is open in , even though U is

not open in . In this way, we can define the subspace (or relative) topology, TS , with

S a subspace of X. Written more compactly, TS = {S ∩ U : U ∈ T} .

Interior, Let ( X , T ) be a topological space, and let A be a subset—not necessarily open—


Exterior,
Boundary,
Limit of X. The interior of A, denoted int( A) , is the union of all open sets contained in A, or,
Points, and
Closure equivalently, the largest open set contained in A. The exterior of A, denoted ext( A) , is

the union of all open sets that do not intersect A; equivalently, we can say that

ext( A) = int( AC ) . The boundary of A, denoted bd( A) , is the set of all x ∈ X such that

every open set containing x intersects both A and the complement of A; equivalently,

bd( A) = ( int( A) ∪ ext( A) ) .


C
A limit point of A is an accumulation point, which is

defined above. The set of all the limit points of A is called the derived set of A, denoted

A′ . The closure of A, denoted cl( A) , can be defined in two equivalent ways:

cl( A) = int( A) ∪ bd( A) = A ∪ A′ . A perhaps more useful definition says that the closure

of A is the intersection of all closed supersets of A, which corresponds to the intersection

of all complements of the topological class that contain A.

60
A set is closed if and only if it contains all of its boundary and limit points;

equivalently, A is closed if A = cl( A) . Also, a set is closed if and only if its complement

is open; a set is open if and only if its complement is closed. A set A is open if and only

if A = int( A) . It is possible for sets to be simultaneously open and closed. (Consider the

discrete topology, made up of the power set, which contains every possible subset, all of

which are considered open. But since the complement of every open subset is also

contained in the power set, every subset is both open and closed.)
Basis for a Let X be a nonempty set, and let B be a collection of subsets of X satisfying the
Topology
following properties:

1. For every x ∈ X , there is at least one set B ∈B such that x ∈ B .

2. If B1 and B2 are sets in B and x ∈ B1 ∩ B2 , then there exists a set B3 ∈ B such that

x ∈ B3 ⊆ B1 ∩ B2 .

The collection B is called a basis, and the sets in B are known as basis elements. A

basis is used to generate a topology in that the topology consists of all possible unions of

the basis elements. A subset O ⊂ X belongs to the topology T generated by B if, for

every x ∈ O , there exists a basis element B such that x ∈ B ⊆ O .

If ( X , TX ) and (Y , TY ) are topological spaces, we can define a product topology

on the Cartesian product X ×Y by setting up the following standard basis:

B = {OX × OY : OX ∈ TX ∧ OY ∈ TY } .

Connectedness Let ( X , T ) be a topological space. If there exist disjoint, nonempty open sets O1

and O2 such that O1 ∪ O2 = X , then X is said to be disconnected. If, however, T

contains no pair of disjoint sets whose union is X, then X is said to be a connected space.

61
The following criteria can be used to identify connected spaces:

1. If A and B are connected and they intersect, then their union is also connected.

This holds for any number of connected sets, as long as their intersection is

nonempty.

2. Let A be a connected set, and let B be any set such that A ⊆ B ⊆ cl( A) . Then B is

connected.

3. The Cartesian product of connected spaces is connected.

4. Let X be a topological space with the property that any two points, x1 , x2 ∈ X can

be joined by a continuous path. Then X is said to be path-connected, and any

path-connected space is connected. The converse of this statement is not true.

Compactness
Let ( X , T ) be a topological space. A covering of X is a collection of subsets of X

whose union is X. An open covering is a covering consisting entirely of open sets. If

every open covering of X contains a finite sub-collection that also covers X, then X is said

to be a compact space. Compact spaces can be identified using the following criteria:

1. Let X be a compact topological space, and let S be a subset of X. If S is closed,

then it’s compact. The converse is true if X is Hausdorff.

2. The Cartesian product of compact spaces is compact.


n
3. The Heine-Borel theorem: A subset of is compact if and only if it’s both

closed and bounded.


n
(A subset A ⊂ is bounded if there exists some positive number M such that the norm

of every point x∈ A is less than M. That is, for x = ( x1 , x2 ,… , xn ) ,

( x1 ) + ( x2 ) + ( xn ) < M .)
2 2 2
x = +

62
Metric Let X be a nonempty set, and let d : X × X → be a real-valued function defined
Spaces
on ordered pairs of points in X. The function d is said to be a metric on X if the

following properties hold for all x, y , z ∈ X :

1. d ( x, y ) ≥ 0 , and d ( x, y ) = 0 if and only if x = y .

2. d ( x, y ) = d ( y , x) .

3. d ( x, z ) ≤ d ( x, y ) + d ( y , z ) .

We call d ( x, y ) the distance between x and y. A set X together with a metric on X is

called a metric space. For ε > 0 , the set Bd ( x, ε ) = { x′ ∈ X : d ( x, x′) < ε } is called an ε-

ball, the open ball of radius ε centered on x. The collection of all ε-balls,

B = { Bd ( x, ε ) : x ∈ X ∧ ε > 0} , is a basis for a topology on X, called the metric topology

(induced by d). In this topology, a subset O ⊂ X is open if and only if every x ∈ O ,

there exists a positive number ε x such that Bd ( x, ε x ) ⊆ O , which says that every point in

the set must be the center of an ε-ball.

Continuity We recall the definition of a continuous function, which says that a function f is

continuous at the point x0 if, for every ε > 0 , there exists a number δ > 0 such that

x − x0 < δ ⇒ f ( x) − f ( x0 ) < ε . We can generalize this to metric spaces by saying the

following: Let ( X1 , d1 ) and ( X 2 , d2 ) be metric spaces. A function f : X 1 → X 2 is

continuous at the point x0 if, for every ε > 0 , there exists a number δ > 0 such that

d1 ( x, x0 ) < δ ⇒ d 2 ( f ( x), f ( x0 ) ) < ε . The function f is simply called continuous if it is

continuous at every x0 ∈ X 1 . We can also say that f : X 1 → X 2 is continuous at the point

x0 if for every open set O containing f ( x0 ) , the inverse image f −1 (O ) is an open set

63
containing x0 . This is equivalent to saying f −1 ( Bd2 ( f ( x0 ), ε ) ) = Bd1 ( x0 , δ ) for suitable

ε , δ > 0 . Finally, for topological spaces ( X 1 , T1 ) and ( X 2 , T2 ) , a function f : X 1 → X 2

is continuous if, for every open set O ∈ X 2 (i.e., for every O ∈ T2 ), the inverse image,

f −1 (O ) is open in X 1 (i.e., f −1 (O ) ∈ T1 ). Moreover, f is continuous at the point x0 if for

every open set O ⊂ X 2 containing f ( x0 ) there is an open set O1 ⊂ X 1 such that

f (O1 ) ⊆ O . If the range of f is a topological space generated by a basis, it is sufficient to

check that the inverse image of the basis elements are open in X 1 . The following are

some facts about continuous maps between topological spaces, f : ( X 1 , T1 ) → ( X 2 , T2 ) :

1. The set f −1 (C ) is closed in X 1 for every closed subset C ⊂ X 2 . (In fact, this is

the case if and only if the map f is continuous.)

2. If C is a connected subset of X 1 , then f (C ) is a connected subset of X 2 .

3. If C is a compact subset of X 1 , then f (C ) is a compact subset of X 2 .

4. Let f : X → be a continuous function, with X a compact space. Then there

exist points a, b ∈ X such that f ( a ) ≤ f ( x ) ≤ f (b) for every x ∈ X . Any real-

valued function defined on a compact space—particularly a closed, bounded


n
subset of —achieves an absolute maximum at some point in X and an absolute

minimum at some point in X.

A map f : X 1 → X 2 is said to be an open map if the image of every open set in X 1 is

open in X 2 . (Note the difference between this and the definition of continuity.) If

f : X 1 → X 2 is bijective and both f and f −1 are continuous (meaning that f is a

continuous, open map), then f is called a homeomorphism. This is analogous to the

64
concept of an isomorphism between groups or rings. That is, if an isomorphism exists

between two algebraic structures, that structure is preserved, and they are algebraically

identical. Homeomorphic topological spaces are topologically identical, since their

topological structure is preserved. For example, a homeomorphism ( g f ) ( x) exists

between the open interval ( 0,1) and , since

( 0,1) ⎯⎯⎯⎯⎯ → ( − π2 , π2 ) ⎯⎯⎯⎯


f ( x ) =π ( x − )
1
g ( y ) = tan y
2
→ .

Let f : X 1 → X 2 be a bijective, continuous map of topological spaces. If X 1 is

compact and X 2 is Hausdorff, then f is a homeomorphism.

Real Analysis

Consider a set of real numbers X ⊂ . If X is bounded above, there is a smallest number

u such that u ≥ x for every x ∈ X . This number is the least upper bound of X, also

known as the supremum or sup X . Similarly, if X is bounded below, there is a greatest

number l such that l ≤ x for every x ∈ X . This number is called the greatest lower

bound, also known as the infimum of X or inf X . The supremum and infimum are

unique for any bounded set of real numbers.

A Cauchy sequence is a sequence of numbers, ( xn )n =1 , such that for any ε > 0 ,


no matter how small, there exists a number N such that, for m, n > N , xn − xm < ε . This

means that the successive elements of a Cauchy sequence grow arbitrarily close to each

other. Every Cauchy sequence of real numbers converges. Any metric space in which

every Cauchy sequence is guaranteed to converge to a point in the space is called a

complete space. By this definition, is complete.

65
Lebesgue Let denote the set ∪ {∞} . Let A be any subset of , and find a countable,
Measure

open covering of A by intervals of the form ( ai , bi ) . We have A ⊆ ∪ ( ai , bi ) . We define
i =1

a function µ ∗ ( A) :℘( ) → by the equation:

⎧ ∞ ∞

µ ∗ ( A) = inf ⎨∑ ( bi − ai ) , for A ⊆ ∪ ( ai , bi ) ⎬ ,
⎩ i =1 i =1 ⎭

where we permit µ ∗ ( A) = ∞ . We now restrict this function to a subfamily, M ⊂℘( ) ,

where M is defined as follows: a subset M ⊂ is a member of M if and only if

µ ∗ ( A) = µ ∗ ( A ∩ M ) + µ ∗ ( AC ∩ M ) for every A ⊂℘( ) . The sets M are called

Lebesgue measurable sets, and the restriction of µ ∗ to M is denoted by µ . This

function is called the Lebesgue measure. A set M for which µ ( M ) = 0 is said to be of

measure zero.

Ever open and closed set in is Lebesgue measurable, and every finite or

countably infinite subset of is also measurable. The complement of a measurable set

is measurable, and a finite or countably infinite union or intersection of measurable sets is

measurable. Not all of ℘( ) is measurable, but almost all subsets of that arise in

practice are measurable. Some properties of the measure and measurable sets follow:

1. The empty set is measurable and has measure zero. If M = {m} is a one-element

(or singleton) set, then µ ( M ) = 0 . It follows from this that if M is a finite or

countably infinite subset of , then µ ( M ) = 0 . For instance, µ ( ) = µ ( ) = 0 .

2. If M = ( a, b ) , [ a, b ) , ( a, b ] , or [ a, b] , then the measure is just the length of the

interval, µ ( M ) = b − a . If M is a finite union of disjoint intervals, then µ ( M ) is

66
the sum of the lengths of the intervals. If M is a measurable set that contains an

infinite interval, then µ ( M ) = ∞ . In general, if {M i } is any countable collection

⎛∞ ⎞ ∞
of disjoint, measurable sets, then µ ⎜ ∪ M i ⎟ = ∑ µ ( M i ) .
⎝ i =1 ⎠ i =1

3. If M 1 , M 2 ∈ M and M 1 ⊆ M 2 , then µ ( M 1 ) ≤ µ ( M 2 ) .

A function f : → is said to be Lebesgue measurable if, for every open set

O⊂ , the inverse image, f −1 (O ) , is a Lebesgue measurable set. Since every open

subset of is measurable, it follows from our earlier definition of continuity that every

continuous function is Lebesgue measurable. Still, there do exist noncontinuous

functions that are measurable. The sum, difference, and product of measurable functions

is measurable. Also, if f is measurable, then so is f ( x) = f ( x) .

For any A ⊂ , we can define a function on , the characteristic function of A:

⎧0 if x ∉ A
χ A ( x) = ⎨ .
⎩1 if x ∈ A

This function is measurable if and only if A is a measurable set. Using this function, we

can construct another function, the step function, which is a finite linear combination of

characteristic functions with real coefficients. Such a function typically takes the form

n
s ( x) = ∑ ai χ Ai ( x) . Step functions provide the foundation for constructing the Lebesgue
i =1

integral of a general function.

Let f be a measurable function such that f ( x ) ≥ 0 for every x ∈ . Then there

exists a sequence of step functions, ( s1 , s2 ,…) , such that 0 ≤ s1 ≤ s2 ≤ … and for which

lim n →∞ sn ( x) = f ( x) .

67
n
Let s = ∑ ai χ Ai be a step function such that every set Ai is measurable. Then s is
i =1

a measurable function, and we say that s is Lebesgue integrable if ai ≠ 0 ⇒ µ ( Ai ) < ∞ .

In this case, the Lebesgue integral is:

∫ s d µ = ∑ ai µ ( Ai ) .
i =1

Let f be a measurable function such that f ≥ 0 . If every step function s, with

s ≥ 0 and s ≤ f , is integrable and ∫ s dµ is finite, then we say that f is Lebesgue

integrable. The value of the Lebesgue integral of f is:

∫ f d µ = sup {∫ s d µ} ,
where the supremum is taken over all integrable, nonnegative step functions, s, such that

s≤ f . For functions that are not everywhere nonnegative, we consider that every

function can be written as the difference of two nonnegative functions, f = f + − f − ,

where f + = 1
2 (f + f ) = max ( f , 0 ) , the positive part, and f − = 1
2 (f − f ) = max ( − f , 0 ) ,

the negative part. This means that an arbitrary measurable function f is Lebesgue

integrable if both its positive and negative parts are Lebesgue integrable, and

∫ f dµ = ∫ f dµ + ∫ f − dµ .
+

Lebesgue integration takes a different approach from Riemann integration. Whereas a

Riemann integral splits the domain of f into partitions and calculates areas under a curve

based on rectangles of height f, becoming exact as the rectangles’ widths grow

infinitesimal, the Lebesgue integral splits the range of f into partitions and approximates f

by the step function and approaches a limit.

68
Complex Analysis

A complex number z = x + iy can be written in polar form as z = r ( cos θ + i sin θ ) , where

r = z = zz = x 2 + y 2 and θ , called the argument of z, is equal to arctan xy ,

corresponding to the angle made with the positive real axis. The principal value of θ ,

written Arg z , falls in the range −π < θ ≤ π .

Two complex numbers can be easily multiplied and divided in polar form:

z1 z2 = r1r2 ( cos(θ1 + θ 2 ) + i sin(θ1 + θ 2 ) )


z1 r1 .
= ( cos(θ1 − θ 2 ) + i sin(θ1 − θ 2 ) )
z2 r2

The Taylor series expansion for the polar form of complex numbers provides us with the

interesting fact that complex numbers can also be expressed exponentially:

z = reiθ = r ( cos θ + i sin θ ) ,

in an expression known as Euler’s formula. Following from this is de Moivre’s

formula, which says:

( cos θ + i sin θ )
n
= cos nθ + i sin nθ .

Since the number 1 can be expressed in exponential form as e (


i 2π k )
for any integer k, we

can express the nth roots of unity by:

⎧ i ( 2π k n ) ⎛ 2π k ⎞ ⎛ 2π k ⎞ ⎫
⎨e = cos ⎜ ⎟ + i sin ⎜ ⎟ : k = 0,1,… , n − 1⎬ .
⎩ ⎝ n ⎠ ⎝ n ⎠ ⎭

The nth roots of any complex number can be found by multiplying the principal nth root

by the nth roots of unity.

69
The logarithm of a complex number can be defined thanks to our ability to

express such a number exponentially. But we must recall that adding any integer

multiple of 2π to the argument of z leaves the complex number essentially unchanged.

Therefore, any nonzero complex number has infinitely many logarithms, but we define

the principle value by Log z = ln z + i Arg z . Using the concept of the complex

logarithm, we can define complex powers, z w , where z , w ∈ . We have z w = e w Log z .

Inspired by Euler’s formula, we can develop formulas for calculating

trigonometric functions of complex numbers. In particular, we have:

eiz + e− iz eiz − e− iz
cos z = and sin z = .
2 2i

The rest of the functions are defined by applying their familiar relations to the definitions

for the sine and cosine above. The trigonometric identities defined for real numbers are

valid for complex numbers as well. One major difference between the complex sine and

cosine and their real counterparts is that, while cos x ≤ 1 and sin x ≤ 1 for x ∈ , the

complex versions are unbounded, and the norms sin z and cos z can take on any

nonnegative real value.

The hyperbolic functions are defined for real numbers by:

e x + e− x e x − e− x
cosh x = and sinh x = .
2 2

They share some properties with their non-hyperbolic namesakes, such as cosh 0 = 1 and

sinh 0 = 0 , cosh ( − x ) = cosh x and sinh ( − x ) = − sinh x . Then there is the identity

cosh 2 x − sinh 2 x = 1 , which is similar to the Pythagorean trig identity for the sine and

cosine.

70
From our definitions, we see that

cos(iz ) = cosh z
cosh(iz ) = cos z
sin(iz ) = i sinh z
sinh(iz ) = i sin z.

With these formulas, we can develop equations that allow us to evaluate cos z and sin z ,

where z = x + iy , in terms of real x and y. Using the angle sum formulas and the relations

directly above, we get:

cos z = cos( x + iy ) = cos x cosh y − i sin x sinh y


.
sin z = sin( x + iy ) = sin x cosh y + i cos x sinh y

These are used to obtain the following:

2
cos z = cos z cos z = cos 2 x + sinh 2 y
2
.
sin z = sin zsin z = sin 2 x + sinh 2 y

Derivatives Functions of a complex variable can be differentiated like their real counterparts.
of
Complex- In fact, a table of derivatives for real-valued functions will apply for complex-valued
Valued
Functions functions. First, it is important to remark that f ( z ) = f ( x + iy ) = u ( x, y ) + iv ( x, y ) for

some real-valued functions u and v. We say that ℜ( f ) = u ( x, y ) , denoting the real part,

and ℑ( f ) = v( x, y ) , denoting the imaginary part. Returning to the variable z, we notice

that there is a slight difference in how the definition of the derivative behaves, which

gives rise to some interesting results. The difference quotient

f ( z + h) − f ( z )
f ′( z ) = lim
h→0 h

looks the same for the complex variable z, but it is important to note here that h is also a

complex number with real and imaginary parts. Since the field is not ordered, h can

71
take infinitely many routes to zero. Applying the definition twice, with h approaching the

origin from the real axis in one case and the imaginary axis in another, we see that

∂u ∂v ∂v ∂u
+ i = f ′( z ) = − i
∂x ∂x ∂y ∂x

if f is differentiable. Equating the real and imaginary parts of the expression above, we

get the Cauchy-Riemann equations:

∂u ∂v ∂v ∂u
= and =− ,
∂x ∂y ∂x ∂y

which must be satisfied for f to be differentiable.

If f ( z ) = f ( x + iy ) = u ( x, y ) + iv ( x, y ) is differentiable throughout some open set

O, then the Cauchy-Riemann equations hold throughout that set. Differentiating the C-R

equations again, we get:

∂ 2u ∂ ⎛ ∂u ⎞ ∂ ⎛ ∂v ⎞ ∂ 2 v ∂ 2u ∂ ⎛ ∂u ⎞ ∂ ⎛ ∂v ⎞ ∂ 2v
= ⎜ ⎟= ⎜ ⎟= and = ⎜ ⎟ = ⎜ − ⎟ = − ,
∂x 2 ∂x ⎝ ∂x ⎠ ∂x ⎝ ∂y ⎠ ∂x∂y ∂y 2 ∂y ⎝ ∂y ⎠ ∂y ⎝ ∂x ⎠ ∂y∂x

which tells us that

∂ 2u ∂ 2u
+ = 0.
∂x 2 ∂y 2

This is Laplace’s equation. A function satisfying this equation in some open set O is

said to be harmonic in O. Using the same method as above, we also see that

∂ 2v ∂ 2v
+ =0.
∂x 2 ∂y 2

Written another way, we have

∇ 2u = ∇ 2 v = 0 ,

72
where ∇ 2 = ∇ ⋅∇ is the scalar Laplacian operator. If f ( z ) is differentiable throughout

some open set O, then ℜ( f ) and ℑ( f ) are harmonic in O.

If f ( z ) is differentiable at z0 and at every point throughout some open set in the

complex plane containing z0 , then f ( z ) is said to be analytic at the point z0 . If f is

differentiable throughout the open set O, then f is said to be analytic in O. If f is analytic

everywhere in the complex plane, then f is said to be an entire function. If f is analytic,

then ℜ( f ) and ℑ( f ) , its component functions, must have continuous partial derivatives

of all orders. Also, if f ( z ) is analytic at z0 , then so is f ′( z ) . This implies that the

derivatives of all orders must also be analytic. This is a strong result without a

counterpart for real-valued functions.

We can calculate complex line integrals over parameterized curves in the

complex plane. If the curve C is given by z (t ) = x(t ) + iy (t ) for a ≤ t ≤ b , where

a, b ∈ , we have:

b b
∫ f ( z ) dz = ∫ f ( z (t ) ) ⋅ z ′(t ) dt = ∫ F ′ ( z (t ) ) dt = F ( z (b) ) − F ( z (a) ) ,
C a a

where d
dz F ( z ) = f ( z ) . This expression suggests two methods for attacking the line

integral, one of which may be easier than the other in a given situation. Using the first

method, one calculates f ( x + iy ) before substituting in the parameterized values of x and

y. Then the product with z ′(t ) is calculated, and the integral is carried out with respect to

the parameter t over the limits of integration, a and b. The second method, which appears

far easier, employs the antiderivative of f ( z ) , telling us that

b z (b )
∫ f ( z (t ) ) ⋅ z′(t ) dt = ∫
a z(a)
f ( z ) dz .

73
We note that this method changes the limits of integration from real numbers to complex

numbers, as one would expect after changing the variable the integral is performed with

respect to from a real variable to a complex variable.

The following are some important theorems regarding analytic functions:

1. Cauchy’s theorem: If f ( z ) is analytic throughout a simply connected, open set

D, then for every closed path C in D, we have ∫ C


f ( z ) dz = 0 . If f ( z ) is an

entire function, then ∫ C


f ( z ) dz = 0 for every closed path in .

2. Morera’s theorem: If f ( z ) is continuous throughout an open, connected set

O⊂ , and ∫ C
f ( z ) dz = 0 for every closed curve C in O, then f ( z ) is analytic

in O.

3. Cauchy’s integral formulas: If f ( z ) is analytic at all points within and on a

1 f ( z)
simple, closed path C surrounding the point z0 , then f ( z0 ) =
2π i ∫ C z − z0
dz .

We also know that the nth derivative is analytic at z0 for all n ∈ , and we have

n! f ( z)
f ( n ) ( z0 ) =
2π i ∫ (z − z )
C n +1
dz . This tells us that if we know the value of f ( z ) at
0

every point on a curve, then we also know its value—and the value of all its

derivatives—at any interior point.

4. Cauchy’s derivative estimates: Let f ( z ) be analytic on and within the circle of

radius r centered at z0 , z − z0 = r . If M is the maximum value of f ( z ) on the

n!M
circle, then f ( n ) ( z0 ) ≤ .
rn

74
5. Liouville’s theorem: If f ( z ) is an entire function that is bounded, then f ( z )

must be a constant function.

6. The maximum principle: Let O be an open, connected subset of the complex

plane, and let f ( z ) be a function that is analytic in O. If there is a point z0 ∈ O

such that f ( z ) ≤ f ( z0 ) for every z ∈ O , then f ( z ) is a constant function. This

says that if f ( z ) is analytic and not constant, then f ( z ) attains no maximum

value in O. If O is bounded, then f ( z ) must achieve a maximum value in

cl(O ) ; since it can’t do so inside, it must achieve a maximum on the boundary of

O.
Taylor Taylor series can be formed for complex-valued functions in the same way that they
Series for
Complex-
Valued can for real-valued functions. The only difference is that the idea of the interval of
Functions
convergence is replaced by the disk of convergence, with z − z0 < R representing the

interior of an open disk of radius R centered at z0 .

∑a (z − z )
n
The power series n 0 converges absolutely for all z that satisfy
n =0

z − z0 < R and diverges for all z such that z − z0 > R , where an = f ( n ) ( z0 ) n ! and

R = 1 lim n an . If lim n an = ∞ , then the series converges only for z = z0 . If


n →∞ n →∞

lim n an = 0 , then the series converges for all z. Every function that is analytic in an
n →∞

open, connected subset O of the complex plane can be expanded in a power series whose

disk of convergence lies within O.

75
Let z0 be a point in the complex plane, and let R be a positive number. The set of

all z such that 0 < z − z0 < R is called the punctured open disk of radius R centered at z0 .

If a function f ( z ) is not analytic at a point z0 but is analytic at some point in every

punctured disk centered at z0 , then z0 is said to be a singularity of f ( z ) . If f ( z ) is

analytic at every point in some punctured open disk centered at z0 , then we call z0 an

isolated singularity.

Assuming now that f ( z ) has an isolated singularity at z0 , if there is a positive

integer n such that

g ( z)
f ( z) = ,
( z − z0 )
n

with g ( z ) analytic in a nonpunctured disk centered at z0 and g ( z0 ) ≠ 0 , then the

singularity z0 is called a pole of order n. If we can’t write f ( z ) in this form, then z0 is

said to be an essential singularity. Functions with singularities may require a

generalization of the Taylor series in order to be expanded to regions in which their

singularities occur.

An annulus is a ring formed by two concentric circles. An annulus centered at

z0 is represented by the double inequality R1 < z − z0 < R2 , where R1 is allowed to be

Laurent zero and R2 is allowed to be ∞ . If f ( z ) is analytic in some annulus centered at z0 , then


Series

f ( z ) can be expanded in a Laurent series, which takes the form:

∞ ∞

∑a (z − z ) + ∑ an ( z − z 0 ) .
−n n
−n 0
n =1 n=0

76
The sum on the right, referred to as the analytic part, is simply the Taylor series for the

function. The sum on the left, referred to as the singular (or principal) part, uses

Laurent coefficients, which are defined by:

1 f ( z)
a− n =
2π i ∫ (z − z )
C − n +1
dz ,
0

where C is a simple, closed curve contained in the annulus. (This definition aside, in

practice, the Laurent coefficients are typically derived algebraically from the Taylor

series.)

If the singular part of the Laurent series contains at least one term, but only

finitely many terms, then z0 is a pole. In particular, if the singular part has a− n = 0 for

all n greater than some integer k, where a− k ≠ 0 , then z0 is a pole of order k. If, however,

the singular part contains infinitely many terms, then z0 is an essential singularity.

Examining the definition of the Laurent coefficients, we see that

1
a −1 =
2π i ∫ C
f ( z ) dz , which suggests that we can easily calculate the integral if we have

the value of that coefficient. That coefficient is called the residue of f ( z ) at the

singularity z0 , written Res ( z0 , f ) , and we see that ∫ f ( z ) dz = 2π i Res ( z0 , f ) . For a


C

pole of order k, we have the formula:

1 d k −1
Res ( z0 , f ) = ⋅ lim k −1 ⎡( z − z0 ) f ( z ) ⎤ .
k

( k + 1)! z → z0 dz ⎣ ⎦

If the curve C surrounds more than one singularity of f ( z ) , z1 , z2 ,… , zn , we have:

∫ f ( z ) dz = 2π i ⋅ ∑ Res ( zm , f ) .
C
m =1

77
78
Gallery of Graphs

79
80
81

You might also like