P. 1
Mathematics Review

# Mathematics Review

4.5

|Views: 413|Likes:
A draft of a mathematics review I wrote based on material drawn from several sources. It is at the level covered by the GRE subject test in mathematics, addressing material ranging from conic sections to multivariable calculus, vector analysis, differential equations, abstract algebra, basic topology, complex analysis, and a little bit on Lebesgue integrals. It's for anyone who wants a compact refresher. I knocked it out fairly quickly, so if there are any errors that have crept in, please let me know.
A draft of a mathematics review I wrote based on material drawn from several sources. It is at the level covered by the GRE subject test in mathematics, addressing material ranging from conic sections to multivariable calculus, vector analysis, differential equations, abstract algebra, basic topology, complex analysis, and a little bit on Lebesgue integrals. It's for anyone who wants a compact refresher. I knocked it out fairly quickly, so if there are any errors that have crept in, please let me know.

See more
See less

10/14/2011

pdf

text

original

# Mathematics Review

Compiled and Edited by Romann M. Weber
*
Precalculus
Analytic Geometry
Parabolas, each defined as the locus of points equidistant from a point F (the focus) and
a line D (the directrix), have equations of the form
2
1
4
y x
p
= ± or
2
1
4
x y
p
= ± , where
the focus is located, respectively, at the point (0 or (with the sign matching
that of the equation above) and the directrix has the equation or ,
respectively. More generally, a parabola centered at has the equation
, ) p ± ( , 0 p ± )
p y = ∓ x p = ∓
( , ) h k
( )
2 1
4
y k x h
p
− = ± − or ( )
2 1
4
x h y
p
− = ± − k .
A circle of radius r centered at ( , has the equation ) h k ( ) ( )
2 2
2
x h y k − + − = r . It
can be parameterized by cos x r t = , . sin y r t =
An ellipse is defined as the locus of points such that the sum of the distances from
each point on the graph to two given fixed points (the foci) is a given constant. Ellipses
have a major and a minor axis and follow the equation
( ) ( )
2 2
2 2
1
x h y k
a b
− −
+ =

, where the
axes have length a and b on either side of the center, ( , , terminating at the vertices.
In the case of a , the major axis is parallel to the x-axis, and the foci are located at the
) h k
b >

*
The overall plan of coverage in this document was based on Cracking the GRE Math Subject Test, 3
rd

Edition by Steven A. Leduc, many of whose results have been incorporated here. Certain items from that
reference have been corrected in this document, and new results from other sources have been included.
1
points ( , where ) , h c k ±
2
c a b = −
2
)
. For the case , the major axis is parallel to the
y-axis, and the foci are located at the points ( , where
a b <
, h k c ±
2
c b a = −
2
. The
eccentricity, which measures the “flatness” of the ellipse, is 0
c
e
a
≤ = ≤1. An ellipse can
be parameterized by cos x a t = , . sin y b t =
A hyperbola is defined as the locus of points in the plane such that the difference
between the distances from every point on the hyperbola to two fixed points (the foci) is a
given constant. Depending on its orientation, it has either the equation
( ) ( )
2 2
2 2
1
x h y k
a b
− −
− = (opening horizontally), with the foci located at , or ( ) , h c k ±
( ) ( )
2 2
2 2
1
y k x h
b a
− −
− = ) (opening vertically), with the foci located at . In either
case,
( , h k c ±
2
c a b = +
2
, and asymptotes are located at ( )
b
y k x h
a
− = ± − .
Polynomials
The rational roots theorem states that for a polynomial with each
, if there are any roots , then they are of the form
0
( )
n
i
n
i
p x a x
=
=
∑ i
i
a ∈Z r ∈Q
s
r
t
= , where and
and . Complex and radical roots always come in conjugate pairs.
, s t ∈Z
0
| s a |
n
t a
For any polynomial
(
0 1
( )
n n
i
n i n
i j
)
j
p x a x a x r
= =
= = −
∑ ∏
, the sum and product of the
roots are given by
1
1
n
n
j
j
n
a
r
a

=
= −

and ( )
0
1
1
n
n
j
j n
a
r
a
=
= −

.
2
Logarithms
The basic identities of logarithms are the following:

1 1
log log
n n
b i b
i i
i
x x
= =
⎛ ⎞
=
⎜ ⎟
⎝ ⎠
∑ ∏


1
1 2
2
log log log
b b
x
b
x x
x
⎛ ⎞
= −
⎜ ⎟
⎝ ⎠

 log log
a
b b
x a x =

log
b
x
b x =
 ( )( ) log log log
b a
a x =
b
x
θ
θ
Trigonometry
The sine function is odd, meaning that , and the cosine function is even,
meaning that . Any trigonometric function is equal to its co-function
operating on its complement. In other words:
( ) sin sin θ − = −
( ) cos cos θ − =
sin cos
2
x x
π ⎛ ⎞
= −
⎜ ⎟
⎝ ⎠
, tan cot
2
x x
π ⎛ ⎞
= −
⎜ ⎟
⎝ ⎠
, sec csc
2
x x
π ⎛
= −

⎝ ⎠

, and vice versa.
θ sinθ cosθ tanθ
0 0 1 0
6 π
1
2

1
2
3
1
3
3
4 π
1
2
2
1
2
2
1
3 π
1
2
3
1
2

3
2 π
1 0 ∞
2 3 π
1
2
3
1
2

3 −
3 4 π
1
2
2
1
2
2 −
-1
5 6 π
1
2

1
2
3
1
3
3 −
π 0 -1 0

3
Other values can be computed using the following identities:
sin
tan
cos
x
x
x
=
cos
cot
sin
x
x
x
=
1
csc
sin
x
x
=
1
sec
cos
x
x
=
2 2 2 2 2 2
sin cos sec tan csc cot 1 x x x x x x + = − = − =
( ) sin sin cos cos sin sin 2 2sin cos x y x y x y x x ± = ± ⇒ = x
( )
2 2 2 2
cos cos cos sin sin cos 2 cos sin 1 2sin 2cos 1 x y x y x y x x x x x ± = ⇒ = − = − = ∓ −
( )
2
tan tan 2tan
tan tan 2
1 tan tan 1 tan
x y x
x y x
x y x
±
± = ⇒ =
− ∓

1 cos
sin
2 2
x x −
= ±
1 cos
cos
2 2
x x +
= ±
sin
tan
2 1 cos
x x
x
=
+

Hyperbolic Functions
sinh
2
x x
e e
x

=
cosh
2
x x
e e
x

+
=

4
Differential Calculus
Sequences
A sequence, formally defined as a function on the set of positive integers, is an infinite,
ordered list of terms. A sequence ( )
n
x approaches a limit L if for every , there’s an
integer N such that
0 ε >
n
x L ε − < when . This is equivalent to challenging someone
to think of a positive number, no matter how small, and coming up with a member of the
sequence whose distance from the limit is less than that number. If such an L exists, the
sequence is said to converge. If no such L exists, the sequence is said to diverge.
n N >
A sequence is monotonic if it is either strictly increasing or decreasing for every n
from some point on.
• Every convergent sequence is bounded, but the converse is not necessarily true
(consider , which is bounded above and below but diverges). ( ) 1
n
n
x = −
• If a sequence is monotonic and bounded, it is convergent.
• If and k is a constant, . lim
n
n
a A
→∞
= lim
n
n
ka kA
→∞
=
• If and , then: lim
n
n
a A
→∞
= lim
n
n
b B
→∞
=
o ( )
n n
a b A B + → +
o ( )
n n
a b A B − → −
o ( )
n n
a b AB →
o
n
n
a A
b B
⎛ ⎞

⎜ ⎟
⎝ ⎠
, assuming 0 B ≠
5
• If k is a positive constant, then
1
0
k
n
⎛ ⎞

⎜ ⎟
⎝ ⎠
.
• If 1 k > , then
1
0
n
k
⎛ ⎞

⎜ ⎟
⎝ ⎠
.
• Assume . If is a sequence such that for
every , then . This is known as the sandwich theorem (or the
squeeze theorem).
lim lim
n n
n n
a c
→∞ →∞
= = L
n
( )
n
b
n n
a b c ≤ ≤
n N > lim
n
n
b L
→∞
=
• If , then if ( )
n
a f n = ( )
n
a L → lim ( )
x
f x L
→∞
= .
Functions
A function is injective (or one-to-one) if for any : f A B → , x y A ∈ ,
. Equivalently, ( ) ( ) f x f y x y = ⇒ = ( ) ( ) x y f x f y ≠ ⇒ ≠ . In other words, f maps
distinct objects to distinct objects. (Graphically, an injective function will be intersected
only once by a horizontal line.) A function is surjective (or onto) if for any
there exists an such that . This means that the range is “spoken
for” by at least one element in the domain. A function that is both injective and
surjective is bijective.
: f A B →
b B ∈ a A ∈ ( ) f a b =
Let f be a function defined on a subset of the real line. We seek to examine the
behavior of f near a point a, which may not be included in its domain. If ( )
n
x is a
sequence converging to a, all of whose terms are in the domain of f, and the
sequence ( ) ( )
n
f x L → as ( )
n
x a → , we call L the limit of f as x approaches a. The limit
of a function exists only if it is the same whether approached from above or below. A
6
limit is also defined in the following way:
lim ( ) 0, : 0 ( )
x a
f x L x a f x L ε δ δ

= ⇒∀ > ∃ < − < ⇒ − < ε .
• If
1
lim ( )
x a
f x L

= and , then:
2
lim ( )
x a
g x L

=
o [ ]
1 2
lim ( ) ( )
x a
f x g x L L

+ = +
o [ ]
1 2
lim ( ) ( )
x a
f x g x L L

− = −
o [ ]
1 2
lim ( ) ( )
x a
f x g x L L

=
o
1
2
( )
lim
( )
x a
L f x
g x L

⎡ ⎤
=
⎢ ⎥
⎣ ⎦
(assuming )
2
0 L ≠
• Assume that lim ( )
x a
f x L

= and . If there is a positive number such
that for all x satisfying
lim ( )
x a
h x L

= δ
( ) ( ) ( ) f x g x h x ≤ ≤ 0 x a δ < − < , then .
This is another version of the sandwich (or squeeze) theorem.
lim ( )
x a
g x L

=
Continuity
A function f is continuous at a if lim ( ) ( )
x a
f x f a

= . A function is continuous on an
interval if it is continuous at every point in that interval.
• The Extreme Value Theorem: If f is a function continuous on a closed interval
[ ]
, a b , then f obtains an absolute minimum value, m, at some point
[ ]
, c a b ∈ and
an absolute maximum value, M, at some point
[ ]
, d a b ∈ . That is, there exist
points c and d such that for all ( ) ( ) ( ) f c f x f d ≤ ≤ [ ]
, x a b ∈ .
7
• Bolzano’s Theorem: If f is a function continuous on the closed interval
[ ]
, a b
such that and have opposite signs, then there’s a point c between a
and b such that .
( ) f a ( ) f b
( ) 0 f c =
• The Intermediate Value Theorem: Let f be a function continuous on the closed
interval
[ ]
, a b . Let m be the absolute minimum value and M be the absolute
maximum value of f on
[ ]
, a b . For every number Y such that , there is
at least one value
m Y M ≤ ≤
[ ]
, c a b ∈ such that . ( ) f c Y =
Derivatives
Rules of Differentiation
• The derivative of a sum is the sum of the derivatives: ( ) ( ) ( ) ( ) f g x f x g x

′ ′ + = +
• The derivative of a constant times a function: ( ) ( ) ( ) kf x kf x

′ =
• The product rule: ( ) ( ) ( ) ( ) ( ) ( ) fg x f x g x f x g x

′ ′ = +
• The quotient rule: ( )
[ ]
2
( ) ( ) ( ) ( )
( )
f g x f x f x g x
x
g
g x

′ ′ ⎛ ⎞ −
=
⎜ ⎟
⎝ ⎠

• The chain rule (for composite functions): ( ) ( ) ( ) ( ) ( ) f u x f u x u x

′ ′ =
• The inverse-function rule: If
1
f

is the inverse of f, and f has a nonzero
derivative at
0
x , then
1
f

has a derivative at equal to ( )
0
y f x =
0
( ) ( )
( )
1
0
0
1
f y
f x
− ′
=

8
Common Derivatives
( ) 0 d k =
( )
1 k k
d u ku du

=
( )
u u
d e e du =
( ) ( ) ln
u u
d a a a du =
( )
1
ln d u d
u
= u
( )
1
log
ln
a
d u
u a
= du
u
u
u
u
u
u

( ) sin cos d u u d =
( ) cos sin d u u d = −
( )
2
tan sec d u u d =
( )
2
cot csc d u u d = −
( ) sec sec tan d u u u d =
( ) csc csc cot d u u u d = −
( )
2
1
arcsin
1
d u
u
=

du
( )
2
1
arccos
1
d u
u

=

du
( )
2
arctan
1
du
d u
u
=
+

If
1
i
n
i
i
f k u
α
=
=

, then
1
n
i
i
i
i
u
f f
u
α
=
⎡ ⎤ ′
′ = ⋅
⎢ ⎥
⎣ ⎦

.
Implicit Differentiation
For a function , we can define ( , ) f x y c =
x
y
f f x
y
f f y
∂ ∂
′ = − = −
∂ ∂
when
x
f and exist and
.
y
f
0
y
f ≠
9
Theorems Concerning Differentiable Functions
Rolle’s Theorem
Assume that f is a continuous function on the closed interval [ ] , a b , with
and f differentiable everywhere in ( . Then there is at least one point such
that .
( ) ( ) f a f b =
) ) , a b ( , c a b ∈
( ) 0 f c ′ =
Mean Value Theorem (for Derivatives)
Assume that f is a continuous function on the closed interval [ ] , a b and differentiable
everywhere in . Then there is at least one point such that
. This theorem states that there is at least one point between a
and b at which the slope of the tangent line is equal to the slope of the secant line through
and .
( , a b) )
)
( , c a b ∈
( ) ( ) ( ) ( b a f c f b f a ′ − = −
( ) , ( ) a f a ( ) , ( ) b f b
Maxima and Minima
( ) f c ′ ( ) f c ′′
( )
( )
n
f c *
( ) f c
0 > 0 N/A Local minimum
0 < 0 N/A Local maximum
( )
( ) 0
n
f c > , with n even
Local minimum 0 0
( )
( ) 0
n
f c < , with n even
Local maximum
0 0 Nonzero, with n odd ( ) f c is not a local maximum or minimum
*Here n is the smallest integer such that the n
th
derivative is nonzero.
10
For a function defined on a closed interval, an absolute extremum will occur either at a
point where the first derivative vanishes, where the derivative is not defined, or at an
endpoint of the interval.
Integral Calculus
Common Integrals
k du ku C = +

1
1
1
(if 1)
ln (if 1)
k
k k
u C k
u du
u C k
+
+
⎧ + ≠

=

+ =

u u
e du e =

1
ln
u u
a
a du a C = +

sin cos u du u C = − +

cos sin u du u C = +

2
sec tan u du u C = +

2
csc cot u du u C = − +

sec tan sec u u du u C = +

csc cot csc u u du u C = − +

2
arcsin
1
du
u C
u
= +

2
arctan
1
du
u C
u
= +
+

11
Integration by Parts
u dv uv v du = −
∫ ∫

The object when applying this method is carefully selecting the portion of the integrand
that can be easily integrated on the right-hand side when deciding which part is u and
which part is dv.
Trigonometric Substitution
For integrands of a certain form containing square roots, it can be useful to make
substitutions that take advantage of trigonometric identities.
Make the substitution . . . For an integrand containing . . .
u du
2 2
a u −
sin a θ cos a d θ θ
2 2
a u +
tan a θ
2
sec a d θ θ
2 2
u a −
sec a θ sec tan a d θ θ θ

Partial Fractions
If an integrand is a rational function of the form ( ) ( ) P x Q x , with , we can
evaluate the integral by splitting the integrand into more manageable units, a process
called partial-fraction decomposition. The denominator is factored, and each factor
becomes a denominator of a new rational function on the right-hand side. Any
denominator term of the form ( will be represented on the right by n fractions in
ascending powers from 1 to n. The numerators on the right side are made up of single
place-holding coefficients (typically A, B, C, etc.), which are to be solved for. In the case
deg deg P < Q
)
n
ax b +
12
that an irreducible quadratic occurs in the factored polynomial, the place-holding
coefficient in the numerator over that term would be replaced by a polynomial of the
form . Multiplying everything out will work when solving for the coefficients,
but this could be quite a time-consuming series of steps. Since the resulting
decomposition will have to be true for all values of x in the domain, it is useful to
multiply by one term at a time and make substitutions that get the terms to cancel out
(namely, values of x that make terms equal zero—roots of the denominator polynomial).
Bx C +
Theorems Concerning Integrable Functions
Differentiating Under the Integral Sign
( )
( )
( ) ( ( )) ( ) ( ( )) ( )
b x
a x
d
f t dt f b x b x f a x a x
dx
′ ′ = −

Mean Value Theorem (for Integrals)
If f is a function continuous on the interval
[ ]
, a b , then there is at least one point
[ ]
, c a b ∈ such that ( ) ( )( )
b
a
f x dx f c b a = −

. Here we see that is the average value
of the function.
( ) f c
Polar Coordinates
( , ) ( , )
T
x y r θ ⎯⎯→
( )
( )
2 2
( , ) , arctan
y
x
T x y x y = + Polar representations are not unique, so this
transformation is not universally true.
( , ) ( , )
T
r x θ ⎯⎯→
¯
y
θ

( , ) ( cos , sin ) T r r r θ θ =
¯

13
Area in polar coordinates:
2
1
2
A r d
β
α
θ =

.
Area under (or within) a curve described parametrically:
If ( ( , ) ( ), ( )) x y x t y t = and the curve is traced out clockwise, beginning and ending at the
same point (i.e., ( ) ( ) x a x b = and ), then ( ) ( ) y a y b = ( ) ( ) ( ) ( )
b b
a a
A y t x t dt x t y t dt ′ ′ = = −
∫ ∫
.
The sign is reversed if the curve is traced counterclockwise.
Volumes of Solids of Revolution
If a function f is rotated around a line, a solid is generated. Clearly, as any point on the
line is rotated about an axis, a disk is traced out. The value of f from point to point
therefore defines the length of the radius of that disk at any point. We can calculate the
volume of the solid formed using the following formulas:
For a function f revolved about the x-axis: . [ ]
2
( )
b b
a a
V dV f x d π = =
∫ ∫
x
For a function g revolved about the y-axis: . [ ]
2
( )
d d
c c
V dV g y d π = =
∫ ∫
y
In the case of a solid being formed with a gap between the defining function and the axis
(in which each disk becomes a washer), that gap being defined by a second function, we
have:
[ ] [ ]
{ }
2 2
( ) ( )
b
a
V f x g x π = −

dx .
Arc Length
For a function , the length of any portion of the curve can be calculated as
follows, the decision being whether to integrate along the x-axis (where x runs from a to
b) or the y-axis (where y runs from c to d):
( ) y f x =
( ) ( )
2 2
1 1
b d
dy
dx
dx dy
a c
s ds dx d = = + = +
∫ ∫ ∫
y .
14
Series
An arithmetic series is a sequence of numbers taking the form
, with each term differing from the one before by
the addition of a constant d. The sum of a finite arithmetic series can be computed easily
by the following formula:
( )
1 1 1 1
, , 2 , , ( 1)
n
a a a d a d a n = + + + − … d
( ) ( ) (
1 1
2 1
2 2
n n
n n
S a a a n = + = + −
)
d
1
, where it should be clear
that the average of the first and last terms of the sequence is being multiplied by the
number of members of the sequence.
A geometric series is a sequence of numbers taking the form
, with each term differing from the one before by the
multiplication of a constant ratio r. The sum of a finite geometric series can be computed
by the following formula:
( )
2 3
1 1 1 1 1
, , , , ,
n
n
a a a r a r a r a r

= …
1
1
1
1 1
n
n
n
a ra r
S a
r r
− −
= =
− −
, where . 1 r ≠
An infinite series converges if the sequence ( of its partial sums converges to
a finite limit S. An infinite geometric series converges if and only if
)
n
s
1 r < , in which case
0
1
1
n
n
r
r

=
=

. For an infinite series to converge, the terms must tend to zero as n tends to
infinity. (The converse is not true, as illustrated by the harmonic series
1
1
n
n

=
= ∞

.)
 If converges, then converges to
n
a
∑ n
ka
∑ n
k a

for every constant k. If
n
a

diverges, then so does for every constant .
n
ka

0 k ≠
 If and both converge, then
n
a
∑ n
b

(
n n
a b + )

converges to
n n
a b +
∑ ∑
.
15
 The so-called p-series
1
p
n

converges for every and diverges for every . 1 p > 1 p ≤
 The comparison test. Assume that for all . Then if 0
n
a b ≤ ≤
n
n N >
n
b

converges, converges; if
n
a
∑ n
a

diverges, then
n
b

diverges.
 The ratio test. Given a series
n
a

of nonnegative terms, form the limit
1
lim
n
n
n
a
L
a
+
→∞
= . If 1 L < , then converges; if
n
a

1 L > , then
n
a

diverges. The test is
inconclusive if 1 L = .
 The root test. Given a series
n
a

of nonnegative terms, form the limit lim
n
n
n
a L
→∞
= .
If 1 L < , then converges; if
n
a

1 L > , then
n
a

diverges. The test is inconclusive
if 1 L = .
 The integral test. If is a positive, monotonically decreasing function for
such that
( ) f x 1 x ≥
( )
n
f n a = for every positive integer n, then
1
n
n
a

=

converges if and only if
1
( ) f x dx

converges.
 The alternating series test. A series of the form (with for all n)
will converge provided that the terms decrease monotonically, with a limit of zero.
Note that this is only necessarily true for alternating series.
( )
1
1
1
n
n
n
a

+
=

0
n
a ≥
n
a
 A series converges absolutely if
n
a
∑ n
a

converges. Every absolutely
convergent series is convergent. (If
n
a

converges but
n
a

does not, is
said to converge conditionally.)
n
a

16
A power series in x is an infinite series whose terms are , where each is a
constant. (Such a series could technically be considered an infinite-degree polynomial.)
The convergence tests mentioned above are useful for the analysis of power series, which
are only useful if they converge. By the ratio test, the series will converge absolutely if:
n
n
a x
n
a

1
1 1
lim lim 1
n
n n
n
n n
n n
a x a
x
a x a
+
+ +
→∞ →∞
= ⋅ < .
We set
1
lim
n
n
L a
+
→∞
=
n
a . If L = 0, then the power series converges for all x; if L = ∞ ,
then the power series converges only for ; and if L is positive and finite, then the
power series converges absolutely for
0 x =
1
x
L
< and diverges for
1
x
L
> . The set of all
values of x for which the series converges is called the interval of convergence. Every
power series in x falls into one of three categories:
1. The series converges for all ( , ) x R R ∈ − and diverges for all x R > , where R is
called the radius of convergence and
1
R
L
= , where L is defined as above.
(Whether the series converges at the endpoints of the interval must be checked on
a case-by-case basis.)
2. The power series converges absolutely for all x; the interval of convergence is
, and the radius of convergence is ∞. ( , −∞ ∞)
3. The series converges only for , so the radius of convergence is 0. 0 x =
Functions are frequently defined by power series. If
0
( )
n
n
n
f x a

=
=

x , the domain of f
must be a subset of the interval of convergence for the series.

17
Within the interval of convergence of the power series:
1. The function f is continuous, differentiable, and integrable.
2. The power series can be differentiated term by term:

( )
1
0 0 1
( ) ( )
n n
n n
n n n
d
n
n
f x a x f x a x na x
dx
∞ ∞ ∞

= = =
′ = ⇒ = =
∑ ∑ ∑
.
3. The power series can be integrated term by term:

( )
1
0 0
0 0
( ) ( )
1
x x
n n n
n n
n n n
a
0
n
f x a x f t dt a t dt x
n
∞ ∞ ∞
+
= = =
= ⇒ = =
+
∑ ∑ ∑
∫ ∫
.
If a function f can be represented by a power series, that series is the Taylor series:

( )
0
(0)
( )
!
n
n
n
f
f x x
n

=
=

.
Function Partial Taylor Expansion Form Interval of Convergence
1
1 x −

2 3
1 x x x + + + +
0
n
n
x

=

( ) 1,1 −
1
1 x +

2 3
1 x x x − + − +
0
( 1)
n n
n
x

=

( ) 1,1 −
ln(1 ) x +
2 3
2 3
x x
x − + −
1
1
( 1)
n
n
n
x
n
+ ∞
=

( ] 1,1 −
x
e
2 3
1
2! 3!
x x
x + + + +
0
1
!
n
n
x
n

=

( ) , −∞ ∞
sin x
3 5
3! 5!
x x
x − + −
2 1
0
( 1)
(2 1)!
n
n
n
x
n

+
=

+

( ) , −∞ ∞
cos x
2 4
1
2! 4!
x x
− + −
2
0
( 1)
(2 )!
n
n
n
x
n

=

( ) , −∞ ∞

18
A Taylor series can be truncated to form a Taylor polynomial for use in approximating
the value of a function. A Taylor polynomial of degree n,
( )
0
(0)
( )
!
k n
k
n
k
f
P x x
k
=
=

, will
have an error, called the remainder, exactly equal to
( 1)
1
( )
( ) ( )
( 1)!
n
n
n
f c
f x P x x
n
+
+
− =
+
for
some c between 0 and x.
The Taylor series mentioned above is a special case of a more general series in
powers of ( ) x a − , which has Taylor coefficients
( )
( )
!
n
n
f a
a
n
= . The remainder of a
Taylor polynomial built on such a series would be equal to
( 1)
1
( )
( ) ( ) ( )
( 1)!
n
n
n
f c
f x P x x a
n
+
+
− = −
+
for some c between a and x.
Multivariable Calculus and Vector Analysis
In three dimensions, it is often convenient to think in terms of vectors. Vectors are often
represented by an ordered triple or addition of vector components. Whereas an ordered
triple could be regarded as a point in , the vector it represents corresponds to the line
connecting this point to the origin. Component notation, which would represent the
vector connecting the origin to the point (
3
R
) , , x y z as
ˆ ˆ ˆ
x y z + + i j k , makes use of
orthogonal unit vectors. Vectors are added and subtracted component-wise.
The norm of a vector is its magnitude, or length. For a vector
ˆ ˆ ˆ
a a a
x y z = + + A i j k ,
2 2 2
a a a
A x y z = = + + A .
There are two ways to multiply vectors. One, the dot (or scalar) product, results
in a scalar. For two vectors,
ˆ ˆ ˆ
a a a
x y z = + + A i j k and
ˆ ˆ ˆ
b b b
x y z = + + B i j k , the dot product
19
is equal to , where is the angle between the two
vectors. Since the dot product can be computed component-wise, it is not necessary to
know the angle between the vectors, and the dot product can be useful when the angle is
sought. For any two vectors perpendicular to each other, the dot product is zero. The dot
product is commutative: .
cos
a b a b a b
AB x x y y z z θ ⋅ = = + + A B θ
⋅ = ⋅ A B B A
The projection of B onto A, denoted , which can be thought of as B’s
shadow cast on A by light shining perpendicular to A, is:
proj
A
B
( )
ˆ
proj

= =

A A
A B
proj B B A A
A A
.
The cross (or vector) product is the other way to multiply vectors. The result is
a vector perpendicular to each of the vectors being multiplied, following the right-hand
rule. For two vectors
ˆ ˆ ˆ
a a a
x y z = + + A i j k and
ˆ ˆ ˆ
b b b
x y z = + + B i j k the cross product is
equal to:

ˆ ˆ ˆ
a a a
b b b
x y z
x y z
× =
i j k
A B ,
and sin AB θ × = A B . This magnitude is equal to the area of the parallelogram spanned
by A and B. The cross product is anticommutative: . ( ) × = − × A B B A
The triple scalar product, the absolute value of which is equal to the volume of a
parallelepiped formed by three vectors A, B, and C, is equal to:
[ ] ( ) , ,
a a a
b b b
c c c
x y z
x y z
x y z
= × ⋅ = A B C A B C .
20
Lines
Just as a slope and a point define a line in , a point and a vector can define a line in
. Specifically, if we have a point, , on a line L and a vector,
, parallel to the line, we can say that any point is on the line if
and only if the vector connecting that point to ,
2
R
3
R (
0 0 0 0
, , P x y z = )
) )
0
(
1 2 3
, , v v v = v ( , , P x y z =
0
P ( )
0 0 0
, , x x y y z z = − − − P P , is parallel
to v, which is to say that ( ) (
0 0 0 1 2
, , , , )
3
x x y y z z t t v v v − − − = = v for some scalar t (note
above the order of the points when subtracting). This is equivalent to three parametric
equations:

0 1
0 2
0 3
:
x x tv
L y y tv
z z tv
= + ⎧

= +

= +

.
These can each be solved for t and equated to yield the symmetric equations of L:

0 0
1 2
o
3
x x y y z z
v v v
− − −
= = .
In the case that any component of v is equal to zero, the term containing that component
is eliminated from the equation.
Planes
Planes in are uniquely defined by a point, , and a vector, n, normal to
the plane. Since any line in the plane must be perpendicular to the normal vector, we
have a clue as to the form that the equation of a plane must take. Specifically, the point
3
R (
0 0 0 0
, , P x y z = )
21
( , , P x y z = )
3
0
)
)
is on the plane if and only if ,
which can be rewritten as , where .
0 0 1 0 2 0
( ) ( ) ( ) x x n y y n z z n ⋅ = − + − + − = P P n
1 2 3
n x n y n z d + + =
1 0 2 0 3 0
d n x n y n z = + +
Cylinders
We can take a curve and extend it into as a cylinder (which appears to
maintain this name even if C is not a closed curve). We do this by connecting to each
point in C a line parallel to some given line v, referred to as the generator. In the case
that this line is perpendicular to the plane of the curve, this extension is automatic and
leaves the original equation unchanged. (Since the equation is in , it is missing the
variable, say z, that would make it an equation in . That missing variable can then
take on all values once the curve is considered in , and the curve is converted into a
right cylinder.)
2
C ⊂ R
3
R
2
R
3
R
3
R
In the case that the line is not perpendicular to the plane of the
curve, we recall the parametric and symmetric equations of the line from above in order
to generate our equation for the cylinder. Without loss of generality, we consider a curve
in the x-y plane, . We say that a point on this curve in is ( , where
we note that no relationship for z is described by the equation of the curve. Any line
from a point on that curve will have to obey the following relationship:
(
1 2 3
, , v v v = v
( ) y f x =
3
R
0 0
, , 0 x y

0 1
0 2
3
x x v t
y y v t
z v t
− =
− =
=
.
22
We can solve these equations for t and equate them, getting:

0 0
1 2
x x y y z
v v
− −
= =
3
v
.
These equations can be solved for the “naught values,” one at a time, getting:

2
0
3
1
0
3
v
x x z
v
v
y y
v
= −
= − z
.
Recall that these are the points on the curve, so the equation of the cylinder in
becomes:
3
R

2 1
3 3
v v
y z f x z
v v
⎛ ⎞
− = −
⎜ ⎟
⎝ ⎠
.
Note the pattern in the naught values above. Note also that was an arbitrary
choice. We could just as easily have had our function be or considered a
curve in the y-z plane or the x-z plane.
( ) y f x =
( , ) f x y c =
Surfaces of Revolution
A curve in can be rotated about an axis to form a surface in . Each point on the
original curve will trace out a circle around the axis about which it’s rotating. The
distance to that axis can be found by the distance formula.
2
R
3
R
This is the basis for the following substitutions, which are used to obtain
equations for the surface:
23

Curve Revolved
around
Replace by Obtaining
( , ) 0 f x y = x-axis y 2 2
y z ± +
( )
2 2
, 0 f x y z ± + =
y-axis x 2 2
x z ± +
( )
2 2
, 0 f x z y ± + =
( , ) 0 f x z = x-axis z 2 2
y z ± +
( )
2 2
, 0 f x y z ± + =
z-axis x 2 2
x y ± +
( )
2 2
, 0 f x y z ± + =
( , ) 0 f y z = y-axis z 2 2
x z ± +
( )
2 2
, 0 f y x z ± + =
z-axis y 2 2
x y ± +
( )
2 2
, 0 f x y z ± + =

Cylindrical and Spherical Coordinates
Cylindrical coordinates are an extension of polar coordinates to by adding the
familiar z coordinate. The Cartesian coordinates
3
R
( , , ) x y z are transformed to
( )
( )
2 2
, , , arctan ,
y
x
r z x y z θ = + , where we repeat the warning about the non-uniqueness
of the angle . θ
Spherical coordinates are closer in spirit to polar coordinates than even
cylindrical coordinates. They take the form ( , , ) ρ φ θ , where ρ is the length of the radius
vector from the origin, φ is the angle the radius vector makes with the positive z-axis,
and is the angle between the positive x-axis and the projection (or shadow, as it may be
helpful to think) of the radius vector on the x-y plane, r.
θ
24
The following relations hold:

2 2 2 2
cos sin cos
sin sin sin
cos
x y z
x r
y r
z
ρ
θ ρ φ θ
θ ρ φ θ
ρ φ
= + +
= =
= =
=

Tangent Plane to a Surface
For a function describing a surface, the equation of the plane tangent to that
surface at a point is:
( , ) z f x y =
( )
0 0 0
, , P x y z =

( ) ( )
0 0
P
P
z z
z z x x y y
x y
∂ ∂
− = ⋅ − + ⋅ −
∂ ∂
0
.
In the case that the surface is described implicitly by , we find the above
partial derivatives by implicit differentiation. (There is another option as well, which
we’ll discuss in the section on directional derivatives.) Equations for tangent planes can
be used for linear approximations, analogous to tangent lines.
( , , ) f x y z c =
Chain Rule for Partial Derivatives
To use the chain rule for derivatives of multivariate functions, care must be taken to
account for every “path” to the variable of interest from the others. For example,
consider the function , where and . Let’s say that
we want to calculate the partial derivative of z with respect to x. We notice that there are
three paths to x, one directly from F and one each from u and v. We then have:
( , , ) z F x u v = ( , ) u f x y = ( , ) v g x y =

, , , y u v x v y x u
z z z u z v
y
x x u x v
∂ ∂ ∂ ∂ ∂ ∂ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
= + +
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
∂ ∂ ∂ ∂ ∂ ∂
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
x
.
25
The subscripts are a bit of bookkeeping, reminding us which variables are treated as
constants in the partial differentiation steps.
The main differential operator on is del, denoted ∇, which is equal to:
3
R

ˆ ˆ ˆ
x y z
∂ ∂ ∂
∇ = + +
∂ ∂ ∂
i j k .
A scalar function f can be converted into a vector called the gradient, and the del
operator can also operate on vectors through the divergence and curl, all of which will
be discussed later.
Consider a surface and a point in the domain of f. If we
want to find the rate of change of the surface f, we must first give thought to the direction
we’re heading. (Think about standing on a rocky hill. Its rate of change is not uniform; it
will be different depending on whether we head forward or back, left or right.) The
directional derivative takes into account both the point we’re interested in and the
direction we’re heading, which we indicate with the unit vector :
( , ) z f x y = (
0 0
, P x y = )
ˆ u
ˆ
P P
D f f = ∇ ⋅
u
u .
Examining this equation and noting the employment of the dot product, we know that
ˆ cos
P P
D f f θ = ∇
u
u , where is the angle between the unit vector and the gradient.
Clearly, the norm of any unit vector is one, so this reduces to
θ
cos
P P
D f f θ = ∇
u
,
which tells us that the directional derivative attains its maximum value when ,
which is when . This tells us that the gradient, , points in the direction in which
f increases most rapidly, and its magnitude gives the maximum rate of increase.
cos 1 θ =
0 θ = f ∇
26
Similarly, points in the direction in which f decreases most rapidly. Examining a
level surface reveals that the gradient will equal zero at any point on that
level surface. This tells us that for a function , the vector
f −∇
( , , ) f x y z c =
( , , ) f x y z
P
f ∇ is perpendicular
to the level surface of f that contains P. Similarly, for a function , the vector ( , ) f x y
P
f ∇ is perpendicular to the level curve of f that contains P. This fact gives us another
option for writing the equation of the tangent plane at : ( )
0 0 0
, , P x y z =

0 0
( ) ( ) ( )
x y z
P P
P
f x x f y y f z z ⋅ − + ⋅ − + ⋅ − =
0
0 .
For completeness, we’ll briefly mention the divergence and curl of a vector field.
The divergence, a scalar quantity, can be interpreted as the net flow of “density” out of a
unit of space. It is defined as follows:

3 1 2
div
F F F
x y z
∂ ∂ ∂
= ∇⋅ = + +
∂ ∂ ∂
F F .
The curl, a vector quantity, can be interpreted as the rotation of a vector field. It is equal
to the maximum circulation at each point and is oriented perpendicularly to the plane of
circulation, following the right-hand rule. It is defined as follows:

1 2
ˆ ˆ ˆ
curl
3
x y z
F F F
∂ ∂ ∂
= ∇× =
∂ ∂ ∂
i j k
F F .
Maxima and Minima
Just as with single-variable functions, multivariable functions attain critical points when
their derivatives equal zero. Critical points are also not necessarily maxima or minima in
multivariable functions. Saddle points, which are critical points that are neither maxima
27
nor minima, can exist for a function at a point if
has a maximum at
( , ) z f x y = ( )
0 0 0
, P x y =
0
( , ) z f x y =
0
x x = while has a minimum at (or vice versa).
We can evaluate the possibilities by computing the Hessian, the determinant of the
Hessian matrix:
0
( , ) z f x y =
0
y y =
( ) ( ) ( )
0
2
0 0 0
det
xx xy
xx yy xy
yx yy
P
f f
f P f P f P
f f
⎡ ⎤
⎡ ⎤ ∆ = = −
⎢ ⎥
⎣ ⎦
⎣ ⎦
,
where we make use of the fact that for continuous functions, .
xy yx
f f =
• If and , then f attains a local maximum at . 0 ∆ > ( )
0
0
xx
f P <
0
P
• If and , then f attains a local minimum at . 0 ∆ > ( )
0
0
xx
f P >
0
P
• If , then f has a saddle point at . 0 ∆ <
0
P
• If , then no conclusion can be drawn. 0 ∆ =
When solving maxima and minima problems subject to a constraint, the constraint
function is solved for one of the variables and substituted into the main function,
differentiated and equated to zero, and then solved as usual for critical points. These
points, along with the endpoints of any intervals given, must be tested to find the
extrema.
The Lagrange Multiplier
In some cases, the constraint equation will not be easy to solve for one of the variables.
In such a case (and in others), it can help to employ the method of the Lagrange
multiplier, which takes advantage of several interesting facts. Let’s say we’re looking to
maximize subject to a constraint . If M is an extreme value of f,
attained at the point , then the level curve and the curve
( , ) f x y ( , ) g x y c =
(
0 0
, P x y = ) ( , ) f x y M =
28
( , ) g x y c = share the same tangent line at P. Because they share the tangent line, they
also share the normal line at P. Since is normal to the curve and is
normal to the curve , the gradients must be parallel, which is to say that one is
a scalar multiple of the other. That is, for some scalar
f ∇ ( , ) f x y M = g ∇
( , ) g x y c =
f λ ∇ = ∇g λ , called the Lagrange
multiplier, whose value is unimportant in itself, although we will solve for it. This
assumes, of course, that is not the zero vector. g ∇
The gradients are computed, and the resulting components on the left and right
side are equated and treated as simultaneous equations that are solved for λ to find the
necessary relationship between the independent variables. (The parameter λ disappears
at this step.) The new relationship is substituted into the original equation to be
maximized and solved for the remaining variable. Critical values are then obtained and
tested.
Line Integrals
The standard definite integral computes the area between a curve and a straight-line
segment of one of the coordinate axes. But integration can follow more complicated
paths, giving rise to the line (or path or contour) integral.

Line Integrals with Respect to Arc Length
This is a close relative of the standard Riemann integral, in that the value of a function at
a point is multiplied by the length of the curve containing that point. Imagine, for
instance, a function and a curve C, which is defined parametrically. If we break
the curve into n tiny pieces, we have
( , ) f x y
( )
1
lim ,
n
i i i
C n
i
f x y s f ds
→∞
=
∆ =

. Geometrically, this is
29
equivalent to calculating the area of a curtain with C as its base and as its height
for each point
( , ) f x y
( , ) x y along C. The integral is evaluated by first parameterizing C:
.
( )
for
( )
x x t
C a
y y t
= ⎧
= ≤

=

t b ≤
The curve C is considered to be directed, meaning that it has a definite initial and final
point. (This detail is similar to the idea that ( ) ( )
b a
a b
f x dx f x dx = −
∫ ∫
.) Since for a
differential treatment of the curve, we have , we can write: ( ) ( ) ( )
2 2
ds dx dy = +
2

[ ] [ ]
2 2
2 2
( ) ( )
ds dx dy
x t y t
dt dt dt
⎛ ⎞ ⎛ ⎞
′ ′ = ± + = ± +
⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
,
where the sign we choose depends on the direction we’re taking along our path. We then
have ( ) ( ), ( )
b
C a
ds
f ds f x t y t dt
dt
=
∫ ∫
.
The treatment in is identical, as
3
R ( ) ( ), ( ), ( )
b
C a
ds
f ds f x t y t z t dt
dt
=
∫ ∫
, where

[ ] [ ] [ ]
2 2 2
2 2
( ) ( ) ( )
ds dx dy dz 2
x t y t z t
dt dt dt dt
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
′ ′ ′ = ± + + = ± + +
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
.
In , we can interpret the line integral using the example of C as a curved wire with
linear density . The line integral of f over C would give the total mass of the
wire.
3
R
( , , ) f x y z
The Line Integral of a Vector Field
A parametric curve given by the vector equation is considered
smooth if the derivative is continuous and nonzero. A curve is considered
piecewise smooth if it is composed of finitely many smooth curves joined at consecutive
( ( ) ( ), ( ) t x t y t = = r r )
( ) t ′ r
30
endpoints. A vector field is a vector-valued function. Let D be a region of the x-y plane
on which a pair of continuous functions, ( , ) M x y and , are both defined. Then
the function
( , ) N x y
(
ˆ ˆ
( , ) ( , ) ( , ) ( , ), ( , )) x y M x y N x y M x y N x y = + = F i j is a continuous vector
field on D. For a curve C parameterized by for ,
we define the line integral of the vector field F as:
( )
ˆ ˆ
( ) ( ) ( ) ( ), ( ) t x t y t x t y t = + = r i j a t b ≤ ≤

( )
( ) ( )
( ) ( )
( ) ( )
( ), ( ) ( ), ( )
( ), ( ) ( ) ( ), ( ) ( ) .
b
C a
b
a
b
a
d t t dt
x t y t x t y t dt
M x t y t x t N x t y t y t dt
′ ⋅ = ⋅
′ ′ = ⋅
′ ′ ⎡ ⎤ = +
⎣ ⎦
∫ ∫

F r F r r
F
The situation in is defined analogously. If we consider F to be a force field, the line
integral can be interpreted as the work done on a particle to move it along the path C.
3
R
In the case that F is equal to the gradient of a scalar field (i.e., a real-valued
function), then it is a gradient field. A function f such that is called a potential
for F. The fundamental theorem of calculus for line integrals says that the line
integral of a gradient field depends only on the endpoints of the path and not the path
itself. For any piecewise smooth curve C oriented from A to B and a continuously
differentiable function f defined on C, we have
f = ∇ F
( ) ( )
C
f d f B f A ∇ ⋅ = −

r . For
(
ˆ ˆ
( , ) ( , ) ( , ) ( , ), ( , )) x y M x y N x y M x y N x y = + = F i j to be a gradient field, it is necessary
(but not sufficient) that M y N ∂ ∂ = ∂ ∂x . This simply states that , which will be
the case for continuous functions. If the domain of F, which we’ll call R, is a simply
connected region—meaning that the interior of every simple closed curve drawn in R is
contained in R, then the above mixed-partials equality condition is sufficient.
xy yx
f f =
31
It should be clear that for a gradient field F, 0
C
d ⋅ =

F r

for any closed path C
(where the circle on the integral symbol indicates a closed path).
Double Integrals
When evaluating double integrals over a region R, it is helpful to examine the region to
select the limits of integration that provide for the easiest calculation of each iterated
integral. For example, when evaluating the integral ( , )
R
f x y dx dy
∫∫
, an imaginary
horizontal line is placed over the region, and expressions for x in terms of y are derived.
An integral with respect to x will then be evaluated, and the y terms will be substituted in
as the limits of integration. An imaginary vertical line will determine the y limits, which
should be numerical (the outside integral should not contain variables). An integral with
respect to y will be evaluated between these limits, and a numerical result will be
obtained. The situation is exactly reversed if the differentials are switched in the integral.
In short:

( )
( )
( )
( )
( , ) ( , )
( , )
y d x h y
y c x g y
R
x b y H x
x a y G x
f x y dA f x y dx dy
f x y dy dx
= =
= =
= =
= =
=
=
∫∫ ∫ ∫
∫ ∫

for some functions h, g, H, and G that describe the region R in terms of the variables
needed.
Double integrals are often evaluated in polar coordinates. The procedure is the
same as with Cartesian double integrals, except for one thing: The element of area
. The reason for this becomes clear when one considers that these small
elements are “curvilinear rectangles,” the curved side of which will have arc length .
dA r dr dθ =
r dθ
32
When evaluating an integral that corresponds to the volume of a region bounded above
by some function and bounded below by another function, the function providing the
upper bound is integrated, while the lower-bound function is treated as the region R.
Green’s Theorem
Green’s theorem is a powerful tool for dealing with integrals, since it relates double
integrals to line integrals, one of which might be easier to evaluate in a given situation.
First we define a simple closed curve, which is a curve that does not intersect itself.
Circles, ellipses, squares, and rectangles are examples of simple closed curves. A curve
is oriented, with the positive direction defined as the direction that, if walked, would put
the interior of the region bounded by the curve on your left.
Consider a simple closed curve C enclosing a region R, such that C is the
boundary of R. If ( , ) M x y and are functions that are defined and have
continuous partial derivatives on C and throughout R, Green’s theorem states:
( , ) N x y
.
C
R
N M
M dx N dy dA
x y
⎛ ⎞ ∂ ∂
+ = −
⎜ ⎟
∂ ∂
⎝ ⎠
∫ ∫∫

A clever selection of ( , ) M x y y = − and gives us the interesting result ( , ) N x y x =
1
2
C
R
dA x dy y dx = −
∫∫ ∫
by Green’s theorem. (Evaluating the integral with either M or N
set to zero while leaving the other unchanged will give the area of the region R, hence the
halving of the result above. That is,
R
C C
x dy y dx A =− =
∫ ∫
.)
33
Differential Equations
Separable Equations
A differential equation of the form
( )
( )
dy f x
dx g y
= is called separable, since its variables can
be separated out. Its solution is straightforward: ( ) ( ) g y dy f x dx =
∫ ∫
.
Homogeneous Equations
A function of two variables, , is said to be homogeneous of degree n if there is a
constant n such that
( , ) f x y
( , ) ( , )
n
f tx ty t f x y = for all t, x, and y for which both sides are
defined. A differential equation is homogeneous if M and N
are both homogeneous functions of the same degree. Homogeneous equations are
soluble by introducing the substitution
( , ) ( , ) 0 M x y dx N x y dy + =
y vx = , where v is considered as a function of x.
This substitution makes the differential equation separable after algebraic manipulation.
Exact Equations
Given a function , its total differential is defined as: ( , ) f x y

f f
df dx dy
x y
∂ ∂
= +
∂ ∂
.
This tells us that the family of curves satisfies the differential equation
. If there exists a function such that
( , ) f x y c =
0 df = ( , ) f x y ( , ) f x M x y ∂ ∂ = and
( , ) f y N x y ∂ ∂ = , then is called an exact differential, and it ( , ) ( , ) 0 M x y dx N x y dy + =
34
has the general solution . An equation is exact if ( , ) f x y c = M y N ∂ ∂ = ∂ ∂x , which we
recognize as the now-familiar equating of the mixed partials, .
xy yx
f f =
Nonexact Equations and Integrating Factors
A differential equation that is not exact can be made exact by being multiplied through by
an integrating factor, µ , which is a function in one of the variables of the differential
equation. Any nonexact differential equation that has a solution also has an integrating
factor, even if that factor is not always particularly easy to find. There are, however,
some cases for which the process of finding it is straightforward.
Given a nonexact differential equation , for which
, we have the following two cases:
( , ) ( , ) 0 M x y dx N x y dy + =
0
y x
M N − ≠
1. If
( )
y x
M N N − is a function of x alone, call this function ( ) x ξ . Then
( ) exp ( ) x x dx µ ξ =

is an integrating factor.
2. If
( )
y x
M N M − − is a function of y alone, call this function ( ) y ψ . Then
is an integrating factor. ( ) exp ( ) y µ ψ =

y dy
First-Order Linear Equations
A first-order linear equation is defined as a differential equation of the form:
( ) ( )
dy
P x y Q x
dx
+ = .
Equations of this type are soluble through the use of an integrating factor,
( ) exp ( ) x P x dx µ =

. Multiplying through yields ( ) ( ) ( ) ( ) ( )
dy
x x P x y x Q x
dx
µ µ µ + = . The
35
left-hand side becomes ( ) y µ

, and the equation becomes ( )
d
y
dx
Q µ µ = . Integrating
both sides gives the general solution:
( )
1
y Q µ
µ
=

dx
p
i
.
Higher-Order Linear Equations with Constant Coefficients
The general second-order linear differential equation with constant coefficients is
. ( ) ay by cy d x ′′ ′ + + =
In the case that , the equation is said to be homogeneous (the definition here,
which is more common, differs from that of the other type of homogeneous equation).
The general solution of any nonhomogeneous equation is , where is the
general solution of the corresponding homogeneous equation, and is any particular
solution of the nonhomogeneous equation.
( ) 0 d x =
h
y y y = +
h
y
p
y
The homogeneous equation is solved by converting it into its corresponding
auxiliary polynomial, in which the n
th
derivative of y becomes the n
th
power of m:
.
2
0 0 ay by cy am bm c ′′ ′ + + = → + + =
This equation is solved for its roots, which form the basis of the solution of the
differential equation according to the following three cases:
1. The roots are real and distinct: The general solution is .
1 2
1 2
m x m x
y c e c e = +
2. The roots are real and identical: The general solution is .
1 1
1 2
m x m x
y c e c xe = +
3. The roots are complex conjugates, m : The general solution is α β = ±
( )
1 2
cos sin
x
y e c x c x
α
β β = + .
36
Linear Algebra
Matrices
An matrix is an array of numbers corresponding to a group of m row vectors and n
column vectors. For a matrix A, the entry is in row i and column j. Matrices of the
same size can be added and subtracted element by element. That is, for matrices A and B,
m n ×
ij
a
( )
ij ij
ij
A B a b ± = ± . Clearly, matrix addition and subtraction are commutative. Matrices
can be multiplied by scalars, yielding a new matrix in which each element has been
multiplied by that scalar.
Two matrices can be multiplied together only if the number of columns of the first
matrix equals the number of rows of the second. An matrix multiplied by an
matrix yields an matrix. Matrix multiplication is defined such that
m n × n p ×
m p ×
( ) ( ) ( )
i j
ij
AB A = ⋅ r c B , where denotes the i ( )
i
A r
th
row vector of the matrix A and ( )
j
B c
denotes the j
th
column vector of the matrix B.
The identity matrix I has entries defined by the Kronecker delta:
.
1 if
0 if
ij
i j
i j
δ
= ⎧
=

An inverse of a matrix A is another matrix such that the product . A
matrix that has an inverse is called invertible. There are several methods for finding an
inverse matrix, which we’ll discuss shortly.
1
A
− 1 1
AA A A I
− −
= =
An augmented matrix is a coefficient matrix with the column vector of the
constants from the right-hand side of the simultaneous equations appended on the right.
An augmented matrix is solved by putting it in echelon form, in which the matrix is
37
upper triangular with any zero rows at the bottom, with the first nonzero entry in any row
appearing to the right of any nonzero entry in the row above. There are three elementary
row operations that are used to reduce a matrix to echelon form:
1. Multiplying a row by a nonzero constant.
2. Interchanging two rows.
3. Adding a multiple of one row to another row.
Variables are solved for by working from the bottom up of the reduced matrix, a
procedure called back-substitution. In order to obtain a unique solution for
simultaneous equations in n unknowns, n equations are required. (It is also required that
they be linearly independent, a concept we’ll visit shortly.) In the case that infinitely
many solutions to a system are possible, it may be the case that one or more of the
variables are free, in which case they can be represented by parameters. The other
variables can be solved for in terms of the parameters.
Any system of linear equations can be written as , where A is the system’s
coefficient matrix, x is the column vector of the unknowns, and b is the column vector of
the constant terms to the right of the equal sign. In the case that A is invertible, the
system can be solved using the relation
A = x b
1 1 1
A A A A
− −
= ⇒ = x b x

b.
Vector Spaces
A vector space V is a set that is closed under the two operations addition and scalar
multiplication. This means that for any , and for any
scalar k. It should be evident from these conditions that the zero vector is required to be
in any vector space.
1 2
, V ∈ x x
1 2
V + ∈ x x
1 2
, k k V ∈ x x
38
If A is an matrix, the set of all vectors x such that is called the
nullspace of A, denoted . The nullspace is a subspace of . It should be
apparent that an invertible matrix will only have the trivial subspace, the zero vector, as
its nullspace.
m n × A = x 0
( ) N A
n
R
Consider a collection of n-vectors, . Any expression of the form
, where represent scalars, is called a linear combination of the vectors .
The set of all possible linear combinations of the vectors is called their span. For
example, since every vector in can be expressed as a linear combination of
ˆ
,
1 2
, , ,
m
v v v …
1
m
i i
i
k
=

v
i
k
i
v
i
v
3
R i
ˆ
j , and
, their span is . The span of any collection of n-vectors is a subspace of . The
vectors are called linearly independent if for all i. This means
that no combination of some vectors cancels other vectors out, or that no vector can be
written as a linear combination of other vectors. If this is not the case, the vectors are
said to be linearly dependent. A minimal spanning set of vectors is called a basis for
the vector space they represent. The number of vectors in the basis is the dimension of
the vector space.
ˆ
k
3
R
n
R
i
v
1
0
m
i i i
i
k k
=
= ⇒ =

v 0
Let { }
1 2
, , ,
n
a a a … be a set of n-vectors in , and let A be an matrix such
that
n
R n n ×
{ }
1 2
, , ,
n
a a a … form the columns of A. The vector set { }
1 2
, , ,
n
a a a … is linearly
independent if and only if , which is equivalent to saying that A is nonsingular. det 0 A ≠
In an m matrix, the columns can be regarded as m-vectors, and the rows can
be regarded as n-vectors. The maximum number of linearly independent columns is
called the column rank, and the maximum number of linearly independent rows is called
n ×
39
the row rank. In any matrix, the column rank is equal to the row rank; this is simply the
rank of the matrix. The subspace of spanned by the columns is called the column
space, , and the subspace of spanned by the rows is called the row space,
m
R
( ) CS A
n
R
( ) RS A . A vector b is in if and only if there is a collection of scalars such that
, where are the column vectors. This requires that have a solution
for . The row space of A is best considered as .
( ) CS A
i
k
1
n
i i
i
k
=
=

c b
i
c A = k b
(
1 2
, ,
T
n
k k k = k … ) ( )
T
CS A
Determinants
The determinant is only defined for square matrices. For an matrix A, we first
define the minor,
n n ×
ij
M , as the determinant of the matrix that results from
eliminating row i and column j from matrix A. The cofactor of any matrix entry is
. The determinant of any square matrix A can be computed by the
Laplace expansion:
( 1) ( 1 n n − × − )
M
ij
a
( ) ( ) cof 1
i j
ij ij
a
+
= −

( ) ( )
1 1
det cof cof
n n
ij ij ij ij
j i
A a a a a
= =
= =
∑ ∑
.
This tells us that determinants can be evaluated along any row or column. This fact
makes it clear that any matrix with a column or row made up of only zeros will have a
zero determinant.
The adjugate matrix of A is the transpose of the cofactor matrix of A. That is,
. This matrix is important in calculating the inverse of a matrix, since
( )
T
ij
A a

=

1
1
det
A A
A

= .
40
Cramer’s rule allows a square linear system to be solved exclusively by
determinants. Let
A = x b
j
A denote the matrix formed by replacing column j of A with the
column vector b. Then Cramer’s rule says:

j
j
A
x
A
= .
The Wronskian can be helpful when evaluating functions for linear independence
in a vector space. For functions [ ]
1 2
, , , ( )
n
f f f … x , we define the Wronskian as the
determinant
[ ]
1 2
1 2
1 2
( 1) ( 1) ( 1)
1 2
( ) ( ) ( )
( ) ( ) ( )
, , , ( )
( ) ( ) ( )
n
n
n
n n n
n
f x f x f x
f x f x f x
W f f f x
f x f x f x
− − −
′ ′ ′
=

.

.
If the Wronskian is nonzero, the functions are linearly independent. A zero Wronskian,
however, does not guarantee linear dependence.
Linear Transformations
For vector spaces V and W, a linear transformation (or linear map) is a function
such that and for any vectors ,
, and in V and any scalar k. If W , then T is called a linear operator. There is
a connection to matrices in that, for an m matrix A, defining a function
by gives a linear transformation. Conversely, if is a linear
transformation, there exists an matrix A such that for all x in . In the
case that and are considered with their standard bases, , in which there is a
: T V W →
1 2 1 2
( ) ( ) ( T T T + = + x x x x ) x
Ax
m
x
( ) ( ) T k kT = x x
1
x
2
x V =
n × :
n m
T → R R
( ) T = x :
n
T → R R
m n × ( ) T A = x
n
R
n
R
m
R ˆ
i
= e B
41
1 in position i and 0 elsewhere, then column i in A is simply the image of . In this case,
A is called the standard matrix for T.
ˆ
i
e
Let : be a linear transformation. The set of all vectors x in that
get mapped to 0 is called the kernel of T:
n
T → R R
m n
R
{ } ker : ( ) T T = x x 0 = . It is a subspace of the
domain space, and its dimension is called the nullity of T. The set of all images of T is
called the range of T:
{ }
( ) ( ) :
n
R T T = ∈ x x R . This is a subspace of , and its
dimension is called the rank of T. If T is given by , then the kernel of T is the
same as the nullspace of A, and the range of T is the same as the column space of A.
m
R
( ) T A = x x
n
n
The rank plus nullity theorem states that the sum of the nullity and rank equals the
dimension of the domain. So for T above, . ( ) dim ker ( ) T R T + =
Let : be a linear operator. If T is bijective, then T has an inverse, ,
defined so that
n
T → R R
1
T

1
( ) T

= y x if . If A is the matrix representation of T, then is
the representative of . Further, if the matrix A represents the transformation
and B represents the transformation , then the matrix product
AB represents the composition T .
( ) T = x y
1
A

1
T

:
m
S → R R
n p
x
:
n
T → R R
S
Eigenvalues and Eigenvectors
In general, multiplying a square matrix A by a compatible nonzero vector x does not
produce a scalar multiple of x. However, if it happens that for some scalar A λ = x λ ,
then λ is called an eigenvalue (or characteristic value) of A, and the nonzero vector x
is called an eigenvector (or characteristic vector). Eigenvalues and eigenvectors are
defined only for square matrices. To find these values, we must find scalars and vectors
42
that satisfy the equation , which can be rewritten as ( . This will
only be nontrivially soluble if is noninvertible, which means that
. Calculation of this determinant produces the characteristic equation.
For square matrices, this is a quadratic that must be solved for
A λ = x x
I
) A I λ − = x 0
A λ −
( ) det 0 A I λ − =
2 2 × λ . Once the
eigenvalues are found, each is substituted into the equation ( and solved for
the corresponding eigenvector (which may be a family of vectors due to possible free
variables). It is an interesting property that
) A I λ − = x 0
tr( ) A λ =

, where tr( ) A is the trace of the
matrix, the sum of its diagonal entries. Also, det A λ =

. (In both cases, it is
understood that the sum and product are over all eigenvalues for A.)
The eigenspace of A corresponding to λ , { } ( ) : E A A
λ
λ = = x x x , is the collection
of all eigenvectors corresponding to λ along with the zero vector. . ( ) ( ) E A N A I
λ
λ = −
The Cayley-Hamilton theorem states that every square matrix satisfies its own
characteristic equation. This is useful for expressing an integer power of a matrix A as a
linear polynomial in A. That is, if , then , where we
consider .
( )
0
det 0
n
i
i
i
A I λ α λ
=
− = =

0
0
n
i
i
i
A α
=
=

0
A I =
43
Number Theory
If a and b are positive integers, the division algorithm says that we can find unique
integers q and r such that b q , with . This algorithm is employed in
sequence to find the greatest common divisor of two integers in the Euclidean
algorithm: Given two numbers, a and b (where we’ll assume ), we know we can
find a quotient and remainder and such that . We then apply the division
algorithm to the divisor and the first remainder: . We continue this process
until there is no remainder (i.e., , where ). The divisor in the step
that yields no remainder is the greatest common divisor, or gcd, of the two numbers. If
, then a and b are said to be relatively prime. The product of the greatest
common divisor and the least common multiple of two numbers is equal to the product of
the two numbers:
a r = + 0 r a ≤ <
a b >
1
q
1
r
1
a q b r = +
1
n
r
2 1 2
b q r r = +
2 1 n n n
r q r
− −
= + 0
n
r =
gcd( , ) 1 a b =
[ ] [ ] gcd( , ) lcm( , ) a b a b ab ⋅ = .
The Diophantine equation will have infinitely many integral
solutions if . Any one solution ( ) reveals the others:
ax by c + =
gcd( , ) | a b c
2
1 1
, x y ∈Z

1
1
gcd( , )
gcd( , )
b
x x t
a b
a
y y t
a b
= +
= −

for any t . ∈Z
The greatest common divisor of any two numbers can always be written as a
linear combination of those two numbers. This is done by working backward through the
Euclidean algorithm and substituting in steps that lead to the original numbers.
44
Congruence
For integers a, b, and n, we say that a is congruent to b modulo n if is divisible by
n. That is, if .
a b −
mod a b n ≡ | ( ) n a b −
 If , then mod ab ac n ≡
o if gc mod b c n ≡ d( , ) 1 a n =
o
gcd( , )
mod
n
a n
b c ≡ if gc d( , ) 1 a n >
 Fermat’s little theorem states that if p is a prime and a is an integer:
o if p does not divide a
1
1mod
p
a

≡ p
p o for any integer a mod
p
a a ≡
The linear congruence equation, , has a solution if and only if gcd(
divides b. If , the solution is unique mod n; if gc , the solution is
unique mod
mod ax b n ≡ , ) a n
gcd( , ) 1 a n = d( , ) 1 a n >
gcd( , )
n
a n
.
Abstract Algebra
Binary Structures and Groups
Let S be a nonempty set. A function defined on every ordered pair of
elements of S to give a result that is also in S is called a binary operation on S. Binary
operations are typically written showing the set and the operation. In the case of f above,
we might write .
: f S S S × →
( ) , S -
A binary operation, , is said to be associative if, for every , the
following equation always holds: . A binary structure whose binary
operation is associative is called a semigroup.
- , , a b c S ∈
( ) ( ) a b c a b c = - - - -
45
Given a binary structure, , an element e with the property
for every is called the identity. A semigroup with an identity is called a monoid.
( , S -)
)
)
)
S ∈ a e e a a = = - -
a S ∈
Let ( be a monoid, and let a be an element in S. If there is an element
such that a a , we call the inverse of a. A monoid with the property that
every element in S has an inverse is called a group.
, S - a S ∈ ¯
a a e = = ¯ ¯ - - a¯
If the binary operation of a group ( has the property that for every
, the binary operation is commutative. A semigroup, monoid, or group whose
binary operation is commutative is said to be abelian.
, S - a b b a = - -
, a b S ∈
If the group ( contains precisely n elements for some positive integer n, then
the group is finite of order n. Otherwise, we say the group is infinite.
, S -
For any group, the inverse ( )
1
1 1
x y y x

− −
= - - .
Cyclic Groups
A group G with the property that there exists an element such that a G ∈
{ }
: 0,1, 2,
n
G a n = = … is said to be cyclic, and the element a is called the generator of
the group. (We clarify here that is the identity, , , etc.) A cyclic
group has at least one generator.
0
a
1
a a =
2
a a = -a
Consider the set { } 0,1, 2, , 1
n
n = Z … − )
)
e
and the group , the group whose
binary operation is addition modulo n. The integer m is a generator of if and
only if m is relatively prime to n. In more general terms, let G be a cyclic group with
generator a, and let n be the smallest integer such that . Then the element is a
generator of G if and only if m is relatively prime to n.
( ,
n
⊕ Z
( ,
n
⊕ Z
n
a =
m
a
46
Subgroups
Let ( be a group. If there’s a subset such that ( is also a group, then H
is a subgroup of G, and we write
) ) , G - H G ⊆ , H -
H G ≤ . (If H G ≠ , we can write H G < to denote H
as a proper subgroup of G.) Every group has at least two subgroups: the trivial
subgroup, { } e , consisting of just the identity element, and the group itself.
Let a be an element of a group G. With the binary operation defined on G, the set
{ }
:
n
a n∈Z , also denoted by a , is a subgroup of G called the cyclic subgroup
generated by a. It consists of all the integer powers of a. In the case that n is a negative
integer, we consider to be the inverse, so (n times).
1
a

= ¯ a a
n
a a a

= ¯ ¯ ¯ - -…-
We can define the order of a group element as follows: the order of is the
order of
a G ∈
a , the cyclic subgroup generated by a. Equivalently, the order of an element
is the smallest integer n such that , where e is the identity. The cyclic subgroup
generated by a is the smallest group of G containing a. More generally, if for
every i in some indexing set I, then the subgroup generated by
n
a = e
i
a G ∈
{ }
i
a is the subgroup
consisting of all finite products of terms of the form and is the smallest subgroup of G
containing all the elements . If the subgroup is all of G, then we say that G is
generated by
i
n
i
a
i
a
{ }
i
a and that the elements are generators of G.
i
a
 Lagrange’s theorem: Let G be a finite group. If H is a subgroup of G, then
the order of H divides the order of G.
 Let G be a finite, abelian group of order n. Then G has at least one subgroup
of order d for every positive divisor d of n.
47
 Let G be a finite, cyclic group of order n. Then G has exactly one subgroup—
a cyclic subgroup—of order d for every positive divisor d of n. If G is
generated by a, then the subgroup generated by the element has order
m
b a =
gcd( , ) d n m n = . (If , we say that gc .) 0 m= d( , ) m n n =
 Cauchy’s theorem: Let G be a finite group of order n, and let p be a prime
that divides n. Then G has at least one subgroup of order p.
 Sylow’s first theorem: Let G be a finite group of order n, and let ,
where p is a prime that does not divide m. Then G has at least one subgroup of
order
k
n p m =
i
p for every [ ] 0, i k ∈ ⊂ Z .
Isomorphisms
Consider the multiplication table for the three-element group below:
- e a b
e e a b
a a b e
b b e a

We compare that to the table for the group :
3
Z
⊕ 0 1 2
0 0 1 2
1 1 2 0
2 2 0 1

48
We notice that, although the symbols are different, the structure is exactly the same.
Structurally identical groups are said to be isomorphic. We denote an isomorphism
between two groups and by . Isomorphic groups share identical structural
properties, so if we know details about one group and also know that it is isomorphic to
another group, we know the other group’s details automatically. These details include
the order of the group, the number of subgroups of a particular order, whether it is cyclic
or abelian, etc.
1
G
2
G
1
G G ≅
2
) )
The Classification of Finite Abelian Groups
Let and be groups. On the set (
1 1
, G - (
2 2
, G - { }
1 2 1 2
( , ) : G G a b a G b G × = ∈ ∧ ∈ , define
a binary operation, , such that ( ) . Then is a
group, called the direct product of the groups and . If and are finite, and
has order m and has order n, then has order mn. If and are both
abelian, then the notation is sometimes used, and the resulting abelian group is
called the direct sum of and . This definition can be generalized to any number of
groups, not just two.
- ( ) (
1 2 1 2 1 1 1 2 2 2
, , , a a b b a b a b = - - ) -
2
2
m
( )
1 2
, G G × -
1
G
2
G
1
G
2
G
1
G
2
G
1
G G ×
1
G
2
G
1
G G ⊕
1
G
2
G
 The direct sum is cyclic if and only if
for every distinct pair and
1 2 k
m m
⊕ ⊕ ⊕ Z Z Z gcd( , ) 1
i j
m m =
i
m
j
m , in which case is
isomorphic to .
1 2 k
m m
⊕ ⊕ ⊕ Z Z Z
m
k
p
1 2 k
m m m
Z
 Every finite abelian group G is isomorphic to a direct sum of the form
, where the
( ) ( ) ( )
1 2
1 2
k k
r
r
p p
⊕ ⊕ ⊕ Z Z Z
i
p are (not necessarily distinct) primes,
and the are (not necessarily distinct) positive integers. The collection of
i
k
49
prime powers, ( , for a given representation of G are known as the
elementary divisors of G.
)
i
k
i
p
 Every finite abelian group G is isomorphic to a direct sum of the form
, where and for every . The
integers are not necessarily distinct, but the list is unique. These
integers are called the invariant factors of G.
1 2 t
m m
⊕ ⊕ ⊕ Z Z Z
m j
) )
1
2 m ≥ |
i
m m 1 j i = +
i
m
1
, ,
t
m m …
Group Homomorphisms
Let and ( be groups. A function ( , G - , G′ ∗ : G G φ ′ → with the property that
( ) ( ) ( ) a b a b φ φ φ = ∗ - for all elements is called a group homomorphism. An
injective (one-to-one) homomorphism is called a monomorphism; a surjective (onto)
homomorphism is called an epimorphism; a bijective (one-to-one and onto)
homomorphism is called an isomorphism. We recall the earlier fact that two groups that
are structurally identical are isomorphic; this is true if an only if there exists a bijective
homomorphism between the two groups. A homomorphism from a group to itself is
called an endomorphism, and an isomorphism from a group to itself is called an
automorphism.
, a b G ∈
For any group homomorphism : G G φ ′ → :
• If e is the identity in G, then ( ) e φ is the identity in G . ′
• If has finite order m, then g G ∈ ( ) g G φ ′ ∈ also has order m.
• If is the inverse of a in G, then is the inverse of
1
a
− 1
(a φ

) ( ) a φ in . G′
50
• If H is a subgroup of G, then ( ) H φ is a subgroup of , where G′
{ } ( ) ( ) : H h h φ φ = ∈H .
• If G is finite, then the order of ( ) G φ divides the order of G; if G is finite, then
the order of

( ) G φ also divides the order of . G′
• If H′ is a subgroup of , then is a subgroup of G, where G′
1
( ) H φ

{ }
1
( ) : ( ) H h G h H φ φ

′ ′ = ∈ ∈ .
• If : G G φ ′ → is a homomorphism of groups, then { } e′ , where is the identity in
, is the trivial subgroup of G . The inverse image of
e′
G′ ′
{ } e′ is a subgroup of G.
This subgroup is the kernel of φ , which is defined by { } ker : ( ) g G g e φ φ ′ = ∈ = .
A homomorphism is a monomorphism if and only if its kernel is trivial.
Rings
A set R, together with two binary operations (we’ll choose addition, +, and multiplication,
, here), is called a ring if the following conditions are satisfied: -
• is an abelian group; ( , R +)
)
-
)
• is a semigroup; ( , R -
• The distributive laws hold; namely, for every , we have:
and ( ) .
, , a b c R ∈
( ) a b c a b a c + = + - - a b c a c b c + = + - - -
If the multiplicative semigroup is a monoid (that is, if it has a multiplicative
identity), then R is called a ring with unity. If the operation of multiplication is
commutative, then R is a commutative ring. For satisfying the ring
( , R -
S R ⊂
51
requirements, we call a subring of ( . The characteristic of a ring is the
smallest integer n such that for every . If no such n exists, as in the case of
the infinite rings , , , and C, the ring is said to have characteristic zero. For cases
when , it is sufficient to check for the smallest n such that .
( , , S + -) )
)
, , R + -
0 na = a R ∈
Z Q R
char 0 R > 1 0 n = -
Ring Homomorphisms
Let and ( be rings. A function ( , , R + × ) , , R′ ⊕ ⊗ : R R φ ′ → is called a ring
homomorphism if both of the following conditions hold for all : , a b R ∈

( ) ( ) (
( ) ( ) (
a b a b
a b a b
)
)
φ φ φ
φ φ φ
+ = ⊕
× = ⊗
.
For any ring homomorphism : R R φ ′ → :
• The kernel of a ring homomorphism is the set { } ker : ( ) 0 a R a φ φ ′ = ∈ = , where 0

R′ . The kernel of a ring homomorphism : R R φ ′ → is a
subring of R.
• The image of R, { } ( ) ( ) : R r r φ φ = ∈R , is a subring of R.
• The image of 0, the additive identity in R, must be , the additive identity in 0′ R′ .
It follows from this that ( ) ( ) r r φ φ − = − for all , where is the additive
inverse of r in R.
r R ∈ r −
Fields
Let a be a nonzero element of a ring R with unity. Recall that the multiplicative structure
is not required to be a group, so a may not have an inverse. If it does have a
multiplicative inverse, then a is called a unit. If every nonzero element of R is a unit—
( , R -)
52
namely, if
(
is a group—then R is called a division ring. A commutative division
ring is called a field.
)
*
, R -
Set Theory
• The symmetric difference of two sets is written as ( ) ( ) A B A B B A ∆ = − ∪ − .
• DeMorgan’s laws say that:

( )
( )
C
C C
C
C C
A B A B
A B A B
∩ = ∪
∪ = ∩

( ) ( ) ( )
( ) ( ) ( )
A B C A C B C
A B C A C B C
∪ − = − ∪ −
∩ − = − ∩ −

( ) ( ) ( )
( ) ( ) ( )
A B C A B A C
A B C A B A C
∩ ∪ = ∩ ∪ ∩
∪ ∩ = ∪ ∩ ∪

Two sets are equivalent if there exists a bijection between them. If there exists a
bijection between a set and the positive integers, , we say that the set is countably
infinite with cardinality . Some sets are uncountably infinite, meaning that no
bijection exists between them and the positive integers. The cardinality of the reals is
called the cardinality of the continuum, denoted sometimes as , since .
(The power set of A,
+
Z
0

0
2

( )
+
≈℘ R Z
( ) A ℘ is the set of all subsets of A; it has cardinality
card
2
A
.)
53
Combinatorics
For k objects, there are possible arrangements of their order, called a permutation.
For k objects chosen from n total objects, there are possible arrangements, where
! k
( , ) P n k

!
( , )
( )
n
P n k
n k
=
− !
.

If order is not important (as in the drawing of a lottery or a hand of cards), the
possibilities are referred to as combinations, and there are possibilities, where ( , ) C n k

( , ) !
( , )
! (
n
P n k n
C n k
k k n
⎛ ⎞
= = =
⎜ ⎟

⎝ ⎠
)! ! k k
.
We note that the combination is equal to the binomial coefficient, which comes
from the binomial theorem:
( , ) C n k
. ( )
0
n
n
n k k
k
n
a b a b
k

=
⎛ ⎞
+ =
⎜ ⎟
⎝ ⎠

When repetition is allowed, , and . ( , )
k
r
P n k n = ( , ) ( 1, )
r
C n k C n k k = + −
A generalized statement of the pigeonhole principle says that if you are given n
different objects, each of which is painted one of c different colors, then for any integer
( ) 1 k n c ⎡ ⎤ ≤ − +
⎣ ⎦
1, there are at least k objects painted the same color.
54
Probability and Statistics
Let S be a nonempty set. A Boolean algebra (or simply an algebra) of sets on S is a
nonempty subfamily of the power set of S, , that satisfies the following two
conditions:
( ) S ⊆℘ E
1. If A and B are sets in E , then so are A B ∪ and A B ∩ .
2. If A∈E, then so is
C
A S A = − .
By definition, this set is nonempty, so it contains some set A, and by the second condition
listed above, it also contains the complement of A. By the first condition above, it also
contains the union of these sets, which means it contains all of S (and, necessarily, the
empty set).
Let E be an algebra of sets on S. A function
[ ]
: 0, P → E 1
0
is called a probability
measure on E (or S if E is understood) if all of the following conditions are met:
1. . ( ) 0 P ∅ =
2. . ( ) 1 P S =
3. for every pair of disjoint sets A and B in E . ( ) ( ) ( ) P A B P A P B ∪ = +
A probability space is a set S together with an algebra of sets on S and a probability
measure on S. The set S is called the sample space, the elements of S are called
outcomes, and the sets in E (which are subsets of S) are called events. With this in
mind, is interpreted to be the probability that event A occurs. Two events, A and B,
are considered mutually exclusive if .
( ) P A
( ) P A B ∩ =
• . ( ) ( ) ( ) ( ) P A B P A P B P A B ∪ = + − ∩
• A and B are independent if and only if . ( ) ( ) ( ) P A B P A P B ∩ = -
55
A Bernoulli trial is an experiment in which there are only two possible outcomes,
often termed a success or a failure. The probability of exactly k successes in n such trials
is given by an application of the binomial theorem:
, ( ; , )
k n k
n
P k n p p q
k

⎛ ⎞
=
⎜ ⎟
⎝ ⎠
where p is the probability of a success on any given trial and is the probability
of failure. The distribution of all possibilities for
1 q = − p
[ ]
0, k n ∈ ⊂Z is called the binomial
distribution.
Let ( be a probability space. A function is called a random
variable. To each outcome , the function assigns some real number, . If we
consider the subset
) , , S P E : X S →R
S ω∈ ( ) X ω
{ } : ( ) X t ω ω ≤ ⊂ S and assume it to be a member of the family E ,
this subset is an event. We can associate a function to the random variable such that
{ } (
( ) : ( )
X
F t P X t ω ω =
)

1
( )
, and we call the distribution function of X. We can
abbreviate this by writing . This function gives the probability that the
random variable will take on a value no greater than t. We can also calculate the
probability that the random variable will be in a certain interval, ( , by
. This fact is arrived at by noting
that the events
X
F
( ) ( )
X
F t P X t = ≤
]
1 2
, t t
1 2 2 1 2
( ) ( ) ( ) ( )
X X
P t X t P X t P X t F t F t < ≤ = ≤ − ≤ = −
1
X t ≤ and are disjoint and considering their union,
1
t X t < ≤
2 2
X t ≤ . An
algebraic rearrangement of the probabilities gives the expression above. To close or open
the interval we’re considering, we need only to add or subtract (respectively) the
probability at the endpoint (e.g., .
1 2 2 1
( ) ( ) ( ) (
1
P t X t P X t P X t P X t ≤ ≤ = ≤ − ≤ + = )
56
Random variables are often defined so that the probability that X will equal any
particular value is zero. Meaningful results only come from considering an interval that
X can fall into. Such a random variable (and its distribution function) is called
continuous. The derivative of the distribution function, ( )
X X
f F t ′ = , is called the
probability density function of X. We require this function to be nonnegative and
integrable, so that
2
1
2 1
( ) ( ) ( )
t
X X X
t
F t F t f t d − =

t . It is also required that ( ) 1
X
f t dt

−∞
=

.
We also have the following result:
. ( ) ( ) ( )
t
X X
P X t F t f x dx
−∞
≤ = =

If X is a continuous random variable, we can calculate the expectation (or mean)
of X by:
. ( ) ( ) ( )
X
E X X tf t dt µ

−∞
= =

The variance of X is given by:
. [ ]
2
2
Var( ) ( ) ( ) ( )
X
X X t X f t σ µ

−∞
= = −

dt
The standard deviation is equal to the square root of the variance.
A random variable is said to be normally distributed if its probability density
function has the form:

( )
2
2
1
( ) exp
2 2
X
t
f t
µ
σ σ π
⎛ ⎞

⎜ ⎟ = −
⎜ ⎟
⎝ ⎠
.
This function must be integrated numerically; it is often useful to consult tables for the
values of interest. These tables often make use of a change of variable: ( ) u t µ σ = − .
57
When the mean is set to zero and the standard deviation is set to one, we get the
standard normal probability density for the standardized normal random variable Z:

2
1
( ) exp
2 2
Z
u
f u
π
⎛ ⎞
= −
⎜ ⎟
⎝ ⎠
,
which is related to the original random variable by the equation:

1 2
1 2
( )
t t
P t X t P Z
µ µ
σ σ
− − ⎛ ⎞
< ≤ = < ≤
⎜ ⎟
⎝ ⎠
.
The integral of ( )
Z
f u gives the standard normal probability distribution, which is
commonly denoted by Φ:

2
2
1
( )
2
z
u
z e
π

−∞
Φ =

du
2
1
( )
.
For two extended real numbers , we have:
1
z z <
.
1 2 2
( ) ( ) P z Z z z z < ≤ = Φ −Φ
A brief table of values is given below.
z 0 0.5 1 1.5 2 2.5 3+
Ф(z) 0.5 0.69 0.84 0.93 0.97 0.99 1 ≈
(For negative values of z, we note that Φ − .) ( ) 1 ( ) z z = − Φ
Returning briefly to the binomial distribution, we can compute
,
2
1
1 2
( )
a
k n k
k a
n
P a X a p q
k

=
⎛ ⎞
≤ ≤ =
⎜ ⎟
⎝ ⎠

but this can be unwieldy for large n. With np µ = and npq σ = , we can obtain an
approximation of the binomial distribution with a normal distribution, where
( ) ( )
1 1
1 2 2 1 2 2
( ) ( ) ( P a X a a a ) µ σ µ σ ≤ ≤ ≈ Φ − + −Φ − − .
58
Point-Set Topology
Let X be a nonempty set. A topology on X, denoted by , is a family of subsets of X,
for which the following three properties always hold:
T
1. and X are in . ∅ T
2. If and are in , then so is their intersection, .
1
O
2
O T
1 2
O O ∩
3. If { }
i
i I
O

is any collection of sets from , then their union, , is also in . T
i
i I
O

T
In sum, a topology is always closed under finite intersections and arbitrary unions. The
sets in are known as open sets. A set is open if all of its elements are interior points,
which means that for every point , p is contained in an open interval that is itself
contained in O. That is, . A point is called an accumulation point of
a set A if every open set G containing p contains a point of A different from p.
T
p O ∈
p
p S O ∈ ⊂ p ∈R
A set X together with a topology is called a topological space, . A
Hausdorff space is a topological space such that for every pair of distinct points
T ( , X T)
, x y X ∈ there exist disjoint open sets and such that
x
O
y
O
x
x O ∈ and .
y
y O ∈
For any nonempty set X, the collection { } , X ∅ is always a topology on X, called
the indiscrete (or trivial) topology. The power set of X, ( ) X ℘ , is also always a
topology on X, and it is referred to as the discrete topology. If and are topologies
on X , we say that is finer than (or that is coarser than ) if . We
know that
1
T
2
T
1
T
2
T
2
T
1
T
1
⊇ T T
2
{ } ( ) , X ℘ ⊃ ∅ X , and we say that these collections represent the extremes:
59
( ) X ℘ is the finest possible topology on X, and { } , X ∅ is the coarsest possible topology
on X. For , we can also write to indicate fineness and coarseness.
1
⊇ T T
2 2
)
)
1
~
¯
T T
If ( is a topological space, we can use T to define a topology on a subset
. A subset U is said to be open in S if U for some set O that is open
in X. It is important to point out that U need not be open in X for this definition to apply.
Consider the sets and
, X T
S X ⊂ S ⊂ O S = ∩
( , O b b = − [ ] , S a a = − , both in , with . Then R a b <
[ ] , U O S S a a = ∩ = = − is said to be open in S, since O is open in , even though U is
not open in . In this way, we can define the subspace (or relative) topology, , with
S a subspace of X. Written more compactly,
R
R
S
T
{ } :
S
S U = ∩ U ∈
)
T T .
The
Subspace
Topology
Let ( be a topological space, and let A be a subset—not necessarily open—
of X. The interior of A, denoted
, X T
int( ) A , is the union of all open sets contained in A, or,
equivalently, the largest open set contained in A. The exterior of A, denoted ext( ) A , is
the union of all open sets that do not intersect A; equivalently, we can say that
. The boundary of A, denoted ext( ) int( )
C
A A = bd( ) A , is the set of all such that
every open set containing x intersects both A and the complement of A; equivalently,
x X ∈
( ) bd( ) int( ) ext( )
C
A A A = ∪ . A limit point of A is an accumulation point, which is
defined above. The set of all the limit points of A is called the derived set of A, denoted
A′ . The closure of A, denoted cl( ) A , can be defined in two equivalent ways:
cl( ) int( ) bd( ) A A A A ′ = ∪ = ∪ A . A perhaps more useful definition says that the closure
of A is the intersection of all closed supersets of A, which corresponds to the intersection
of all complements of the topological class that contain A.
Interior,
Exterior,
Boundary,
Limit
Points, and
Closure
60
A set is closed if and only if it contains all of its boundary and limit points;
equivalently, A is closed if cl( ) A A = . Also, a set is closed if and only if its complement
is open; a set is open if and only if its complement is closed. A set A is open if and only
if int( ) A A = . It is possible for sets to be simultaneously open and closed. (Consider the
discrete topology, made up of the power set, which contains every possible subset, all of
which are considered open. But since the complement of every open subset is also
contained in the power set, every subset is both open and closed.)
Basis for a
Topology
Let X be a nonempty set, and let B be a collection of subsets of X satisfying the
following properties:
1. For every , there is at least one set x X ∈ B∈B such that . x B ∈
2. If
1
B and
2
B are sets in B and
1 2
x B B ∈ ∩ , then there exists a set such that
3
B ∈B
3 1 2
x B B B ∈ ⊆ ∩ .
The collection B is called a basis, and the sets in B are known as basis elements. A
basis is used to generate a topology in that the topology consists of all possible unions of
the basis elements. A subset belongs to the topology generated by if, for
every , there exists a basis element B such that
O X ⊂ T B
x O ∈ x B O ∈ ⊆ .
If ( ) and ( are topological spaces, we can define a product topology
on the Cartesian product
,
X
X T ) ,
Y
Y T
X Y × by setting up the following standard basis:
{ } :
X Y X X Y Y
O O O O = × ∈ ∧ ∈ B T T
)
X ∪ =
.
Let ( be a topological space. If there exist disjoint, nonempty open sets
and such that O O , then X is said to be disconnected. If, however,
contains no pair of disjoint sets whose union is X, then X is said to be a connected space.
, X T
1
O
2
O
1 2
T
Connectedness
61
The following criteria can be used to identify connected spaces:
1. If A and B are connected and they intersect, then their union is also connected.
This holds for any number of connected sets, as long as their intersection is
nonempty.
2. Let A be a connected set, and let B be any set such that cl( ) A B ⊆ ⊆ A . Then B is
connected.
3. The Cartesian product of connected spaces is connected.
4. Let X be a topological space with the property that any two points,
1 2
, x x X ∈ can
be joined by a continuous path. Then X is said to be path-connected, and any
path-connected space is connected. The converse of this statement is not true.
Let be a topological space. A covering of X is a collection of subsets of X
whose union is X. An open covering is a covering consisting entirely of open sets. If
every open covering of X contains a finite sub-collection that also covers X, then X is said
to be a compact space. Compact spaces can be identified using the following criteria:
( , X T)
Compactness
1. Let X be a compact topological space, and let S be a subset of X. If S is closed,
then it’s compact. The converse is true if X is Hausdorff.
2. The Cartesian product of compact spaces is compact.
3. The Heine-Borel theorem: A subset of is compact if and only if it’s both
closed and bounded.
n
R
(A subset is bounded if there exists some positive number M such that the norm
of every point
n
A⊂ R
A ∈ x is less than M. That is, for ( )
1 2
, , ,
n
x x x = x … ,
( ) ( ) ( )
2 2 2
1 2 n
x x x = + + + < x M .)
62
Let X be a nonempty set, and let be a real-valued function defined
on ordered pairs of points in X. The function d is said to be a metric on X if the
following properties hold for all
: d X X × →R
, , x y z X ∈ :
Metric
Spaces
1. , and if and only if ( , ) 0 d x y ≥ ( , ) 0 d x y = x y = .
2. . ( , ) ( , ) d x y d y x =
3. . ( , ) ( , ) ( , ) d x z d x y d y z ≤ +
We call the distance between x and y. A set X together with a metric on X is
called a metric space. For , the set
( , ) d x y
0 ε > { } ( , ) : ( , )
d
B x x X d x x ε ε ′ ′ = ∈ < is called an ε-
ball, the open ball of radius ε centered on x. The collection of all ε-balls,
{ } ( , ) : 0
d
B x x X ε ε = ∈ ∧ B > , is a basis for a topology on X, called the metric topology
(induced by d). In this topology, a subset O is open if and only if every ,
there exists a positive number such that
X ⊂ x O ∈
x
ε ( , )
d x
B x ε ⊆O, which says that every point in
the set must be the center of an ε-ball.
Continuity We recall the definition of a continuous function, which says that a function f is
continuous at the point
0
x if, for every , there exists a number such that 0 ε > 0 δ >
0
( ) ( ) x x f x f x δ − < ⇒ − <
0
ε
) )
. We can generalize this to metric spaces by saying the
following: Let and be metric spaces. A function (
1 1
, X d (
2 2
, X d
1 2
: f X X → is
continuous at the point
0
x if, for every , there exists a number such that
. The function f is simply called continuous if it is
continuous at every
0 ε > 0 δ >
(
1 0 2 0
( , ) ( ), ( ) d x x d f x f x δ < ⇒ < ) ε
1 0
x X ∈ . We can also say that
1
:
2
f X X → is continuous at the point
0
x if for every open set O containing
0
( ) f x , the inverse image
1
( ) f O

is an open set
63
containing
0
x . This is equivalent to saying for suitable
. Finally, for topological spaces ( and ( , a function
( ) ( ) (
2 1
1
0
( ), ,
d d
f B f x B x ε

= )
0
δ
0 ) ) , ε δ >
1 1
, X T
2 2
, X T
1 2
: f X X →
is continuous if, for every open set (i.e., for every ), the inverse image,
2
O X ∈
2
O∈T
1
( ) f O

is open in
1
X (i.e., ). Moreover, f is continuous at the point
1
1
( ) f O

∈T
0
x if for
every open set containing
2
O X ⊂
0
( ) f x there is an open set such that
1
O X ⊂
1
1
( ) f O ⊆O . If the range of f is a topological space generated by a basis, it is sufficient to
check that the inverse image of the basis elements are open in
1
X . The following are
some facts about continuous maps between topological spaces, : ( ) (
1 1 2 2
: , , f X X → T T )
1. The set
1
( ) f C

is closed in
1
X for every closed subset . (In fact, this is
the case if and only if the map f is continuous.)
2
C X ⊂
2. If C is a connected subset of
1
X , then is a connected subset of ( ) f C
2
X .
3. If C is a compact subset of
1
X , then is a compact subset of ( ) f C
2
X .
4. Let be a continuous function, with X a compact space. Then there
exist points such that for every . Any real-
valued function defined on a compact space—particularly a closed, bounded
subset of —achieves an absolute maximum at some point in X and an absolute
minimum at some point in X.
: f X →R
, a b X ∈ ( ) ( ) ( ) f a f x f b ≤ ≤ x X ∈
n
R
A map
1
:
2
f X X → is said to be an open map if the image of every open set in
1
X is
open in
2
X . (Note the difference between this and the definition of continuity.) If
1
:
2
f X X → is bijective and both f and
1
f

are continuous (meaning that f is a
continuous, open map), then f is called a homeomorphism. This is analogous to the
64
concept of an isomorphism between groups or rings. That is, if an isomorphism exists
between two algebraic structures, that structure is preserved, and they are algebraically
identical. Homeomorphic topological spaces are topologically identical, since their
topological structure is preserved. For example, a homeomorphism ( exists
between the open interval and R, since
) ( ) g f x
(0,1)
( ) ( )
( ) )
( ) tan
2 2
0,1 ,
f x
g y y
π π
=
1
2
( x π = −
. ⎯⎯⎯⎯⎯→ − ⎯⎯⎯⎯→R
2
Let
1
: f X X → be a bijective, continuous map of topological spaces. If
1
X is
compact and
2
X is Hausdorff, then f is a homeomorphism.
Real Analysis
Consider a set of real numbers X ⊂ R. If X is bounded above, there is a smallest number
u such that u for every . This number is the least upper bound of X, also
known as the supremum or sup
x ≥ x X ∈
X . Similarly, if X is bounded below, there is a greatest
number l such that l for every . This number is called the greatest lower
bound, also known as the infimum of X or inf
x ≤ x X ∈
X . The supremum and infimum are
unique for any bounded set of real numbers.
A Cauchy sequence is a sequence of numbers, ( ) , such that for any ,
no matter how small, there exists a number N such that, for ,
1
n
n
x

=
0 ε >
, m n N >
n m
x x ε − < . This
means that the successive elements of a Cauchy sequence grow arbitrarily close to each
other. Every Cauchy sequence of real numbers converges. Any metric space in which
every Cauchy sequence is guaranteed to converge to a point in the space is called a
complete space. By this definition, R is complete.
65
Let

R denote the set { } ∪ ∞ R . Let A be any subset of , and find a countable,
open covering of A by intervals of the form . We have
R
( ,
i i
a b ) ) (
1
,
i i
i
A a b

=

. We define
a function ( ) : ( ) A µ

℘ →

R R
) b
by the equation:
Lebesgue
Measure
( ) (
1 1
( ) inf , for ,
i i i i
i i
A b a A a µ
∞ ∞

= =
⎧ ⎫
= − ⊆
⎨ ⎬
⎩ ⎭
∑ ∪
,
where we permit . We now restrict this function to a subfamily, ,
where is defined as follows: a subset
( ) A µ

= ∞ ( ) ⊂℘ R M
M M ⊂ R is a member of if and only if
for every . The sets are called
Lebesgue measurable sets, and the restriction of
M
( ) ( ) ( )
C
A A M A µ µ µ
∗ ∗ ∗
= ∩ + ∩M ( ) A ⊂℘ R M
µ

to is denoted by M µ . This
function is called the Lebesgue measure. A set M for which ( ) 0 M µ = is said to be of
measure zero.
Ever open and closed set in is Lebesgue measurable, and every finite or
countably infinite subset of is also measurable. The complement of a measurable set
is measurable, and a finite or countably infinite union or intersection of measurable sets is
measurable. Not all of ℘ is measurable, but almost all subsets of that arise in
practice are measurable. Some properties of the measure and measurable sets follow:
R
R
( ) R R
1. The empty set is measurable and has measure zero. If { } M m = is a one-element
(or singleton) set, then ( ) 0 M µ = . It follows from this that if M is a finite or
countably infinite subset of , then R ( ) 0 M µ = . For instance, ( ) ( ) 0 µ µ = = Z Q .
2. If ( ) , M a b = , [ ) , a b , , or ( ] , a b [ ] , a b , then the measure is just the length of the
interval, ( ) M b a µ = − . If M is a finite union of disjoint intervals, then ( ) M µ is
66
the sum of the lengths of the intervals. If M is a measurable set that contains an
infinite interval, then ( ) M µ = ∞. In general, if { }
i
M is any countable collection
of disjoint, measurable sets, then ( )
1 1
i i
i i
M M µ µ
∞ ∞
= =
⎛ ⎞
=
⎜ ⎟
⎝ ⎠
∑ ∪
.
3. If and
1 2
, M M ∈M
1 2
M M ⊆ , then
1 2
( ) ( ) M M µ µ ≤ .
A function is said to be Lebesgue measurable if, for every open set
, the inverse image,
: f → R R
O ⊂ R
1
( ) f O

, is a Lebesgue measurable set. Since every open
subset of is measurable, it follows from our earlier definition of continuity that every
continuous function is Lebesgue measurable. Still, there do exist noncontinuous
functions that are measurable. The sum, difference, and product of measurable functions
is measurable. Also, if f is measurable, then so is
R
( ) ( ) f x f x = .
For any A ⊂ R , we can define a function on R, the characteristic function of A:

0 if
( )
1 if
A
x A
x
x A
χ
∉ ⎧
=

.
This function is measurable if and only if A is a measurable set. Using this function, we
can construct another function, the step function, which is a finite linear combination of
characteristic functions with real coefficients. Such a function typically takes the form
. Step functions provide the foundation for constructing the Lebesgue
integral of a general function.
1
( ) ( )
i
n
i A
i
s x a x χ
=
=

Let f be a measurable function such that for every . Then there
exists a sequence of step functions, ( , such that and for which
.
( ) 0 f x ≥ x∈R
)
1 2
, , s s …
1 2
0 s s ≤ ≤ ≤…
lim ( ) ( )
n n
s x f x
→∞
=
67
Let
1
i
n
i A
i
s a χ
=
=

be a step function such that every set is measurable. Then s is
a measurable function, and we say that s is Lebesgue integrable if
i
A
0 ( )
i i
a A µ ≠ ⇒ < ∞.
In this case, the Lebesgue integral is:
.
1
( )
n
i i
i
s d a A µ µ
=
=

Let f be a measurable function such that . If every step function s, with
and , is integrable and
0 f ≥
0 s ≥ s f ≤ s dµ

is finite, then we say that f is Lebesgue
integrable. The value of the Lebesgue integral of f is:

{ }
sup f d s d µ µ =
∫ ∫
,
where the supremum is taken over all integrable, nonnegative step functions, s, such that
. For functions that are not everywhere nonnegative, we consider that every
function can be written as the difference of two nonnegative functions,
s f ≤
f f f
+ −
= − ,
where
( ) (
1
2
max , 0) f f f f
+
= + = , the positive part, and
( ) ( )
1
2
max , 0 f f f f

= − = − ,
the negative part. This means that an arbitrary measurable function f is Lebesgue
integrable if both its positive and negative parts are Lebesgue integrable, and
f d f d f d µ µ µ
+ −
= +
∫ ∫ ∫
.
Lebesgue integration takes a different approach from Riemann integration. Whereas a
Riemann integral splits the domain of f into partitions and calculates areas under a curve
based on rectangles of height f, becoming exact as the rectangles’ widths grow
infinitesimal, the Lebesgue integral splits the range of f into partitions and approximates f
by the step function and approaches a limit.
68
Complex Analysis
A complex number can be written in polar form as , where z x iy = + ( ) cos sin z r i θ = + θ
2
r z zz x y = = = +
2
and , called the argument of z, is equal to θ arctan
y
x
,
corresponding to the angle made with the positive real axis. The principal value of ,
written Arg , falls in the range .
θ
z π θ π − < ≤
Two complex numbers can be easily multiplied and divided in polar form:

( )
( )
1 2 1 2 1 2 1 2
1 1
1 2 1 2
2 2
cos( ) sin( )
cos( ) sin( )
z z r r i
z r
i
z r
θ θ θ θ
θ θ θ θ
= + + +
= − + −
.
The Taylor series expansion for the polar form of complex numbers provides us with the
interesting fact that complex numbers can also be expressed exponentially:
, ( ) cos sin
i
z re r i
θ
θ θ = = +
in an expression known as Euler’s formula. Following from this is de Moivre’s
formula, which says:
. ( ) cos sin cos sin
n
i n i θ θ θ + = + nθ
)
Since the number 1 can be expressed in exponential form as for any integer k, we
can express the n
(2 i k
e
π
th
roots of unity by:

( ) 2
2 2
cos sin : 0,1, , 1
i k n
k k
e i k
n n
π
π π ⎧ ⎫ ⎛ ⎞ ⎛ ⎞
= + =
⎨ ⎬
⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎩ ⎭
… n − .
The n
th
roots of any complex number can be found by multiplying the principal n
th
root
by the n
th
roots of unity.
69
The logarithm of a complex number can be defined thanks to our ability to
express such a number exponentially. But we must recall that adding any integer
multiple of to the argument of z leaves the complex number essentially unchanged.
Therefore, any nonzero complex number has infinitely many logarithms, but we define
the principle value by

Log ln Arg z z i z = + . Using the concept of the complex
logarithm, we can define complex powers, , where . We have .
w
z , z w∈C
Log w w
z e =
z
Inspired by Euler’s formula, we can develop formulas for calculating
trigonometric functions of complex numbers. In particular, we have:
cos and sin .
2 2
iz iz iz iz
e e e e
z z
i
− −
+ −
= =
The rest of the functions are defined by applying their familiar relations to the definitions
for the sine and cosine above. The trigonometric identities defined for real numbers are
valid for complex numbers as well. One major difference between the complex sine and
cosine and their real counterparts is that, while cos 1 x ≤ and sin 1 x ≤ for , the
complex versions are unbounded, and the norms
x∈R
sin z and cos z can take on any
nonnegative real value.
The hyperbolic functions are defined for real numbers by:
cosh and sinh .
2 2
x x x x
e e e e
x x
− −
+ −
= =
They share some properties with their non-hyperbolic namesakes, such as cos and
,
h0 1 =
sinh0 0 = ( ) cosh cosh x x − = and ( ) sinh sinh x x − = − . Then there is the identity
2 2
cosh sinh 1 x x − = , which is similar to the Pythagorean trig identity for the sine and
cosine.
70
From our definitions, we see that

cos( ) cosh
cosh( ) cos
sin( ) sinh
sinh( ) sin .
iz z
iz z
iz i z
iz i z
=
=
=
=
With these formulas, we can develop equations that allow us to evaluate cos z and sin z ,
where , in terms of real x and y. Using the angle sum formulas and the relations
directly above, we get:
z x iy = +
.
cos cos( ) cos cosh sin sinh
sin sin( ) sin cosh cos sinh
z x iy x y i x
z x iy x y i x
= + = −
= + = +
y
y
These are used to obtain the following:

2
2 2
2
2 2
cos cos cos cos sinh
sin sin sin sin sinh
z z z x
z z z x
= = +
= = +
y
y
.
Functions of a complex variable can be differentiated like their real counterparts.
In fact, a table of derivatives for real-valued functions will apply for complex-valued
functions. First, it is important to remark that for
some real-valued functions u and v. We say that , denoting the real part,
and , denoting the imaginary part. Returning to the variable z, we notice
that there is a slight difference in how the definition of the derivative behaves, which
gives rise to some interesting results. The difference quotient
( ) ( ) ( , ) ( , ) f z f x iy u x y iv x y = + = +
( ) ( , ) f u x y ℜ =
( ) ( , ) f v x y ℑ =
Derivatives
of
Complex-
Valued
Functions

0
( ) (
( ) lim
h
) f z h f z
f z
h

+ −
′ =
looks the same for the complex variable z, but it is important to note here that h is also a
complex number with real and imaginary parts. Since the field C is not ordered, h can
71
take infinitely many routes to zero. Applying the definition twice, with h approaching the
origin from the real axis in one case and the imaginary axis in another, we see that
( )
u v v
i f z i
u
x x y
∂ ∂ ∂ ∂
′ + = = −
∂ ∂ ∂ ∂x

if f is differentiable. Equating the real and imaginary parts of the expression above, we
get the Cauchy-Riemann equations:
and
u v v u
x y x
∂ ∂ ∂ ∂
= =
∂ ∂ ∂ ∂y
− ,
which must be satisfied for f to be differentiable.
If is differentiable throughout some open set
O, then the Cauchy-Riemann equations hold throughout that set. Differentiating the C-R
equations again, we get:
( ) ( ) ( , ) ( , ) f z f x iy u x y iv x y = + = +

2 2 2
2 2
and
u u v v u u v
2
v
x x x x y x y y y y y x y x
⎛ ⎞ ⎛ ⎞ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ⎛ ⎞ ⎛ ⎞
= = = = = − = −
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
⎝ ⎠ ⎝ ⎠
⎝ ⎠ ⎝ ⎠
,
which tells us that

2 2
2 2
0
u u
x y
∂ ∂
+ =
∂ ∂
.
This is Laplace’s equation. A function satisfying this equation in some open set O is
said to be harmonic in O. Using the same method as above, we also see that

2 2
2 2
0
v v
x y
∂ ∂
+ =
∂ ∂
.
Written another way, we have
,
2 2
0 u v ∇ = ∇ =
72
where is the scalar Laplacian operator. If is differentiable throughout
some open set O, then and are harmonic in O.
2
∇ = ∇⋅ ∇ ( ) f z
( ) f ℜ ( ) f ℑ
If is differentiable at and at every point throughout some open set in the
complex plane containing , then is said to be analytic at the point . If f is
differentiable throughout the open set O, then f is said to be analytic in O. If f is analytic
everywhere in the complex plane, then f is said to be an entire function. If f is analytic,
then and , its component functions, must have continuous partial derivatives
of all orders. Also, if is analytic at , then so is . This implies that the
derivatives of all orders must also be analytic. This is a strong result without a
counterpart for real-valued functions.
( ) f z
0
z
0
z ( ) f z
0
z
( ) f ℜ ( ) f ℑ
( ) f z
0
z ( ) f z ′
We can calculate complex line integrals over parameterized curves in the
complex plane. If the curve C is given by for , where
, we have:
( ) ( ) ( ) z t x t iy t = + a t b ≤ ≤
, a b∈R
( ) ( ) ( ) ( ( ) ( ) ( ) ( ) ( ) ( )
b b
C a a
) f z dz f z t z t dt F z t dt F z b F z a ′ ′ = ⋅ = = −
∫ ∫ ∫
,
where ( ) ( )
d
dz
F z f z = . This expression suggests two methods for attacking the line
integral, one of which may be easier than the other in a given situation. Using the first
method, one calculates before substituting in the parameterized values of x and
y. Then the product with is calculated, and the integral is carried out with respect to
the parameter t over the limits of integration, a and b. The second method, which appears
far easier, employs the antiderivative of , telling us that
( f x iy + )
b
( ) z t ′
( ) f z
( )
( )
( )
( ) ( ) ( )
b z
a z a
f z t z t dt f z dz ′ ⋅ =
∫ ∫
.
73
We note that this method changes the limits of integration from real numbers to complex
numbers, as one would expect after changing the variable the integral is performed with
respect to from a real variable to a complex variable.
The following are some important theorems regarding analytic functions:
1. Cauchy’s theorem: If is analytic throughout a simply connected, open set
D, then for every closed path C in D, we have
( ) f z
( ) 0
C
f z dz =

. If is an
entire function, then for every closed path in .
( ) f z
( ) 0
C
f z dz =

C
2. Morera’s theorem: If is continuous throughout an open, connected set
, and for every closed curve C in O, then is analytic
in O.
( ) f z
O ⊂C ( ) 0
C
f z dz =

( ) f z
3. Cauchy’s integral formulas: If is analytic at all points within and on a
simple, closed path C surrounding the point , then
( ) f z
0
z
0
0
1 ( )
( )
2
C
f z
f z d
i z z π
=

z .
We also know that the n
th
derivative is analytic at for all , and we have
0
z n∈N
( )
( )
0 1
0
! ( )
( )
2
n
n
C
n f z
f z
i
z z
π
+
=

dz . This tells us that if we know the value of at
every point on a curve, then we also know its value—and the value of all its
derivatives—at any interior point.
( ) f z
4. Cauchy’s derivative estimates: Let be analytic on and within the circle of
( ) f z
0
z
0
z z − = r . If M is the maximum value of ( ) f z on the
circle, then
( )
0
!
( )
n
n
n M
f z
r
≤ .
74
5. Liouville’s theorem: If is an entire function that is bounded, then
must be a constant function.
( ) f z ( ) f z
6. The maximum principle: Let O be an open, connected subset of the complex
plane, and let be a function that is analytic in O. If there is a point
such that
( ) f z
0
z O ∈
0
( ) ( ) f z f z ≤ for every z O ∈ , then is a constant function. This
says that if is analytic and not constant, then
( ) f z
( ) f z ( ) f z attains no maximum
value in O. If O is bounded, then ( ) f z must achieve a maximum value in
; since it can’t do so inside, it must achieve a maximum on the boundary of
O.
cl( ) O
Taylor series can be formed for complex-valued functions in the same way that they
can for real-valued functions. The only difference is that the idea of the interval of
convergence is replaced by the disk of convergence, with
0
z z − < R
)
representing the
interior of an open disk of radius R centered at .
0
z
Taylor
Series for
Complex-
Valued
Functions
The power series converges absolutely for all z that satisfy (
0
0
n
n
n
a z z

=

0
z z − < R and diverges for all z such that
0
z z − > R, where
( )
0
( ) !
n
n
a f z n = and
1 lim
n
n
n
R a
→∞
= . If lim
n
n
n
a
→∞
= ∞, then the series converges only for . If
0
z z =
lim 0
n
n
n
a
→∞
= , then the series converges for all z. Every function that is analytic in an
open, connected subset O of the complex plane can be expanded in a power series whose
disk of convergence lies within O.

75
Let be a point in the complex plane, and let R be a positive number. The set of
all z such that
0
z
0
0 z z < − < R is called the punctured open disk of radius R centered at .
If a function is not analytic at a point but is analytic at some point in every
punctured disk centered at , then is said to be a singularity of . If is
analytic at every point in some punctured open disk centered at , then we call an
isolated singularity.
0
z
( ) f z
0
z
0
z
0
z ( ) f z ( ) f z
0
z
0
z
Assuming now that has an isolated singularity at , if there is a positive
integer n such that
( ) f z
0
z

( )
0
( )
( )
n
g z
f z
z z
=

,
with analytic in a nonpunctured disk centered at and , then the
singularity is called a pole of order n. If we can’t write in this form, then is
said to be an essential singularity. Functions with singularities may require a
generalization of the Taylor series in order to be expanded to regions in which their
singularities occur.
( ) g z
0
z
0
( ) 0 g z ≠
0
z ( ) f z
0
z
An annulus is a ring formed by two concentric circles. An annulus centered at
is represented by the double inequality
0
z
1 0
R z z R < − <
2
, where
1
R is allowed to be
zero and
2
R is allowed to be . If is analytic in some annulus centered at , then
can be expanded in a Laurent series, which takes the form:
∞ ( ) f z
0
z
( ) f z
Laurent
Series
. ( ) ( )
0 0
1 0
n n
n n
n n
a z z a z z
∞ ∞

= =
− + −
∑ ∑
76
The sum on the right, referred to as the analytic part, is simply the Taylor series for the
function. The sum on the left, referred to as the singular (or principal) part, uses
Laurent coefficients, which are defined by:

( )
1
0
1 ( )
2
n n
C
f z
a d
i
z z
π
− − +
=

z ,
where C is a simple, closed curve contained in the annulus. (This definition aside, in
practice, the Laurent coefficients are typically derived algebraically from the Taylor
series.)
If the singular part of the Laurent series contains at least one term, but only
finitely many terms, then is a pole. In particular, if the singular part has for
all n greater than some integer k, where , then is a pole of order k. If, however,
the singular part contains infinitely many terms, then is an essential singularity.
0
z 0
n
a

=
0
k
a

0
z
0
z
Examining the definition of the Laurent coefficients, we see that
1
1
( )
2
C
a f
i π

=

z dz
)
, which suggests that we can easily calculate the integral if we have
the value of that coefficient. That coefficient is called the residue of at the
singularity , written
( ) f z
0
z (
0
Res , z f , and we see that (
0
( ) 2 Res ,
C
) f z dz i z f π =

. For a
pole of order k, we have the formula:

( )
( )
( )
0
1
0 0 1
1
Res , lim ( )
1 !
k
k
k
z z
d
z f z z f z
k dz

⎡ ⎤
= ⋅ −
⎣ ⎦
+
.
If the curve C surrounds more than one singularity of , , we have: ( ) f z
1 2
, , ,
n
z z z …
( )
1
( ) 2 Res ,
n
m
C
m
f z dz i z f π
=
= ⋅

.
77
78
Gallery of Graphs

79

80

81

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->