Mathematics Review

Mathematics Review
Compiled and Edited by Romann M. Weber*
Precalculus
Analytic Geometry
Parabolas, each defined as the locus of points equidistant from a point F (the focus) and
1 2 1 2
a line D (the directrix), have equations of the form y = ± x or x = ± y , where
4p 4p
the focus is located, respectively, at the point (0, ± p ) or (± p, 0) (with the sign matching
that of the equation above) and the directrix has the equation y = ∓ p or x = ∓ p ,
respectively. More generally, a parabola centered at ( h, k ) has the equation
1 1
( x − h ) or x − h = ± ( y − k ) .
2 2
y−k = ±
4p 4p
A circle of radius r centered at ( h, k ) has the equation ( x − h ) + ( y − k ) = r 2 . It

2 2
can be parameterized by x = r cos t , y = r sin t .
An ellipse is defined as the locus of points such that the sum of the distances from
each point on the graph to two given fixed points (the foci) is a given constant. Ellipses
( x − h) ( y −k)
2 2
have a major and a minor axis and follow the equation + = 1 , where the
a2 b2
axes have length a and b on either side of the center, ( h, k ) , terminating at the vertices.
In the case of a > b , the major axis is parallel to the x-axis, and the foci are located at the
*
The overall plan of coverage in this document was based on Cracking the GRE Math Subject Test, 3rd
Edition by Steven A. Leduc, many of whose results have been incorporated here. Certain items from that
reference have been corrected in this document, and new results from other sources have been included.
1
points ( h ± c, k ) , where c = a 2 − b 2 . For the case a < b , the major axis is parallel to the
y-axis, and the foci are located at the points ( h, k ± c ) , where c = b 2 − a 2 . The
c
eccentricity, which measures the “flatness” of the ellipse, is 0 ≤ e = ≤ 1 . An ellipse can
a
be parameterized by x = a cos t , y = b sin t .
A hyperbola is defined as the locus of points in the plane such that the difference
between the distances from every point on the hyperbola to two fixed points (the foci) is a
given constant. Depending on its orientation, it has either the equation
( x − h) (y −k)
2 2
− = 1 (opening horizontally), with the foci located at ( h ± c, k ) , or

a2 b2
(y −k) ( x − h)
2 2
− = 1 (opening vertically), with the foci located at ( h, k ± c ) . In either

b2 a2
b
case, c = a 2 + b 2 , and asymptotes are located at y − k = ± ( x − h) .
a
Polynomials
n
The rational roots theorem states that for a polynomial pn ( x) = ∑ ai x i with each
i =0
s
ai ∈ , if there are any roots r ∈ , then they are of the form r = , where s, t ∈ and
t
s | a0 and t | an . Complex and radical roots always come in conjugate pairs.
n n
For any polynomial pn ( x) = ∑ ai x i = an ∏ ( x − rj ) , the sum and product of the
i =0 j =1
n
an −1 n
a0
∑ rj = − ∏ r = ( −1)
n
roots are given by and j .
j =1 an j =1 an
2
Logarithms
The basic identities of logarithms are the following:
⎛ n ⎞ n
log b ⎜ ∏ xi ⎟ = ∑ log b xi
⎝ i =1 ⎠ i =1
⎛x ⎞
log b ⎜ 1 ⎟ = log b x1 − log b x2
⎝ x2 ⎠
log b x a = a log b x
b logb x = x
( logb a )( log a x ) = logb x
Trigonometry
The sine function is odd, meaning that sin ( −θ ) = − sin θ , and the cosine function is even,
meaning that cos ( −θ ) = cos θ . Any trigonometric function is equal to its co-function
operating on its complement. In other words:
⎛π ⎞ ⎛π ⎞ ⎛π ⎞
sin x = cos ⎜ − x ⎟ , tan x = cot ⎜ − x ⎟ , sec x = csc ⎜ − x ⎟ , and vice versa.
⎝2 ⎠ ⎝2 ⎠ ⎝2 ⎠
θ sin θ cos θ tan θ

0 0 1 0
π 6 1
2
1
2 3
1
3 3
π 4 1
2 2 1
2 2 1
π 3 1
2 3
1
2 3
π 2 1 0 ∞
2π 3 1
2 3 − 1
2 − 3
3π 4 1
2 2 − 12 2 -1
5π 6 1
2
1
2 3 − 13 3
π 0 -1 0
3
Other values can be computed using the following identities:
sin x
tan x =
cos x
cos x
cot x =
sin x
1
csc x =
sin x
1
sec x =
cos x
sin 2 x + cos 2 x = sec 2 x − tan 2 x = csc 2 x − cot 2 x = 1
sin ( x ± y ) = sin x cos y ± cos x sin y ⇒ sin 2 x = 2sin x cos x
cos ( x ± y ) = cos x cos y ∓ sin x sin y ⇒ cos 2 x = cos2 x − sin 2 x = 1 − 2sin 2 x = 2cos 2 x − 1
tan x ± tan y 2 tan x

tan ( x ± y ) = ⇒ tan 2 x =
1 ∓ tan x tan y 1 − tan 2 x
x 1 − cos x
sin =±
2 2
x 1 + cos x
cos =±
2 2
x sin x
tan =
2 1 + cos x
Hyperbolic Functions
e x − e− x
sinh x =
2
e x + e− x
cosh x =
2
4
Differential Calculus
Sequences
A sequence, formally defined as a function on the set of positive integers, is an infinite,
ordered list of terms. A sequence ( xn ) approaches a limit L if for every ε > 0 , there’s an
integer N such that xn − L < ε when n > N . This is equivalent to challenging someone
to think of a positive number, no matter how small, and coming up with a member of the
sequence whose distance from the limit is less than that number. If such an L exists, the
sequence is said to converge. If no such L exists, the sequence is said to diverge.
A sequence is monotonic if it is either strictly increasing or decreasing for every n
from some point on.
• Every convergent sequence is bounded, but the converse is not necessarily true
(consider xn = ( −1) , which is bounded above and below but diverges).

n
• If a sequence is monotonic and bounded, it is convergent.
• If lim an = A and k is a constant, lim kan = kA .

n →∞ n →∞
• If lim an = A and lim bn = B , then:

n →∞ n →∞
o ( an + bn ) → A + B
o ( an − bn ) → A − B
o ( anbn ) → AB
⎛ an ⎞ A
o ⎜ ⎟ → , assuming B ≠ 0
⎝ bn ⎠ B
5
⎛ 1 ⎞
• If k is a positive constant, then ⎜ k ⎟→0.
⎝n ⎠
⎛ 1 ⎞
• If k > 1 , then ⎜ n ⎟→0.
⎝k ⎠
• Assume lim an = lim cn = L . If ( bn ) is a sequence such that an ≤ bn ≤ cn for

n →∞ n →∞
every n > N , then lim bn = L . This is known as the sandwich theorem (or the
n →∞
squeeze theorem).
• If an = f (n) , then ( an ) → L if lim f ( x) = L .

x →∞
Functions
A function f :A→ B is injective (or one-to-one) if for any x, y ∈ A ,
f ( x) = f ( y ) ⇒ x = y . Equivalently, x ≠ y ⇒ f ( x ) ≠ f ( y ) . In other words, f maps
distinct objects to distinct objects. (Graphically, an injective function will be intersected
only once by a horizontal line.) A function f : A → B is surjective (or onto) if for any
b ∈ B there exists an a ∈ A such that f ( a ) = b . This means that the range is “spoken
for” by at least one element in the domain. A function that is both injective and
surjective is bijective.
Let f be a function defined on a subset of the real line. We seek to examine the
behavior of f near a point a, which may not be included in its domain. If ( xn ) is a
sequence converging to a, all of whose terms are in the domain of f, and the
sequence ( f ( xn ) ) → L as ( xn ) → a , we call L the limit of f as x approaches a. The limit
of a function exists only if it is the same whether approached from above or below. A
6
limit is also defined in the following way:
lim f ( x) = L ⇒ ∀ε > 0, ∃δ : 0 < x − a < δ ⇒ f ( x) − L < ε .

x →a
• If lim f ( x) = L1 and lim g ( x) = L2 , then:

x→a x→a
o lim [ f ( x) + g ( x)] = L1 + L2
x →a
o lim [ f ( x) − g ( x)] = L1 − L2
x →a
o lim [ f ( x) g ( x)] = L1 L2
x →a
⎡ f ( x) ⎤ L1
lim ⎢ = (assuming L2 ≠ 0 )
x→a g ( x) ⎥
o
⎣ ⎦ L2
• Assume that lim f ( x) = L and lim h( x) = L . If there is a positive number δ such

x→a x→a
that f ( x ) ≤ g ( x) ≤ h( x ) for all x satisfying 0 < x − a < δ , then lim g ( x) = L .

x→a
This is another version of the sandwich (or squeeze) theorem.
Continuity
A function f is continuous at a if lim f ( x) = f (a ) . A function is continuous on an

x→a
interval if it is continuous at every point in that interval.
• The Extreme Value Theorem: If f is a function continuous on a closed interval
[ a, b] , then f obtains an absolute minimum value, m, at some point c ∈ [ a, b] and
an absolute maximum value, M, at some point d ∈ [ a, b ] . That is, there exist
points c and d such that f (c ) ≤ f ( x ) ≤ f ( d ) for all x ∈ [ a, b ] .
7
• Bolzano’s Theorem: If f is a function continuous on the closed interval [ a, b]
such that f ( a ) and f (b) have opposite signs, then there’s a point c between a
and b such that f (c ) = 0 .
• The Intermediate Value Theorem: Let f be a function continuous on the closed
interval [ a, b] . Let m be the absolute minimum value and M be the absolute
maximum value of f on [ a, b] . For every number Y such that m ≤ Y ≤ M , there is
at least one value c ∈ [ a, b] such that f (c) = Y .
Derivatives
Rules of Differentiation
• The derivative of a sum is the sum of the derivatives: ( f + g )′ ( x ) = f ′( x) + g ′( x)
• The derivative of a constant times a function: ( kf )′ ( x ) = kf ′( x)
• The product rule: ( fg )′ ( x ) = f ( x ) g ′( x ) + f ′( x) g ( x)
⎛ f ⎞′ g ( x) f ′( x) − f ( x) g ′( x)
• The quotient rule: ⎜ ⎟ ( x ) =
[ g ( x)]
2
⎝g⎠
• The chain rule (for composite functions): ( f u )′ ( x ) = f ′ ( u ( x ) ) u′ ( x )
• The inverse-function rule: If f −1 is the inverse of f, and f has a nonzero
derivative at x0 , then f −1 has a derivative at y0 = f ( x0 ) equal to
( f )′ ( y ) = f ′ (1x )
−1
0
0
8
Common Derivatives
d (k ) = 0
d ( u k ) = ku k −1du
( )
d eu = eu du
d ( a u ) = ( ln a ) a u du
1
d ( ln u ) = du
u
1
d ( log a u ) = du
u ln a
d ( sin u ) = cos u du
d ( cos u ) = − sin u du
d ( tan u ) = sec2 u du
d ( cot u ) = − csc2 u du
d ( sec u ) = sec u tan u du
d ( csc u ) = − csc u cot u du
1
d ( arcsin u ) = du
1− u2
−1
d ( arccos u ) = du
1− u2
du
d ( arctan u ) =
1+ u2
n
⎡ n u′ ⎤
If f = k ∏ uiαi , then f ′ = f ⋅ ⎢ ∑ α i i ⎥ .
i =1 ⎣ i =1 ui ⎦
Implicit Differentiation
fx ∂f ∂x
For a function f ( x, y ) = c , we can define y′ = − =− when f x and f y exist and
fy ∂f ∂y
fy ≠ 0 .
9
Theorems Concerning Differentiable Functions
Rolle’s Theorem
Assume that f is a continuous function on the closed interval [ a, b] , with f ( a ) = f (b)
and f differentiable everywhere in ( a, b ) . Then there is at least one point c ∈ ( a, b ) such
that f ′(c) = 0 .
Mean Value Theorem (for Derivatives)
Assume that f is a continuous function on the closed interval [ a, b] and differentiable
everywhere in ( a, b ) . Then there is at least one point c ∈ ( a, b ) such that
(b − a ) f ′(c ) = f (b) − f ( a ) . This theorem states that there is at least one point between a
and b at which the slope of the tangent line is equal to the slope of the secant line through
( a, f ( a ) ) and ( b, f (b) ) .
Maxima and Minima
f ′(c ) f ′′(c ) f ( n ) (c ) * f (c )
0 >0 N/A Local minimum
0 <0 N/A Local maximum
0 0 f ( n ) (c) > 0 , with n even Local minimum
f ( n ) (c) < 0 , with n even Local maximum
0 0 Nonzero, with n odd f (c ) is not a local maximum or minimum
*Here n is the smallest integer such that the nth derivative is nonzero.
10
For a function defined on a closed interval, an absolute extremum will occur either at a
point where the first derivative vanishes, where the derivative is not defined, or at an
endpoint of the interval.
Integral Calculus
Common Integrals
∫ k du = ku + C
⎧⎪ k1+1 u k +1 + C (if k ≠ −1)
∫
k
u du =⎨
⎪⎩ln u + C (if k = −1)
∫e
u
du = eu
∫a
u
du = 1
ln a au + C
∫ sin u du = − cos u + C
∫ cos u du = sin u + C
∫ sec
2
u du = tan u + C
∫ csc
2
u du = − cot u + C
∫ sec u tan u du = sec u + C

∫ csc u cot u du = − csc u + C
⌠ du
⎮ = arcsin u + C
⌡ 1− u2
⌠ du = arctan u + C
⎮
⌡ 1+ u2
11
Integration by Parts
∫ u dv = uv − ∫ v du
The object when applying this method is carefully selecting the portion of the integrand
that can be easily integrated on the right-hand side when deciding which part is u and
which part is dv.
Trigonometric Substitution
For integrands of a certain form containing square roots, it can be useful to make
substitutions that take advantage of trigonometric identities.
For an integrand containing . . . Make the substitution . . .
u du
a2 − u 2 a sin θ a cos θ dθ
a2 + u 2 a tan θ a sec 2 θ dθ
u 2 − a2 a sec θ a sec θ tan θ dθ
Partial Fractions
If an integrand is a rational function of the form P( x) Q( x) , with deg P < deg Q , we can
evaluate the integral by splitting the integrand into more manageable units, a process
called partial-fraction decomposition. The denominator is factored, and each factor
becomes a denominator of a new rational function on the right-hand side. Any
denominator term of the form ( ax + b ) will be represented on the right by n fractions in

n
ascending powers from 1 to n. The numerators on the right side are made up of single
place-holding coefficients (typically A, B, C, etc.), which are to be solved for. In the case
12
that an irreducible quadratic occurs in the factored polynomial, the place-holding
coefficient in the numerator over that term would be replaced by a polynomial of the
form Bx + C . Multiplying everything out will work when solving for the coefficients,
but this could be quite a time-consuming series of steps. Since the resulting
decomposition will have to be true for all values of x in the domain, it is useful to
multiply by one term at a time and make substitutions that get the terms to cancel out
(namely, values of x that make terms equal zero—roots of the denominator polynomial).
Theorems Concerning Integrable Functions
Differentiating Under the Integral Sign
d b( x)
dx ∫a ( x )
f (t ) dt = f (b( x))b′( x) − f (a ( x)) a′( x)
Mean Value Theorem (for Integrals)
If f is a function continuous on the interval [ a, b] , then there is at least one point
b
c ∈ [ a, b] such that ∫ f ( x) dx = f (c)(b − a) . Here we see that f (c ) is the average value
a
of the function.
Polar Coordinates
T
( x, y ) ⎯⎯ →(r ,θ )
T ( x, y ) = ( x 2 + y 2 , arctan ( ))
y
x
Polar representations are not unique, so this
transformation is not universally true.
( r , θ ) ⎯⎯
T
→( x , y )
T (r ,θ ) = (r cos θ , r sin θ )
13
β
Area in polar coordinates: A = ∫α 1
2 r 2 dθ .
Area under (or within) a curve described parametrically:
If ( x, y ) = ( x(t ), y (t ) ) and the curve is traced out clockwise, beginning and ending at the
b b
same point (i.e., x( a ) = x(b) and y ( a ) = y (b) ), then A = ∫ y (t ) x′(t ) dt = − ∫ x(t ) y′(t ) dt .
a a
The sign is reversed if the curve is traced counterclockwise.
Volumes of Solids of Revolution
If a function f is rotated around a line, a solid is generated. Clearly, as any point on the
line is rotated about an axis, a disk is traced out. The value of f from point to point
therefore defines the length of the radius of that disk at any point. We can calculate the
volume of the solid formed using the following formulas:
b b
For a function f revolved about the x-axis: V = ∫ dV = ∫ π [ f ( x)] dx .
2
a a
d d
For a function g revolved about the y-axis: V = ∫ dV = ∫ π [ g ( y )] dy .
2
c c
In the case of a solid being formed with a gap between the defining function and the axis
(in which each disk becomes a washer), that gap being defined by a second function, we
have: V = ∫ π
b
a
{[ f ( x)] − [ g ( x)] } dx .
2 2
Arc Length
For a function y = f ( x) , the length of any portion of the curve can be calculated as
follows, the decision being whether to integrate along the x-axis (where x runs from a to
( ) ( )
b 2 d 2
b) or the y-axis (where y runs from c to d): s = ∫ ds = ∫ 1+ dy
dx dx = ∫ 1+ dx
dy dy .
a c
14
Series
An arithmetic series is a sequence of numbers taking the form
( an ) = a1 , a1 + d , a1 + 2d ,…, a1 + (n − 1)d , with each term differing from the one before by
the addition of a constant d. The sum of a finite arithmetic series can be computed easily
n n
by the following formula: S n = ( a1 + an ) = ( 2a1 + ( n − 1) d ) , where it should be clear
2 2
that the average of the first and last terms of the sequence is being multiplied by the
number of members of the sequence.
A geometric series is a sequence of numbers taking the form
( an ) = a1 , a1r , a1r 2 , a1r 3 ,…, a1r n−1 , with each term differing from the one before by the
multiplication of a constant ratio r. The sum of a finite geometric series can be computed
1 − r n a1 − ran
by the following formula: S n = a1 = , where r ≠ 1 .
1− r 1− r
An infinite series converges if the sequence ( sn ) of its partial sums converges to
a finite limit S. An infinite geometric series converges if and only if r < 1 , in which case
∞
1
∑r
n =0
n
=
1− r
. For an infinite series to converge, the terms must tend to zero as n tends to
∞
1
infinity. (The converse is not true, as illustrated by the harmonic series ∑ n = ∞ .)
n =1
If ∑a n converges, then ∑ ka n converges to k ∑ an for every constant k. If ∑an
diverges, then so does ∑ ka n for every constant k ≠ 0 .
If ∑a n and ∑bn both converge, then ∑(a n + bn ) converges to ∑ a + ∑b

n n .
15
1
The so-called p-series ∑n p
converges for every p > 1 and diverges for every p ≤ 1 .
The comparison test. Assume that 0 ≤ an ≤ bn for all n > N . Then if ∑b n
converges, ∑a n converges; if ∑a n diverges, then ∑b n diverges.
The ratio test. Given a series ∑a n of nonnegative terms, form the limit
an +1
lim
n →∞ a
= L . If L < 1 , then ∑a n converges; if L > 1 , then ∑a n diverges. The test is
n
inconclusive if L = 1 .
The root test. Given a series ∑a n of nonnegative terms, form the limit lim n an = L .
n →∞
If L < 1 , then ∑a n converges; if L > 1 , then ∑a n diverges. The test is inconclusive
if L = 1 .
The integral test. If f ( x ) is a positive, monotonically decreasing function for x ≥ 1
∞
such that f (n) = an for every positive integer n, then ∑a
n =1
n converges if and only if
∞
∫1
f ( x) dx converges.
∑ ( −1)
n +1
The alternating series test. A series of the form an (with an ≥ 0 for all n)
n =1
will converge provided that the terms an decrease monotonically, with a limit of zero.
Note that this is only necessarily true for alternating series.
A series ∑a n converges absolutely if ∑a n converges. Every absolutely
convergent series is convergent. (If ∑a n converges but ∑a n does not, ∑an is
said to converge conditionally.)
16
A power series in x is an infinite series whose terms are an x n , where each an is a
constant. (Such a series could technically be considered an infinite-degree polynomial.)
The convergence tests mentioned above are useful for the analysis of power series, which
are only useful if they converge. By the ratio test, the series will converge absolutely if:
an +1 x n +1 a
lim n
= x ⋅ lim n +1 < 1 .
n →∞ an x n →∞ a
n
We set L = lim an +1 an . If L = 0, then the power series converges for all x; if L = ∞ ,

n →∞
then the power series converges only for x = 0 ; and if L is positive and finite, then the
1 1
power series converges absolutely for x < and diverges for x > . The set of all
L L
values of x for which the series converges is called the interval of convergence. Every
power series in x falls into one of three categories:
1. The series converges for all x ∈ ( − R, R ) and diverges for all x > R , where R is
1
called the radius of convergence and R = , where L is defined as above.
L
(Whether the series converges at the endpoints of the interval must be checked on
a case-by-case basis.)
2. The power series converges absolutely for all x; the interval of convergence is
( −∞, ∞ ) , and the radius of convergence is ∞ .

3. The series converges only for x = 0 , so the radius of convergence is 0.
∞
Functions are frequently defined by power series. If f ( x) = ∑ an x n , the domain of f
n =0
must be a subset of the interval of convergence for the series.
17
Within the interval of convergence of the power series:
1. The function f is continuous, differentiable, and integrable.
2. The power series can be differentiated term by term:
∞ ∞ ∞
d
f ( x) = ∑ an x n ⇒ f ′( x) = ∑
dx
( n ) ∑ nan x n−1 .
a x n
=
n =0 n =0 n =1
3. The power series can be integrated term by term:
(∫ )
∞ ∞ ∞
an n +1
f ( x) = ∑ an x n ⇒ ∫ f (t ) dt = ∑ ant n dt = ∑
x x
x .
n=0 n + 1
0 0
n =0 n =0
If a function f can be represented by a power series, that series is the Taylor series:
∞
f ( n ) (0) n
f ( x) = ∑ x .
n =0 n!
Function Partial Taylor Expansion Form Interval of Convergence
1 1 + x + x 2 + x3 + ∞
( −1,1)
1− x ∑x
n=0
n
1 1 − x + x 2 − x3 + ∞
( −1,1)
1+ x ∑ (−1)
n =0
n
xn
ln(1 + x ) x 2 x3 ∞
(−1) n+1 n ( −1,1]
x−
2 3
+ − ∑
n =1 n
x
ex x 2 x3 ∞
1 ( −∞, ∞ )
1+ x + + +
2! 3!
∑ n! x
n =0
n
sin x x3 x5 ∞
(−1)n 2 n +1 ( −∞, ∞ )
x− + −
3! 5!
∑
n = 0 (2n + 1)!
x
cos x x2 x4 ∞
(−1) n 2 n ( −∞, ∞ )
1− + −
2! 4!
∑
n = 0 (2n)!
x
18
A Taylor series can be truncated to form a Taylor polynomial for use in approximating
n
f ( k ) (0) k
the value of a function. A Taylor polynomial of degree n, Pn ( x) = ∑ x , will
k =0 k!
f ( n +1) (c) n +1
have an error, called the remainder, exactly equal to f ( x) − Pn ( x) = x for
(n + 1)!
some c between 0 and x.
The Taylor series mentioned above is a special case of a more general series in
f ( n ) (a)
powers of ( x − a ) , which has Taylor coefficients an = . The remainder of a
n!
Taylor polynomial built on such a series would be equal to
f ( n +1) (c)
f ( x) − Pn ( x) = ( x − a)n +1 for some c between a and x.
(n + 1)!
Multivariable Calculus and Vector Analysis
In three dimensions, it is often convenient to think in terms of vectors. Vectors are often
represented by an ordered triple or addition of vector components. Whereas an ordered

3
triple could be regarded as a point in , the vector it represents corresponds to the line
connecting this point to the origin. Component notation, which would represent the
vector connecting the origin to the point ( x, y , z ) as xî + yˆj + zkˆ , makes use of
orthogonal unit vectors. Vectors are added and subtracted component-wise.
The norm of a vector is its magnitude, or length. For a vector
A = xa î + ya ˆj + za kˆ , A = A = xa2 + ya2 + za2 .
There are two ways to multiply vectors. One, the dot (or scalar) product, results
in a scalar. For two vectors, A = xa î + ya ˆj + za kˆ and B = xb î + yb ˆj + zbkˆ , the dot product
19
is equal to A ⋅ B = AB cos θ = xa xb + ya yb + za zb , where θ is the angle between the two
vectors. Since the dot product can be computed component-wise, it is not necessary to
know the angle between the vectors, and the dot product can be useful when the angle is
sought. For any two vectors perpendicular to each other, the dot product is zero. The dot
product is commutative: A ⋅ B = B ⋅ A .
The projection of B onto A, denoted projA B , which can be thought of as B’s
shadow cast on A by light shining perpendicular to A, is:
ˆ = A⋅B A .
projA B = ( projA B ) A
A⋅A
The cross (or vector) product is the other way to multiply vectors. The result is
a vector perpendicular to each of the vectors being multiplied, following the right-hand
rule. For two vectors A = xa î + ya ˆj + za kˆ and B = xb î + yb ˆj + zbkˆ the cross product is
equal to:
î ˆj kˆ
A × B = xa ya za ,
xb yb zb
and A × B = AB sin θ . This magnitude is equal to the area of the parallelogram spanned
by A and B. The cross product is anticommutative: A × B = − ( B × A ) .
The triple scalar product, the absolute value of which is equal to the volume of a
parallelepiped formed by three vectors A, B, and C, is equal to:
xa ya za
[ A, B, C] = ( A × B ) ⋅ C = xb yb zb .
xc yc zc
20
Lines
2
Just as a slope and a point define a line in , a point and a vector can define a line in
3
. Specifically, if we have a point, P0 = ( x0 , y0 , z0 ) , on a line L and a vector,
v = ( v1 , v2 , v3 ) , parallel to the line, we can say that any point P = ( x, y, z ) is on the line if
and only if the vector connecting that point to P0 , P0 P = ( x − x0 , y − y0 , z − z0 ) , is parallel
to v, which is to say that ( x − x0 , y − y0 , z − z0 ) = tv = t ( v1 , v2 , v3 ) for some scalar t (note
above the order of the points when subtracting). This is equivalent to three parametric
equations:
⎧ x = x0 + tv1
⎪
L : ⎨ y = y0 + tv2 .
⎪ z = z + tv
⎩ 0 3
These can each be solved for t and equated to yield the symmetric equations of L:
x − xo y − y0 z − z0
= = .
v1 v2 v3
In the case that any component of v is equal to zero, the term containing that component
is eliminated from the equation.
Planes
Planes in 3
are uniquely defined by a point, P0 = ( x0 , y0 , z0 ) , and a vector, n, normal to
the plane. Since any line in the plane must be perpendicular to the normal vector, we
have a clue as to the form that the equation of a plane must take. Specifically, the point
21
P = ( x, y, z ) is on the plane if and only if P0 P ⋅ n = ( x − x0 )n1 + ( y − y0 )n2 + ( z − z0 )n3 = 0 ,
which can be rewritten as n1 x + n2 y + n3 z = d , where d = n1 x0 + n2 y0 + n3 z0 .
Cylinders
2 3
We can take a curve C ⊂ and extend it into as a cylinder (which appears to
maintain this name even if C is not a closed curve). We do this by connecting to each
point in C a line parallel to some given line v, referred to as the generator. In the case
that this line is perpendicular to the plane of the curve, this extension is automatic and
2
leaves the original equation unchanged. (Since the equation is in , it is missing the
3
variable, say z, that would make it an equation in . That missing variable can then
3
take on all values once the curve is considered in , and the curve is converted into a
right cylinder.)
In the case that the line v = ( v1 , v2 , v3 ) is not perpendicular to the plane of the
curve, we recall the parametric and symmetric equations of the line from above in order
to generate our equation for the cylinder. Without loss of generality, we consider a curve
in the x-y plane, y = f ( x) . We say that a point on this curve in 3

is ( x0 , y0 , 0 ) , where
we note that no relationship for z is described by the equation of the curve. Any line
from a point on that curve will have to obey the following relationship:
x − x0 = v1t
y − y0 = v2t .
z = v3t
22
We can solve these equations for t and equate them, getting:
x − x0 y − y0 z
= = .
v1 v2 v3
These equations can be solved for the “naught values,” one at a time, getting:
v2
x0 = x − z
v3
.
v
y0 = y − 1 z
v3
3
Recall that these are the points on the curve, so the equation of the cylinder in
becomes:
v2 ⎛ v ⎞
y− z = f ⎜x− 1 z⎟.
v3 ⎝ v3 ⎠
Note the pattern in the naught values above. Note also that y = f ( x) was an arbitrary
choice. We could just as easily have had our function be f ( x, y ) = c or considered a
curve in the y-z plane or the x-z plane.
Surfaces of Revolution
2 3
A curve in can be rotated about an axis to form a surface in . Each point on the
original curve will trace out a circle around the axis about which it’s rotating. The
distance to that axis can be found by the distance formula.
This is the basis for the following substitutions, which are used to obtain
equations for the surface:
23
Curve Revolved Replace by Obtaining
around
f ( x, y ) = 0 x-axis y ± y2 + z2 ( )
f x, ± y 2 + z 2 = 0
y-axis x ± x2 + z 2 ( )
f ± x2 + z 2 , y = 0
f ( x, z ) = 0 x-axis z ± y2 + z2 ( )
f x, ± y 2 + z 2 = 0
z-axis x ± x2 + y 2 ( )
f ± x2 + y2 , z = 0
f ( y, z ) = 0 y-axis z ± x2 + z 2 ( )
f y, ± x 2 + z 2 = 0
z-axis y ± x2 + y 2 ( )
f ± x2 + y2 , z = 0
Cylindrical and Spherical Coordinates
3
Cylindrical coordinates are an extension of polar coordinates to by adding the
familiar z coordinate. The Cartesian coordinates ( x, y , z ) are transformed to
( r ,θ , z ) = ( )
x 2 + y 2 , arctan xy , z , where we repeat the warning about the non-uniqueness
of the angle θ .
Spherical coordinates are closer in spirit to polar coordinates than even
cylindrical coordinates. They take the form ( ρ , φ ,θ ) , where ρ is the length of the radius
vector from the origin, φ is the angle the radius vector makes with the positive z-axis,
and θ is the angle between the positive x-axis and the projection (or shadow, as it may be
helpful to think) of the radius vector on the x-y plane, r.
24
The following relations hold:
ρ 2 = x2 + y 2 + z 2
x = r cos θ = ρ sin φ cos θ
y = r sin θ = ρ sin φ sin θ
z = ρ cos φ
Tangent Plane to a Surface
For a function z = f ( x, y ) describing a surface, the equation of the plane tangent to that
surface at a point P = ( x0 , y0 , z0 ) is:
∂z ∂z
z − z0 = ⋅ ( x − x0 ) + ⋅ ( y − y0 ) .
∂x P ∂y P
In the case that the surface is described implicitly by f ( x, y , z ) = c , we find the above
partial derivatives by implicit differentiation. (There is another option as well, which
we’ll discuss in the section on directional derivatives.) Equations for tangent planes can
be used for linear approximations, analogous to tangent lines.
Chain Rule for Partial Derivatives
To use the chain rule for derivatives of multivariate functions, care must be taken to
account for every “path” to the variable of interest from the others. For example,
consider the function z = F ( x, u , v ) , where u = f ( x, y ) and v = g ( x, y ) . Let’s say that
we want to calculate the partial derivative of z with respect to x. We notice that there are
three paths to x, one directly from F and one each from u and v. We then have:
⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎛ ∂u ⎞ ⎛ ∂z ⎞ ⎛ ∂v ⎞
⎜ ⎟ = ⎜ ⎟ +⎜ ⎟ ⎜ ⎟ +⎜ ⎟ ⎜ ⎟ .
⎝ ∂x ⎠ y ⎝ ∂x ⎠u ,v ⎝ ∂u ⎠ x ,v ⎝ ∂x ⎠ y ⎝ ∂v ⎠ x ,u ⎝ ∂x ⎠ y
25
The subscripts are a bit of bookkeeping, reminding us which variables are treated as
constants in the partial differentiation steps.
Directional Derivatives and the Gradient
The main differential operator on 3

is del, denoted ∇ , which is equal to:
∂ ˆ ∂ ˆ ∂ ˆ
∇= i + j+ k .
∂x ∂y ∂z
A scalar function f can be converted into a vector called the gradient, and the del
operator can also operate on vectors through the divergence and curl, all of which will
be discussed later.
Consider a surface z = f ( x, y ) and a point P = ( x0 , y0 ) in the domain of f. If we
want to find the rate of change of the surface f, we must first give thought to the direction
we’re heading. (Think about standing on a rocky hill. Its rate of change is not uniform; it
will be different depending on whether we head forward or back, left or right.) The
directional derivative takes into account both the point we’re interested in and the
direction we’re heading, which we indicate with the unit vector û :
Du f P
= ∇f P
⋅ uˆ .
Examining this equation and noting the employment of the dot product, we know that
Du f P
= ∇f P
uˆ cos θ , where θ is the angle between the unit vector and the gradient.
Clearly, the norm of any unit vector is one, so this reduces to Du f P

= ∇f P
cos θ ,
which tells us that the directional derivative attains its maximum value when cos θ = 1 ,
which is when θ = 0 . This tells us that the gradient, ∇f , points in the direction in which
f increases most rapidly, and its magnitude gives the maximum rate of increase.
26
Similarly, −∇f points in the direction in which f decreases most rapidly. Examining a
level surface f ( x, y , z ) = c reveals that the gradient will equal zero at any point on that
level surface. This tells us that for a function f ( x, y , z ) , the vector ∇f P

is perpendicular
to the level surface of f that contains P. Similarly, for a function f ( x, y ) , the vector
∇f P
is perpendicular to the level curve of f that contains P. This fact gives us another
option for writing the equation of the tangent plane at P = ( x0 , y0 , z0 ) :
f x P ⋅ ( x − x0 ) + f y ⋅ ( y − y0 ) + f z P ⋅ ( z − z0 ) = 0 .
P
For completeness, we’ll briefly mention the divergence and curl of a vector field.
The divergence, a scalar quantity, can be interpreted as the net flow of “density” out of a
unit of space. It is defined as follows:
∂F1 ∂F2 ∂F3

div F = ∇ ⋅ F = + + .
∂x ∂y ∂z
The curl, a vector quantity, can be interpreted as the rotation of a vector field. It is equal
to the maximum circulation at each point and is oriented perpendicularly to the plane of
circulation, following the right-hand rule. It is defined as follows:
î ˆj kˆ
∂ ∂ ∂
curl F = ∇ × F = .
∂x ∂y ∂z
F1 F2 F3
Maxima and Minima
Just as with single-variable functions, multivariable functions attain critical points when
their derivatives equal zero. Critical points are also not necessarily maxima or minima in
multivariable functions. Saddle points, which are critical points that are neither maxima
27
nor minima, can exist for a function z = f ( x, y ) at a point P0 = ( x0 , y0 ) if z = f ( x, y0 )
has a maximum at x = x0 while z = f ( x0 , y ) has a minimum at y = y0 (or vice versa).
We can evaluate the possibilities by computing the Hessian, the determinant of the
Hessian matrix:
⎡ f xx f xy ⎤ 2
∆ = det ⎢ ⎥ = f xx ( P0 ) f yy ( P0 ) − ⎡⎣ f xy ( P0 ) ⎤⎦ ,
⎣ f yx f yy ⎦
P0
where we make use of the fact that for continuous functions, f xy = f yx .
• If ∆ > 0 and f xx ( P0 ) < 0 , then f attains a local maximum at P0 .
• If ∆ > 0 and f xx ( P0 ) > 0 , then f attains a local minimum at P0 .
• If ∆ < 0 , then f has a saddle point at P0 .
• If ∆ = 0 , then no conclusion can be drawn.
When solving maxima and minima problems subject to a constraint, the constraint
function is solved for one of the variables and substituted into the main function,
differentiated and equated to zero, and then solved as usual for critical points. These
points, along with the endpoints of any intervals given, must be tested to find the
extrema.
The Lagrange Multiplier
In some cases, the constraint equation will not be easy to solve for one of the variables.
In such a case (and in others), it can help to employ the method of the Lagrange
multiplier, which takes advantage of several interesting facts. Let’s say we’re looking to
maximize f ( x, y ) subject to a constraint g ( x, y ) = c . If M is an extreme value of f,
attained at the point P = ( x0 , y0 ) , then the level curve f ( x, y ) = M and the curve
28
g ( x, y ) = c share the same tangent line at P. Because they share the tangent line, they
also share the normal line at P. Since ∇f is normal to the curve f ( x, y ) = M and ∇g is
normal to the curve g ( x, y ) = c , the gradients must be parallel, which is to say that one is
a scalar multiple of the other. That is, ∇f = λ∇g for some scalar λ , called the Lagrange
multiplier, whose value is unimportant in itself, although we will solve for it. This
assumes, of course, that ∇g is not the zero vector.
The gradients are computed, and the resulting components on the left and right
side are equated and treated as simultaneous equations that are solved for λ to find the
necessary relationship between the independent variables. (The parameter λ disappears
at this step.) The new relationship is substituted into the original equation to be
maximized and solved for the remaining variable. Critical values are then obtained and
tested.
Line Integrals
The standard definite integral computes the area between a curve and a straight-line
segment of one of the coordinate axes. But integration can follow more complicated
paths, giving rise to the line (or path or contour) integral.
Line Integrals with Respect to Arc Length
This is a close relative of the standard Riemann integral, in that the value of a function at
a point is multiplied by the length of the curve containing that point. Imagine, for
instance, a function f ( x, y ) and a curve C, which is defined parametrically. If we break
n
the curve into n tiny pieces, we have lim ∑ f ( xi , yi ) ∆si = ∫ f ds . Geometrically, this is
n →∞ C
i =1
29
equivalent to calculating the area of a curtain with C as its base and f ( x, y ) as its height
for each point ( x, y ) along C. The integral is evaluated by first parameterizing C:
⎧ x = x(t )
C=⎨ for a ≤ t ≤ b .
⎩ y = y (t )
The curve C is considered to be directed, meaning that it has a definite initial and final
b a
point. (This detail is similar to the idea that ∫
a
f ( x) dx = − ∫ f ( x) dx .) Since for a
b
differential treatment of the curve, we have ( ds ) = ( dx ) + ( dy ) , we can write:

2 2 2
2 2
ds ⎛ dx ⎞ ⎛ dy ⎞
[ x′(t )] + [ y′(t )]
2 2
= ± ⎜ ⎟ +⎜ ⎟ = ± ,
dt ⎝ dt ⎠ ⎝ dt ⎠
where the sign we choose depends on the direction we’re taking along our path. We then
b ds
have ∫ f ds = ∫ f ( x(t ), y (t ) ) dt .
C a dt
b ds
The treatment in 3
is identical, as ∫ f ds = ∫ f ( x(t ), y (t ), z (t ) ) dt , where
C a dt
2 2 2
ds ⎛ dx ⎞ ⎛ dy ⎞ ⎛ dz ⎞
[ x′(t )] + [ y′(t )] + [ z′(t )]
2 2 2
= ± ⎜ ⎟ +⎜ ⎟ +⎜ ⎟ = ± .
dt ⎝ dt ⎠ ⎝ dt ⎠ ⎝ dt ⎠
3
In , we can interpret the line integral using the example of C as a curved wire with
linear density f ( x, y , z ) . The line integral of f over C would give the total mass of the
wire.
The Line Integral of a Vector Field
A parametric curve given by the vector equation r = r (t ) = ( x(t ), y (t ) ) is considered
smooth if the derivative r ′(t ) is continuous and nonzero. A curve is considered
piecewise smooth if it is composed of finitely many smooth curves joined at consecutive
30
endpoints. A vector field is a vector-valued function. Let D be a region of the x-y plane
on which a pair of continuous functions, M ( x, y ) and N ( x, y ) , are both defined. Then
the function F( x, y ) = M ( x, y )î + N ( x, y )ˆj = ( M ( x, y ), N ( x, y ) ) is a continuous vector
field on D. For a curve C parameterized by r (t ) = x(t )î + y (t )ˆj = ( x(t ), y (t ) ) for a ≤ t ≤ b ,
we define the line integral of the vector field F as:
b
∫ F ⋅ dr = ∫ F ( r (t ) ) ⋅ r′(t ) dt
C a
b
= ∫ F ( x(t ), y (t ) ) ⋅ ( x′(t ), y′(t ) ) dt
a
b
= ∫ ⎡⎣ M ( x(t ), y (t ) ) x′(t ) + N ( x(t ), y (t ) ) y′(t ) ⎤⎦ dt.
a
3
The situation in is defined analogously. If we consider F to be a force field, the line
integral can be interpreted as the work done on a particle to move it along the path C.
In the case that F is equal to the gradient of a scalar field (i.e., a real-valued
function), then it is a gradient field. A function f such that F = ∇f is called a potential
for F. The fundamental theorem of calculus for line integrals says that the line
integral of a gradient field depends only on the endpoints of the path and not the path
itself. For any piecewise smooth curve C oriented from A to B and a continuously
differentiable function f defined on C, we have ∫

C
∇f ⋅ dr = f ( B) − f ( A) . For
F( x, y ) = M ( x, y )î + N ( x, y )ˆj = ( M ( x, y ), N ( x, y ) ) to be a gradient field, it is necessary
(but not sufficient) that ∂M ∂y = ∂N ∂x . This simply states that f xy = f yx , which will be
the case for continuous functions. If the domain of F, which we’ll call R, is a simply
connected region—meaning that the interior of every simple closed curve drawn in R is
contained in R, then the above mixed-partials equality condition is sufficient.
31
It should be clear that for a gradient field F, ∫ C
F ⋅ dr = 0 for any closed path C
(where the circle on the integral symbol indicates a closed path).
Double Integrals
When evaluating double integrals over a region R, it is helpful to examine the region to
select the limits of integration that provide for the easiest calculation of each iterated
integral. For example, when evaluating the integral ∫∫ f ( x, y) dx dy ,

R
an imaginary
horizontal line is placed over the region, and expressions for x in terms of y are derived.
An integral with respect to x will then be evaluated, and the y terms will be substituted in
as the limits of integration. An imaginary vertical line will determine the y limits, which
should be numerical (the outside integral should not contain variables). An integral with
respect to y will be evaluated between these limits, and a numerical result will be
obtained. The situation is exactly reversed if the differentials are switched in the integral.
In short:
y =d x=h( y )
∫∫ f ( x, y) dA = ∫ ∫
R
y =c x= g ( y )
f ( x, y ) dx dy
x =b y=H ( x)
=∫ ∫ f ( x, y ) dy dx
x=a y =G ( x )
for some functions h, g, H, and G that describe the region R in terms of the variables
needed.
Double integrals are often evaluated in polar coordinates. The procedure is the
same as with Cartesian double integrals, except for one thing: The element of area
dA = r dr dθ . The reason for this becomes clear when one considers that these small
elements are “curvilinear rectangles,” the curved side of which will have arc length r dθ .
32
When evaluating an integral that corresponds to the volume of a region bounded above
by some function and bounded below by another function, the function providing the
upper bound is integrated, while the lower-bound function is treated as the region R.
Green’s Theorem
Green’s theorem is a powerful tool for dealing with integrals, since it relates double
integrals to line integrals, one of which might be easier to evaluate in a given situation.
First we define a simple closed curve, which is a curve that does not intersect itself.
Circles, ellipses, squares, and rectangles are examples of simple closed curves. A curve
is oriented, with the positive direction defined as the direction that, if walked, would put
the interior of the region bounded by the curve on your left.
Consider a simple closed curve C enclosing a region R, such that C is the
boundary of R. If M ( x, y ) and N ( x, y ) are functions that are defined and have
continuous partial derivatives on C and throughout R, Green’s theorem states:
⎛ ∂N ∂M ⎞
∫ C
M dx + N dy = ∫∫ ⎜
R ⎝ ∂x
−
∂y
⎟ dA.
⎠
A clever selection of M ( x, y ) = − y and N ( x, y ) = x gives us the interesting result
∫∫ dA = ∫
R
1
2 C
x dy − y dx by Green’s theorem. (Evaluating the integral with either M or N
set to zero while leaving the other unchanged will give the area of the region R, hence the
halving of the result above. That is, ∫ C

x dy = − ∫ y dx = AR .)
C
33
Differential Equations
Separable Equations
dy f ( x)
A differential equation of the form = is called separable, since its variables can
dx g ( y )
be separated out. Its solution is straightforward: ∫ g ( y ) dy = ∫ f ( x) dx .
Homogeneous Equations
A function of two variables, f ( x, y ) , is said to be homogeneous of degree n if there is a
constant n such that f (tx, ty ) = t n f ( x, y ) for all t, x, and y for which both sides are
defined. A differential equation M ( x, y ) dx + N ( x, y ) dy = 0 is homogeneous if M and N
are both homogeneous functions of the same degree. Homogeneous equations are
soluble by introducing the substitution y = vx , where v is considered as a function of x.
This substitution makes the differential equation separable after algebraic manipulation.
Exact Equations
Given a function f ( x, y ) , its total differential is defined as:
∂f ∂f
df = dx + dy .
∂x ∂y
This tells us that the family of curves f ( x, y ) = c satisfies the differential equation
df = 0 . If there exists a function f ( x, y ) such that ∂f ∂x = M ( x, y ) and
∂f ∂y = N ( x, y ) , then M ( x, y ) dx + N ( x, y ) dy = 0 is called an exact differential, and it
34
has the general solution f ( x, y ) = c . An equation is exact if ∂M ∂y = ∂N ∂x , which we
recognize as the now-familiar equating of the mixed partials, f xy = f yx .
Nonexact Equations and Integrating Factors
A differential equation that is not exact can be made exact by being multiplied through by
an integrating factor, µ , which is a function in one of the variables of the differential
equation. Any nonexact differential equation that has a solution also has an integrating
factor, even if that factor is not always particularly easy to find. There are, however,
some cases for which the process of finding it is straightforward.
Given a nonexact differential equation M ( x, y ) dx + N ( x, y ) dy = 0 , for which
M y − N x ≠ 0 , we have the following two cases:
1. If (M y − N x ) N is a function of x alone, call this function ξ ( x) . Then
µ ( x) = exp ∫ ξ ( x) dx is an integrating factor.
2. If (M y − N x ) − M is a function of y alone, call this function ψ ( y ) . Then
µ ( y ) = exp ∫ψ ( y ) dy is an integrating factor.
First-Order Linear Equations
A first-order linear equation is defined as a differential equation of the form:
dy
+ P( x) y = Q( x) .
dx
Equations of this type are soluble through the use of an integrating factor,
dy
µ ( x ) = exp ∫ P ( x) dx . Multiplying through yields µ ( x) + µ ( x) P( x) y = µ ( x)Q( x) . The
dx
35
d
left-hand side becomes ( µ y )′ , and the equation becomes ( µ y ) = µ Q . Integrating
dx
both sides gives the general solution:
1
( µQ ) dx .
µ∫
y=
Higher-Order Linear Equations with Constant Coefficients
The general second-order linear differential equation with constant coefficients is
ay′′ + by ′ + cy = d ( x) .
In the case that d ( x) = 0 , the equation is said to be homogeneous (the definition here,
which is more common, differs from that of the other type of homogeneous equation).
The general solution of any nonhomogeneous equation is y = yh + y p , where yh is the
general solution of the corresponding homogeneous equation, and y p is any particular
solution of the nonhomogeneous equation.
The homogeneous equation is solved by converting it into its corresponding
auxiliary polynomial, in which the nth derivative of y becomes the nth power of m:
ay′′ + by′ + cy = 0 → am2 + bm + c = 0 .
This equation is solved for its roots, which form the basis of the solution of the
differential equation according to the following three cases:
1. The roots are real and distinct: The general solution is y = c1e m1x + c2 e m2 x .
2. The roots are real and identical: The general solution is y = c1e m1x + c2 xe m1x .
3. The roots are complex conjugates, m = α ± β i : The general solution is
y = eα x ( c1 cos β x + c2 sin β x ) .
36
Linear Algebra
Matrices
An m × n matrix is an array of numbers corresponding to a group of m row vectors and n
column vectors. For a matrix A, the aij entry is in row i and column j. Matrices of the
same size can be added and subtracted element by element. That is, for matrices A and B,
( A ± B )ij = aij ± bij . Clearly, matrix addition and subtraction are commutative. Matrices
can be multiplied by scalars, yielding a new matrix in which each element has been
multiplied by that scalar.
Two matrices can be multiplied together only if the number of columns of the first
matrix equals the number of rows of the second. An m × n matrix multiplied by an n × p
matrix yields an m × p matrix. Matrix multiplication is defined such that
( AB )ij = ri ( A) ⋅ c j ( B) , where ri ( A) denotes the ith row vector of the matrix A and c j ( B )
denotes the jth column vector of the matrix B.
The identity matrix I has entries defined by the Kronecker delta:
⎧1 if i = j
δ ij = ⎨ .
⎩0 if i ≠ j
An inverse of a matrix A is another matrix A−1 such that the product AA−1 = A−1 A = I . A
matrix that has an inverse is called invertible. There are several methods for finding an
inverse matrix, which we’ll discuss shortly.
An augmented matrix is a coefficient matrix with the column vector of the
constants from the right-hand side of the simultaneous equations appended on the right.
An augmented matrix is solved by putting it in echelon form, in which the matrix is
37
upper triangular with any zero rows at the bottom, with the first nonzero entry in any row
appearing to the right of any nonzero entry in the row above. There are three elementary
row operations that are used to reduce a matrix to echelon form:
1. Multiplying a row by a nonzero constant.
2. Interchanging two rows.
3. Adding a multiple of one row to another row.
Variables are solved for by working from the bottom up of the reduced matrix, a
procedure called back-substitution. In order to obtain a unique solution for
simultaneous equations in n unknowns, n equations are required. (It is also required that
they be linearly independent, a concept we’ll visit shortly.) In the case that infinitely
many solutions to a system are possible, it may be the case that one or more of the
variables are free, in which case they can be represented by parameters. The other
variables can be solved for in terms of the parameters.
Any system of linear equations can be written as Ax = b , where A is the system’s
coefficient matrix, x is the column vector of the unknowns, and b is the column vector of
the constant terms to the right of the equal sign. In the case that A is invertible, the
system can be solved using the relation A−1 Ax = A−1b ⇒ x = A−1b .
Vector Spaces
A vector space V is a set that is closed under the two operations addition and scalar
multiplication. This means that for any x1 , x 2 ∈V , x1 + x 2 ∈V and kx1 , kx 2 ∈ V for any
scalar k. It should be evident from these conditions that the zero vector is required to be
in any vector space.
38
If A is an m × n matrix, the set of all vectors x such that Ax = 0 is called the
n
nullspace of A, denoted N ( A) . The nullspace is a subspace of . It should be
apparent that an invertible matrix will only have the trivial subspace, the zero vector, as
its nullspace.
Consider a collection of n-vectors, v1 , v 2 ,… , v m . Any expression of the form
∑k v
i =1
i i , where ki represent scalars, is called a linear combination of the vectors vi .
The set of all possible linear combinations of the vectors vi is called their span. For
example, since every vector in 3

can be expressed as a linear combination of î , ĵ , and
3 n
k̂ , their span is . The span of any collection of n-vectors is a subspace of . The
m
vectors vi are called linearly independent if ∑k v
i =1
i i = 0 ⇒ ki = 0 for all i. This means
that no combination of some vectors cancels other vectors out, or that no vector can be
written as a linear combination of other vectors. If this is not the case, the vectors are
said to be linearly dependent. A minimal spanning set of vectors is called a basis for
the vector space they represent. The number of vectors in the basis is the dimension of
the vector space.
Let {a1 , a 2 ,… , a n } be a set of n-vectors in n

, and let A be an n × n matrix such
that {a1 , a 2 ,… , a n } form the columns of A. The vector set {a1 , a 2 ,… , a n } is linearly
independent if and only if det A ≠ 0 , which is equivalent to saying that A is nonsingular.
In an m × n matrix, the columns can be regarded as m-vectors, and the rows can
be regarded as n-vectors. The maximum number of linearly independent columns is
called the column rank, and the maximum number of linearly independent rows is called
39
the row rank. In any matrix, the column rank is equal to the row rank; this is simply the
m
rank of the matrix. The subspace of spanned by the columns is called the column
n
space, CS ( A) , and the subspace of spanned by the rows is called the row space,
RS ( A) . A vector b is in CS ( A) if and only if there is a collection of scalars ki such that
∑k c
i =1
i i = b , where ci are the column vectors. This requires that Ak = b have a solution
for k = ( k1 , k2 ,… kn ) . The row space of A is best considered as CS ( AT ) .

T
Determinants
The determinant is only defined for square matrices. For an n × n matrix A, we first
define the minor, M ij , as the determinant of the ( n − 1) × (n − 1) matrix that results from
eliminating row i and column j from matrix A. The cofactor of any matrix entry aij is
cof ( aij ) = ( −1)

i+ j
M ij . The determinant of any square matrix A can be computed by the
Laplace expansion:
n n
det A = ∑ aij cof ( aij ) = ∑ aij cof ( aij ) .
j =1 i =1
This tells us that determinants can be evaluated along any row or column. This fact
makes it clear that any matrix with a column or row made up of only zeros will have a
zero determinant.
The adjugate matrix of A is the transpose of the cofactor matrix of A. That is,
Adj A = ⎡⎣cof ( aij ) ⎤⎦ . This matrix is important in calculating the inverse of a matrix, since
T
1
A−1 = Adj A .
det A
40
Cramer’s rule allows a square linear system Ax = b to be solved exclusively by
determinants. Let A j denote the matrix formed by replacing column j of A with the
column vector b. Then Cramer’s rule says:
Aj
xj = .
A
The Wronskian can be helpful when evaluating functions for linear independence
in a vector space. For functions [ f1 , f 2 ,…, f n ] ( x) , we define the Wronskian as the
determinant
f1 ( x) f 2 ( x) f n ( x)
f1′( x) f 2′( x) f n′( x)
W [ f1 , f 2 ,… , f n ] ( x) = .
f1( n −1) ( x) f 2( n −1) ( x) f n( n −1) ( x)
If the Wronskian is nonzero, the functions are linearly independent. A zero Wronskian,
however, does not guarantee linear dependence.
Linear Transformations
For vector spaces V and W, a linear transformation (or linear map) is a function
T : V → W such that T (x1 + x 2 ) = T (x1 ) + T (x 2 ) and T ( kx) = kT ( x) for any vectors x ,
x1 , and x 2 in V and any scalar k. If W = V , then T is called a linear operator. There is
a connection to matrices in that, for an m × n matrix A, defining a function T : n

→ m
n m
by T ( x) = Ax gives a linear transformation. Conversely, if T : → is a linear
transformation, there exists an m × n matrix A such that T ( x) = Ax for all x in n

. In the
n m
case that and are considered with their standard bases, B = eˆ i , in which there is a
41
1 in position i and 0 elsewhere, then column i in A is simply the image of eˆ i . In this case,
A is called the standard matrix for T.
n m n
Let T : → be a linear transformation. The set of all vectors x in that
get mapped to 0 is called the kernel of T: ker T = {x : T (x) = 0} . It is a subspace of the
domain space, and its dimension is called the nullity of T. The set of all images of T is
called the range of T: R (T ) = {T ( x) : x ∈ n

}. This is a subspace of m
, and its
dimension is called the rank of T. If T is given by T ( x) = Ax , then the kernel of T is the
same as the nullspace of A, and the range of T is the same as the column space of A.
The rank plus nullity theorem states that the sum of the nullity and rank equals the
dimension of the domain. So for T above, dim ( ker T ) + R(T ) = n .
Let T : n
→ n
be a linear operator. If T is bijective, then T has an inverse, T −1 ,
defined so that T −1 (y ) = x if T ( x) = y . If A is the matrix representation of T, then A−1 is
the representative of T −1 . Further, if the matrix A represents the transformation

m n n p
S: → and B represents the transformation T : → , then the matrix product
AB represents the composition T S .
Eigenvalues and Eigenvectors
In general, multiplying a square matrix A by a compatible nonzero vector x does not
produce a scalar multiple of x. However, if it happens that Ax = λ x for some scalar λ ,
then λ is called an eigenvalue (or characteristic value) of A, and the nonzero vector x
is called an eigenvector (or characteristic vector). Eigenvalues and eigenvectors are
defined only for square matrices. To find these values, we must find scalars and vectors
42
that satisfy the equation Ax = λ x , which can be rewritten as ( A − λ I ) x = 0 . This will
only be nontrivially soluble if A − λI is noninvertible, which means that
det ( A − λ I ) = 0 . Calculation of this determinant produces the characteristic equation.
For 2 × 2 square matrices, this is a quadratic that must be solved for λ . Once the
eigenvalues are found, each is substituted into the equation ( A − λ I ) x = 0 and solved for
the corresponding eigenvector (which may be a family of vectors due to possible free
variables). It is an interesting property that ∑ λ = tr( A) , where tr( A) is the trace of the
matrix, the sum of its diagonal entries. Also, ∏ λ = det A . (In both cases, it is
understood that the sum and product are over all eigenvalues for A.)
The eigenspace of A corresponding to λ , Eλ ( A) = {x : Ax = λ x} , is the collection
of all eigenvectors corresponding to λ along with the zero vector. Eλ ( A) = N ( A − λ I ) .
The Cayley-Hamilton theorem states that every square matrix satisfies its own
characteristic equation. This is useful for expressing an integer power of a matrix A as a
n n
linear polynomial in A. That is, if det ( A − λ I ) = ∑ α i λ i = 0 , then ∑α A i
i
= 0 , where we
i =0 i =0
consider A0 = I .
43
Number Theory
If a and b are positive integers, the division algorithm says that we can find unique
integers q and r such that b = qa + r , with 0 ≤ r < a . This algorithm is employed in
sequence to find the greatest common divisor of two integers in the Euclidean
algorithm: Given two numbers, a and b (where we’ll assume a > b ), we know we can
find a quotient and remainder q1 and r1 such that a = q1b + r1 . We then apply the division
algorithm to the divisor and the first remainder: b = q2 r1 + r2 . We continue this process
until there is no remainder (i.e., rn − 2 = qn rn −1 + rn , where rn = 0 ). The divisor in the step
that yields no remainder is the greatest common divisor, or gcd, of the two numbers. If
gcd( a, b) = 1 , then a and b are said to be relatively prime. The product of the greatest
common divisor and the least common multiple of two numbers is equal to the product of
the two numbers: [ gcd(a, b)] ⋅ [ lcm(a, b)] = ab .
The Diophantine equation ax + by = c will have infinitely many integral
solutions if gcd( a, b) | c . Any one solution ( x1 , y1 ) ∈ 2

reveals the others:
b
x = x1 + t
gcd(a, b)
a
y = y1 − t
gcd(a, b)
for any t ∈ .
The greatest common divisor of any two numbers can always be written as a
linear combination of those two numbers. This is done by working backward through the
Euclidean algorithm and substituting in steps that lead to the original numbers.
44
Congruence
For integers a, b, and n, we say that a is congruent to b modulo n if a − b is divisible by
n. That is, a ≡ b mod n if n | ( a − b) .
If ab ≡ ac mod n , then
o b ≡ c mod n if gcd( a, n) = 1
o b ≡ c mod gcd(na ,n ) if gcd( a, n) > 1
Fermat’s little theorem states that if p is a prime and a is an integer:
o a p −1 ≡ 1mod p if p does not divide a
o a p ≡ a mod p for any integer a
The linear congruence equation, ax ≡ b mod n , has a solution if and only if gcd( a, n)
divides b. If gcd( a, n) = 1 , the solution is unique mod n; if gcd( a, n) > 1 , the solution is
unique mod n
gcd( a , n )
.
Abstract Algebra
Binary Structures and Groups
Let S be a nonempty set. A function f : S × S → S defined on every ordered pair of
elements of S to give a result that is also in S is called a binary operation on S. Binary
operations are typically written showing the set and the operation. In the case of f above,
we might write ( S ,i ) .
A binary operation, i , is said to be associative if, for every a, b, c ∈ S , the
following equation always holds: a i(bic ) = ( a ib)ic . A binary structure whose binary
operation is associative is called a semigroup.
45
Given a binary structure, ( S ,i ) , an element e ∈ S with the property aie = eia = a
for every a ∈ S is called the identity. A semigroup with an identity is called a monoid.
Let ( S ,i ) be a monoid, and let a be an element in S. If there is an element a ∈ S
such that aia = aia = e , we call a the inverse of a. A monoid with the property that
every element in S has an inverse is called a group.
If the binary operation of a group ( S ,i ) has the property that aib = bia for every
a, b ∈ S , the binary operation is commutative. A semigroup, monoid, or group whose
binary operation is commutative is said to be abelian.
If the group ( S ,i ) contains precisely n elements for some positive integer n, then
the group is finite of order n. Otherwise, we say the group is infinite.
For any group, the inverse ( xi y ) = y −1 i x −1 .

−1
Cyclic Groups
A group G with the property that there exists an element a ∈ G such that
G = {a n : n = 0,1, 2,…} is said to be cyclic, and the element a is called the generator of
the group. (We clarify here that a 0 is the identity, a1 = a , a 2 = a i a , etc.) A cyclic
group has at least one generator.
Consider the set n = {0,1, 2,… , n − 1} and the group ( n , ⊕ ) , the group whose
binary operation is addition modulo n. The integer m is a generator of ( n , ⊕ ) if and
only if m is relatively prime to n. In more general terms, let G be a cyclic group with
generator a, and let n be the smallest integer such that a n = e . Then the element a m is a
generator of G if and only if m is relatively prime to n.
46
Subgroups
Let ( G,i ) be a group. If there’s a subset H ⊆ G such that ( H ,i ) is also a group, then H
is a subgroup of G, and we write H ≤ G . (If H ≠ G , we can write H < G to denote H
as a proper subgroup of G.) Every group has at least two subgroups: the trivial
subgroup, {e} , consisting of just the identity element, and the group itself.
Let a be an element of a group G. With the binary operation defined on G, the set
{a n
:n∈ }, also denoted by a , is a subgroup of G called the cyclic subgroup
generated by a. It consists of all the integer powers of a. In the case that n is a negative
integer, we consider a −1 = a to be the inverse, so a − n = a i a i…i a (n times).
We can define the order of a group element as follows: the order of a ∈ G is the
order of a , the cyclic subgroup generated by a. Equivalently, the order of an element
is the smallest integer n such that a n = e , where e is the identity. The cyclic subgroup
generated by a is the smallest group of G containing a. More generally, if ai ∈ G for
every i in some indexing set I, then the subgroup generated by {ai } is the subgroup
consisting of all finite products of terms of the form aini and is the smallest subgroup of G
containing all the elements ai . If the subgroup is all of G, then we say that G is
generated by {ai } and that the elements ai are generators of G.
Lagrange’s theorem: Let G be a finite group. If H is a subgroup of G, then
the order of H divides the order of G.
Let G be a finite, abelian group of order n. Then G has at least one subgroup
of order d for every positive divisor d of n.
47
Let G be a finite, cyclic group of order n. Then G has exactly one subgroup—
a cyclic subgroup—of order d for every positive divisor d of n. If G is
generated by a, then the subgroup generated by the element b = a m has order
d = n gcd(m, n) . (If m = 0 , we say that gcd( m, n) = n .)
Cauchy’s theorem: Let G be a finite group of order n, and let p be a prime
that divides n. Then G has at least one subgroup of order p.
Sylow’s first theorem: Let G be a finite group of order n, and let n = p k m ,
where p is a prime that does not divide m. Then G has at least one subgroup of
order p i for every i ∈ [ 0, k ] ⊂ .
Isomorphisms
Consider the multiplication table for the three-element group below:
i e a b
e e a b
a a b e
b b e a
We compare that to the table for the group 3 :
⊕ 0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
48
We notice that, although the symbols are different, the structure is exactly the same.
Structurally identical groups are said to be isomorphic. We denote an isomorphism
between two groups G1 and G2 by G1 ≅ G2 . Isomorphic groups share identical structural
properties, so if we know details about one group and also know that it is isomorphic to
another group, we know the other group’s details automatically. These details include
the order of the group, the number of subgroups of a particular order, whether it is cyclic
or abelian, etc.
The Classification of Finite Abelian Groups
Let ( G1 ,i1 ) and ( G2 ,i 2 ) be groups. On the set G1 × G2 = {(a, b) : a ∈ G1 ∧ b ∈ G2 } , define
a binary operation, i , such that ( a1 , a2 )i( b1 , b2 ) = ( a1 i1 b1 , a2 i 2 b2 ) . Then ( G1 × G2 ,i ) is a
group, called the direct product of the groups G1 and G2 . If G1 and G2 are finite, and
G1 has order m and G2 has order n, then G1 × G2 has order mn. If G1 and G2 are both
abelian, then the notation G1 ⊕ G2 is sometimes used, and the resulting abelian group is
called the direct sum of G1 and G2 . This definition can be generalized to any number of
groups, not just two.
The direct sum m1 ⊕ m2 ⊕ ⊕ mk is cyclic if and only if gcd( mi , m j ) = 1
for every distinct pair mi and m j , in which case m1 ⊕ m2 ⊕ ⊕ mk is
isomorphic to m1m2 mk .
Every finite abelian group G is isomorphic to a direct sum of the form
⊕ ⊕ ⊕ , where the pi are (not necessarily distinct) primes,

( p1 )k1 ( p2 )k2 ( pr ) k r
and the ki are (not necessarily distinct) positive integers. The collection of
49
( pi )
ki
prime powers, , for a given representation of G are known as the
elementary divisors of G.
Every finite abelian group G is isomorphic to a direct sum of the form
m1 ⊕ m2 ⊕ ⊕ mt , where m1 ≥ 2 and mi | m j for every j = i + 1 . The
integers mi are not necessarily distinct, but the list m1 ,… , mt is unique. These
integers are called the invariant factors of G.
Group Homomorphisms
Let ( G,i ) and ( G′, ∗) be groups. A function φ : G → G ′ with the property that
φ ( aib ) = φ (a) ∗ φ (b) for all elements a, b ∈ G is called a group homomorphism. An
injective (one-to-one) homomorphism is called a monomorphism; a surjective (onto)
homomorphism is called an epimorphism; a bijective (one-to-one and onto)
homomorphism is called an isomorphism. We recall the earlier fact that two groups that
are structurally identical are isomorphic; this is true if an only if there exists a bijective
homomorphism between the two groups. A homomorphism from a group to itself is
called an endomorphism, and an isomorphism from a group to itself is called an
automorphism.
For any group homomorphism φ : G → G ′ :
• If e is the identity in G, then φ (e) is the identity in G′ .
• If g ∈ G has finite order m, then φ ( g ) ∈ G ′ also has order m.
• If a −1 is the inverse of a in G, then φ (a −1 ) is the inverse of φ (a ) in G′ .
50
• If H is a subgroup of G, then φ ( H ) is a subgroup of G′ , where
φ ( H ) = {φ (h) : h ∈ H } .
• If G is finite, then the order of φ (G ) divides the order of G; if G′ is finite, then
the order of φ (G ) also divides the order of G′ .
• If H ′ is a subgroup of G′ , then φ −1 ( H ′) is a subgroup of G, where
φ −1 ( H ′) = {h ∈ G : φ (h) ∈ H ′} .
• If φ : G → G ′ is a homomorphism of groups, then {e′} , where e′ is the identity in
G′ , is the trivial subgroup of G′ . The inverse image of {e′} is a subgroup of G.
This subgroup is the kernel of φ , which is defined by ker φ = { g ∈ G : φ ( g ) = e′} .
A homomorphism is a monomorphism if and only if its kernel is trivial.
Rings
A set R, together with two binary operations (we’ll choose addition, +, and multiplication,
i , here), is called a ring if the following conditions are satisfied:
• ( R, + ) is an abelian group;
• ( R,i ) is a semigroup;
• The distributive laws hold; namely, for every a, b, c ∈ R , we have:
a i(b + c) = a ib + a ic and ( a + b)ic = a ic + bic .
If the multiplicative semigroup ( R,i ) is a monoid (that is, if it has a multiplicative
identity), then R is called a ring with unity. If the operation of multiplication is
commutative, then R is a commutative ring. For S ⊂ R satisfying the ring
51
requirements, we call ( S , +,i ) a subring of ( R, +,i ) . The characteristic of a ring is the
smallest integer n such that na = 0 for every a ∈ R . If no such n exists, as in the case of
the infinite rings , , , and , the ring is said to have characteristic zero. For cases
when char R > 0 , it is sufficient to check for the smallest n such that ni1 = 0 .
Ring Homomorphisms
Let ( R , +, × ) and ( R′, ⊕, ⊗ ) be rings. A function φ : R → R′ is called a ring
homomorphism if both of the following conditions hold for all a, b ∈ R :
φ (a + b) = φ (a ) ⊕ φ (b)
.
φ (a × b) = φ (a ) ⊗ φ (b)
For any ring homomorphism φ : R → R′ :
• The kernel of a ring homomorphism is the set ker φ = {a ∈ R : φ (a) = 0′} , where 0′
is the additive identity in R ′ . The kernel of a ring homomorphism φ : R → R′ is a
subring of R.
• The image of R, φ ( R) = {φ (r ) : r ∈ R} , is a subring of R.
• The image of 0, the additive identity in R, must be 0′ , the additive identity in R ′ .
It follows from this that φ ( − r ) = −φ (r ) for all r ∈ R , where −r is the additive
inverse of r in R.
Fields
Let a be a nonzero element of a ring R with unity. Recall that the multiplicative structure
( R,i ) is not required to be a group, so a may not have an inverse. If it does have a
multiplicative inverse, then a is called a unit. If every nonzero element of R is a unit—
52
namely, if ( R* ,i ) is a group—then R is called a division ring. A commutative division
ring is called a field.
Additional Topics
Set Theory
• The symmetric difference of two sets is written as A∆B = ( A − B ) ∪ ( B − A) .
• DeMorgan’s laws say that:
( A ∩ B ) = AC ∪ BC
C
( A ∪ B ) = AC ∩ BC
C
( A ∪ B) − C = ( A − C ) ∪ ( B − C )
•
( A ∩ B) − C = ( A − C ) ∩ ( B − C )
A ∩ (B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C )
•
A ∪ (B ∩ C ) = ( A ∪ B) ∩ ( A ∪ C )
Two sets are equivalent if there exists a bijection between them. If there exists a
+
bijection between a set and the positive integers, , we say that the set is countably
infinite with cardinality ℵ0 . Some sets are uncountably infinite, meaning that no
bijection exists between them and the positive integers. The cardinality of the reals is
called the cardinality of the continuum, denoted sometimes as 2ℵ0 , since ≈℘ ( ).

+
(The power set of A, ℘( A) is the set of all subsets of A; it has cardinality 2card A .)
53
Combinatorics
For k objects, there are k ! possible arrangements of their order, called a permutation.
For k objects chosen from n total objects, there are P ( n, k ) possible arrangements, where
n!
P ( n, k ) = .
(n − k ) !
If order is not important (as in the drawing of a lottery or a hand of cards), the
possibilities are referred to as combinations, and there are C (n, k ) possibilities, where
P(n, k ) ⎛ n ⎞ n!
C (n, k ) = =⎜ ⎟= .
k! ⎝ k ⎠ (n − k )!k !
We note that the combination C (n, k ) is equal to the binomial coefficient, which comes
from the binomial theorem:
n
⎛n⎞
(a + b) = ∑ ⎜ ⎟ a n−k bk .
n
k =0 ⎝ k ⎠
When repetition is allowed, Pr ( n, k ) = n k , and Cr (n, k ) = C (n + k − 1, k ) .
A generalized statement of the pigeonhole principle says that if you are given n
different objects, each of which is painted one of c different colors, then for any integer
k ≤ ⎡⎣( n − 1) c ⎤⎦ + 1 , there are at least k objects painted the same color.
54
Probability and Statistics
Let S be a nonempty set. A Boolean algebra (or simply an algebra) of sets on S is a
nonempty subfamily of the power set of S, E ⊆℘( S ) , that satisfies the following two
conditions:
1. If A and B are sets in E , then so are A ∪ B and A ∩ B .
2. If A ∈E , then so is AC = S − A .
By definition, this set is nonempty, so it contains some set A, and by the second condition
listed above, it also contains the complement of A. By the first condition above, it also
contains the union of these sets, which means it contains all of S (and, necessarily, the
empty set).
Let E be an algebra of sets on S. A function P : E → [ 0,1] is called a probability
measure on E (or S if E is understood) if all of the following conditions are met:
1. P (∅ ) = 0 .
2. P ( S ) = 1 .
3. P ( A ∪ B ) = P ( A) + P ( B ) for every pair of disjoint sets A and B in E .
A probability space is a set S together with an algebra of sets on S and a probability
measure on S. The set S is called the sample space, the elements of S are called
outcomes, and the sets in E (which are subsets of S) are called events. With this in
mind, P ( A) is interpreted to be the probability that event A occurs. Two events, A and B,
are considered mutually exclusive if P ( A ∩ B ) = 0 .
• P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B ) .
• A and B are independent if and only if P ( A ∩ B ) = P ( A)i P ( B ) .
55
A Bernoulli trial is an experiment in which there are only two possible outcomes,
often termed a success or a failure. The probability of exactly k successes in n such trials
is given by an application of the binomial theorem:
⎛n⎞
P(k ; n, p) = ⎜ ⎟ p k q n − k ,
⎝k ⎠
where p is the probability of a success on any given trial and q = 1 − p is the probability
of failure. The distribution of all possibilities for k ∈ [ 0, n] ⊂ is called the binomial
distribution.
Let ( S , E, P ) be a probability space. A function X : S → is called a random
variable. To each outcome ω ∈ S , the function assigns some real number, X (ω ) . If we
consider the subset {ω : X (ω ) ≤ t} ⊂ S and assume it to be a member of the family E ,
this subset is an event. We can associate a function to the random variable such that
FX (t ) = P ({ω : X (ω ) ≤ t}) , and we call FX the distribution function of X. We can
abbreviate this by writing FX (t ) = P ( X ≤ t ) . This function gives the probability that the
random variable will take on a value no greater than t. We can also calculate the
probability that the random variable will be in a certain interval, ( t1 , t2 ] , by
P(t1 < X ≤ t2 ) = P( X ≤ t2 ) − P ( X ≤ t1 ) = FX (t2 ) − FX (t1 ) . This fact is arrived at by noting
that the events X ≤ t1 and t1 < X ≤ t2 are disjoint and considering their union, X ≤ t2 . An
algebraic rearrangement of the probabilities gives the expression above. To close or open
the interval we’re considering, we need only to add or subtract (respectively) the
probability at the endpoint (e.g., P (t1 ≤ X ≤ t2 ) = P( X ≤ t2 ) − P( X ≤ t1 ) + P ( X = t1 ) .
56
Random variables are often defined so that the probability that X will equal any
particular value is zero. Meaningful results only come from considering an interval that
X can fall into. Such a random variable (and its distribution function) is called
continuous. The derivative of the distribution function, f X = FX′ (t ) , is called the
probability density function of X. We require this function to be nonnegative and
t2 ∞
integrable, so that FX (t2 ) − FX (t1 ) = ∫ f X (t ) dt . It is also required that
t1 ∫
−∞
f X (t ) dt = 1 .
We also have the following result:
t
P( X ≤ t ) = FX (t ) = ∫ f X ( x) dx .
−∞
If X is a continuous random variable, we can calculate the expectation (or mean)
of X by:
∞
E ( X ) = µ ( X ) = ∫ tf X (t ) dt .
−∞
The variance of X is given by:
∞
Var( X ) = σ 2 ( X ) = ∫ [t − µ ( X )]
2
f X (t ) dt .
−∞
The standard deviation is equal to the square root of the variance.
A random variable is said to be normally distributed if its probability density
function has the form:
1 ⎛ ( t − µ )2 ⎞
f X (t ) = exp ⎜ − ⎟.
σ 2π ⎜ 2σ 2 ⎟
⎝ ⎠
This function must be integrated numerically; it is often useful to consult tables for the
values of interest. These tables often make use of a change of variable: u = ( t − µ ) σ .
57
When the mean is set to zero and the standard deviation is set to one, we get the
standard normal probability density for the standardized normal random variable Z:
1 ⎛ u2 ⎞
f Z (u ) = exp ⎜ − ⎟ ,
2π ⎝ 2⎠
which is related to the original random variable by the equation:
⎛t −µ t −µ ⎞
P (t1 < X ≤ t2 ) = P ⎜ 1 <Z≤ 2 .
⎝ σ σ ⎟⎠
The integral of f Z (u ) gives the standard normal probability distribution, which is
commonly denoted by Φ :
z 1 −u2 2
Φ( z ) = ∫ e du .
−∞
2π
For two extended real numbers z1 < z2 , we have:
P( z1 < Z ≤ z2 ) = Φ( z2 ) − Φ ( z1 ) .
A brief table of values is given below.
z 0 0.5 1 1.5 2 2.5 3+
Ф(z) 0.5 0.69 0.84 0.93 0.97 0.99 ≈1
(For negative values of z, we note that Φ ( − z ) = 1 − Φ ( z ) .)
Returning briefly to the binomial distribution, we can compute
a2
⎛n⎞
P (a1 ≤ X ≤ a2 ) = ∑ ⎜ ⎟ p k q n − k ,
k = a1 ⎝ k ⎠
but this can be unwieldy for large n. With µ = np and σ = npq , we can obtain an
approximation of the binomial distribution with a normal distribution, where
P(a1 ≤ X ≤ a2 ) ≈ Φ ( (a2 − µ + 12 ) σ ) − Φ ( (a1 − µ − 12 ) σ ) .
58
Point-Set Topology
Let X be a nonempty set. A topology on X, denoted by T , is a family of subsets of X,
for which the following three properties always hold:
1. ∅ and X are in T .
2. If O1 and O2 are in T , then so is their intersection, O1 ∩ O2 .
3. If {Oi }i∈I is any collection of sets from T , then their union, ∪ i∈I
Oi , is also in T .
In sum, a topology is always closed under finite intersections and arbitrary unions. The
sets in T are known as open sets. A set is open if all of its elements are interior points,
which means that for every point p ∈ O , p is contained in an open interval that is itself
contained in O. That is, p ∈ S p ⊂ O . A point p ∈ is called an accumulation point of
a set A if every open set G containing p contains a point of A different from p.
A set X together with a topology T is called a topological space, ( X , T) . A
Hausdorff space is a topological space such that for every pair of distinct points
x, y ∈ X there exist disjoint open sets Ox and Oy such that x ∈ Ox and y ∈ Oy .
For any nonempty set X, the collection {∅, X } is always a topology on X, called
the indiscrete (or trivial) topology. The power set of X, ℘( X ) , is also always a
topology on X, and it is referred to as the discrete topology. If T1 and T2 are topologies
on X , we say that T1 is finer than T2 (or that T2 is coarser than T1 ) if T1 ⊇ T2 . We
know that ℘( X ) ⊃ {∅, X } , and we say that these collections represent the extremes:
59
℘( X ) is the finest possible topology on X, and {∅, X } is the coarsest possible topology
on X. For T1 ⊇ T2 , we can also write T1 T2 to indicate fineness and coarseness.
The If ( X , T ) is a topological space, we can use T to define a topology on a subset

Subspace
Topology
S ⊂ X . A subset U ⊂ S is said to be open in S if U = O ∩ S for some set O that is open
in X. It is important to point out that U need not be open in X for this definition to apply.
Consider the sets O = ( −b, b ) and S = [ −a, a ] , both in , with a < b . Then
U = O ∩ S = S = [ −a, a ] is said to be open in S, since O is open in , even though U is
not open in . In this way, we can define the subspace (or relative) topology, TS , with
S a subspace of X. Written more compactly, TS = {S ∩ U : U ∈ T} .
Interior, Let ( X , T ) be a topological space, and let A be a subset—not necessarily open—

Exterior,
Boundary,
Limit of X. The interior of A, denoted int( A) , is the union of all open sets contained in A, or,
Points, and
Closure equivalently, the largest open set contained in A. The exterior of A, denoted ext( A) , is
the union of all open sets that do not intersect A; equivalently, we can say that
ext( A) = int( AC ) . The boundary of A, denoted bd( A) , is the set of all x ∈ X such that
every open set containing x intersects both A and the complement of A; equivalently,
bd( A) = ( int( A) ∪ ext( A) ) .

C
A limit point of A is an accumulation point, which is
defined above. The set of all the limit points of A is called the derived set of A, denoted
A′ . The closure of A, denoted cl( A) , can be defined in two equivalent ways:
cl( A) = int( A) ∪ bd( A) = A ∪ A′ . A perhaps more useful definition says that the closure
of A is the intersection of all closed supersets of A, which corresponds to the intersection
of all complements of the topological class that contain A.
60
A set is closed if and only if it contains all of its boundary and limit points;
equivalently, A is closed if A = cl( A) . Also, a set is closed if and only if its complement
is open; a set is open if and only if its complement is closed. A set A is open if and only
if A = int( A) . It is possible for sets to be simultaneously open and closed. (Consider the
discrete topology, made up of the power set, which contains every possible subset, all of
which are considered open. But since the complement of every open subset is also
contained in the power set, every subset is both open and closed.)
Basis for a Let X be a nonempty set, and let B be a collection of subsets of X satisfying the
Topology
following properties:
1. For every x ∈ X , there is at least one set B ∈B such that x ∈ B .
2. If B1 and B2 are sets in B and x ∈ B1 ∩ B2 , then there exists a set B3 ∈ B such that
x ∈ B3 ⊆ B1 ∩ B2 .
The collection B is called a basis, and the sets in B are known as basis elements. A
basis is used to generate a topology in that the topology consists of all possible unions of
the basis elements. A subset O ⊂ X belongs to the topology T generated by B if, for
every x ∈ O , there exists a basis element B such that x ∈ B ⊆ O .
If ( X , TX ) and (Y , TY ) are topological spaces, we can define a product topology
on the Cartesian product X ×Y by setting up the following standard basis:
B = {OX × OY : OX ∈ TX ∧ OY ∈ TY } .
Connectedness Let ( X , T ) be a topological space. If there exist disjoint, nonempty open sets O1
and O2 such that O1 ∪ O2 = X , then X is said to be disconnected. If, however, T
contains no pair of disjoint sets whose union is X, then X is said to be a connected space.
61
The following criteria can be used to identify connected spaces:
1. If A and B are connected and they intersect, then their union is also connected.
This holds for any number of connected sets, as long as their intersection is
nonempty.
2. Let A be a connected set, and let B be any set such that A ⊆ B ⊆ cl( A) . Then B is
connected.
3. The Cartesian product of connected spaces is connected.
4. Let X be a topological space with the property that any two points, x1 , x2 ∈ X can
be joined by a continuous path. Then X is said to be path-connected, and any
path-connected space is connected. The converse of this statement is not true.
Compactness
Let ( X , T ) be a topological space. A covering of X is a collection of subsets of X
whose union is X. An open covering is a covering consisting entirely of open sets. If
every open covering of X contains a finite sub-collection that also covers X, then X is said
to be a compact space. Compact spaces can be identified using the following criteria:
1. Let X be a compact topological space, and let S be a subset of X. If S is closed,
then it’s compact. The converse is true if X is Hausdorff.
2. The Cartesian product of compact spaces is compact.

n
3. The Heine-Borel theorem: A subset of is compact if and only if it’s both
closed and bounded.

n
(A subset A ⊂ is bounded if there exists some positive number M such that the norm
of every point x∈ A is less than M. That is, for x = ( x1 , x2 ,… , xn ) ,
( x1 ) + ( x2 ) + ( xn ) < M .)
2 2 2
x = +
62
Metric Let X be a nonempty set, and let d : X × X → be a real-valued function defined
Spaces
on ordered pairs of points in X. The function d is said to be a metric on X if the
following properties hold for all x, y , z ∈ X :
1. d ( x, y ) ≥ 0 , and d ( x, y ) = 0 if and only if x = y .
2. d ( x, y ) = d ( y , x) .
3. d ( x, z ) ≤ d ( x, y ) + d ( y , z ) .
We call d ( x, y ) the distance between x and y. A set X together with a metric on X is
called a metric space. For ε > 0 , the set Bd ( x, ε ) = { x′ ∈ X : d ( x, x′) < ε } is called an ε-
ball, the open ball of radius ε centered on x. The collection of all ε-balls,
B = { Bd ( x, ε ) : x ∈ X ∧ ε > 0} , is a basis for a topology on X, called the metric topology
(induced by d). In this topology, a subset O ⊂ X is open if and only if every x ∈ O ,
there exists a positive number ε x such that Bd ( x, ε x ) ⊆ O , which says that every point in
the set must be the center of an ε-ball.
Continuity We recall the definition of a continuous function, which says that a function f is
continuous at the point x0 if, for every ε > 0 , there exists a number δ > 0 such that
x − x0 < δ ⇒ f ( x) − f ( x0 ) < ε . We can generalize this to metric spaces by saying the
following: Let ( X1 , d1 ) and ( X 2 , d2 ) be metric spaces. A function f : X 1 → X 2 is
continuous at the point x0 if, for every ε > 0 , there exists a number δ > 0 such that
d1 ( x, x0 ) < δ ⇒ d 2 ( f ( x), f ( x0 ) ) < ε . The function f is simply called continuous if it is
continuous at every x0 ∈ X 1 . We can also say that f : X 1 → X 2 is continuous at the point
x0 if for every open set O containing f ( x0 ) , the inverse image f −1 (O ) is an open set
63
containing x0 . This is equivalent to saying f −1 ( Bd2 ( f ( x0 ), ε ) ) = Bd1 ( x0 , δ ) for suitable
ε , δ > 0 . Finally, for topological spaces ( X 1 , T1 ) and ( X 2 , T2 ) , a function f : X 1 → X 2
is continuous if, for every open set O ∈ X 2 (i.e., for every O ∈ T2 ), the inverse image,
f −1 (O ) is open in X 1 (i.e., f −1 (O ) ∈ T1 ). Moreover, f is continuous at the point x0 if for
every open set O ⊂ X 2 containing f ( x0 ) there is an open set O1 ⊂ X 1 such that
f (O1 ) ⊆ O . If the range of f is a topological space generated by a basis, it is sufficient to
check that the inverse image of the basis elements are open in X 1 . The following are
some facts about continuous maps between topological spaces, f : ( X 1 , T1 ) → ( X 2 , T2 ) :
1. The set f −1 (C ) is closed in X 1 for every closed subset C ⊂ X 2 . (In fact, this is
the case if and only if the map f is continuous.)
2. If C is a connected subset of X 1 , then f (C ) is a connected subset of X 2 .
3. If C is a compact subset of X 1 , then f (C ) is a compact subset of X 2 .
4. Let f : X → be a continuous function, with X a compact space. Then there
exist points a, b ∈ X such that f ( a ) ≤ f ( x ) ≤ f (b) for every x ∈ X . Any real-
valued function defined on a compact space—particularly a closed, bounded

n
subset of —achieves an absolute maximum at some point in X and an absolute
minimum at some point in X.
A map f : X 1 → X 2 is said to be an open map if the image of every open set in X 1 is
open in X 2 . (Note the difference between this and the definition of continuity.) If
f : X 1 → X 2 is bijective and both f and f −1 are continuous (meaning that f is a
continuous, open map), then f is called a homeomorphism. This is analogous to the
64
concept of an isomorphism between groups or rings. That is, if an isomorphism exists
between two algebraic structures, that structure is preserved, and they are algebraically
identical. Homeomorphic topological spaces are topologically identical, since their
topological structure is preserved. For example, a homeomorphism ( g f ) ( x) exists
between the open interval ( 0,1) and , since
( 0,1) ⎯⎯⎯⎯⎯ → ( − π2 , π2 ) ⎯⎯⎯⎯

f ( x ) =π ( x − )
1
g ( y ) = tan y
2
→ .
Let f : X 1 → X 2 be a bijective, continuous map of topological spaces. If X 1 is
compact and X 2 is Hausdorff, then f is a homeomorphism.
Real Analysis
Consider a set of real numbers X ⊂ . If X is bounded above, there is a smallest number
u such that u ≥ x for every x ∈ X . This number is the least upper bound of X, also
known as the supremum or sup X . Similarly, if X is bounded below, there is a greatest
number l such that l ≤ x for every x ∈ X . This number is called the greatest lower
bound, also known as the infimum of X or inf X . The supremum and infimum are
unique for any bounded set of real numbers.
A Cauchy sequence is a sequence of numbers, ( xn )n =1 , such that for any ε > 0 ,

∞
no matter how small, there exists a number N such that, for m, n > N , xn − xm < ε . This
means that the successive elements of a Cauchy sequence grow arbitrarily close to each
other. Every Cauchy sequence of real numbers converges. Any metric space in which
every Cauchy sequence is guaranteed to converge to a point in the space is called a
complete space. By this definition, is complete.
65
Lebesgue Let denote the set ∪ {∞} . Let A be any subset of , and find a countable,
Measure
∞
open covering of A by intervals of the form ( ai , bi ) . We have A ⊆ ∪ ( ai , bi ) . We define
i =1
a function µ ∗ ( A) :℘( ) → by the equation:
⎧ ∞ ∞
⎫
µ ∗ ( A) = inf ⎨∑ ( bi − ai ) , for A ⊆ ∪ ( ai , bi ) ⎬ ,
⎩ i =1 i =1 ⎭
where we permit µ ∗ ( A) = ∞ . We now restrict this function to a subfamily, M ⊂℘( ) ,
where M is defined as follows: a subset M ⊂ is a member of M if and only if
µ ∗ ( A) = µ ∗ ( A ∩ M ) + µ ∗ ( AC ∩ M ) for every A ⊂℘( ) . The sets M are called
Lebesgue measurable sets, and the restriction of µ ∗ to M is denoted by µ . This
function is called the Lebesgue measure. A set M for which µ ( M ) = 0 is said to be of
measure zero.
Ever open and closed set in is Lebesgue measurable, and every finite or
countably infinite subset of is also measurable. The complement of a measurable set
is measurable, and a finite or countably infinite union or intersection of measurable sets is
measurable. Not all of ℘( ) is measurable, but almost all subsets of that arise in
practice are measurable. Some properties of the measure and measurable sets follow:
1. The empty set is measurable and has measure zero. If M = {m} is a one-element
(or singleton) set, then µ ( M ) = 0 . It follows from this that if M is a finite or
countably infinite subset of , then µ ( M ) = 0 . For instance, µ ( ) = µ ( ) = 0 .
2. If M = ( a, b ) , [ a, b ) , ( a, b ] , or [ a, b] , then the measure is just the length of the
interval, µ ( M ) = b − a . If M is a finite union of disjoint intervals, then µ ( M ) is
66
the sum of the lengths of the intervals. If M is a measurable set that contains an
infinite interval, then µ ( M ) = ∞ . In general, if {M i } is any countable collection
⎛∞ ⎞ ∞
of disjoint, measurable sets, then µ ⎜ ∪ M i ⎟ = ∑ µ ( M i ) .
⎝ i =1 ⎠ i =1
3. If M 1 , M 2 ∈ M and M 1 ⊆ M 2 , then µ ( M 1 ) ≤ µ ( M 2 ) .
A function f : → is said to be Lebesgue measurable if, for every open set
O⊂ , the inverse image, f −1 (O ) , is a Lebesgue measurable set. Since every open
subset of is measurable, it follows from our earlier definition of continuity that every
continuous function is Lebesgue measurable. Still, there do exist noncontinuous
functions that are measurable. The sum, difference, and product of measurable functions
is measurable. Also, if f is measurable, then so is f ( x) = f ( x) .
For any A ⊂ , we can define a function on , the characteristic function of A:
⎧0 if x ∉ A
χ A ( x) = ⎨ .
⎩1 if x ∈ A
This function is measurable if and only if A is a measurable set. Using this function, we
can construct another function, the step function, which is a finite linear combination of
characteristic functions with real coefficients. Such a function typically takes the form
n
s ( x) = ∑ ai χ Ai ( x) . Step functions provide the foundation for constructing the Lebesgue
i =1
integral of a general function.
Let f be a measurable function such that f ( x ) ≥ 0 for every x ∈ . Then there
exists a sequence of step functions, ( s1 , s2 ,…) , such that 0 ≤ s1 ≤ s2 ≤ … and for which
lim n →∞ sn ( x) = f ( x) .
67
n
Let s = ∑ ai χ Ai be a step function such that every set Ai is measurable. Then s is
i =1
a measurable function, and we say that s is Lebesgue integrable if ai ≠ 0 ⇒ µ ( Ai ) < ∞ .
In this case, the Lebesgue integral is:
∫ s d µ = ∑ ai µ ( Ai ) .
i =1
Let f be a measurable function such that f ≥ 0 . If every step function s, with
s ≥ 0 and s ≤ f , is integrable and ∫ s dµ is finite, then we say that f is Lebesgue
integrable. The value of the Lebesgue integral of f is:
∫ f d µ = sup {∫ s d µ} ,
where the supremum is taken over all integrable, nonnegative step functions, s, such that
s≤ f . For functions that are not everywhere nonnegative, we consider that every
function can be written as the difference of two nonnegative functions, f = f + − f − ,
where f + = 1
2 (f + f ) = max ( f , 0 ) , the positive part, and f − = 1
2 (f − f ) = max ( − f , 0 ) ,
the negative part. This means that an arbitrary measurable function f is Lebesgue
integrable if both its positive and negative parts are Lebesgue integrable, and
∫ f dµ = ∫ f dµ + ∫ f − dµ .
+
Lebesgue integration takes a different approach from Riemann integration. Whereas a
Riemann integral splits the domain of f into partitions and calculates areas under a curve
based on rectangles of height f, becoming exact as the rectangles’ widths grow
infinitesimal, the Lebesgue integral splits the range of f into partitions and approximates f
by the step function and approaches a limit.
68
Complex Analysis
A complex number z = x + iy can be written in polar form as z = r ( cos θ + i sin θ ) , where
r = z = zz = x 2 + y 2 and θ , called the argument of z, is equal to arctan xy ,
corresponding to the angle made with the positive real axis. The principal value of θ ,
written Arg z , falls in the range −π < θ ≤ π .
Two complex numbers can be easily multiplied and divided in polar form:
z1 z2 = r1r2 ( cos(θ1 + θ 2 ) + i sin(θ1 + θ 2 ) )

z1 r1 .
= ( cos(θ1 − θ 2 ) + i sin(θ1 − θ 2 ) )
z2 r2
The Taylor series expansion for the polar form of complex numbers provides us with the
interesting fact that complex numbers can also be expressed exponentially:
z = reiθ = r ( cos θ + i sin θ ) ,
in an expression known as Euler’s formula. Following from this is de Moivre’s
formula, which says:
( cos θ + i sin θ )
n
= cos nθ + i sin nθ .
Since the number 1 can be expressed in exponential form as e (

i 2π k )
for any integer k, we
can express the nth roots of unity by:
⎧ i ( 2π k n ) ⎛ 2π k ⎞ ⎛ 2π k ⎞ ⎫
⎨e = cos ⎜ ⎟ + i sin ⎜ ⎟ : k = 0,1,… , n − 1⎬ .
⎩ ⎝ n ⎠ ⎝ n ⎠ ⎭
The nth roots of any complex number can be found by multiplying the principal nth root
by the nth roots of unity.
69
The logarithm of a complex number can be defined thanks to our ability to
express such a number exponentially. But we must recall that adding any integer
multiple of 2π to the argument of z leaves the complex number essentially unchanged.
Therefore, any nonzero complex number has infinitely many logarithms, but we define
the principle value by Log z = ln z + i Arg z . Using the concept of the complex
logarithm, we can define complex powers, z w , where z , w ∈ . We have z w = e w Log z .
Inspired by Euler’s formula, we can develop formulas for calculating
trigonometric functions of complex numbers. In particular, we have:
eiz + e− iz eiz − e− iz
cos z = and sin z = .
2 2i
The rest of the functions are defined by applying their familiar relations to the definitions
for the sine and cosine above. The trigonometric identities defined for real numbers are
valid for complex numbers as well. One major difference between the complex sine and
cosine and their real counterparts is that, while cos x ≤ 1 and sin x ≤ 1 for x ∈ , the
complex versions are unbounded, and the norms sin z and cos z can take on any
nonnegative real value.
The hyperbolic functions are defined for real numbers by:
e x + e− x e x − e− x
cosh x = and sinh x = .
2 2
They share some properties with their non-hyperbolic namesakes, such as cosh 0 = 1 and
sinh 0 = 0 , cosh ( − x ) = cosh x and sinh ( − x ) = − sinh x . Then there is the identity
cosh 2 x − sinh 2 x = 1 , which is similar to the Pythagorean trig identity for the sine and
cosine.
70
From our definitions, we see that
cos(iz ) = cosh z
cosh(iz ) = cos z
sin(iz ) = i sinh z
sinh(iz ) = i sin z.
With these formulas, we can develop equations that allow us to evaluate cos z and sin z ,
where z = x + iy , in terms of real x and y. Using the angle sum formulas and the relations
directly above, we get:
cos z = cos( x + iy ) = cos x cosh y − i sin x sinh y

.
sin z = sin( x + iy ) = sin x cosh y + i cos x sinh y
These are used to obtain the following:
2
cos z = cos z cos z = cos 2 x + sinh 2 y
2
.
sin z = sin zsin z = sin 2 x + sinh 2 y
Derivatives Functions of a complex variable can be differentiated like their real counterparts.
of
Complex- In fact, a table of derivatives for real-valued functions will apply for complex-valued
Valued
Functions functions. First, it is important to remark that f ( z ) = f ( x + iy ) = u ( x, y ) + iv ( x, y ) for
some real-valued functions u and v. We say that ℜ( f ) = u ( x, y ) , denoting the real part,
and ℑ( f ) = v( x, y ) , denoting the imaginary part. Returning to the variable z, we notice
that there is a slight difference in how the definition of the derivative behaves, which
gives rise to some interesting results. The difference quotient
f ( z + h) − f ( z )
f ′( z ) = lim
h→0 h
looks the same for the complex variable z, but it is important to note here that h is also a
complex number with real and imaginary parts. Since the field is not ordered, h can
71
take infinitely many routes to zero. Applying the definition twice, with h approaching the
origin from the real axis in one case and the imaginary axis in another, we see that
∂u ∂v ∂v ∂u
+ i = f ′( z ) = − i
∂x ∂x ∂y ∂x
if f is differentiable. Equating the real and imaginary parts of the expression above, we
get the Cauchy-Riemann equations:
∂u ∂v ∂v ∂u
= and =− ,
∂x ∂y ∂x ∂y
which must be satisfied for f to be differentiable.
If f ( z ) = f ( x + iy ) = u ( x, y ) + iv ( x, y ) is differentiable throughout some open set
O, then the Cauchy-Riemann equations hold throughout that set. Differentiating the C-R
equations again, we get:
∂ 2u ∂ ⎛ ∂u ⎞ ∂ ⎛ ∂v ⎞ ∂ 2 v ∂ 2u ∂ ⎛ ∂u ⎞ ∂ ⎛ ∂v ⎞ ∂ 2v
= ⎜ ⎟= ⎜ ⎟= and = ⎜ ⎟ = ⎜ − ⎟ = − ,
∂x 2 ∂x ⎝ ∂x ⎠ ∂x ⎝ ∂y ⎠ ∂x∂y ∂y 2 ∂y ⎝ ∂y ⎠ ∂y ⎝ ∂x ⎠ ∂y∂x
which tells us that
∂ 2u ∂ 2u
+ = 0.
∂x 2 ∂y 2
This is Laplace’s equation. A function satisfying this equation in some open set O is
said to be harmonic in O. Using the same method as above, we also see that
∂ 2v ∂ 2v
+ =0.
∂x 2 ∂y 2
Written another way, we have
∇ 2u = ∇ 2 v = 0 ,
72
where ∇ 2 = ∇ ⋅∇ is the scalar Laplacian operator. If f ( z ) is differentiable throughout
some open set O, then ℜ( f ) and ℑ( f ) are harmonic in O.
If f ( z ) is differentiable at z0 and at every point throughout some open set in the
complex plane containing z0 , then f ( z ) is said to be analytic at the point z0 . If f is
differentiable throughout the open set O, then f is said to be analytic in O. If f is analytic
everywhere in the complex plane, then f is said to be an entire function. If f is analytic,
then ℜ( f ) and ℑ( f ) , its component functions, must have continuous partial derivatives
of all orders. Also, if f ( z ) is analytic at z0 , then so is f ′( z ) . This implies that the
derivatives of all orders must also be analytic. This is a strong result without a
counterpart for real-valued functions.
We can calculate complex line integrals over parameterized curves in the
complex plane. If the curve C is given by z (t ) = x(t ) + iy (t ) for a ≤ t ≤ b , where
a, b ∈ , we have:
b b
∫ f ( z ) dz = ∫ f ( z (t ) ) ⋅ z ′(t ) dt = ∫ F ′ ( z (t ) ) dt = F ( z (b) ) − F ( z (a) ) ,
C a a
where d
dz F ( z ) = f ( z ) . This expression suggests two methods for attacking the line
integral, one of which may be easier than the other in a given situation. Using the first
method, one calculates f ( x + iy ) before substituting in the parameterized values of x and
y. Then the product with z ′(t ) is calculated, and the integral is carried out with respect to
the parameter t over the limits of integration, a and b. The second method, which appears
far easier, employs the antiderivative of f ( z ) , telling us that
b z (b )
∫ f ( z (t ) ) ⋅ z′(t ) dt = ∫
a z(a)
f ( z ) dz .
73
We note that this method changes the limits of integration from real numbers to complex
numbers, as one would expect after changing the variable the integral is performed with
respect to from a real variable to a complex variable.
The following are some important theorems regarding analytic functions:
1. Cauchy’s theorem: If f ( z ) is analytic throughout a simply connected, open set
D, then for every closed path C in D, we have ∫ C

f ( z ) dz = 0 . If f ( z ) is an
entire function, then ∫ C

f ( z ) dz = 0 for every closed path in .
2. Morera’s theorem: If f ( z ) is continuous throughout an open, connected set
O⊂ , and ∫ C
f ( z ) dz = 0 for every closed curve C in O, then f ( z ) is analytic
in O.
3. Cauchy’s integral formulas: If f ( z ) is analytic at all points within and on a
1 f ( z)
simple, closed path C surrounding the point z0 , then f ( z0 ) =
2π i ∫ C z − z0
dz .
We also know that the nth derivative is analytic at z0 for all n ∈ , and we have
n! f ( z)
f ( n ) ( z0 ) =
2π i ∫ (z − z )
C n +1
dz . This tells us that if we know the value of f ( z ) at
0
every point on a curve, then we also know its value—and the value of all its
derivatives—at any interior point.
4. Cauchy’s derivative estimates: Let f ( z ) be analytic on and within the circle of
radius r centered at z0 , z − z0 = r . If M is the maximum value of f ( z ) on the
n!M
circle, then f ( n ) ( z0 ) ≤ .
rn
74
5. Liouville’s theorem: If f ( z ) is an entire function that is bounded, then f ( z )
must be a constant function.
6. The maximum principle: Let O be an open, connected subset of the complex
plane, and let f ( z ) be a function that is analytic in O. If there is a point z0 ∈ O
such that f ( z ) ≤ f ( z0 ) for every z ∈ O , then f ( z ) is a constant function. This
says that if f ( z ) is analytic and not constant, then f ( z ) attains no maximum
value in O. If O is bounded, then f ( z ) must achieve a maximum value in
cl(O ) ; since it can’t do so inside, it must achieve a maximum on the boundary of
O.
Taylor Taylor series can be formed for complex-valued functions in the same way that they
Series for
Complex-
Valued can for real-valued functions. The only difference is that the idea of the interval of
Functions
convergence is replaced by the disk of convergence, with z − z0 < R representing the
interior of an open disk of radius R centered at z0 .
∑a (z − z )
n
The power series n 0 converges absolutely for all z that satisfy
n =0
z − z0 < R and diverges for all z such that z − z0 > R , where an = f ( n ) ( z0 ) n ! and
R = 1 lim n an . If lim n an = ∞ , then the series converges only for z = z0 . If

n →∞ n →∞
lim n an = 0 , then the series converges for all z. Every function that is analytic in an
n →∞
open, connected subset O of the complex plane can be expanded in a power series whose
disk of convergence lies within O.
75
Let z0 be a point in the complex plane, and let R be a positive number. The set of
all z such that 0 < z − z0 < R is called the punctured open disk of radius R centered at z0 .
If a function f ( z ) is not analytic at a point z0 but is analytic at some point in every
punctured disk centered at z0 , then z0 is said to be a singularity of f ( z ) . If f ( z ) is
analytic at every point in some punctured open disk centered at z0 , then we call z0 an
isolated singularity.
Assuming now that f ( z ) has an isolated singularity at z0 , if there is a positive
integer n such that
g ( z)
f ( z) = ,
( z − z0 )
n
with g ( z ) analytic in a nonpunctured disk centered at z0 and g ( z0 ) ≠ 0 , then the
singularity z0 is called a pole of order n. If we can’t write f ( z ) in this form, then z0 is
said to be an essential singularity. Functions with singularities may require a
generalization of the Taylor series in order to be expanded to regions in which their
singularities occur.
An annulus is a ring formed by two concentric circles. An annulus centered at
z0 is represented by the double inequality R1 < z − z0 < R2 , where R1 is allowed to be
Laurent zero and R2 is allowed to be ∞ . If f ( z ) is analytic in some annulus centered at z0 , then

Series
f ( z ) can be expanded in a Laurent series, which takes the form:
∞ ∞
∑a (z − z ) + ∑ an ( z − z 0 ) .
−n n
−n 0
n =1 n=0
76
The sum on the right, referred to as the analytic part, is simply the Taylor series for the
function. The sum on the left, referred to as the singular (or principal) part, uses
Laurent coefficients, which are defined by:
1 f ( z)
a− n =
2π i ∫ (z − z )
C − n +1
dz ,
0
where C is a simple, closed curve contained in the annulus. (This definition aside, in
practice, the Laurent coefficients are typically derived algebraically from the Taylor
series.)
If the singular part of the Laurent series contains at least one term, but only
finitely many terms, then z0 is a pole. In particular, if the singular part has a− n = 0 for
all n greater than some integer k, where a− k ≠ 0 , then z0 is a pole of order k. If, however,
the singular part contains infinitely many terms, then z0 is an essential singularity.
Examining the definition of the Laurent coefficients, we see that
1
a −1 =
2π i ∫ C
f ( z ) dz , which suggests that we can easily calculate the integral if we have
the value of that coefficient. That coefficient is called the residue of f ( z ) at the
singularity z0 , written Res ( z0 , f ) , and we see that ∫ f ( z ) dz = 2π i Res ( z0 , f ) . For a

C
pole of order k, we have the formula:
1 d k −1
Res ( z0 , f ) = ⋅ lim k −1 ⎡( z − z0 ) f ( z ) ⎤ .
k
( k + 1)! z → z0 dz ⎣ ⎦
If the curve C surrounds more than one singularity of f ( z ) , z1 , z2 ,… , zn , we have:
∫ f ( z ) dz = 2π i ⋅ ∑ Res ( zm , f ) .
C
m =1
77
78
Gallery of Graphs
79
80
81

Mathematics Review

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematics Review

Uploaded by

Copyright:

Available Formats

Mathematics Review

Compiled and Edited by Romann M. Weber*

respectively. More generally, a parabola centered at ( h, k ) has the equation

A circle of radius r centered at ( h, k ) has the equation ( x − h ) + ( y − k ) = r 2 . It

can be parameterized by x = r cos t , y = r sin t .

be parameterized by x = a cos t , y = b sin t .

given constant. Depending on its orientation, it has either the equation

− = 1 (opening horizontally), with the foci located at ( h ± c, k ) , or

− = 1 (opening vertically), with the foci located at ( h, k ± c ) . In either

s | a0 and t | an . Complex and radical roots always come in conjugate pairs.

 ( logb a )( log a x ) = logb x

operating on its complement. In other words:

θ sin θ cos θ tan θ

sin 2 x + cos 2 x = sec 2 x − tan 2 x = csc 2 x − cot 2 x = 1

sin ( x ± y ) = sin x cos y ± cos x sin y ⇒ sin 2 x = 2sin x cos x

tan x ± tan y 2 tan x

A sequence, formally defined as a function on the set of positive integers, is an infinite,

sequence is said to converge. If no such L exists, the sequence is said to diverge.

A sequence is monotonic if it is either strictly increasing or decreasing for every n

from some point on.

(consider xn = ( −1) , which is bounded above and below but diverges).

• If a sequence is monotonic and bounded, it is convergent.

• If lim an = A and k is a constant, lim kan = kA .

• If lim an = A and lim bn = B , then:

• Assume lim an = lim cn = L . If ( bn ) is a sequence such that an ≤ bn ≤ cn for

• If an = f (n) , then ( an ) → L if lim f ( x) = L .

A function f :A→ B is injective (or one-to-one) if for any x, y ∈ A ,

f ( x) = f ( y ) ⇒ x = y . Equivalently, x ≠ y ⇒ f ( x ) ≠ f ( y ) . In other words, f maps

distinct objects to distinct objects. (Graphically, an injective function will be intersected

behavior of f near a point a, which may not be included in its domain. If ( xn ) is a

sequence ( f ( xn ) ) → L as ( xn ) → a , we call L the limit of f as x approaches a. The limit

lim f ( x) = L ⇒ ∀ε > 0, ∃δ : 0 < x − a < δ ⇒ f ( x) − L < ε .

• If lim f ( x) = L1 and lim g ( x) = L2 , then:

• Assume that lim f ( x) = L and lim h( x) = L . If there is a positive number δ such

that f ( x ) ≤ g ( x) ≤ h( x ) for all x satisfying 0 < x − a < δ , then lim g ( x) = L .

This is another version of the sandwich (or squeeze) theorem.

A function f is continuous at a if lim f ( x) = f (a ) . A function is continuous on an

interval if it is continuous at every point in that interval.

• The Extreme Value Theorem: If f is a function continuous on a closed interval

[ a, b] , then f obtains an absolute minimum value, m, at some point c ∈ [ a, b] and

an absolute maximum value, M, at some point d ∈ [ a, b ] . That is, there exist

points c and d such that f (c ) ≤ f ( x ) ≤ f ( d ) for all x ∈ [ a, b ] .

and b such that f (c ) = 0 .

• The Intermediate Value Theorem: Let f be a function continuous on the closed

interval [ a, b] . Let m be the absolute minimum value and M be the absolute

maximum value of f on [ a, b] . For every number Y such that m ≤ Y ≤ M , there is

at least one value c ∈ [ a, b] such that f (c) = Y .

• The derivative of a sum is the sum of the derivatives: ( f + g )′ ( x ) = f ′( x) + g ′( x)

• The derivative of a constant times a function: ( kf )′ ( x ) = kf ′( x)

• The product rule: ( fg )′ ( x ) = f ( x ) g ′( x ) + f ′( x) g ( x)

• The chain rule (for composite functions): ( f u )′ ( x ) = f ′ ( u ( x ) ) u′ ( x )

• The inverse-function rule: If f −1 is the inverse of f, and f has a nonzero

derivative at x0 , then f −1 has a derivative at y0 = f ( x0 ) equal to

d ( sec u ) = sec u tan u du

d ( csc u ) = − csc u cot u du

Assume that f is a continuous function on the closed interval [ a, b] , with f ( a ) = f (b)

and f differentiable everywhere in ( a, b ) . Then there is at least one point c ∈ ( a, b ) such

Mean Value Theorem (for Derivatives)

Assume that f is a continuous function on the closed interval [ a, b] and differentiable

everywhere in ( a, b ) . Then there is at least one point c ∈ ( a, b ) such that

Maxima and Minima

0 >0 N/A Local minimum

0 <0 N/A Local maximum

0 0 f ( n ) (c) > 0 , with n even Local minimum

( logb a )( log a x ) = logb x

If ∑a n converges, then ∑ ka n converges to k ∑ an for every constant k. If ∑an

If ∑a n and ∑bn both converge, then ∑(a n + bn ) converges to ∑ a + ∑b

The comparison test. Assume that 0 ≤ an ≤ bn for all n > N . Then if ∑b n

The integral test. If f ( x ) is a positive, monotonically decreasing function for x ≥ 1

A series ∑a n converges absolutely if ∑a n converges. Every absolutely