Professional Documents
Culture Documents
Precalculus
Analytic Geometry
Parabolas, each defined as the locus of points equidistant from a point F (the focus) and
1 2 1 2
a line D (the directrix), have equations of the form y = ± x or x = ± y , where
4p 4p
the focus is located, respectively, at the point (0, ± p ) or (± p, 0) (with the sign matching
that of the equation above) and the directrix has the equation y = ∓ p or x = ∓ p ,
1 1
( x − h ) or x − h = ± ( y − k ) .
2 2
y−k = ±
4p 4p
An ellipse is defined as the locus of points such that the sum of the distances from
each point on the graph to two given fixed points (the foci) is a given constant. Ellipses
( x − h) ( y −k)
2 2
have a major and a minor axis and follow the equation + = 1 , where the
a2 b2
axes have length a and b on either side of the center, ( h, k ) , terminating at the vertices.
In the case of a > b , the major axis is parallel to the x-axis, and the foci are located at the
*
The overall plan of coverage in this document was based on Cracking the GRE Math Subject Test, 3rd
Edition by Steven A. Leduc, many of whose results have been incorporated here. Certain items from that
reference have been corrected in this document, and new results from other sources have been included.
1
points ( h ± c, k ) , where c = a 2 − b 2 . For the case a < b , the major axis is parallel to the
y-axis, and the foci are located at the points ( h, k ± c ) , where c = b 2 − a 2 . The
c
eccentricity, which measures the “flatness” of the ellipse, is 0 ≤ e = ≤ 1 . An ellipse can
a
A hyperbola is defined as the locus of points in the plane such that the difference
between the distances from every point on the hyperbola to two fixed points (the foci) is a
( x − h) (y −k)
2 2
(y −k) ( x − h)
2 2
b
case, c = a 2 + b 2 , and asymptotes are located at y − k = ± ( x − h) .
a
Polynomials
n
The rational roots theorem states that for a polynomial pn ( x) = ∑ ai x i with each
i =0
s
ai ∈ , if there are any roots r ∈ , then they are of the form r = , where s, t ∈ and
t
n n
For any polynomial pn ( x) = ∑ ai x i = an ∏ ( x − rj ) , the sum and product of the
i =0 j =1
n
an −1 n
a0
∑ rj = − ∏ r = ( −1)
n
roots are given by and j .
j =1 an j =1 an
2
Logarithms
The basic identities of logarithms are the following:
⎛ n ⎞ n
log b ⎜ ∏ xi ⎟ = ∑ log b xi
⎝ i =1 ⎠ i =1
⎛x ⎞
log b ⎜ 1 ⎟ = log b x1 − log b x2
⎝ x2 ⎠
log b x a = a log b x
b logb x = x
Trigonometry
The sine function is odd, meaning that sin ( −θ ) = − sin θ , and the cosine function is even,
meaning that cos ( −θ ) = cos θ . Any trigonometric function is equal to its co-function
⎛π ⎞ ⎛π ⎞ ⎛π ⎞
sin x = cos ⎜ − x ⎟ , tan x = cot ⎜ − x ⎟ , sec x = csc ⎜ − x ⎟ , and vice versa.
⎝2 ⎠ ⎝2 ⎠ ⎝2 ⎠
π 4 1
2 2 1
2 2 1
π 3 1
2 3
1
2 3
π 2 1 0 ∞
2π 3 1
2 3 − 1
2 − 3
3π 4 1
2 2 − 12 2 -1
5π 6 1
2
1
2 3 − 13 3
π 0 -1 0
3
Other values can be computed using the following identities:
sin x
tan x =
cos x
cos x
cot x =
sin x
1
csc x =
sin x
1
sec x =
cos x
cos ( x ± y ) = cos x cos y ∓ sin x sin y ⇒ cos 2 x = cos2 x − sin 2 x = 1 − 2sin 2 x = 2cos 2 x − 1
x 1 − cos x
sin =±
2 2
x 1 + cos x
cos =±
2 2
x sin x
tan =
2 1 + cos x
Hyperbolic Functions
e x − e− x
sinh x =
2
e x + e− x
cosh x =
2
4
Differential Calculus
Sequences
ordered list of terms. A sequence ( xn ) approaches a limit L if for every ε > 0 , there’s an
integer N such that xn − L < ε when n > N . This is equivalent to challenging someone
to think of a positive number, no matter how small, and coming up with a member of the
sequence whose distance from the limit is less than that number. If such an L exists, the
• Every convergent sequence is bounded, but the converse is not necessarily true
o ( an + bn ) → A + B
o ( an − bn ) → A − B
o ( anbn ) → AB
⎛ an ⎞ A
o ⎜ ⎟ → , assuming B ≠ 0
⎝ bn ⎠ B
5
⎛ 1 ⎞
• If k is a positive constant, then ⎜ k ⎟→0.
⎝n ⎠
⎛ 1 ⎞
• If k > 1 , then ⎜ n ⎟→0.
⎝k ⎠
every n > N , then lim bn = L . This is known as the sandwich theorem (or the
n →∞
squeeze theorem).
Functions
only once by a horizontal line.) A function f : A → B is surjective (or onto) if for any
b ∈ B there exists an a ∈ A such that f ( a ) = b . This means that the range is “spoken
for” by at least one element in the domain. A function that is both injective and
surjective is bijective.
Let f be a function defined on a subset of the real line. We seek to examine the
sequence converging to a, all of whose terms are in the domain of f, and the
of a function exists only if it is the same whether approached from above or below. A
6
limit is also defined in the following way:
o lim [ f ( x) + g ( x)] = L1 + L2
x →a
o lim [ f ( x) − g ( x)] = L1 − L2
x →a
o lim [ f ( x) g ( x)] = L1 L2
x →a
⎡ f ( x) ⎤ L1
lim ⎢ = (assuming L2 ≠ 0 )
x→a g ( x) ⎥
o
⎣ ⎦ L2
Continuity
7
• Bolzano’s Theorem: If f is a function continuous on the closed interval [ a, b]
such that f ( a ) and f (b) have opposite signs, then there’s a point c between a
Derivatives
Rules of Differentiation
⎛ f ⎞′ g ( x) f ′( x) − f ( x) g ′( x)
• The quotient rule: ⎜ ⎟ ( x ) =
[ g ( x)]
2
⎝g⎠
( f )′ ( y ) = f ′ (1x )
−1
0
0
8
Common Derivatives
d (k ) = 0
d ( u k ) = ku k −1du
( )
d eu = eu du
d ( a u ) = ( ln a ) a u du
1
d ( ln u ) = du
u
1
d ( log a u ) = du
u ln a
d ( sin u ) = cos u du
d ( cos u ) = − sin u du
d ( tan u ) = sec2 u du
d ( cot u ) = − csc2 u du
1
d ( arcsin u ) = du
1− u2
−1
d ( arccos u ) = du
1− u2
du
d ( arctan u ) =
1+ u2
n
⎡ n u′ ⎤
If f = k ∏ uiαi , then f ′ = f ⋅ ⎢ ∑ α i i ⎥ .
i =1 ⎣ i =1 ui ⎦
Implicit Differentiation
fx ∂f ∂x
For a function f ( x, y ) = c , we can define y′ = − =− when f x and f y exist and
fy ∂f ∂y
fy ≠ 0 .
9
Theorems Concerning Differentiable Functions
Rolle’s Theorem
that f ′(c) = 0 .
(b − a ) f ′(c ) = f (b) − f ( a ) . This theorem states that there is at least one point between a
and b at which the slope of the tangent line is equal to the slope of the secant line through
( a, f ( a ) ) and ( b, f (b) ) .
f ′(c ) f ′′(c ) f ( n ) (c ) * f (c )
*Here n is the smallest integer such that the nth derivative is nonzero.
10
For a function defined on a closed interval, an absolute extremum will occur either at a
point where the first derivative vanishes, where the derivative is not defined, or at an
Integral Calculus
Common Integrals
∫ k du = ku + C
⎧⎪ k1+1 u k +1 + C (if k ≠ −1)
∫
k
u du =⎨
⎪⎩ln u + C (if k = −1)
∫e
u
du = eu
∫a
u
du = 1
ln a au + C
∫ sin u du = − cos u + C
∫ cos u du = sin u + C
∫ sec
2
u du = tan u + C
∫ csc
2
u du = − cot u + C
⌠ du = arctan u + C
⎮
⌡ 1+ u2
11
Integration by Parts
∫ u dv = uv − ∫ v du
The object when applying this method is carefully selecting the portion of the integrand
that can be easily integrated on the right-hand side when deciding which part is u and
Trigonometric Substitution
For integrands of a certain form containing square roots, it can be useful to make
u du
a2 − u 2 a sin θ a cos θ dθ
a2 + u 2 a tan θ a sec 2 θ dθ
Partial Fractions
If an integrand is a rational function of the form P( x) Q( x) , with deg P < deg Q , we can
evaluate the integral by splitting the integrand into more manageable units, a process
ascending powers from 1 to n. The numerators on the right side are made up of single
place-holding coefficients (typically A, B, C, etc.), which are to be solved for. In the case
12
that an irreducible quadratic occurs in the factored polynomial, the place-holding
coefficient in the numerator over that term would be replaced by a polynomial of the
form Bx + C . Multiplying everything out will work when solving for the coefficients,
but this could be quite a time-consuming series of steps. Since the resulting
decomposition will have to be true for all values of x in the domain, it is useful to
multiply by one term at a time and make substitutions that get the terms to cancel out
(namely, values of x that make terms equal zero—roots of the denominator polynomial).
d b( x)
dx ∫a ( x )
f (t ) dt = f (b( x))b′( x) − f (a ( x)) a′( x)
b
c ∈ [ a, b] such that ∫ f ( x) dx = f (c)(b − a) . Here we see that f (c ) is the average value
a
of the function.
Polar Coordinates
T
( x, y ) ⎯⎯ →(r ,θ )
T ( x, y ) = ( x 2 + y 2 , arctan ( ))
y
x
Polar representations are not unique, so this
( r , θ ) ⎯⎯
T
→( x , y )
T (r ,θ ) = (r cos θ , r sin θ )
13
β
Area in polar coordinates: A = ∫α 1
2 r 2 dθ .
If ( x, y ) = ( x(t ), y (t ) ) and the curve is traced out clockwise, beginning and ending at the
b b
same point (i.e., x( a ) = x(b) and y ( a ) = y (b) ), then A = ∫ y (t ) x′(t ) dt = − ∫ x(t ) y′(t ) dt .
a a
If a function f is rotated around a line, a solid is generated. Clearly, as any point on the
line is rotated about an axis, a disk is traced out. The value of f from point to point
therefore defines the length of the radius of that disk at any point. We can calculate the
b b
For a function f revolved about the x-axis: V = ∫ dV = ∫ π [ f ( x)] dx .
2
a a
d d
For a function g revolved about the y-axis: V = ∫ dV = ∫ π [ g ( y )] dy .
2
c c
In the case of a solid being formed with a gap between the defining function and the axis
(in which each disk becomes a washer), that gap being defined by a second function, we
have: V = ∫ π
b
a
{[ f ( x)] − [ g ( x)] } dx .
2 2
Arc Length
For a function y = f ( x) , the length of any portion of the curve can be calculated as
follows, the decision being whether to integrate along the x-axis (where x runs from a to
( ) ( )
b 2 d 2
b) or the y-axis (where y runs from c to d): s = ∫ ds = ∫ 1+ dy
dx dx = ∫ 1+ dx
dy dy .
a c
14
Series
( an ) = a1 , a1 + d , a1 + 2d ,…, a1 + (n − 1)d , with each term differing from the one before by
the addition of a constant d. The sum of a finite arithmetic series can be computed easily
n n
by the following formula: S n = ( a1 + an ) = ( 2a1 + ( n − 1) d ) , where it should be clear
2 2
that the average of the first and last terms of the sequence is being multiplied by the
( an ) = a1 , a1r , a1r 2 , a1r 3 ,…, a1r n−1 , with each term differing from the one before by the
multiplication of a constant ratio r. The sum of a finite geometric series can be computed
1 − r n a1 − ran
by the following formula: S n = a1 = , where r ≠ 1 .
1− r 1− r
a finite limit S. An infinite geometric series converges if and only if r < 1 , in which case
∞
1
∑r
n =0
n
=
1− r
. For an infinite series to converge, the terms must tend to zero as n tends to
∞
1
infinity. (The converse is not true, as illustrated by the harmonic series ∑ n = ∞ .)
n =1
15
1
The so-called p-series ∑n p
converges for every p > 1 and diverges for every p ≤ 1 .
The ratio test. Given a series ∑a n of nonnegative terms, form the limit
an +1
lim
n →∞ a
= L . If L < 1 , then ∑a n converges; if L > 1 , then ∑a n diverges. The test is
n
inconclusive if L = 1 .
The root test. Given a series ∑a n of nonnegative terms, form the limit lim n an = L .
n →∞
if L = 1 .
∞
such that f (n) = an for every positive integer n, then ∑a
n =1
n converges if and only if
∞
∫1
f ( x) dx converges.
∑ ( −1)
n +1
The alternating series test. A series of the form an (with an ≥ 0 for all n)
n =1
will converge provided that the terms an decrease monotonically, with a limit of zero.
16
A power series in x is an infinite series whose terms are an x n , where each an is a
The convergence tests mentioned above are useful for the analysis of power series, which
are only useful if they converge. By the ratio test, the series will converge absolutely if:
an +1 x n +1 a
lim n
= x ⋅ lim n +1 < 1 .
n →∞ an x n →∞ a
n
then the power series converges only for x = 0 ; and if L is positive and finite, then the
1 1
power series converges absolutely for x < and diverges for x > . The set of all
L L
values of x for which the series converges is called the interval of convergence. Every
1. The series converges for all x ∈ ( − R, R ) and diverges for all x > R , where R is
1
called the radius of convergence and R = , where L is defined as above.
L
(Whether the series converges at the endpoints of the interval must be checked on
a case-by-case basis.)
2. The power series converges absolutely for all x; the interval of convergence is
∞
Functions are frequently defined by power series. If f ( x) = ∑ an x n , the domain of f
n =0
17
Within the interval of convergence of the power series:
∞ ∞ ∞
d
f ( x) = ∑ an x n ⇒ f ′( x) = ∑
dx
( n ) ∑ nan x n−1 .
a x n
=
n =0 n =0 n =1
(∫ )
∞ ∞ ∞
an n +1
f ( x) = ∑ an x n ⇒ ∫ f (t ) dt = ∑ ant n dt = ∑
x x
x .
n=0 n + 1
0 0
n =0 n =0
If a function f can be represented by a power series, that series is the Taylor series:
∞
f ( n ) (0) n
f ( x) = ∑ x .
n =0 n!
1 1 + x + x 2 + x3 + ∞
( −1,1)
1− x ∑x
n=0
n
1 1 − x + x 2 − x3 + ∞
( −1,1)
1+ x ∑ (−1)
n =0
n
xn
ln(1 + x ) x 2 x3 ∞
(−1) n+1 n ( −1,1]
x−
2 3
+ − ∑
n =1 n
x
ex x 2 x3 ∞
1 ( −∞, ∞ )
1+ x + + +
2! 3!
∑ n! x
n =0
n
sin x x3 x5 ∞
(−1)n 2 n +1 ( −∞, ∞ )
x− + −
3! 5!
∑
n = 0 (2n + 1)!
x
cos x x2 x4 ∞
(−1) n 2 n ( −∞, ∞ )
1− + −
2! 4!
∑
n = 0 (2n)!
x
18
A Taylor series can be truncated to form a Taylor polynomial for use in approximating
n
f ( k ) (0) k
the value of a function. A Taylor polynomial of degree n, Pn ( x) = ∑ x , will
k =0 k!
f ( n +1) (c) n +1
have an error, called the remainder, exactly equal to f ( x) − Pn ( x) = x for
(n + 1)!
The Taylor series mentioned above is a special case of a more general series in
f ( n ) (a)
powers of ( x − a ) , which has Taylor coefficients an = . The remainder of a
n!
f ( n +1) (c)
f ( x) − Pn ( x) = ( x − a)n +1 for some c between a and x.
(n + 1)!
In three dimensions, it is often convenient to think in terms of vectors. Vectors are often
connecting this point to the origin. Component notation, which would represent the
vector connecting the origin to the point ( x, y , z ) as xˆi + yˆj + zkˆ , makes use of
There are two ways to multiply vectors. One, the dot (or scalar) product, results
19
is equal to A ⋅ B = AB cos θ = xa xb + ya yb + za zb , where θ is the angle between the two
vectors. Since the dot product can be computed component-wise, it is not necessary to
know the angle between the vectors, and the dot product can be useful when the angle is
sought. For any two vectors perpendicular to each other, the dot product is zero. The dot
product is commutative: A ⋅ B = B ⋅ A .
ˆ = A⋅B A .
projA B = ( projA B ) A
A⋅A
The cross (or vector) product is the other way to multiply vectors. The result is
a vector perpendicular to each of the vectors being multiplied, following the right-hand
equal to:
ˆi ˆj kˆ
A × B = xa ya za ,
xb yb zb
and A × B = AB sin θ . This magnitude is equal to the area of the parallelogram spanned
The triple scalar product, the absolute value of which is equal to the volume of a
xa ya za
[ A, B, C] = ( A × B ) ⋅ C = xb yb zb .
xc yc zc
20
Lines
2
Just as a slope and a point define a line in , a point and a vector can define a line in
3
. Specifically, if we have a point, P0 = ( x0 , y0 , z0 ) , on a line L and a vector,
v = ( v1 , v2 , v3 ) , parallel to the line, we can say that any point P = ( x, y, z ) is on the line if
above the order of the points when subtracting). This is equivalent to three parametric
equations:
⎧ x = x0 + tv1
⎪
L : ⎨ y = y0 + tv2 .
⎪ z = z + tv
⎩ 0 3
These can each be solved for t and equated to yield the symmetric equations of L:
x − xo y − y0 z − z0
= = .
v1 v2 v3
In the case that any component of v is equal to zero, the term containing that component
Planes
Planes in 3
are uniquely defined by a point, P0 = ( x0 , y0 , z0 ) , and a vector, n, normal to
the plane. Since any line in the plane must be perpendicular to the normal vector, we
have a clue as to the form that the equation of a plane must take. Specifically, the point
21
P = ( x, y, z ) is on the plane if and only if P0 P ⋅ n = ( x − x0 )n1 + ( y − y0 )n2 + ( z − z0 )n3 = 0 ,
Cylinders
2 3
We can take a curve C ⊂ and extend it into as a cylinder (which appears to
maintain this name even if C is not a closed curve). We do this by connecting to each
point in C a line parallel to some given line v, referred to as the generator. In the case
that this line is perpendicular to the plane of the curve, this extension is automatic and
2
leaves the original equation unchanged. (Since the equation is in , it is missing the
3
variable, say z, that would make it an equation in . That missing variable can then
3
take on all values once the curve is considered in , and the curve is converted into a
right cylinder.)
In the case that the line v = ( v1 , v2 , v3 ) is not perpendicular to the plane of the
curve, we recall the parametric and symmetric equations of the line from above in order
to generate our equation for the cylinder. Without loss of generality, we consider a curve
we note that no relationship for z is described by the equation of the curve. Any line
from a point on that curve will have to obey the following relationship:
x − x0 = v1t
y − y0 = v2t .
z = v3t
22
We can solve these equations for t and equate them, getting:
x − x0 y − y0 z
= = .
v1 v2 v3
These equations can be solved for the “naught values,” one at a time, getting:
v2
x0 = x − z
v3
.
v
y0 = y − 1 z
v3
3
Recall that these are the points on the curve, so the equation of the cylinder in
becomes:
v2 ⎛ v ⎞
y− z = f ⎜x− 1 z⎟.
v3 ⎝ v3 ⎠
Note the pattern in the naught values above. Note also that y = f ( x) was an arbitrary
Surfaces of Revolution
2 3
A curve in can be rotated about an axis to form a surface in . Each point on the
original curve will trace out a circle around the axis about which it’s rotating. The
This is the basis for the following substitutions, which are used to obtain
23
Curve Revolved Replace by Obtaining
around
f ( x, y ) = 0 x-axis y ± y2 + z2 ( )
f x, ± y 2 + z 2 = 0
y-axis x ± x2 + z 2 ( )
f ± x2 + z 2 , y = 0
f ( x, z ) = 0 x-axis z ± y2 + z2 ( )
f x, ± y 2 + z 2 = 0
z-axis x ± x2 + y 2 ( )
f ± x2 + y2 , z = 0
f ( y, z ) = 0 y-axis z ± x2 + z 2 ( )
f y, ± x 2 + z 2 = 0
z-axis y ± x2 + y 2 ( )
f ± x2 + y2 , z = 0
3
Cylindrical coordinates are an extension of polar coordinates to by adding the
( r ,θ , z ) = ( )
x 2 + y 2 , arctan xy , z , where we repeat the warning about the non-uniqueness
of the angle θ .
cylindrical coordinates. They take the form ( ρ , φ ,θ ) , where ρ is the length of the radius
vector from the origin, φ is the angle the radius vector makes with the positive z-axis,
and θ is the angle between the positive x-axis and the projection (or shadow, as it may be
24
The following relations hold:
ρ 2 = x2 + y 2 + z 2
x = r cos θ = ρ sin φ cos θ
y = r sin θ = ρ sin φ sin θ
z = ρ cos φ
For a function z = f ( x, y ) describing a surface, the equation of the plane tangent to that
∂z ∂z
z − z0 = ⋅ ( x − x0 ) + ⋅ ( y − y0 ) .
∂x P ∂y P
In the case that the surface is described implicitly by f ( x, y , z ) = c , we find the above
we’ll discuss in the section on directional derivatives.) Equations for tangent planes can
To use the chain rule for derivatives of multivariate functions, care must be taken to
account for every “path” to the variable of interest from the others. For example,
we want to calculate the partial derivative of z with respect to x. We notice that there are
three paths to x, one directly from F and one each from u and v. We then have:
⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎛ ∂u ⎞ ⎛ ∂z ⎞ ⎛ ∂v ⎞
⎜ ⎟ = ⎜ ⎟ +⎜ ⎟ ⎜ ⎟ +⎜ ⎟ ⎜ ⎟ .
⎝ ∂x ⎠ y ⎝ ∂x ⎠u ,v ⎝ ∂u ⎠ x ,v ⎝ ∂x ⎠ y ⎝ ∂v ⎠ x ,u ⎝ ∂x ⎠ y
25
The subscripts are a bit of bookkeeping, reminding us which variables are treated as
∂ ˆ ∂ ˆ ∂ ˆ
∇= i + j+ k .
∂x ∂y ∂z
A scalar function f can be converted into a vector called the gradient, and the del
operator can also operate on vectors through the divergence and curl, all of which will
be discussed later.
want to find the rate of change of the surface f, we must first give thought to the direction
we’re heading. (Think about standing on a rocky hill. Its rate of change is not uniform; it
will be different depending on whether we head forward or back, left or right.) The
directional derivative takes into account both the point we’re interested in and the
Du f P
= ∇f P
⋅ uˆ .
Examining this equation and noting the employment of the dot product, we know that
Du f P
= ∇f P
uˆ cos θ , where θ is the angle between the unit vector and the gradient.
which tells us that the directional derivative attains its maximum value when cos θ = 1 ,
which is when θ = 0 . This tells us that the gradient, ∇f , points in the direction in which
f increases most rapidly, and its magnitude gives the maximum rate of increase.
26
Similarly, −∇f points in the direction in which f decreases most rapidly. Examining a
level surface f ( x, y , z ) = c reveals that the gradient will equal zero at any point on that
to the level surface of f that contains P. Similarly, for a function f ( x, y ) , the vector
∇f P
is perpendicular to the level curve of f that contains P. This fact gives us another
f x P ⋅ ( x − x0 ) + f y ⋅ ( y − y0 ) + f z P ⋅ ( z − z0 ) = 0 .
P
For completeness, we’ll briefly mention the divergence and curl of a vector field.
The divergence, a scalar quantity, can be interpreted as the net flow of “density” out of a
The curl, a vector quantity, can be interpreted as the rotation of a vector field. It is equal
to the maximum circulation at each point and is oriented perpendicularly to the plane of
ˆi ˆj kˆ
∂ ∂ ∂
curl F = ∇ × F = .
∂x ∂y ∂z
F1 F2 F3
Just as with single-variable functions, multivariable functions attain critical points when
their derivatives equal zero. Critical points are also not necessarily maxima or minima in
multivariable functions. Saddle points, which are critical points that are neither maxima
27
nor minima, can exist for a function z = f ( x, y ) at a point P0 = ( x0 , y0 ) if z = f ( x, y0 )
We can evaluate the possibilities by computing the Hessian, the determinant of the
Hessian matrix:
⎡ f xx f xy ⎤ 2
∆ = det ⎢ ⎥ = f xx ( P0 ) f yy ( P0 ) − ⎡⎣ f xy ( P0 ) ⎤⎦ ,
⎣ f yx f yy ⎦
P0
When solving maxima and minima problems subject to a constraint, the constraint
function is solved for one of the variables and substituted into the main function,
differentiated and equated to zero, and then solved as usual for critical points. These
points, along with the endpoints of any intervals given, must be tested to find the
extrema.
In some cases, the constraint equation will not be easy to solve for one of the variables.
In such a case (and in others), it can help to employ the method of the Lagrange
multiplier, which takes advantage of several interesting facts. Let’s say we’re looking to
attained at the point P = ( x0 , y0 ) , then the level curve f ( x, y ) = M and the curve
28
g ( x, y ) = c share the same tangent line at P. Because they share the tangent line, they
also share the normal line at P. Since ∇f is normal to the curve f ( x, y ) = M and ∇g is
normal to the curve g ( x, y ) = c , the gradients must be parallel, which is to say that one is
a scalar multiple of the other. That is, ∇f = λ∇g for some scalar λ , called the Lagrange
multiplier, whose value is unimportant in itself, although we will solve for it. This
The gradients are computed, and the resulting components on the left and right
side are equated and treated as simultaneous equations that are solved for λ to find the
at this step.) The new relationship is substituted into the original equation to be
maximized and solved for the remaining variable. Critical values are then obtained and
tested.
Line Integrals
The standard definite integral computes the area between a curve and a straight-line
segment of one of the coordinate axes. But integration can follow more complicated
This is a close relative of the standard Riemann integral, in that the value of a function at
a point is multiplied by the length of the curve containing that point. Imagine, for
n
the curve into n tiny pieces, we have lim ∑ f ( xi , yi ) ∆si = ∫ f ds . Geometrically, this is
n →∞ C
i =1
29
equivalent to calculating the area of a curtain with C as its base and f ( x, y ) as its height
⎧ x = x(t )
C=⎨ for a ≤ t ≤ b .
⎩ y = y (t )
The curve C is considered to be directed, meaning that it has a definite initial and final
b a
point. (This detail is similar to the idea that ∫
a
f ( x) dx = − ∫ f ( x) dx .) Since for a
b
2 2
ds ⎛ dx ⎞ ⎛ dy ⎞
[ x′(t )] + [ y′(t )]
2 2
= ± ⎜ ⎟ +⎜ ⎟ = ± ,
dt ⎝ dt ⎠ ⎝ dt ⎠
where the sign we choose depends on the direction we’re taking along our path. We then
b ds
have ∫ f ds = ∫ f ( x(t ), y (t ) ) dt .
C a dt
b ds
The treatment in 3
is identical, as ∫ f ds = ∫ f ( x(t ), y (t ), z (t ) ) dt , where
C a dt
2 2 2
ds ⎛ dx ⎞ ⎛ dy ⎞ ⎛ dz ⎞
[ x′(t )] + [ y′(t )] + [ z′(t )]
2 2 2
= ± ⎜ ⎟ +⎜ ⎟ +⎜ ⎟ = ± .
dt ⎝ dt ⎠ ⎝ dt ⎠ ⎝ dt ⎠
3
In , we can interpret the line integral using the example of C as a curved wire with
linear density f ( x, y , z ) . The line integral of f over C would give the total mass of the
wire.
30
endpoints. A vector field is a vector-valued function. Let D be a region of the x-y plane
b
∫ F ⋅ dr = ∫ F ( r (t ) ) ⋅ r′(t ) dt
C a
b
= ∫ F ( x(t ), y (t ) ) ⋅ ( x′(t ), y′(t ) ) dt
a
b
= ∫ ⎡⎣ M ( x(t ), y (t ) ) x′(t ) + N ( x(t ), y (t ) ) y′(t ) ⎤⎦ dt.
a
3
The situation in is defined analogously. If we consider F to be a force field, the line
integral can be interpreted as the work done on a particle to move it along the path C.
In the case that F is equal to the gradient of a scalar field (i.e., a real-valued
for F. The fundamental theorem of calculus for line integrals says that the line
integral of a gradient field depends only on the endpoints of the path and not the path
itself. For any piecewise smooth curve C oriented from A to B and a continuously
(but not sufficient) that ∂M ∂y = ∂N ∂x . This simply states that f xy = f yx , which will be
the case for continuous functions. If the domain of F, which we’ll call R, is a simply
connected region—meaning that the interior of every simple closed curve drawn in R is
31
It should be clear that for a gradient field F, ∫ C
F ⋅ dr = 0 for any closed path C
Double Integrals
When evaluating double integrals over a region R, it is helpful to examine the region to
select the limits of integration that provide for the easiest calculation of each iterated
horizontal line is placed over the region, and expressions for x in terms of y are derived.
An integral with respect to x will then be evaluated, and the y terms will be substituted in
as the limits of integration. An imaginary vertical line will determine the y limits, which
should be numerical (the outside integral should not contain variables). An integral with
respect to y will be evaluated between these limits, and a numerical result will be
obtained. The situation is exactly reversed if the differentials are switched in the integral.
In short:
y =d x=h( y )
∫∫ f ( x, y) dA = ∫ ∫
R
y =c x= g ( y )
f ( x, y ) dx dy
x =b y=H ( x)
=∫ ∫ f ( x, y ) dy dx
x=a y =G ( x )
for some functions h, g, H, and G that describe the region R in terms of the variables
needed.
Double integrals are often evaluated in polar coordinates. The procedure is the
same as with Cartesian double integrals, except for one thing: The element of area
dA = r dr dθ . The reason for this becomes clear when one considers that these small
elements are “curvilinear rectangles,” the curved side of which will have arc length r dθ .
32
When evaluating an integral that corresponds to the volume of a region bounded above
by some function and bounded below by another function, the function providing the
upper bound is integrated, while the lower-bound function is treated as the region R.
Green’s Theorem
Green’s theorem is a powerful tool for dealing with integrals, since it relates double
integrals to line integrals, one of which might be easier to evaluate in a given situation.
First we define a simple closed curve, which is a curve that does not intersect itself.
Circles, ellipses, squares, and rectangles are examples of simple closed curves. A curve
is oriented, with the positive direction defined as the direction that, if walked, would put
⎛ ∂N ∂M ⎞
∫ C
M dx + N dy = ∫∫ ⎜
R ⎝ ∂x
−
∂y
⎟ dA.
⎠
∫∫ dA = ∫
R
1
2 C
x dy − y dx by Green’s theorem. (Evaluating the integral with either M or N
set to zero while leaving the other unchanged will give the area of the region R, hence the
33
Differential Equations
Separable Equations
dy f ( x)
A differential equation of the form = is called separable, since its variables can
dx g ( y )
Homogeneous Equations
constant n such that f (tx, ty ) = t n f ( x, y ) for all t, x, and y for which both sides are
are both homogeneous functions of the same degree. Homogeneous equations are
This substitution makes the differential equation separable after algebraic manipulation.
Exact Equations
∂f ∂f
df = dx + dy .
∂x ∂y
This tells us that the family of curves f ( x, y ) = c satisfies the differential equation
34
has the general solution f ( x, y ) = c . An equation is exact if ∂M ∂y = ∂N ∂x , which we
A differential equation that is not exact can be made exact by being multiplied through by
equation. Any nonexact differential equation that has a solution also has an integrating
factor, even if that factor is not always particularly easy to find. There are, however,
dy
+ P( x) y = Q( x) .
dx
Equations of this type are soluble through the use of an integrating factor,
dy
µ ( x ) = exp ∫ P ( x) dx . Multiplying through yields µ ( x) + µ ( x) P( x) y = µ ( x)Q( x) . The
dx
35
d
left-hand side becomes ( µ y )′ , and the equation becomes ( µ y ) = µ Q . Integrating
dx
1
( µQ ) dx .
µ∫
y=
ay′′ + by ′ + cy = d ( x) .
In the case that d ( x) = 0 , the equation is said to be homogeneous (the definition here,
which is more common, differs from that of the other type of homogeneous equation).
auxiliary polynomial, in which the nth derivative of y becomes the nth power of m:
This equation is solved for its roots, which form the basis of the solution of the
1. The roots are real and distinct: The general solution is y = c1e m1x + c2 e m2 x .
2. The roots are real and identical: The general solution is y = c1e m1x + c2 xe m1x .
y = eα x ( c1 cos β x + c2 sin β x ) .
36
Linear Algebra
Matrices
column vectors. For a matrix A, the aij entry is in row i and column j. Matrices of the
same size can be added and subtracted element by element. That is, for matrices A and B,
( A ± B )ij = aij ± bij . Clearly, matrix addition and subtraction are commutative. Matrices
can be multiplied by scalars, yielding a new matrix in which each element has been
Two matrices can be multiplied together only if the number of columns of the first
( AB )ij = ri ( A) ⋅ c j ( B) , where ri ( A) denotes the ith row vector of the matrix A and c j ( B )
⎧1 if i = j
δ ij = ⎨ .
⎩0 if i ≠ j
An inverse of a matrix A is another matrix A−1 such that the product AA−1 = A−1 A = I . A
matrix that has an inverse is called invertible. There are several methods for finding an
constants from the right-hand side of the simultaneous equations appended on the right.
37
upper triangular with any zero rows at the bottom, with the first nonzero entry in any row
appearing to the right of any nonzero entry in the row above. There are three elementary
Variables are solved for by working from the bottom up of the reduced matrix, a
simultaneous equations in n unknowns, n equations are required. (It is also required that
they be linearly independent, a concept we’ll visit shortly.) In the case that infinitely
many solutions to a system are possible, it may be the case that one or more of the
variables are free, in which case they can be represented by parameters. The other
coefficient matrix, x is the column vector of the unknowns, and b is the column vector of
the constant terms to the right of the equal sign. In the case that A is invertible, the
Vector Spaces
A vector space V is a set that is closed under the two operations addition and scalar
multiplication. This means that for any x1 , x 2 ∈V , x1 + x 2 ∈V and kx1 , kx 2 ∈ V for any
scalar k. It should be evident from these conditions that the zero vector is required to be
38
If A is an m × n matrix, the set of all vectors x such that Ax = 0 is called the
n
nullspace of A, denoted N ( A) . The nullspace is a subspace of . It should be
apparent that an invertible matrix will only have the trivial subspace, the zero vector, as
its nullspace.
∑k v
i =1
i i , where ki represent scalars, is called a linear combination of the vectors vi .
The set of all possible linear combinations of the vectors vi is called their span. For
3 n
k̂ , their span is . The span of any collection of n-vectors is a subspace of . The
m
vectors vi are called linearly independent if ∑k v
i =1
i i = 0 ⇒ ki = 0 for all i. This means
that no combination of some vectors cancels other vectors out, or that no vector can be
written as a linear combination of other vectors. If this is not the case, the vectors are
said to be linearly dependent. A minimal spanning set of vectors is called a basis for
the vector space they represent. The number of vectors in the basis is the dimension of
that {a1 , a 2 ,… , a n } form the columns of A. The vector set {a1 , a 2 ,… , a n } is linearly
In an m × n matrix, the columns can be regarded as m-vectors, and the rows can
called the column rank, and the maximum number of linearly independent rows is called
39
the row rank. In any matrix, the column rank is equal to the row rank; this is simply the
m
rank of the matrix. The subspace of spanned by the columns is called the column
n
space, CS ( A) , and the subspace of spanned by the rows is called the row space,
∑k c
i =1
i i = b , where ci are the column vectors. This requires that Ak = b have a solution
Determinants
The determinant is only defined for square matrices. For an n × n matrix A, we first
define the minor, M ij , as the determinant of the ( n − 1) × (n − 1) matrix that results from
eliminating row i and column j from matrix A. The cofactor of any matrix entry aij is
Laplace expansion:
n n
det A = ∑ aij cof ( aij ) = ∑ aij cof ( aij ) .
j =1 i =1
This tells us that determinants can be evaluated along any row or column. This fact
makes it clear that any matrix with a column or row made up of only zeros will have a
zero determinant.
The adjugate matrix of A is the transpose of the cofactor matrix of A. That is,
Adj A = ⎡⎣cof ( aij ) ⎤⎦ . This matrix is important in calculating the inverse of a matrix, since
T
1
A−1 = Adj A .
det A
40
Cramer’s rule allows a square linear system Ax = b to be solved exclusively by
determinants. Let A j denote the matrix formed by replacing column j of A with the
Aj
xj = .
A
The Wronskian can be helpful when evaluating functions for linear independence
determinant
f1 ( x) f 2 ( x) f n ( x)
f1′( x) f 2′( x) f n′( x)
W [ f1 , f 2 ,… , f n ] ( x) = .
If the Wronskian is nonzero, the functions are linearly independent. A zero Wronskian,
Linear Transformations
For vector spaces V and W, a linear transformation (or linear map) is a function
n m
by T ( x) = Ax gives a linear transformation. Conversely, if T : → is a linear
n m
case that and are considered with their standard bases, B = eˆ i , in which there is a
41
1 in position i and 0 elsewhere, then column i in A is simply the image of eˆ i . In this case,
n m n
Let T : → be a linear transformation. The set of all vectors x in that
domain space, and its dimension is called the nullity of T. The set of all images of T is
same as the nullspace of A, and the range of T is the same as the column space of A.
The rank plus nullity theorem states that the sum of the nullity and rank equals the
Let T : n
→ n
be a linear operator. If T is bijective, then T has an inverse, T −1 ,
then λ is called an eigenvalue (or characteristic value) of A, and the nonzero vector x
defined only for square matrices. To find these values, we must find scalars and vectors
42
that satisfy the equation Ax = λ x , which can be rewritten as ( A − λ I ) x = 0 . This will
For 2 × 2 square matrices, this is a quadratic that must be solved for λ . Once the
eigenvalues are found, each is substituted into the equation ( A − λ I ) x = 0 and solved for
the corresponding eigenvector (which may be a family of vectors due to possible free
variables). It is an interesting property that ∑ λ = tr( A) , where tr( A) is the trace of the
matrix, the sum of its diagonal entries. Also, ∏ λ = det A . (In both cases, it is
understood that the sum and product are over all eigenvalues for A.)
The Cayley-Hamilton theorem states that every square matrix satisfies its own
n n
linear polynomial in A. That is, if det ( A − λ I ) = ∑ α i λ i = 0 , then ∑α A i
i
= 0 , where we
i =0 i =0
consider A0 = I .
43
Number Theory
If a and b are positive integers, the division algorithm says that we can find unique
sequence to find the greatest common divisor of two integers in the Euclidean
algorithm: Given two numbers, a and b (where we’ll assume a > b ), we know we can
find a quotient and remainder q1 and r1 such that a = q1b + r1 . We then apply the division
algorithm to the divisor and the first remainder: b = q2 r1 + r2 . We continue this process
that yields no remainder is the greatest common divisor, or gcd, of the two numbers. If
gcd( a, b) = 1 , then a and b are said to be relatively prime. The product of the greatest
common divisor and the least common multiple of two numbers is equal to the product of
b
x = x1 + t
gcd(a, b)
a
y = y1 − t
gcd(a, b)
for any t ∈ .
The greatest common divisor of any two numbers can always be written as a
linear combination of those two numbers. This is done by working backward through the
Euclidean algorithm and substituting in steps that lead to the original numbers.
44
Congruence
If ab ≡ ac mod n , then
o b ≡ c mod n if gcd( a, n) = 1
The linear congruence equation, ax ≡ b mod n , has a solution if and only if gcd( a, n)
divides b. If gcd( a, n) = 1 , the solution is unique mod n; if gcd( a, n) > 1 , the solution is
unique mod n
gcd( a , n )
.
Abstract Algebra
operations are typically written showing the set and the operation. In the case of f above,
we might write ( S ,i ) .
following equation always holds: a i(bic ) = ( a ib)ic . A binary structure whose binary
45
Given a binary structure, ( S ,i ) , an element e ∈ S with the property aie = eia = a
for every a ∈ S is called the identity. A semigroup with an identity is called a monoid.
such that aia = aia = e , we call a the inverse of a. A monoid with the property that
If the binary operation of a group ( S ,i ) has the property that aib = bia for every
If the group ( S ,i ) contains precisely n elements for some positive integer n, then
Cyclic Groups
A group G with the property that there exists an element a ∈ G such that
G = {a n : n = 0,1, 2,…} is said to be cyclic, and the element a is called the generator of
the group. (We clarify here that a 0 is the identity, a1 = a , a 2 = a i a , etc.) A cyclic
Consider the set n = {0,1, 2,… , n − 1} and the group ( n , ⊕ ) , the group whose
only if m is relatively prime to n. In more general terms, let G be a cyclic group with
generator a, and let n be the smallest integer such that a n = e . Then the element a m is a
46
Subgroups
Let ( G,i ) be a group. If there’s a subset H ⊆ G such that ( H ,i ) is also a group, then H
as a proper subgroup of G.) Every group has at least two subgroups: the trivial
subgroup, {e} , consisting of just the identity element, and the group itself.
Let a be an element of a group G. With the binary operation defined on G, the set
{a n
:n∈ }, also denoted by a , is a subgroup of G called the cyclic subgroup
generated by a. It consists of all the integer powers of a. In the case that n is a negative
We can define the order of a group element as follows: the order of a ∈ G is the
is the smallest integer n such that a n = e , where e is the identity. The cyclic subgroup
every i in some indexing set I, then the subgroup generated by {ai } is the subgroup
consisting of all finite products of terms of the form aini and is the smallest subgroup of G
containing all the elements ai . If the subgroup is all of G, then we say that G is
Let G be a finite, abelian group of order n. Then G has at least one subgroup
47
Let G be a finite, cyclic group of order n. Then G has exactly one subgroup—
where p is a prime that does not divide m. Then G has at least one subgroup of
Isomorphisms
i e a b
e e a b
a a b e
b b e a
⊕ 0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
48
We notice that, although the symbols are different, the structure is exactly the same.
properties, so if we know details about one group and also know that it is isomorphic to
another group, we know the other group’s details automatically. These details include
the order of the group, the number of subgroups of a particular order, whether it is cyclic
or abelian, etc.
group, called the direct product of the groups G1 and G2 . If G1 and G2 are finite, and
G1 has order m and G2 has order n, then G1 × G2 has order mn. If G1 and G2 are both
abelian, then the notation G1 ⊕ G2 is sometimes used, and the resulting abelian group is
called the direct sum of G1 and G2 . This definition can be generalized to any number of
isomorphic to m1m2 mk .
and the ki are (not necessarily distinct) positive integers. The collection of
49
( pi )
ki
prime powers, , for a given representation of G are known as the
elementary divisors of G.
integers mi are not necessarily distinct, but the list m1 ,… , mt is unique. These
Group Homomorphisms
Let ( G,i ) and ( G′, ∗) be groups. A function φ : G → G ′ with the property that
homomorphism is called an isomorphism. We recall the earlier fact that two groups that
are structurally identical are isomorphic; this is true if an only if there exists a bijective
automorphism.
50
• If H is a subgroup of G, then φ ( H ) is a subgroup of G′ , where
φ ( H ) = {φ (h) : h ∈ H } .
φ −1 ( H ′) = {h ∈ G : φ (h) ∈ H ′} .
Rings
A set R, together with two binary operations (we’ll choose addition, +, and multiplication,
• ( R, + ) is an abelian group;
• ( R,i ) is a semigroup;
• The distributive laws hold; namely, for every a, b, c ∈ R , we have:
51
requirements, we call ( S , +,i ) a subring of ( R, +,i ) . The characteristic of a ring is the
smallest integer n such that na = 0 for every a ∈ R . If no such n exists, as in the case of
the infinite rings , , , and , the ring is said to have characteristic zero. For cases
when char R > 0 , it is sufficient to check for the smallest n such that ni1 = 0 .
Ring Homomorphisms
φ (a + b) = φ (a ) ⊕ φ (b)
.
φ (a × b) = φ (a ) ⊗ φ (b)
• The kernel of a ring homomorphism is the set ker φ = {a ∈ R : φ (a) = 0′} , where 0′
subring of R.
inverse of r in R.
Fields
Let a be a nonzero element of a ring R with unity. Recall that the multiplicative structure
( R,i ) is not required to be a group, so a may not have an inverse. If it does have a
52
namely, if ( R* ,i ) is a group—then R is called a division ring. A commutative division
Additional Topics
Set Theory
( A ∩ B ) = AC ∪ BC
C
( A ∪ B ) = AC ∩ BC
C
( A ∪ B) − C = ( A − C ) ∪ ( B − C )
•
( A ∩ B) − C = ( A − C ) ∩ ( B − C )
A ∩ (B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C )
•
A ∪ (B ∩ C ) = ( A ∪ B) ∩ ( A ∪ C )
Two sets are equivalent if there exists a bijection between them. If there exists a
+
bijection between a set and the positive integers, , we say that the set is countably
infinite with cardinality ℵ0 . Some sets are uncountably infinite, meaning that no
bijection exists between them and the positive integers. The cardinality of the reals is
(The power set of A, ℘( A) is the set of all subsets of A; it has cardinality 2card A .)
53
Combinatorics
For k objects, there are k ! possible arrangements of their order, called a permutation.
For k objects chosen from n total objects, there are P ( n, k ) possible arrangements, where
n!
P ( n, k ) = .
(n − k ) !
If order is not important (as in the drawing of a lottery or a hand of cards), the
possibilities are referred to as combinations, and there are C (n, k ) possibilities, where
P(n, k ) ⎛ n ⎞ n!
C (n, k ) = =⎜ ⎟= .
k! ⎝ k ⎠ (n − k )!k !
We note that the combination C (n, k ) is equal to the binomial coefficient, which comes
n
⎛n⎞
(a + b) = ∑ ⎜ ⎟ a n−k bk .
n
k =0 ⎝ k ⎠
A generalized statement of the pigeonhole principle says that if you are given n
different objects, each of which is painted one of c different colors, then for any integer
54
Probability and Statistics
nonempty subfamily of the power set of S, E ⊆℘( S ) , that satisfies the following two
conditions:
2. If A ∈E , then so is AC = S − A .
By definition, this set is nonempty, so it contains some set A, and by the second condition
listed above, it also contains the complement of A. By the first condition above, it also
contains the union of these sets, which means it contains all of S (and, necessarily, the
empty set).
1. P (∅ ) = 0 .
2. P ( S ) = 1 .
measure on S. The set S is called the sample space, the elements of S are called
outcomes, and the sets in E (which are subsets of S) are called events. With this in
mind, P ( A) is interpreted to be the probability that event A occurs. Two events, A and B,
• P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B ) .
55
A Bernoulli trial is an experiment in which there are only two possible outcomes,
often termed a success or a failure. The probability of exactly k successes in n such trials
⎛n⎞
P(k ; n, p) = ⎜ ⎟ p k q n − k ,
⎝k ⎠
where p is the probability of a success on any given trial and q = 1 − p is the probability
distribution.
this subset is an event. We can associate a function to the random variable such that
abbreviate this by writing FX (t ) = P ( X ≤ t ) . This function gives the probability that the
random variable will take on a value no greater than t. We can also calculate the
that the events X ≤ t1 and t1 < X ≤ t2 are disjoint and considering their union, X ≤ t2 . An
algebraic rearrangement of the probabilities gives the expression above. To close or open
the interval we’re considering, we need only to add or subtract (respectively) the
56
Random variables are often defined so that the probability that X will equal any
particular value is zero. Meaningful results only come from considering an interval that
X can fall into. Such a random variable (and its distribution function) is called
t2 ∞
integrable, so that FX (t2 ) − FX (t1 ) = ∫ f X (t ) dt . It is also required that
t1 ∫
−∞
f X (t ) dt = 1 .
t
P( X ≤ t ) = FX (t ) = ∫ f X ( x) dx .
−∞
of X by:
∞
E ( X ) = µ ( X ) = ∫ tf X (t ) dt .
−∞
∞
Var( X ) = σ 2 ( X ) = ∫ [t − µ ( X )]
2
f X (t ) dt .
−∞
1 ⎛ ( t − µ )2 ⎞
f X (t ) = exp ⎜ − ⎟.
σ 2π ⎜ 2σ 2 ⎟
⎝ ⎠
This function must be integrated numerically; it is often useful to consult tables for the
57
When the mean is set to zero and the standard deviation is set to one, we get the
standard normal probability density for the standardized normal random variable Z:
1 ⎛ u2 ⎞
f Z (u ) = exp ⎜ − ⎟ ,
2π ⎝ 2⎠
⎛t −µ t −µ ⎞
P (t1 < X ≤ t2 ) = P ⎜ 1 <Z≤ 2 .
⎝ σ σ ⎟⎠
commonly denoted by Φ :
z 1 −u2 2
Φ( z ) = ∫ e du .
−∞
2π
P( z1 < Z ≤ z2 ) = Φ( z2 ) − Φ ( z1 ) .
a2
⎛n⎞
P (a1 ≤ X ≤ a2 ) = ∑ ⎜ ⎟ p k q n − k ,
k = a1 ⎝ k ⎠
but this can be unwieldy for large n. With µ = np and σ = npq , we can obtain an
58
Point-Set Topology
1. ∅ and X are in T .
3. If {Oi }i∈I is any collection of sets from T , then their union, ∪ i∈I
Oi , is also in T .
In sum, a topology is always closed under finite intersections and arbitrary unions. The
sets in T are known as open sets. A set is open if all of its elements are interior points,
which means that for every point p ∈ O , p is contained in an open interval that is itself
Hausdorff space is a topological space such that for every pair of distinct points
For any nonempty set X, the collection {∅, X } is always a topology on X, called
the indiscrete (or trivial) topology. The power set of X, ℘( X ) , is also always a
know that ℘( X ) ⊃ {∅, X } , and we say that these collections represent the extremes:
59
℘( X ) is the finest possible topology on X, and {∅, X } is the coarsest possible topology
in X. It is important to point out that U need not be open in X for this definition to apply.
Consider the sets O = ( −b, b ) and S = [ −a, a ] , both in , with a < b . Then
not open in . In this way, we can define the subspace (or relative) topology, TS , with
the union of all open sets that do not intersect A; equivalently, we can say that
ext( A) = int( AC ) . The boundary of A, denoted bd( A) , is the set of all x ∈ X such that
every open set containing x intersects both A and the complement of A; equivalently,
defined above. The set of all the limit points of A is called the derived set of A, denoted
cl( A) = int( A) ∪ bd( A) = A ∪ A′ . A perhaps more useful definition says that the closure
60
A set is closed if and only if it contains all of its boundary and limit points;
equivalently, A is closed if A = cl( A) . Also, a set is closed if and only if its complement
is open; a set is open if and only if its complement is closed. A set A is open if and only
if A = int( A) . It is possible for sets to be simultaneously open and closed. (Consider the
discrete topology, made up of the power set, which contains every possible subset, all of
which are considered open. But since the complement of every open subset is also
contained in the power set, every subset is both open and closed.)
Basis for a Let X be a nonempty set, and let B be a collection of subsets of X satisfying the
Topology
following properties:
2. If B1 and B2 are sets in B and x ∈ B1 ∩ B2 , then there exists a set B3 ∈ B such that
x ∈ B3 ⊆ B1 ∩ B2 .
The collection B is called a basis, and the sets in B are known as basis elements. A
basis is used to generate a topology in that the topology consists of all possible unions of
the basis elements. A subset O ⊂ X belongs to the topology T generated by B if, for
B = {OX × OY : OX ∈ TX ∧ OY ∈ TY } .
Connectedness Let ( X , T ) be a topological space. If there exist disjoint, nonempty open sets O1
contains no pair of disjoint sets whose union is X, then X is said to be a connected space.
61
The following criteria can be used to identify connected spaces:
1. If A and B are connected and they intersect, then their union is also connected.
This holds for any number of connected sets, as long as their intersection is
nonempty.
2. Let A be a connected set, and let B be any set such that A ⊆ B ⊆ cl( A) . Then B is
connected.
4. Let X be a topological space with the property that any two points, x1 , x2 ∈ X can
Compactness
Let ( X , T ) be a topological space. A covering of X is a collection of subsets of X
every open covering of X contains a finite sub-collection that also covers X, then X is said
to be a compact space. Compact spaces can be identified using the following criteria:
( x1 ) + ( x2 ) + ( xn ) < M .)
2 2 2
x = +
62
Metric Let X be a nonempty set, and let d : X × X → be a real-valued function defined
Spaces
on ordered pairs of points in X. The function d is said to be a metric on X if the
2. d ( x, y ) = d ( y , x) .
3. d ( x, z ) ≤ d ( x, y ) + d ( y , z ) .
called a metric space. For ε > 0 , the set Bd ( x, ε ) = { x′ ∈ X : d ( x, x′) < ε } is called an ε-
ball, the open ball of radius ε centered on x. The collection of all ε-balls,
there exists a positive number ε x such that Bd ( x, ε x ) ⊆ O , which says that every point in
Continuity We recall the definition of a continuous function, which says that a function f is
continuous at the point x0 if, for every ε > 0 , there exists a number δ > 0 such that
continuous at the point x0 if, for every ε > 0 , there exists a number δ > 0 such that
x0 if for every open set O containing f ( x0 ) , the inverse image f −1 (O ) is an open set
63
containing x0 . This is equivalent to saying f −1 ( Bd2 ( f ( x0 ), ε ) ) = Bd1 ( x0 , δ ) for suitable
is continuous if, for every open set O ∈ X 2 (i.e., for every O ∈ T2 ), the inverse image,
check that the inverse image of the basis elements are open in X 1 . The following are
1. The set f −1 (C ) is closed in X 1 for every closed subset C ⊂ X 2 . (In fact, this is
open in X 2 . (Note the difference between this and the definition of continuity.) If
64
concept of an isomorphism between groups or rings. That is, if an isomorphism exists
between two algebraic structures, that structure is preserved, and they are algebraically
Real Analysis
u such that u ≥ x for every x ∈ X . This number is the least upper bound of X, also
number l such that l ≤ x for every x ∈ X . This number is called the greatest lower
bound, also known as the infimum of X or inf X . The supremum and infimum are
no matter how small, there exists a number N such that, for m, n > N , xn − xm < ε . This
means that the successive elements of a Cauchy sequence grow arbitrarily close to each
other. Every Cauchy sequence of real numbers converges. Any metric space in which
65
Lebesgue Let denote the set ∪ {∞} . Let A be any subset of , and find a countable,
Measure
∞
open covering of A by intervals of the form ( ai , bi ) . We have A ⊆ ∪ ( ai , bi ) . We define
i =1
⎧ ∞ ∞
⎫
µ ∗ ( A) = inf ⎨∑ ( bi − ai ) , for A ⊆ ∪ ( ai , bi ) ⎬ ,
⎩ i =1 i =1 ⎭
measure zero.
Ever open and closed set in is Lebesgue measurable, and every finite or
measurable. Not all of ℘( ) is measurable, but almost all subsets of that arise in
practice are measurable. Some properties of the measure and measurable sets follow:
1. The empty set is measurable and has measure zero. If M = {m} is a one-element
66
the sum of the lengths of the intervals. If M is a measurable set that contains an
⎛∞ ⎞ ∞
of disjoint, measurable sets, then µ ⎜ ∪ M i ⎟ = ∑ µ ( M i ) .
⎝ i =1 ⎠ i =1
3. If M 1 , M 2 ∈ M and M 1 ⊆ M 2 , then µ ( M 1 ) ≤ µ ( M 2 ) .
subset of is measurable, it follows from our earlier definition of continuity that every
functions that are measurable. The sum, difference, and product of measurable functions
⎧0 if x ∉ A
χ A ( x) = ⎨ .
⎩1 if x ∈ A
This function is measurable if and only if A is a measurable set. Using this function, we
can construct another function, the step function, which is a finite linear combination of
characteristic functions with real coefficients. Such a function typically takes the form
n
s ( x) = ∑ ai χ Ai ( x) . Step functions provide the foundation for constructing the Lebesgue
i =1
exists a sequence of step functions, ( s1 , s2 ,…) , such that 0 ≤ s1 ≤ s2 ≤ … and for which
lim n →∞ sn ( x) = f ( x) .
67
n
Let s = ∑ ai χ Ai be a step function such that every set Ai is measurable. Then s is
i =1
∫ s d µ = ∑ ai µ ( Ai ) .
i =1
∫ f d µ = sup {∫ s d µ} ,
where the supremum is taken over all integrable, nonnegative step functions, s, such that
s≤ f . For functions that are not everywhere nonnegative, we consider that every
where f + = 1
2 (f + f ) = max ( f , 0 ) , the positive part, and f − = 1
2 (f − f ) = max ( − f , 0 ) ,
the negative part. This means that an arbitrary measurable function f is Lebesgue
integrable if both its positive and negative parts are Lebesgue integrable, and
∫ f dµ = ∫ f dµ + ∫ f − dµ .
+
Riemann integral splits the domain of f into partitions and calculates areas under a curve
infinitesimal, the Lebesgue integral splits the range of f into partitions and approximates f
68
Complex Analysis
corresponding to the angle made with the positive real axis. The principal value of θ ,
Two complex numbers can be easily multiplied and divided in polar form:
The Taylor series expansion for the polar form of complex numbers provides us with the
( cos θ + i sin θ )
n
= cos nθ + i sin nθ .
⎧ i ( 2π k n ) ⎛ 2π k ⎞ ⎛ 2π k ⎞ ⎫
⎨e = cos ⎜ ⎟ + i sin ⎜ ⎟ : k = 0,1,… , n − 1⎬ .
⎩ ⎝ n ⎠ ⎝ n ⎠ ⎭
The nth roots of any complex number can be found by multiplying the principal nth root
69
The logarithm of a complex number can be defined thanks to our ability to
express such a number exponentially. But we must recall that adding any integer
Therefore, any nonzero complex number has infinitely many logarithms, but we define
the principle value by Log z = ln z + i Arg z . Using the concept of the complex
eiz + e− iz eiz − e− iz
cos z = and sin z = .
2 2i
The rest of the functions are defined by applying their familiar relations to the definitions
for the sine and cosine above. The trigonometric identities defined for real numbers are
valid for complex numbers as well. One major difference between the complex sine and
cosine and their real counterparts is that, while cos x ≤ 1 and sin x ≤ 1 for x ∈ , the
complex versions are unbounded, and the norms sin z and cos z can take on any
e x + e− x e x − e− x
cosh x = and sinh x = .
2 2
They share some properties with their non-hyperbolic namesakes, such as cosh 0 = 1 and
sinh 0 = 0 , cosh ( − x ) = cosh x and sinh ( − x ) = − sinh x . Then there is the identity
cosh 2 x − sinh 2 x = 1 , which is similar to the Pythagorean trig identity for the sine and
cosine.
70
From our definitions, we see that
cos(iz ) = cosh z
cosh(iz ) = cos z
sin(iz ) = i sinh z
sinh(iz ) = i sin z.
With these formulas, we can develop equations that allow us to evaluate cos z and sin z ,
where z = x + iy , in terms of real x and y. Using the angle sum formulas and the relations
2
cos z = cos z cos z = cos 2 x + sinh 2 y
2
.
sin z = sin zsin z = sin 2 x + sinh 2 y
Derivatives Functions of a complex variable can be differentiated like their real counterparts.
of
Complex- In fact, a table of derivatives for real-valued functions will apply for complex-valued
Valued
Functions functions. First, it is important to remark that f ( z ) = f ( x + iy ) = u ( x, y ) + iv ( x, y ) for
some real-valued functions u and v. We say that ℜ( f ) = u ( x, y ) , denoting the real part,
that there is a slight difference in how the definition of the derivative behaves, which
f ( z + h) − f ( z )
f ′( z ) = lim
h→0 h
looks the same for the complex variable z, but it is important to note here that h is also a
complex number with real and imaginary parts. Since the field is not ordered, h can
71
take infinitely many routes to zero. Applying the definition twice, with h approaching the
origin from the real axis in one case and the imaginary axis in another, we see that
∂u ∂v ∂v ∂u
+ i = f ′( z ) = − i
∂x ∂x ∂y ∂x
if f is differentiable. Equating the real and imaginary parts of the expression above, we
∂u ∂v ∂v ∂u
= and =− ,
∂x ∂y ∂x ∂y
O, then the Cauchy-Riemann equations hold throughout that set. Differentiating the C-R
∂ 2u ∂ ⎛ ∂u ⎞ ∂ ⎛ ∂v ⎞ ∂ 2 v ∂ 2u ∂ ⎛ ∂u ⎞ ∂ ⎛ ∂v ⎞ ∂ 2v
= ⎜ ⎟= ⎜ ⎟= and = ⎜ ⎟ = ⎜ − ⎟ = − ,
∂x 2 ∂x ⎝ ∂x ⎠ ∂x ⎝ ∂y ⎠ ∂x∂y ∂y 2 ∂y ⎝ ∂y ⎠ ∂y ⎝ ∂x ⎠ ∂y∂x
∂ 2u ∂ 2u
+ = 0.
∂x 2 ∂y 2
This is Laplace’s equation. A function satisfying this equation in some open set O is
said to be harmonic in O. Using the same method as above, we also see that
∂ 2v ∂ 2v
+ =0.
∂x 2 ∂y 2
∇ 2u = ∇ 2 v = 0 ,
72
where ∇ 2 = ∇ ⋅∇ is the scalar Laplacian operator. If f ( z ) is differentiable throughout
then ℜ( f ) and ℑ( f ) , its component functions, must have continuous partial derivatives
derivatives of all orders must also be analytic. This is a strong result without a
a, b ∈ , we have:
b b
∫ f ( z ) dz = ∫ f ( z (t ) ) ⋅ z ′(t ) dt = ∫ F ′ ( z (t ) ) dt = F ( z (b) ) − F ( z (a) ) ,
C a a
where d
dz F ( z ) = f ( z ) . This expression suggests two methods for attacking the line
integral, one of which may be easier than the other in a given situation. Using the first
y. Then the product with z ′(t ) is calculated, and the integral is carried out with respect to
the parameter t over the limits of integration, a and b. The second method, which appears
b z (b )
∫ f ( z (t ) ) ⋅ z′(t ) dt = ∫
a z(a)
f ( z ) dz .
73
We note that this method changes the limits of integration from real numbers to complex
numbers, as one would expect after changing the variable the integral is performed with
O⊂ , and ∫ C
f ( z ) dz = 0 for every closed curve C in O, then f ( z ) is analytic
in O.
1 f ( z)
simple, closed path C surrounding the point z0 , then f ( z0 ) =
2π i ∫ C z − z0
dz .
We also know that the nth derivative is analytic at z0 for all n ∈ , and we have
n! f ( z)
f ( n ) ( z0 ) =
2π i ∫ (z − z )
C n +1
dz . This tells us that if we know the value of f ( z ) at
0
every point on a curve, then we also know its value—and the value of all its
n!M
circle, then f ( n ) ( z0 ) ≤ .
rn
74
5. Liouville’s theorem: If f ( z ) is an entire function that is bounded, then f ( z )
O.
Taylor Taylor series can be formed for complex-valued functions in the same way that they
Series for
Complex-
Valued can for real-valued functions. The only difference is that the idea of the interval of
Functions
convergence is replaced by the disk of convergence, with z − z0 < R representing the
∑a (z − z )
n
The power series n 0 converges absolutely for all z that satisfy
n =0
z − z0 < R and diverges for all z such that z − z0 > R , where an = f ( n ) ( z0 ) n ! and
lim n an = 0 , then the series converges for all z. Every function that is analytic in an
n →∞
open, connected subset O of the complex plane can be expanded in a power series whose
75
Let z0 be a point in the complex plane, and let R be a positive number. The set of
all z such that 0 < z − z0 < R is called the punctured open disk of radius R centered at z0 .
analytic at every point in some punctured open disk centered at z0 , then we call z0 an
isolated singularity.
g ( z)
f ( z) = ,
( z − z0 )
n
singularities occur.
∞ ∞
∑a (z − z ) + ∑ an ( z − z 0 ) .
−n n
−n 0
n =1 n=0
76
The sum on the right, referred to as the analytic part, is simply the Taylor series for the
function. The sum on the left, referred to as the singular (or principal) part, uses
1 f ( z)
a− n =
2π i ∫ (z − z )
C − n +1
dz ,
0
where C is a simple, closed curve contained in the annulus. (This definition aside, in
practice, the Laurent coefficients are typically derived algebraically from the Taylor
series.)
If the singular part of the Laurent series contains at least one term, but only
finitely many terms, then z0 is a pole. In particular, if the singular part has a− n = 0 for
all n greater than some integer k, where a− k ≠ 0 , then z0 is a pole of order k. If, however,
the singular part contains infinitely many terms, then z0 is an essential singularity.
1
a −1 =
2π i ∫ C
f ( z ) dz , which suggests that we can easily calculate the integral if we have
the value of that coefficient. That coefficient is called the residue of f ( z ) at the
1 d k −1
Res ( z0 , f ) = ⋅ lim k −1 ⎡( z − z0 ) f ( z ) ⎤ .
k
( k + 1)! z → z0 dz ⎣ ⎦
∫ f ( z ) dz = 2π i ⋅ ∑ Res ( zm , f ) .
C
m =1
77
78
Gallery of Graphs
79
80
81