Professional Documents
Culture Documents
A Tourist Brochure
John Duggan
November 29, 2013
Contents
1 Opening Remarks
2 Set Theory
3 Linear Algebra
4 Euclidean Topology
13
18
6 Differentiability
23
7 Lebesgue Measure
27
8 Differential Topology
32
9 Measurable Functions
38
10 Convergence of Measures
45
11 Convergence of Functions
48
12 Product Measurability
54
13 Metric Spaces
58
63
15 Transition Probabilities
71
16 Continuous Correspondences
79
17 Measurable Correspondences
85
18 Topological Spaces
91
98
20 Maximal Elements
106
A Technical Details
109
Opening Remarks
This survey is predicated on the assumption that there is some value in gathering
together a subset of mathematical tools in a systematic, if very selective and
terse, way. Aside from the obvious disadvantages of a cursory and incomplete
treatment, the advantages are that a variety of concepts can be distilled to
their essence, connections between concepts can be drawn more easily, and it
makes for relatively quick reading (because there are no proofs). The trade-off
is, basically, depth for breadth of understanding. I have tried to select tools that
would be especially useful to someone whose goal is mathematical modeling (at
least in the fields of economics or political science), rather than pure math. This
reflects, admittedly, both the preferences and the limitations of the author. It
may go without saying, but my intention in writing this is to provide an overview
of some aspects of mathematical analysis it is not to write a math book that
is suitable for citation or as a substitute for serious study.
A distinguishing feature of the survey is that it does not attempt to present
definitions and results in their greatest generality; in some cases, I introduce
concepts in very simple terms, then return to them later in greater generality,
then again in even greater generality. I have deliberately developed the material
avoiding derivatives as linear approximations (focussing instead on directional
derivatives), with almost no reference to Borel sets or -algebras (relying on the
Lebesgue measurable sets for needed measurable structure), without mention
of normed linear spaces (limiting the discussion to metric properties of these
spaces) or the theory of linear operators, defining Lp -spaces only for real-valued
(rather than vector-valued) functions, and pushing discussion of abstract topological spaces to the end. This approach entails considerable redundancy and
some loss of generality (the absence of the Borel sets is particularly inconvenient), but I hope it is an effective pedagogical technique for those of us outside
the measure zero set of natural mathematicians, who see concepts more clearly
with less intervening structure.
The inspiration for the current endeavor is a pair of mathematical summaries
in outstanding books by Werner Hildenbrand and by Andreu Mas-Colell; both
provide excellent coverage of somewhat different slices of mathematics, and I
draw on some of that material here. For a better understanding, the reader
should consult any of a large number of deeper and more thorough references.
I list a few below, but the book by Aliprantis and Border is noteworthy for its
encyclopedic scope; many of the results presented here are just special cases of
more general statements in that book.
[1] C. Aliprantis and K. Border (2006) Infinite Dimensional Analysis: A
Hitchhikers Guide, 3rd ed., Springer: New York, NY.
[2] C. Aliprantis and O. Burkinshaw (1990) Principles of Real Analysis, 2nd
ed., Academic Press: New York, NY.
Set Theory
Mathematical analysis takes the concept of a set and membership within a set
as primitive notions. Roughly, a set is a collection of objects that belong to
it, but this begs the questions, What is a collection? and How does an
object belong to a collection? Formally, we assume that sets and membership
satisfy the standard axioms of Zermelo-Frankel set theory,1 which allows us
to capture the familiar number systems. In particular, the Zermelo-Frenkel
axioms provide a way of working with the natural numbers N, the integers Z,
the rational numbers Q, the real numbers R, the non-negative real numbers R+ ,
and the positive real numbers R++ . Normally, we will take as given a universe
of discourse that contains all objects of interest, and we consider sets, denoted
A, B, X, Y , etc., made up of objects, denoted a, b, x, y, etc., from the universe
of discourse. When referring specifically to natural numbers, we typically use
the variables i, j, k, , m, n to denote such objects. In general, we write a A
if a belongs to A and a
/ A otherwise. We typically specify a set by listing its
elements between braces, e.g., {1, 2, 3}, or by defining a property possessed by
exactly the elements of the set, e.g., {x N | 1 x 3}. In general, if the
universe of discourse is X and if p(x) is a well-formed formula containing just
one free variable x, then we can form the set of elements possessing the property
described by p(x) as {x X | p(x)}, as long as the predicate function does not
refer to the set itself.2
The real numbers include the following important types of sets: the closed
interval [a, b] = {x R | a x b}, the open interval (a, b) = {x R | a <
x < b}, and the half-closed intervals [a, b) = {x R | a x < b} and (a, b] =
{x R | a < x b}. We also have half-lines, written [a, ), (, b], (a, ),
and (, b), where the infinity symbol indicates that the set extends without
bound in one direction. We sometimes work with the extended real numbers,
R = R {, }, where by convention represents a quantity larger than
all other extended real numbers, and we write x < for all x R \ {}.
Similarly, is a quantity less than all other extended real numbers. Also by
1 For a brief but instructive overview, see Appendix A.2 of Dudley (2002) Real Analysis
and Probability, Cambridge University Press: Cambridge, MA.
2 The risk is Russells paradox, in which we try to form the set of all sets that do not include
themselves.
intersection
union
relative complement
i=1
n
[
i=1
Ai
= {x | for all i = 1, . . . , n, x Ai }
Ai
= {x | for some i = 1, . . . , n, x Ai }
for the intersection and union of the sets. The new sets consist, respectively, of
the elements belonging to all and to some of the original sets. Given a set A,
a collection
{A1 , . . . , An } of nonempty subsets of A is a finite partition of A if
Sn
A = i=1 Ai and it is pairwise disjoint, i.e., for all distinct i, j = 1, . . . , n, we
have Ai Aj = . In the above, the sets are indexed by i, which ranges over
the index set {1, 2, . . . , n}. When there is one set for each natural number, we
may index the sets by the natural numbers, as in A1 , A2 , . . ., and write
Ai
and
i=1
i=1
Ai
for the intersection and union of the collection, which consist, respectively, of
the elements belonging to all and to some of the sets. Finally, we can extend
these operations to arbitrary collections of sets. When A is a collection of sets,
we write
\
A = {x | for all A A, x A}
[
A = {x | for some A A, x A}
Given one set for each natural number and indexing the sets as A1 , A2 , . . . , we
define two limiting operations: the limit supremum (or limsup), denoted
lim sup An
An ,
m=1 n=m
consists of any element x that belongs to infinitely many An , and the limit
infimum (or liminf ), denoted
lim inf An
n
An
m=1 n=m
consists of any x that eventually (for n high enough) belongs to all An . When
lim sup An = lim inf An , we write lim An for this limit.
Note that the natural numbers satisfy the well-ordering axiom with respect to
the usual greater than or equal to relation: for every nonempty subset Y of
natural numbers, there exists x Y such that for all y Y , we have x y;
this important property enables proof by induction. This axiom is not satisfied
by the real numbers with respect to the greater than or equal to relation, for
there is no smallest positive real number. The real numbers do satisfy the
completeness axiom, which states that if a nonempty subset is bounded below,
then it has an infimum. Formally, A R is bounded below if there exists c R
such that for all x A, we have c x. If A is nonempty and bounded below,
then the infimum (or greatest lower bound ) of A, denoted inf A, is the greatest
lower bound of A, i.e., it is a lower bound of A, and it satisfies inf A c for
all other lower bounds c. By convention, if A is empty, then inf A = ; and
if A is not bounded below, then inf A = . The completeness axiom simply
states that if A is nonempty and bounded below, then it does indeed have an
infimum, in which case the infimum is unique. Note that the rational numbers
do not satisfy the completeness axiom.
A set A R is bounded above if there exists c R such that for all x A,
we have x c. If A is nonempty and bounded above, then the supremum (or
5
least upper bound ) of A, denoted sup A, is the least upper bound of A, i.e., it
is an upper bound of A, and it satisfies sup A c for all other upper bounds
c. By convention, if A is empty, then sup A = ; and if A is not bounded
above, then sup A = . By the completeness axiom, every nonempty set that is
bounded above has a supremum, in which case it is unique. A set A is bounded
if it is bounded above and below. A real number x R is a maximum of A,
denoted max A, if it is a supremum of A and x A; it is a minimum of A,
denoted min A, if it is an infimum of A and x A. The well-ordering property
of the natural numbers can then be rephrased as saying that every nonempty
subset has a minimum with respect to the usual ordering.
Given two sets, A first and B second, an ordered pair is a list, typically denoted
(x, y), consisting of an element of A in the first component and an element of B
in the second, and the set of all ordered pairs is the Cartesian product A B.
This concept is distinct from the set {x, y}, because we now distinguish between
(x, y) and (y, x) (when x and y are distinct). To extend this idea, suppose we
have a finite number n of sets A1 , . . . , An . An ordered n-tuple (a1 , . . . , an ) of
elements from these sets is a list such that ai is an element of Ai for i = 1, . . . , n.
The set of Q
all such ordered pairs is the Cartesian product, written A1 An ,
or simply ni=1 Ai . If the sets are identical, i.e., Ai = A for all i = 1, . . . , n,
then this can be simplified further to An . And when A = R, we refer to Rn as
Euclidean space, and n is the dimension of the space.
We can extend this idea one step further to take the product of an infinite number sets, one for each natural
Q number and indexed as A1 , A2 , . . .. An element of
the Cartesian product i=1 Ai is a sequence, which we can denote (a1 , a2 , . . .),
such that ai Ai for each i = 1, 2, . . .. When the sets in the product are the
same, say Ai = X for each i = 1, 2, . . ., we can write X for the Cartesian
product, and an element (x1 , x2 , . . .) of X is a sequence in X. Although the
parenthetic notation extends the formalism of ordered n-tuple, it is common
convention to denote a sequence in X using set-theoretic notation {xi }. Technically, this is not entirely correct, because the list representation for sets does
not entail a particular ordering of elements, while we count two sequences that
differ with respect to the ordering of elements as distinct.
Given sets X and Y , a function (or mapping) from X to Y assigns to each
element x X a particular element y Y , which may vary with x. Such
a function is often denoted f , or more informatively, f : X Y , and we let
f (x) Y be the element assigned to x. Here, X is the domain of the function
and Y is the codomain. When X Y , a trivial example of a function f : X Y
is the identity function, defined by f (x) = x. Given A X and B Y , the
image of A and preimage of B are, respectively,
f (A) = {f (x) | x A}
and
and
respectively, with weak and strict lower contour sets defined analogously, with
the reverse inequalities.
Given Y X and f : X R, an element x Y is a maximizer of f over Y
if f (x) is a maximum of f (Y ), i.e., if for all y Y , f (x) f (y); it is simply
a maximizer if it maximizes f over its domain. The concept of minimizer is
defined similarly; in fact, x minimizes f if and only if it maximizes f . The
maximum value of f over Y , when it is well-defined, is sometimes denoted
maxxY f (x), and the minimum value is denoted minxY f (x). Denote the set
of maximizers of f over Y by arg maxxY f (x), and denote the minimizers of f
over Y by arg minxY f (x).
We say a set A is countable if there is an injective mapping f : A N, and it
7
Linear Algebra
Recall that the product of n copies of the real line is a finite-dimensional Euclidean space, where n is the dimension of the space. An element of Rn is an
ordered n-tuple x = (x1 , . . . , xn ) called a vector. Notable vectors are the zero
vector with zeroes in each coordinate, denoted simply 0, and the ith unit coordinate vector ei = (0, . . . , 0, 1, 0, . . . , 0), which is defined for each i = 1, . . . , n and
has a one in the ith coordinate and zeroes elsewhere. We can multiply a vector
x = (x1 , . . . , xn ) Rn by a scalar R (applying to each coordinate of x)
and add two vectors x, y Rn (summing coordinate by coordinate), writing the
new vectors as x and x + y, respectively. Formally,
x + y
(x1 + y1 , . . . , xn + yn ).
{x + y | x A, y B}
and
++
Hp,c
= {x Rn | p x > c}.
+
++
Given a linear function f : Rn R with non-zero gradient p and value c, the hyperplane of f at c, denoted Hp,c , is the level set of f at c, i.e., Hp,c = {x Rn | p
x = c}. Clearly, Hp,c is a linear space if and only if c = 0, which holds if and only
if Hp,c contains the zero vector. In general, a hyperplane is any level set of a linear function with
non-zero gradient. Let two sets X, Y Rn be
p
Y
convex and disjoint; then a weak version of the
separating hyperplane theorem establishes that
X
the two sets can be separated by a linear funcf =c
tion with non-zero gradient, i.e., there is a linear
function f with gradient p 6= 0 such that for all
x X and all y Y , we have f (x) = p x p y = f (y). A sharper form of
this result is stated with the help of topological ideas in Section 4.
Given a convex set X Rn , a function f : X R is concave if for all distinct
x, y X and all (0, 1), we have f (x + (1 )y) f (x) + (1 )f (y);
10
and the definition of a strictly concave function is identical but with strict inequality replacing weak. The definition of convex and strictly convex function
is obtained by replacing the weak and strict greater than relation, respectively,
with weak and strict less than. We say f : X R is quasi-concave if for all
distinct x, y X and all (0, 1), we have f (x+(1)y) min{f (x), f (y)};
and the definition of strictly quasi-concave function is identical but with strict
inequality replacing weak. And, as above, the definition of quasi-convex and
strictly quasi-convex function is obtained by replacing greater than with less
than. The set of maximizers of a quasi-concave function is convex, and the set
of maximizers of a strictly quasi-concave function is either singleton or empty;
analogous statements for minimizers of (strictly) quasi-convex functions apply.
Given vectors x1 , . . . , xm Rn , we say {x1 , . . . , xm } is linearly dependent if
there exist scalars 1 , , m R (not all zero) such that
1 x1 + 2 x2 + + m xm
0,
and the set is linearly independent if it is not linearly dependent. It turns out
that {x1 , . . . , xm } is linearly independent if no vector xj belongs to the span of
the other vectors. The rank of {x1 , . . . , xm } is the size of the largest linearly
independent subset of the set; this is positive as long as at least one xj is nonzero. Given a linear space L, a basis is a linearly independent set that spans L.
Every nontrivial linear space has a basis consisting of no more than n vectors;
the bases of a given linear space all contain the same number of elements; and
this number is the dimension of L, written dim(L). In particular, dim({0}) = 0,
and the rank of {x1 , . . . , xm } is the dimension of its span.
A function f : Rn Rm takes vector values, f (x) = (f1 (x), . . . , fm (x)), and
we say that such a function is linear if each component function fj is linear,
j = 1, . . . , m, extending the above definition. Given a linear function f =
(f1 , . . . , fm ) : Rn Rm , two linear spaces are of interest:
ker(f ) =
= n.
Letting p1 , . . . , pm be the gradients corresponding to the linear component functions of f , ker(f ) = {0} if and only if {p1 , . . . , pm } are linearly independent,
which holds if and only if for each y range(f ), there is a unique x Rn such
that y = f (x), i.e., f is injective. If, in addition, m = n, then range(f ) has
11
a1,1 a1,n
.. ,
..
A = ...
.
.
am,1 am,n
Pn
j=1 tj a1,j
Pm
Pm
..
.
At =
and s A =
i=1 si ai,1
i=1 si ai,n
.
Pn
j=1 tj am,j
These operations are associative when combined, i.e., (s A)t = s (At), and
produce a 1 1 matrix, i.e., a number. When A is square, say n n, the
product is simply
t At
n X
n
X
ti tj ai,j .
i=1 j=1
Of course, we can interpret the ith row of A as the gradient pi = (ai,1 , . . . , ai,n ) of
a linear function fi , i = 1, . . . , m, in which case Ax is the vector (f1 (x), . . . , fm (x))
for each x Rn . As this observation suggests, a linear function f : Rn Rm
has a unique matrix representation A with the ith row of A equal to the gradient
of fi ; and conversely, every m n matrix A determines a unique linear function
f : Rn Rm with the gradient of fi equal to the ith row of A.
The row rank of the m n matrix A is the rank of the set of vectors making up
its rows, and the column rank of A is the rank of its columns; another form of
the fundamental theorem of linear algebra is that these quantities are the same,
and we can then unambiguously refer to the rank of a matrix. It has full column
(resp. row ) rank if the rank of the columns is n (resp. rows is n); equivalently,
the columns (resp. rows) are linearly independent. We say an n n matrix A
has full rank (or is non-singular ) if its row and column rank equal n; otherwise,
it is singular ; it is symmetric if A = A , or equivalently if ai,j = aj,i for all
12
Euclidean Topology
Euclidean topology refers to the study of intervals, balls, and their generalization
to higher dimensions. Formally, the open ball of
radius r > 0 around a vector x Rn is the set
Br (x) = {y Rn | ||y x|| < r} of vectors strictly
within distance r of x, depicted to the right for
r
n = 2. The sphere of radius r > 0 around x is
x
the set Sr (x) = {y Rn | ||y x|| = r} of vectors
that are exactly distance r from x; in the figure,
Br (x)
this set consists of the dashed boundary of the
open ball. The disc of radius r around x, denoted
Dr (x) = {y Rn | ||y x|| r}, is the union of the open ball and sphere of
radius r around x.
A vector x in Rn is an interior point of a set Y Rn if there exists > 0 such
that B (x) Y ; it is a boundary point of Y if for all > 0, B (x) Y 6= and
B (x) \ Y 6= ; and it is a closure point of Y if for all > 0, B (x) Y 6= . The
boundary of Y , denoted bd(Y ), consists of all of its boundary points; the interior
13
of Y , denoted int(Y ), consists of all of its interior points, i.e., int(Y ) = Y \bd(Y );
and the closure of Y , denoted clos(Y ), consists of all interior and boundary
points, i.e., clos(Y ) = int(Y ) bd(Y ). A set Y is open if Y = int(Y ), and it is
closed if Y = clos(Y ); equivalently, Y is open if is disjoint from its boundary,
and it is closed if it contains its boundary. Note that int(Y ) is the largest
open subset of Y , in the sense that if G Y is open, then G int(Y ). Also,
clos(Y ) is the smallest closed set containing Y , so that if F Y is closed, then
clos(Y ) F . Furthermore, a set is closed if and only if its complement is open,
and it is open if and only if its complement is closed. It is easily verified that
and n are both open; arbitrary unions of open sets are open; and finite
intersections of open sets are open. Furthermore, and n are both closed;
finite unions of closed sets are closed; and arbitrary intersections of closed sets
are closed. Of course, closed intervals and closed half-lines are closed subsets of
the real line, closed half-spaces are closed subsets of Rn , and any finite subset
of n is automatically closed.
Recall that a sequence in Rn is a countably infinite list (x1 , x2 , . . .) (Rn ) of
vectors indexed by the natural numbers m N, with the convention being that
we instead use set-theoretic notation, as in {xm }.4 We say a sequence {xm }
converges to a vector x if it eventually becomes arbitrarily close to x: for all
> 0, there exists k such that for all m k, we have xm B (x). This is
written xm x or limm xm = x, the sequence is convergent, and x is the
limit of the sequence. A sequence {m } of real numbers diverges to infinity,
written limm m = , if for all c R, there exists k such that for all m k,
we have m c. Similarly, the sequence diverges to negative infinity, written
limm m = , if {m } diverges to infinity. In either case, the sequence
is divergent. The sequence is increasing if higher indices are assigned to weakly
higher numbers, i.e., m k implies m k ; and it is decreasing if {m }
is increasing. Write m if the sequence is increasing and converges to ,
and write m if it is decreasing and converges to . If a sequence of real
numbers is increasing and bounded above, or if it is decreasing and bounded
below, then it is convergent. Of course, a sequence is strictly increasing if it is
increasing and has no repetitions, i.e., m 6= k implies m 6= k , and it is strictly
decreasing if {m } is strictly increasing.
As an example, given a sequence {m } of non-negative
real numbers, the sePm
is called
quence {m } of partial sums defined by
=
k
m
k=1
Pm a series and
P
is increasing. In particular, we have m=1 21m = limm k=1 21k = 1. A
sequence {m } of real numbers converges asymptotically to if it converges to
but does not hit it infinitely often: m and there exists k such that
for all m k, we have m 6= . In this case, we write m
. Given a
sequence {m } in R, define lim supm m = limm sup{k | k m}, and
define lim inf n m = limm inf{k | k m}. These ideas should not be
4 Note that we now use superscripts to index elements of a sequence, because subscripts
may appear to refer to the coordinate of a vector.
14
confused with lim sup An and lim inf An in the context of set theory.
Given two sequences {xm } and {y m } in Rn , we say {y m } is a subsequence of
{xm } if there is a mapping : N N that is strictly increasing and such that
for all m N, we have y m = x(m) . Intuitively, the subsequence {y m } is the
result of deleting elements of the original sequence {xm }. For another way of
viewing a subsequence of {xm }, let {mk } be a strictly increasing sequence of
natural numbers indexed by k; then for each k, xmk is an element of the original
sequence, and {xmk } is a selection of elements from the original, one for each
natural number k. Thus, using different indexing variables for clarity, {y k } is a
subsequence of {xm } if and only if there is a strictly increasing sequence {mk }
of natural numbers such that for all k = 1, 2, . . ., we have y k = xmk . For this
reason, a subsequence of {xm } is usually written {xmk }, where it is understood
that {mk } is a strictly increasing sequence of natural numbers indexed by k. In
some contexts, there is actually no need to have special notation for a subsequence of {xm }, and we can then just use the notation for the original sequence.
A sequence {xm } converges to x in Rn if and only if every subsequence of {xm }
converges to x.
A set Y Rn is closed if and only if the limits of all convergent sequences in
Y belong to Y . Given a sequence {Ym } of subsets of Rn , define the topological
limit superior and the topological limit inferior of the sequence, respectively, as
n for all > 0 and all n, there exists
ls({Ym }) =
xR
m n such that Ym B (x) 6=
and
li({Ym }) =
x Rn
for all > 0, there exists n such that
.
for all m n, Ym B (x) 6=
The former consists of limits of all subsequences of elements from the sets {Ym },
and the latter consists of limits of all sequences of elements from the sets. Both
sets are closed. For the case of a sequence {xm }, which can be viewed as a
sequence of singleton sets Ym = {xm }, the set ls({xm }) is referred to as the
accumulation points of the sequence, and {xm } converges to x if and only if
ls({xm }) = li({xm }) = {x}. Given a sequence {m } of real numbers that
is bounded above, it follows that lim supm m = sup(ls({m })), i.e., the
limsup of the sequence identifies its largest accumulation point; similar remarks
hold for the liminf of a sequence that is bounded below.
A set Y Rn is bounded if there exists r > 0 such that Y Br (0). It is compact
if it is closed and bounded; equivalently, Y is compact if every sequence in Y
has a subsequence that converges to an element of Y S
. We say a collection U
of open subsets of Rn is an open cover of Y if Y U, and we say U is a
subcover of U if it is an open cover of Y and U U. Then Y is compact if
15
and only if every open cover of Y has a finite subcover.5 Let K be a collection
of closed subsets of K satisfying the finite intersection
property, which means
T
that for all m and all K1 , . . . , Km K, we have m
K
j=1 T j 6= . If K is compact,
then the collection has nonempty intersection, i.e., K 6= . Conversely, if
every collection K of closed subsets of K with the finite intersection property
has nonempty intersection, then K is compact. It is easily verified that is
compact; finite unions of compact sets are compact; and arbitrary intersections
of compact sets are compact. Finite subsets are automatically compact. If a
set K is compact, then every closed subset of K is compact. A special feature
of Rn is that the convex hull of a compact set is compact. Every nonempty,
compact subset of the real line has a maximum and a minimum.
A set Y Rn is connected if there do not exist open sets U, V Rn such that
U V = , Y U V , U Y 6= , and V Y 6= . In words, X is connected
if is not possible to break X into two parts by intersecting it with two disjoint
open sets. Every convex set is connected (but not vice versa).
Given x Rn , a function f : Rn R is continuous at x if for every sequence
{xm } in Rn converging to x, we have f (xm ) f (x). It is continuous if it is
continuous at every x Rn ; equivalently, for every open set G R, the preimage
f 1 (G) is open. Note that the Euclidean norm ||x y|| in Rn is continuous.
More precisely, denote vectors in R2n by (x, y), where x, y Rn , and define
f : R2n R by f (x, y) = ||x y||; then f is continuous. We can combine
continuous functions to make new continuous functions. Let f, g : Rn R be
continuous at x Rn , let h : R R be continuous at f (x), and let , R.
Then:
1. f + g is continuous at x,
2. f g is continuous at x,
3. max{f (x), g(x)} is continuous at x,
4. h f is continuous at x.
If f : Rn R is continuous and K Rn is compact, then by Weierstrass
theorem, the image f (K) is compact, and thus there exists a maximizer of f
over K. By the intermediate value theorem, if f : Rn R is continuous and
Y Rn is connected, then the image f (Y ) is convex.
We can now revisit the separating hyperplane theorem and give more general
condition under which the separation result holds. Let two sets X, Y Rn be
5 By the Heine-Borel theorem, a set in Rn is closed and bounded if and only if every open
cover of the set has a finite subcover; and by the Bolzano-Weierstrass theorem, every sequence
in a set in Rn has a convergent subsequence with limit in the set if and only if every open
cover of the set has a finite subcover.
16
17
y1
Y
Y3
Y2
y2
y3
Note the difference between this open ball and the open balls defined above for
Euclidean space: this one is restricted to X. A sequence {xm } in X converges
in the relative topology to x X if for all > 0, there exists k such that for
all m k, we have xm BX (x). We can then define the notions of relative
boundary, interior, and closure just as we did above. For example, given Y X,
we say x is a boundary point of Y relative to X if for all > 0, we have
BX (x) Y 6= and BX (x) \ Y 6= . As the other definitions are structurally
identical to the originals, there is no need to repeat all of them. We may use
the notation bdX , intX , and closX to distinguish the relative versions of these
concepts.
We then say Y X is open relative to X (or open in the relative topology
on X, or open in X, or just relatively open) if Y is disjoint from its relative
boundary; equivalently, if for all x Y , there is some > 0 such that BX (x)
Y . A set Y is closed relative to X (or closed in the relative topology on X,
7 The
18
this is equivalent to the original definition, because U and V are open relative
to X if and only if there exist open sets U , V Rn such that U = U X and
V = V X. Thus, we do not define a concept of connectedness relative to X.
Given X Rn , the definition of a continuous function f : X R is nearly
identical to the original: given x X, a function f : X R is continuous at
x if for every sequence {xm } converging to x in X, we have f (xm ) f (x).
It is continuous if it is continuous at every x X; equivalently, for every
open G R, the preimage f 1 (G) is relatively open in X. Again, we can
combine continuous functions to make continuous functions: given continuous
f, g : X R, the functions f + g, f g, and max{f (x), g(x)} are continuous;
and compositions of continuous functions are continuous. Again, the image
of a compact set under a continuous function is compact; and the image of a
connected set under a continuous function is convex. Notions of convergence of a
sequence {fm } of functions fm : X R to f : X R extend straightforwardly:
the sequence converges uniformly if sup{|fm (x) f (x)| | x X} 0, and it
converges pointwise if for all x X, fm (x) f (x).
More generally, a vector-valued mapping f : X Rm is continuous if each
component function is continuous; equivalently, for every x X and every
sequence {xm } converging to x in X, f (xm ) f (x); equivalently, for every
open set G Rm , the preimage f 1 (G) is open relative to X. Again, we can
combine continuous functions to make continuous functions, and compositions
of continuous functions is continuous: if f : X Rm is continuous, if Y Rm
contains f (X), and if g : Y Rk is continuous, then g f is continuous. By
Weierstrass theorem, the image of a compact set under a continuous function
is compact; and by the intermediate value theorem, the image of a connected
set under a continuous function is connected. Given any finite set X Rn , note
that every subset Y X is open relative to X and compact, and every function
f : X R is continuous.
Given a set X Rn and a function f : X Rn , and element x is a fixed point of
f if f (x) = x. By Browers fixed point theorem, if X is nonempty, compact, and
convex, and if f is continuous with f (X) X, then the function has at least one
fixed point. There may, however, be multiple fixed points, e.g., f is the identity
function. A function f : X R is a contraction mapping on X if f (X) X
and there exists [0, 1) such that for all x, y X, ||f (x) f (y)|| ||x y||,
in which case is the modulus of the contraction. By the contraction mapping
theorem, if X is a nonempty, closed subset of Rn and f : X Rn is a contraction
mapping on X, then f has a unique fixed point.
Another variation on the usual Euclidean topology is the product topology,
P
which starts with subsets Xi Rni , i = 1, . . . , k, where we let n = ki=1 ni .
Qk
We then consider the product X = i=1 Xi , which consists of ordered k-tuples
1
k
i
x = (x , . . . , x ) with component x belonging to set Xi , i = 1, . . . , k. We define
20
k
Y
BrXi (xi ).
i=1
21
22
2
continuous. The converse does not hold. For example, define
R as
n f : [0, 1] o
|yz|
follows: if y = 0, then f (y, z) = 0; otherwise, f (y, z) = max 1 y , 0 . This
1 1
, m )} converges
function is continuous in each argument, but the sequence {( m
1 1
to zero, while f ( m , m ) = 1 6 0 = f (0, 0). Thus, continuity in each argument
separately does not imply joint continuity.
Differentiability
f (x + t) f (x)
,
23
Figure 2: Gradients
with a < b, if f : [a, b] R is differentiable on (a, b) and continuous at a and b,
then there is some x (a, b) such that
Df (x)
f (b) f (a)
.
ba
special notation
Dst f (x) =
Ds (Dt f )(x).
Di (Dj f )(x),
and we refer to this as the ijth cross partial derivative of f at x. The Hessian
of f at x is then the n n matrix of cross partial derivatives at x,
D11 f (x)
D12 f (x)
D1n f (x)
D21 f (x)
D22 f (x)
D2n f (x)
D2 f (x) =
.
..
..
..
.
.
.
Dn1 f (x)
Dn2 f (x)
Dnn f (x)
n X
n
X
Dij f (x)si tj .
i=1 j=1
D11 f (x)
D
21 f (x)
s1 s2 sn
..
Dn1 f (x)
D12 f (x)
D22 f (x)
..
.
D1n f (x)
D2n f (x)
..
.
Dn2 f (x)
Dnn f (x)
t1
t2
..
.
tn
lim Dg(xm ) =
lim Df (xm ) =
To see that (2) can only be stated for one direction, consider f : R R defined
by f (x) = x4 ; this function is strictly concave, but D2 f (0) = 0.
Given X Rn if f : X R is directionally differentiable at x int(X), and
if x is a local maximizer of f , i.e., there is an open set U Rn such that
f (x ) = maxxU f (x), then for every direction t, we have Dt f (x ) = 0. If,
in addition, f is twice directionally differentiable, then Dt2 f (x ) 0 for every
direction t. If, in addition, f is C 2 , then D2 f (x ) is negative semi-definite.
Conversely, assuming f is C 2 and x int(Y ), if Df (x ) = 0 and D2 f (x ) is
negative definite, then x is a strict local maximizer of f , i.e., there exists open
G Y such that x G and for all x G \ {x }, f (x ) > f (x). Assuming
X Rn is convex and f : X R is C 1 and concave, if x int(X) and
Df (x ) = 0, then x is a maximizer of f .
26
Lebesgue Measure
S
.
(Ri )
(Y ) = inf
rectangles with Y m
i=1 Ri
i=1
= (Y Z) + (Y Z).
That is, we can use Y to break up any Z and compute the size of Z in each part
separately, as in Figure 3 where a (curvy) measurable set is used to partition an
arbitrary (jagged) set.10 Letting Ln denote the Lebesgue measurable subsets of
10 In some treatments, the term measurable applied to a set may mean Borel measurable.
The class of Borel sets is the smallest collection of sets that contains all open sets and is closed
with respect to complements and countable unions. The class is included among the Lebesgue
measurable sets, and in fact the Lebesgue measurable sets are the completion of the Borel sets
with respect to Lebesgue measure. There are Lebesgue measurable sets that are not Borel
measurable. More general treatments may define measurability with respect to an abstract
-algebra. These considerations will come up from time to time in footnotes or the appendix,
but they are mostly outside the scope of this survey.
27
11 The definition of completeness differs from Definition 10.34 of Aliprantis and Border
(2006). Equivalence of the definitions for -finite measures follows from Theorems 4.2 and
4.3, and discussion on p.84, in Kingman and Taylor (1966) Introduction to Measure and
Probability, Cambridge Press: New York, NY.
28
X
[
(Yi ).
=
Yi
i=1
i=1
X
[
(Yi ).
Yi
i=1
i=1
29
X
\
(Yi ).
(Yj )
Yi
i=1
i=1
(Y I[0,1] ) + |Y Q[0,1] |,
where || denotes the number (possibly infinite) of rationals between zero and one
belonging to Y . This measure is -finite, but it is not outer regular: every open
set U containing I[0,1] contains Q[0,1] (and infinite number of rational numbers),
12 These results follow from Theorem 12.5 in Aliprantis and Border (2006), except that the
latter result concerns approximation of Borel sets. By Theorem 10.23 (part 6) of Aliprantis
and Border (2006), given measurable Y , there is a Borel set B Y with (B) = (Y ),
so Y can be approximated from above by open sets. Note as well that there is a Borel set
C (Y ) with (C) = (Y ), and therefore C Y is a Borel set with (C) = (Y ), so Y can
be approximated from below by compact sets.
13 See Theorem 4.5 of Kingman and Taylor (1966) Introduction to Measure and Probability,
Cambridge Press: New York, NY.
30
S
and if (
m=n Ym ) < for some n, then
lim sup (Yn ).
lim sup Yn
n
31
Differential Topology
We now briefly study the differentiable structure of sets in Rn , and for this
it is necessary to consider in more detail functions f : U Rm defined on
an open set U Rn . The values of such a function can be decomposed as
f (x) = (f1 (x), . . . , fm (x)), where each component function satisfies fi : U R,
i = 1, . . . , m. Then f : U Rm is r-times continuously partially differentiable,
or C r , if each component function fi is C r . In this case, the derivative of f at
x U is a m n matrix, the Jacobian matrix of f at x,
Df (x) =
,
..
..
..
.
.
.
D1 fm (x)
D2 fm (x)
Dn fm (x)
Df j (x ) =
Dfxj (xj ) =
32
Dj f1 (x )
Dj f2 (x )
..
.
Dj fm (x )
Df2 (x )
Df 2 (x )
Df1 (x )
x1 = x1
f 2 = c2
Df 1 (x )
x2 = x2
f 1 = c1
{x} + Tx M
f
M
Tx M
0,
the set of solutions near x has a manifold structure, and each equation reduces
the dimensionality of the solutions near x by one. This justifies pictures, as
in Figure 2, that depict the level sets of differentiable functions f : R2 R as
one-dimensional, smooth curves: by the preimage theorem, the level set of f
at a value c will have these properties as long as the gradient of f is non-zero
everywhere on the level set. To see that the result holds only for regular values,
define f : R2 R by f (x, y) = x2 y 2 , and note that the level set of f at zero
is not smooth.
Again, consider an open set U Rn and a C r function f : U Rm . By Sards
theorem, if r > max{n m, 0}, then the set of critical values of f is measurable and has Lebesgue measure zero. An implication is that the set of regular
35
{x U | f (x, p) = 0}
36
x1
x0
small variations of (x, p), so that Df (x, p) has full row rank, then even if the
set of solutions (x1 , . . . , xn ) to the equations
f1 (x1 , . . . , xn , p) =
..
.
fm (x1 , . . . , xn , p) =
has poor structure, almost all perturbations p of the system will produce a set
of solutions with a manifold structure.
It is noteworthy that the full rank condition in the transversality theorem allows
variation of the parameter p as well as x; thus, if the parameterization of f is
sufficiently rich, the condition should hold. An application of interest is the case
in which m > n. Then, under the assumptions of the theorem, for almost all
p P , the set of solutions to f (x, p) = 0 has dimension n m < 0, which means
that the set is empty: there are no solutions to the system. This is depicted
in Figure 7 for the case n = k = 1 and m = 2. Here, we write Dx f (x , p )
for the first column of the Jacobian of f (corresponding to the variable x)
and Dp f (x , p ) for the second column (corresponding to p), and clearly these
columns have rank two, so the Jacobian of f has full row rank at (x , p ).
Assuming the rank condition does not fail at any other solutions to f (x, p) = 0,
zero is a regular value of f , and the transversality theorem applies. Although
x does solve the system of equations given parameter p , an increase of p to,
say, p , shifts the trajectory of f given p (the solid line) to the trajectory given
p (the dashed line), and there is no solution.
For a very simple application of the transversality theorem, let U Rn be open
37
x = x
Df p (x , p )
p = p
Df x (x , p )
p = p
p = p
Measurable Functions
Implications are that level sets of a measurable function are measurable, and every continuous function with measurable domain is measurable.16 A measurable
function is sometimes referred to as a random variable.
Measurable functions can be combined in a number of ways to produce new
measurable functions. Let f, g : X R be measurable, let {fm } be a sequence
of measurable functions fm : X R, m = 1, 2, . . ., and let , R. Then:17
1. f + g is measurable,
2. f g is measurable,
3. max{f (x), g(x)} is measurable as a function of x,
4. if h : X R satisfies fm (x) h(x) for all x X, then h is measurable.
39
x X
f (x) <
lim
m
m
m
m
k=1
R
which may be infinite. See Figure 8. If f (x)dx < , then f is Lebesgue
integrable (or simply integrable). For general measurable f (taking possibly
negative values), define
f + (x) = max{f (x), 0}
and
Rx
a
DG(x)dx
x X
f (x) <
m
m
m
m
lim
k=1
R
which may be infinite. If f (x)(dx) < , then f is integrable with respect to
(or -integrable). For general measurable f : X R, if either f + or f are
integrable with respect to , then the integral of f with respect to is
Z
Z
Z
+
f (x)(dx) =
f (x)(dx) f (x)(dx).
If both f + or f are -integrable, then f is -integrable. Say f is essentially bounded with respect to (or -bounded ) if there exists c > 0 such that
({x X | f (x) > c}) = 0. If
R is finite and Rf is -bounded, then f is for f (x)IXY (x)(dx). If is a
integrable. As before, we write Y f (x)(dx)
R
probability measure, then the integral
f
(x)(dx)
R
R is called the expected value
of f , and the vector of integrals ( x1 (dx), . . . , xn (dx)) is the expected value
(or mean) of x.
Let be a measure on Rn , let X Rn be measurable, and let f, g : X R be
measurable. Then:
R
1. if f is non-negative, then f (x)(dx) = 0 if and only if f (x) = 0 for
-almost all x,
41
R
R
2. if
R either f or g is -integrable, then [f (x) + g(x)](dx) = f (x)(dx) +
g(x)(dx),
R
R
3. for all R, f (x)(dx) = f (x)(dx),
R
R
4. if f (x) g(x) for -almost all x X, then f (x)(dx) g(x)(dx).
Note that property 2 implies that if A and B are disjoint measurable sets, then
Z
Z
Z
f (x)(dx) =
f (x)(dx) +
f (x)(dx).
AB
|f (x)|(dx).
and ignoring the subset of the domain on which it takes infinite values, the
integrand on the left-hand side, lim sup fm (x), is a -integrable function of x.
More precisely, if we define f (x) = 0 when lim supm fm (x) = and f (x) =
lim supm fm (x) otherwise, the function f is integrable. The assumption that
{fm } is dominated by integrable g can be weakened somewhat: it suffices to
assume that each fm is dominated by a -integrable gm : X R, and that there
is a -integrable
function gR: X R satisfying gm (x) g(x) for -almost all
R
x X and gm (x)(x) g(x)(dx).
An easy consequence of Fatous lemma is the following. Let {fm } be a sequence of -integrable functions f : X R, m = 1, 2, . . .. Assume that for
-almost Rall x X, the sequence {fm (x)} is increasing, and assume that
limm fm (x)(dx) < . Then by Levis monotone convergence theorem,
there exists a -integrable
R function f : X
R R such that fm (x) f (x) for
-almost all x X and fm (x)(dx) f (x)(dx).
This result has useful implications for integrals of parameterized functions. Let
X Rn be measurable, let P Rk , and consider any bounded function
f : X P R, where we view p P as a parameter. Assume that for all
x X, the function fx : P R defined by fx (p) = f (x, p) is continuous; and
assume that for all p P , the function fp : X R defined by fp (x) = f (x, p)
is measurable. Suppose further that there is a -integrable function g : X R
such that for -almost all x and for all p, |f (x, p)| g(x). Let {pm } be a
sequence of parameters
convergingRto p in P . Then the parameterized inteR
grals
converge:
f
(x,
p
)(dx) f (x, p)(dx). In other words, the integral
m
R
f (x, p)(dx) is a continuous function of the parameter p. While these limiting
results are stated for a fixed measure , in Section 10 we allow the measures to
vary in a continuous way along the sequence.
Given a measure on Rn , a density function for is any measurable function
f : Rn R R with non-negative values such that for all measurable Y Rn ,
we have RY f (x)dx = (Y ). If f is a density function and is a probability
measure, f (x)dx = 1, then it is a probability density function for , but
sometimes the term density function is used to refer to a probability density
function. A measure on Rn is absolutely continuous (with respect to Lebesgue
measure) if for all measurable Y , (Y ) = 0 implies (Y ) = 0. For example, any
measure defined by a density function is absolutely continuous with respect to
Lebesgue measure; for a counterexample, the counting measure is not absolutely
continuous. The Radon-Nikodym theorem states that if a measure on Rn is
finite and absolutely continuous with respect to Lebesgue measure, then it has
a density that is unique up to sets of measure zero, i.e., if f and g are both
densities for , then f (x) = g(x) for almost all x.
More generally, given two measures and on Rn , say is absolutely continuous
with respect to , written , if for all measurable Y Rn , (Y ) = 0 implies
(Y ) = 0. The Radon-Nikodym theorem establishes that if is finite and
is -finite, then there is a measurable function f : Rn RR with non-negative
values such that for all measurable Y , we have (Y ) = f (x)(dx), where f
is the density of with respect to . Conversely, starting with an arbitrary
measurable mapping f : Rn R with non-negative values and any measure
on Rn , we can define a mapping from measurable subsets Y R Rn to the
extended real numbers (possibly including infinity) by (Y ) = Y f (x)(dx).
Then , so-defined, is a measure on Rn ; it is absolutely continuous with respect
to ; and f is a density for with respect to .
Given a probability measure , the expected value of a concave function f is
less than or equal to the value of the function at the mean of x. To be more
precise, let X R be measurable and convex, let be a probability measure
on R with (X) = 1, and let f : X R be concave; then f is measurable (in
fact, it is directionally differentiable at -almost all x int(X)), and Jensens
43
Z
x(dx)
f (x)(dx).
f (x)(dx),
again with strict inequality if f is strictly concave and is not degenerate on
some x.
We can define measurability and integration for vector-valued functions as well.
Given measurable X Rn , a mapping f : X Rm with values f (x) =
(f1 (x), . . . , fm (x)) is measurable if each component mapping fi : X R is measurable, i = 1, . . . , m; equivalently, f is measurable if for all open G Rm , the
inverse image f 1 (G) is measurable. For a measure on Rn , the integral of the
vector-valued function is just the vector of integrals of its components,
Z
Z
Z
f (x)(dx) =
f1 (x)(dx), . . . , fm (x)(dx) ,
and f is -integrable if each component fi is -integrable,
R i = 1, . . . , m. When
is a probability measure, the expected value of f is just f (x)(dx), permitting
a more compact statement of Jensens inequality:
given measurable
and convex
R
R
X Rn and f : X R concave, we have f ( x(dx)) f (x)(dx).
A set K of measurable functions f : X Rm is uniformly -integrable if for all
> 0, there exists > R0 such that for all f K and all measurable Y X
with (Y ) < , we have Y ||f (x)||(dx) < . In other words, for every sequence
{Yk } of measurable subsets with (Yk ) 0, we have
Z
sup
||f (x)||(dx) | f K 0
Yk
functions
are dominated by a -integrable function g : X R, and assume that
R
{ f k (x)(dx)} is a convergent sequence. Then there is a -integrable function
f : X Rm such that:19
(i) for -almost all x X, f (x) ls({f k (x)}),
R
R
(ii) f (x)(dx) = limk f k (x)(dx).
Thus, we can select from the pointwise accumulation points ofR {f k (x)} for almost all x in a way that preserves the limit of the integrals { f k (x)(dx)}.
10
Convergence of Measures
R,
we
have
f
(x)
m (dx)
R
f (x)(dx). And third, m weakly if and only if any of the following
conditions hold:
19 See
45
f dn
1
2
k
and so on. In general, given m, define intervals Im
, . . . , Im
so that Im
=
m
m
[(k 1)/2S , k/2 ], and let fm : [0, 1] R be two times the indicator function
k
of Jm = {Im
| k = 1, . . . , 2m 1, odd }. Note that the Lebesgue measure of
each Jm is one half. As depicted in Figure 9, these functions have an increasingly
jagged, sawtooth appearance (I will refer to this useful
R example several times
in the sequel). Defining the measure m by m (Y ) = fm (x)dx for all measurable Y R, it can be shown that the sequence {m } converges set-wise to the
uniform distribution, i.e., the measure given by the density f = I[0,1] . But then
Z
Z
1
[2 1]dx =
m (Jm ) (Jm ) = [fm (x) f (x)]dx =
2
Jm
f (x)(dx)
lim sup
m
fm (x)m (dx),
fm (x)m (dx)
f (x)(dx) < .
Of course, if the sequence {fm } is uniformly bounded, in the sense that sup{f (x) |
x X, m N} < , then the antecedent is automatically satisfied, and we obtain convergence of the integrals.
This result has obvious implications for integrals of parameterized functions.
Let X Rn be measurable, let P Rk , and consider any bounded function
21 See Theorem 15.22 of Aliprantis and Border (2006). Their result is stated for probability
measures defined on the Borel -algebra, but we can apply it to the sequence of probability
measures restricted to the Borel sets.
22 See Proposition 17, p.269, of Royden (2008). Royden considers the liminf of integrals of
non-negative functions, so we apply his result to {gm fm } and g f .
23 See Proposition 18, p.270, of Royden (2008).
47
The latter formulation allows for a variable integrating measure, but it assumes
that the sequence {m } converges to set-wise. We can relax this to weak convergence if we impose correspondingly stricter requirements on the sequence of
functions. Let {m } be a sequence of probability measures on Rn that converge
weakly to the probability measure on Rn , let X Rn be measurable, let F
be a set of measurable functions f : X R that is uniformly bounded, in the
generalized sense that sup{f (x) | x X, f F } < . Furthermore, assume
that for every x, the function sx : X R defined by
sup{|f (x) f (y)| | f F }
R
R
is continuous. Then implies that f (x)m (dx) f (x)(dx) uniformly in f ,
i.e.,
R for all > 0, Rthere exists k such that for all m k and all f F , we have
| f (x)m (dx) f (x)(dx)| < .24 Now let {fm } be a uniformly bounded sequence of functions fm : X R converging
function
R pointwise to the continuous
R
f : X R. In the appendix, I show that fm (x)m (dx) f (x)(dx), which
provides a version of Lebesgues dominated convergence theorem that allows the
integrating probability measure to vary in a weakly continuous way.
sx (y) =
The preceding formulation of the dominated convergence theorem has the following implication for parameterized integrals. Let X Rn be measurable, let
P Rk , and consider any bounded function f : X P R. Assume that for
all x X, the function fx : P R defined by fx (p) = f (x, p) is continuous; and
assume that for all p P , the function fp : X R defined by fp (x) = f (x, p) is
measurable. Let {m } be a sequence of probability measures on Rn converging
weakly to , and let {pm } be a sequence of parameters converging to p in P
such
that f (x, p) is continuous
in x. Then the parameterized integrals converge:
R
R
f (x, pm )m (dx) f (x, p)(dx). In comparison with the previous convergence result, the advantage is that we allow weak convergence of the integrating
measure, but the disadvantage is that we assume the limiting function f (x, p)
is continuous in x, and we focus on probability measures.
11
Convergence of Functions
48
49
that
if fm f inR measure, then the sequence converges in mean as long as
R
|fm (x)|(dx) |f (x)|(dx) (and |fm | + |f | is -integrable for all m). Convergence in measure implies that a subsequence of {fm } converges to f pointwise
almost everywhere; in fact, assuming is finite, a sequence {fm } converges to
f in measure if and only if every subsequence of {fm } itself possesses a subsequence that converges pointwise almost everywhere to f . In the above diagram,
dashed arrows indicate that the direction of implication holds for a subsequence
of functions.
It may also be useful to consider counterexamples for some unstated directions of implication. That pointwise almost everywhere convergence does not
imply uniform almost everywhere can be seen from the sequence of functions
fm : [0, 1] R defined by fm (x) = max{1 mx, 0}. To see that convergence
in pth mean and convergence in measure do not imply pointwise convergence,
let f1 = I[0,1/2] , f2 = I[1/2,1] , f3 = I[0,1/4] , f4 = I[1/4,1/2] , etc. In general, to define
Pk fm for m 2, let k be the greatest nonnegative integer such
that n = =1 2 < m, and let fm be the indicator function of the interval
[ mn1
, mn
]. These functions cycle continuously through the unit interval
2k+1
2k+1
1
with ever smaller support (i.e., the Lebesgue measure of fm
({1}) becomes arbitrarily small). Thus, they converge to the function f = 0 in pth mean and
in measure, but they do not converge pointwise almost everywhere. To see
Rthat convergence inR measure does not imply convergence in pth mean unless
|fm (x)|(dx) |f (x)|(dx), let fm = mI[1 m1 ,1] , so that {fm } converges
1
to f R= 0 in measure; indeed, ({x R | |fm (x) f (x)| > 0}) = m
0.
p
But |fm (x)| dx = 1 for all m, so it does not converge in pth mean for any
p [1, ). Finally, to see that pointwise almost everywhere convergence does
not imply convergence in measure unless is finite, define fm : R R by
fm = I[m1,m] for all m, and note that {fm } converges pointwise to f = 0,
but ({x R | |fm (x) f (x)| 1}) = 1 for all m, so it does not converge in
measure. See Figure 10.
To see that convergence in pth mean does not generally imply convergence in
qth mean when p > q unless is finite, note that for arbitrary c > 0, we can
c
. Thus, for each
choose y > 0 sufficiently close to zero such that ln(y) < pq
m, we can choose ym > 0 satisfying q ln(ym ) > ln(m) + p ln(ym ), and then we
q
can choose m = 1/ym
> 0, which implies
q ln(ym ) = ln(m ) > ln(m) + p ln(ym ),
or equivalently, m (ym )q = 1 > mm (ym )p . Now define f1 = y1 I[0,1 ] as ym
times the indicator function of [0, 1 ], define f2 = y2 I[1 ,1 +2 ] , etc. In general,
define fm = ym I[m1 ,m1 +m ] , and let f = 0 be identically zero. Then
Z
1
|fm (x) f (x)|p dx = m (ym )p <
,
m
50
pointwise a.e.
in measure
pointwise a.e.
in measure
does not imply
pointwise a.e.
in pth mean
in measure
51
m (ym )q = 1,
Rnk , and let fk : X k R be a measurable function; and let X Rn be measurable, let be a measure on Rn , and let f : X Rm be a measurable function.
Assume that f , along withR each fk , satisfies the absolute continuity condition,
and assume that lim supk |fk (x)k (dx)| < .R If the sequence {f
R k } converges
in distribution to f , then f is -integrable, and fk (x)k (dx) f (x)(dx).26
12
Product Measurability
54
X
[
( )(Ai ).
=
Ai
( )
i=1
i=1
55
56
Not all measures on a product space are product measures. But any measure
on Rn = R Rm determines a measure on each factor. The marginal measure
of on R is the measure defined by (Y ) = (Y Rm ) for all measurable
Y R . Similarly, the marginal on Rm is defined by m (Z) = (R Z) for
all measurable Z Rm . Assuming they are -finite and absolutely continuous
with respect to Lebesgue measure, these marginal measures determine a product
measure, m , on Rn , but it will not be the case that = m unless the
initial measure was indeed a product measure. In that case, if = , then
the marginals will coincide with the factors in the product: = and m = .
We can characterize weak convergence of product measures in terms of convergence of marginals. Consider a sequence { k } of measures on Rn = R Rm
converging weakly to the measure . Then the sequence {k } of marginals on
R converges weakly to the marginal , and similarly for the marginals on Rm .
Indeed, consider any Y R with -measure zero boundary in R , and note
that bd(Y Rm ) = bd(Y ) Rm , which has -measure zero in Rn . Then
33 See
34 See
k (Y ) = k (Y Rm ) (Y Rm ) = (Y ).
57
The converse holds for product measures: letting k = k k for all k and
= , if k weakly and k weakly, then k weakly.35
13
Metric Spaces
58
li({Ym }) =
xX
for all > 0, there exists n such that
.
for all m n, Ym B (x) 6=
The former consists of limits of all subsequences of elements from the sets {Ym },
and the latter consists of limits of all sequences of elements from the sets. Both
sets are closed. For the case of a sequence {xm }, which can be viewed as a
sequence of singleton sets Ym = {xm }, an element of the set ls({xm }) is referred
to as an accumulation point of the sequence.
A set Y X is bounded if it is contained in a ball Br (x) for some r > 0 and
some x X. A set Y X is compact if every sequence in Y has a subsequence
that converges to an element of Y ; equivalently, Y is compact if and only if
every open cover of Y has a finite subcover; and Y is compact if and only
if every collection of closed subsets of Y with the finite intersection property
has nonempty intersection. As before, the empty set and all finite sets are
compact; finite unions of compact sets are compact; and arbitrary intersections
of compact sets are compact. Every compact set is closed and bounded, but
the converse need not hold. For example, if X = (0, 1) is the open unit interval
with the Euclidean metric, it is closed when viewed as a metric space and is
bounded, but it is not compact. For a deeper example (one that is essentially
infinite-dimensional), note that the sawtooth sequence in Figure 9 belongs to
the closure of the unit ball around zero in the space of bounded functions defined
on [0, 1] (in the supremum metric), but it has no convergent subsequence. For a
1
x)} belongs to the closure of the unit
continuous example, the sequence {cos( 2n
ball around zero in the space of bounded, continuous functions defined on [0, 1]
(in the supremum metric) but also has no convergent subsequence. Every closed
subset of a compact subset is compact. As usual, a set Y X is connected if
36 We revert to subscripting indices, as metric spaces do not generally possess a vector
structure; thus, we cannot interpret xm as the value of the mth coordinate of x.
59
60
g(p) = x.37
We can also generalize the definition of measurability to mappings taking values
in a metric space. Given a measurable set X Rn and metric space Y , a
function f : X Y is measurable if for every open G Y , the pre-image
f 1 (G) is measurable. Thus, if f is continuous, then it is measurable.
Given a metric space X and a function f : X X, an element x is a fixed
point of f if f (x) = x. A function f : X X is a contraction mapping on X
if there exists [0, 1) such that for all x, y X, we have (f (x), f (y))
(x, y), in which case is the modulus of the contraction. By the contraction
mapping theorem, if X is a nonempty, complete metric space and f : X X is
a contraction mapping, then f has a unique fixed point.
A mixture space is a set X for which a mapping : X X [0, 1] X is
defined, where we write x + (1 )y for (x, y, ), and possesses certain intuitive properties; intuitively, the mapping acts just like the usual idea of
convex combination of vectors in Euclidean
Pmspace. Since convex combinations
are independent of order, we can write i=1 i xi without ambiguity.38 Following the analysis of Rn , we say x is a convex combination of m elements
x1 , . .P
. , xm if there exist non-negative coefficients that sum to one and such that
m
x = i=1 i xi . A subset Y X of a metric mixture space is convex if for all
x, y Y and all (0, 1), we have x + (1 )y. And given any A X, the
convex hull of A is
I N is finite, xi Y for
X
,
conv(Y ) =
i xi all i I, P
i 0 for all i I,
and iI i = 1
iI
which consists of all convex combinations of all finite subsets of Y . As before,
A is convex if and only if A = conv(A), so that the convex hull of a set is itself
convex, and in fact it is the intersection of all convex supersets of A.
The set X is a metric mixture space if X is a metric space (with metric ), and
the mapping is continuous with respect to (x, y, ), i.e., for all (x, y, ) and
all sequences {(xm , ym , m )} with xm x, ym y, and m , we have
(m xm + (1 m )ym , x + (1 )y) 0. The metric is quasi-convex if for
all x, y, z, w X and all [0, 1],
(x + (1 )z, y + (1 )w)
37 See Theorem 9.3 of Loomis and Sternberg (1968) Advanced Calculus, Addison-Wesley
Publishing: Reading, MA.
38 To be more precise, it must be that for all x, y, z X and all , , [0, 1] with
+ + = 1, we have (i) 1x + 0y = x, (ii) x + (1 )y = (1 )y + x, and
(iii) x + (1 )
y+
z = y + (1 )
x+
z .
1
1
1
1
61
I show in the appendix that in a metric mixture space, the convex hull of a finite
set is compact; and assuming X is complete, the closure of the convex hull of
each compact subset is compact.
In contrast to Rn , the latter result is not true if stated without taking the closure
of the convex hull: in general metric spaces, there are examples of compact sets
with non-compact convex hulls. Let X RN be the metric mixture
Pspace of
summable sequences x = (x1 , x2 , . . .), so that for all x P
X, we have i=1 |xi | <
k
. We endow this space with the metric (x, y) =
i=1 |xi yi |. Let x =
k
k
(0, . . . , 0, 1/2 , 0, . . .) be the sequence with 1/2 in the kth coordinate and zeroes
elsewhere, and note that the sequence {xk } converges to the sequence with all
zero entries, i.e., (xk , 0) 0, so the union Y = {xk | k = 1, 2, . . .} {0} is
compact, and every x conv(Y ) is such that xi = 0 for all but finitely many
1
coordinates. For each k, define the sequence y k X by y k = ( 14 , 16
, . . . , 212k ),
and note that
!
k
X
1
1
1 1 1 2
k
k
0,
x + x + + k x + 1
y
=
2
4
2
2i
i=1
1
1
so y k conv(Y ). But then {y k } converges to the limit y = ( 14 , 16
, 64
, . . .) with
2i
ith coordinate yi = 1/2 , which has positive entries in all coordinates, and
therefore y
/ conv(Y ). Thus, conv(Y ) is not closed, let alone compact.
14
Next are some examples of useful metric spaces. Of note, we will see that all
but the pointwise (and set-wise) convergence concepts and weak (and weak*)
convergence are metrizable, in the sense that we can define a metric on the
appropriate space (of functions or measures) such that a sequence in the space
converges to some limit if and only if the distance between the elements of the
sequence and the limit point goes to zero.
1. Countable Cartesian products. Returning to the P
definition of the
k
product topology, let Xi Rni for i = 1, . . . , k with n =
i=1 ni , and let
Qk
X = i=1 Xi be the product space. We can define the product metric on
this space by
(x, y)
k
X
i=1
||xi y i ||i ,
where || ||i denotes the Euclidean norm in Rni . With this metric, a sequence
{xm } in X converges to x X if and only if it converges in each component,
i.e., letting xm = (x1m , . . . , xkm ) and x = (x1 , . . . , xk ), we have xim xi for
each i = 1, . . . , k. That is, (xm , x) 0 if and only if {xm } converges to
x in the product topology, showing that convergence in the product topology
is metrizable; but we knew that anyway, because a sequence converges in the
product topology on the finite product X if and only if it converges in the space
Rn in which X is contained.
This extends straightforwardly to the product of a countably infinite Q
number
k=1
k
1
2k+1 rk
||xk y k ||k .
2rk , so (x, y)
k=1 2k < , so this metric is well-defined. With this
metric, a sequence {xm } converges to x in X if and only if it converges in
each component, i.e., letting xm = (x1m , x2m , . . .) and x = (x1 , x2 , . . .), we have
xim xi for each i = 1, 2, . . .. Whether the product is finite or countably
infinite, X is separable; if each Xi is closed, then X is closed and complete; and
by Tychonoff s product theorem, if each Xi is compact, then X is compact.
We focus on the more interesting case of an infinite product set. If each Xi
is a bounded, convex subset of Rni , then X is a metric mixture space: given
63
X
1
=
||(xk z k ) + (1 )(y k wk )||k
2k+1 rk
k=1
k=1
1
||xk z k ||k + (1 )||y k wk ||k
2k+1 rk
Combining the metric version of Schauders theorem with previous results, assume each Xi is a nonempty, compact, convex subset of Rni , and equip the product space X with the product metric; then every continuous function f : X X
has at least one fixed point.
The above choice of metric for the infinite product space relied on the assumption that all (or all but finitely many) component sets Xi are bounded. In
general, when some Xi are not bounded, we can define a version of the product
metric as follows:
(x, y)
X
1 ||xk y k ||k
.
2k 1 + ||xk y k ||k
k=1
With this metric, a sequence {xm } again converges to x in X if and only if it converges in each component, i.e., letting xm = (x1m , x2m , . . .) and x = (x1 , x2 , . . .),
we have xim xi for each i = 1, 2, . . .. Again, X is separable; if each Xi is
closed, then X is closed and complete; and by Tychonoff s product theorem, if
each Xi is compact, then X is compact. In contrast to the initial definition of
the product metric, however, the latter definition may not be convex.
The above concepts can be extended to products of metric spaces.
Q Let Xi be a
metric space with metric i . Define the product space X = i=1 Xi with the
product metric
(x, y)
X
1
(x , y ),
kr i i i
2
i
i=1
Combining the metric version of Schauders theorem with previous results, assume X Rn is nonempty and compact, and let K Cb (X) be a nonempty,
convex, bounded, closed, equicontinuous set; then every continuous function
f : K K has at least one fixed point.
The analysis of the space of continuous functions can be extended to metric
spaces. If X is a compact metric space, then the space Cb (X) consists of all
bounded, continuous mappings f : X R; and equipped with the supremum
65
Z
p1
|f (x) g(x)|p (dx)
.
Let L (X, ) consist of all functions f : X R that are -bounded, and define
as follows: given f, g L (X, ), let
(f, g) =
If is finite and f, g L (X, ), then limp p (f, g) = (f, g). Furthermore, if is finite, then p < q implies Lq (X, ) Lp (X, ).
Next are two fundamental inequalities for (equivalence classes of) measurable
functions. Recall that p and q are conjugate if p, q (1, ) and 1p + 1q = 1. Then
H
olders inequality is that if p and q are conjugate, then for all f Lp (X, )
and all g Lq (X, ), we have
Z
|f (x)g(x)|(dx)
Z
p1 Z
1q
q
|f (x)| (dx)
.
|g(x)| (dx)
p
+
.
|f (x)|p (dx)
|g(x)|p (dx)
Analogously, if f,R g L (X, ), and if |f (x)| c and |g(x)| d for -almost
all x X, then |f (x) + g(x)|(dx) c + d. An implication of Minkowskis
inequality is that any linear combination of functions in Lp (X, ) also belongs
to Lp (X, ). Another implication is that for all f, g, h Lp (X, ), we have
p (f, g) p (f, h) + p (h, g), delivering the triangle inequality and confirming
that p is in fact a metric.
The spaces Lp (X, ), with p [1, ) or p = , are complete when equipped
with the metric p . For p [1, ), if is finite, then Lp (X, ) is separable,
but L (X, ) is separable if and only if has finite support, a very restrictive
condition. For the case of X = Rn , Lebesgue measure, and p [1, ), the
Kolmogorov-Riesz compactness theorem states that a subset K Lp (Rn , ) is
compact if and only if all of the following hold:39
(i) K is closed and bounded,
(ii) for every Rsequence {fm } in K and every sequence {xm } in Rn with xm 0,
we have |fm (xm + y) fm (y)|p dy 0,
(iii) for
R every sequence {fpm } in K and for every sequence rm , we have
Rn \Br (0) (y)|fm (y)| dy 0.
m
67
where Y = {x X| there exists y Y with ||x y|| < }. The collection of all
such open balls and arbitrary unions of open balls is sometimes called the weak*
topology.41 A sequence {m } of probability measures in P(X) converges to
P(X) if and only if for all measurable Y Rn with (bd(Y )) = 0, we have
40 See
41 A
68
The measure of distance between sets we use is the Hausdorff metric, defined
Borel sets. Although the Lebesgue measurable sets include some sets that are not Borel, this
discrepancy does not affect this survey, because the Prohorov distance between two measures
on Rn equals the distance between the restrictions of the measures to the Borel -algebra.
Indeed, for every measurable Y , Theorem 10.23 (part 7) yields a -measure zero set B such
that Y B is Borel and a -measure zero set C such that Y C is Borel; then Y (B C)
is Borel, and (Y (B C)) ((Y (B C)) ) + implies (Y ) (Y ) + . See Section
18 for a general treatment of topological spaces.
42 The above observations hold also when X is a complete and separable metric space,
but then we must define probability measures on the Borel measurable sets. With that
modification, P(X) equipped with the Prohorov metric is complete and separable; and if in
addition X is compact, then so is P(X).
43 See the discussion on p.161 of Dunford and Schwartz (1958).
69
zZ yY
and a set G K(X) is open if for all Y G, there exists > 0 such that
B (Y ) G. Here, of course, the open ball Br (Y ) is a set of sets: it consists
of compact sets that are, in a sense, close to Y . Convergence in the Hausdorff
metric has a simple characterization when X is compact.
Indeed, assuming X is compact, let {Ym } be a sequence of sets in K(X), and
let Y K(X). Then the following conditions in conjunction are necessary and
sufficient for h (Ym , Y ) 0:
(i) for every subsequence {Ymk } in K(X) and every x X, if xmk Ymk for
all k and xmk x, then x Y ,
(ii) for all x Y , there is a sequence {xm } in X such that xm Ym for all m
and xm x.
In other words, {Ym } converges to Y in the Hausdorff metric if and only if
ls({Ym }) = li({Ym }). Conditions (i) and (ii) are together referred to as closed
convergence of the sequence of sets. Interesting properties of the metric space
K(X) equipped with the Hausdorff metric are that it is separable; if X is closed,
then K(X) is complete; and if X is compact, then so is K(X). When X Rn
is convex, K(X) is also convex: given Y, Z K(X) and [0, 1], Y + Z is a
nonempty, compact subset of X. In fact, I show in the appendix that K(X) is a
metric mixture space and that the Hausdorff metric is quasi-convex. Combining
the metric version of Schauders theorem with previous results, assume X Rn
is nonempty, compact, and convex; then every continuous function f : K(X)
K(X) has at least one fixed point.
The above concepts can be extended to subsets of a metric space X with metric
, defining Hausdorff distance using the metric on X. Again, let K(X) be the
collection of nonempty, compact subsets of X, and define the Hausdorff metric
as
h (Y, Z) = max max min (y, z), max min (y, z) .
yY zZ
zZ yY
70
15
Transition Probabilities
The transition probability satisfies the Feller property if for every bounded,
continuous
function f : Y R, the function g : Z R defined by g(z) =
R
f (y)(dy|z) is continuous, rather than only measurable. Given , we can
define the mapping P : Z P(Y ) by P (z) = (|z). Then, in fact, satisfies
the Feller property if and only if P is continuous with the Prohorov metric on
P(Y ).45 Thus, if zk z in Z, then (|zk ) (|z) weakly. Therefore, by a
version of Lebesgues dominated convergence theorem in Section 10, if {f k } is
a uniformly bounded sequence of functions f k : Y R converging
R k pointwise to
the
continuous
function
f
:
Y
R,
then
z
z
in
Z
implies
f (y)(dy|zk )
k
R
f (y)(dy|z).
Because the Feller property is quite useful, we note the following sufficient condition. Let h : Y Z R be a density for that is Caratheodory, i.e., the
function hz : Y R defined by hz (y) = h(y, z) is continuous for all z, and the
function hy : Z R defined by hy (z) = h(y, z) is measurable for all y. And
assume that the set {hz : z Z} is dominated by the -integrable function
g : Y R. Let zk z in Z, and consider any bounded, continuous functions
f : Y R. By Lebesgues dominated convergence theorem, we have
Z
Z
Z
Z
f (y)(dy|zk ) =
f (y)h(y, zk )dy
f (y)h(y, z)dy =
f (y)(dy|z),
which establishes the Feller property.
71
48
R =
i=1 Yi and for all i = 1, 2, . . ., sup{(Yi |z) | z Z} < .
Maintaining that is a transition probability and that and (|z) are absolutely continuous with respect to Lebesgue measure for all z, let f : Y Z R
be measurable. Then a general form of Fubinis theorem states that if f is
(|z) -integrable, then for -almost all z, the function fz : Y R defined
46 We maintain this absolute continuity condition for presentational purposes (to avoid
measure-theoretic issues), but this construction can accommodate probability measures (|z)
that are not absolutely continuous, in which case the measure (|z) must be defined on
the Borel -algebra.
47 At issue is whether the Lebesgue measurable sets are included among the sets measurable
with respect to the outer measure (|z) . The argument is similar to the product measure
case (which is found in the appendix) and is omitted.
48 See Theorem 2.6.2 of Ash (1972) Measure, Integration, and Functional Analysis, Academic Press: New York, NY. Although Ash states this result for Borel sets, we accommodate
Lebesgue measurable subsets of Rn using the absolute continuity assumption, as in the product
measure case in the appendix.
72
73
P (A|z)(dz)
(A (R C)) =
C
X
[
P (Ai |z).
Ai |z
=
P
i=1
i=1
A weakness of this countable additivity property is that the exceptional set can
depend on the collection {A1 , A2 , . . .}. In other words, there need not be a
single -measure zero set such that for all z outside this exceptional set, P (|z)
is a probability measure.
In fact, any given probability measure on Rn admits a regular conditional
probability, a mapping : L Rm R such that for -almost all z, (|z) is
a probability measure on R and {P (A|) | A Ln } is a system of conditional
probabilities, where P (A|z) = (Az |z) for all measurable A Rn and all z.52
An implication is that for each measurable B R , (B|z) is measurable as a
function of z, and therefore is a transition probability. Furthermore, we have
Z
Z
(A) =
P (A|z)(dz) =
(Az |z)(dz)
for all measurable A Rn , and therefore = (|z). That is, the probability
measure induces the marginal and the transition probability (a regular
conditional probability), which return the initial probability measure. Note that
absolute continuity of (|z) with respect to Lebesgue measure is not used here.
51 See Theorem 6.4.6 of Ash (1972) Real Analysis and Probability, Academic Press: New
York, NY.
52 See Theorems 10.2.1 and 10.2.2 of Dudley (2002) Real Analysis and Probability, Cambridge University Press: Cambridge, MA. Dudley establishes a mapping such that (A) =
R
(Az |z)(dz) for all A belonging to the product -algebra L Lm . By Theorem 10.23
(parts 6 and 7) of Aliprantis and Border (2006), for each measurable A Rn , there is a set
C L Lm (in fact, a Borel set) with -measure zero such that A \ C A C. Then
((A \ C)z |z) and ((A C)z |z) are measurable functions of z and equal for -almost all z;
moreover, these functions are equal to (Az |z) for -almost all z, and therefore the latter
function is measurable. Finally, it follows that (A) is equal to the integral across sections.
74
Transition probabilities are often used to model discrete time Markov chains,
in which case we equate Y = Z and view an element z Z as the state of
a system; then (|z) specifies the probability distribution over next periods
state, conditional on the current state being z. Given a transition probability
: L(Z) Z R, a distribution P(Z) over the current state induces a
distribution T () over next periods state as follows: for all measurable subsets
A Rm ,
Z
T ()(A) =
(A|z)(dz).
The mapping T is referred to as the adjoint of the transition probability.
A probability measure is an invariant distribution (or stationary distribution)
if T () = , i.e., it is a fixed point of T . Note that if satisfies the Feller
property, then the mapping T : P(Z) P(Z) is continuous with the Prohorov
metric on P(Z). Indeed, let k weakly, and let f : Z R be any bounded,
continuous function. Then
Z
Z
Z
f (z)T(k )(dz) =
f (z)
(dz|z )k (dz )
z
z
Z Z
=
f (z)(dz|z ) k (dz )
z
z
Z Z
f (z)(dz|z ) (dz )
z
Zz
=
f (z)T()(dz),
where the second and third equalities follows from linearity of the integral in
the integrating measure, and the limit follows from the Feller property and
weak convergence. In fact, the converse holds as well.53 By the metric version
of Schauders fixed point theorem, if Z is compact and satisfies the Feller
property, so P(Z) is compact and T is continuous, then admits at least one
invariant distribution.
To explore further the dynamic interpretation of transition probabilities, we
consider a transition probability : L(Z)Z R satisfying Doeblins condition:
there exist a finite measure on Rm and > 0 such that for every measurable
set C Rm and every z Z, (C) implies (C|z) 1 .54 In words, this
requires that the transition probability not be too concentrated on small sets,
where the meaning of small is given by and . (Note that the measure
53 See
75
in Doeblins condition must satisfy (Z) > 0.) Because Doeblins condition is
quite useful, we note that it is satisfied if has a density h : Z Z R that is
bounded. Indeed, if b 1 is an upper bound for h, then we can set = and
= 1/2b; then given measurable C with (C) , we have (C) 1/2b, and
therefore
Z
1
1
1
= 1
h(y, z)dy (C)b
(C|z) =
2
2b
C
for all z, as required.
Then a measurable set C Z is invariant (or absorbing or self-supporting) if
for all z C, we have (C|z) = 1. And an invariant set E is ergodic if it is
nonempty and contains no invariant set C with smaller -measure, i.e., there is
no invariant set C E with (C) < (E). Every invariant set, and therefore
every ergodic set, has -measure of at least . Furthermore, if E and E are
ergodic sets, then it must be either that E and E are equivalent with respect
to , i.e., (E \ E ) = (E \ E) = 0, or that they are disjoint with respect to
, i.e., (E E ) = 0. As a consequence, the number of meaningfully distinct
ergodic sets is bounded above by (Z)/. Furthermore, there is at least one
ergodic set.55 An invariant distribution is an ergodic distribution if there is
an ergodic set E such that (E) = 1.
Say a set F is transient if for all z Z, we have (F |z) < 1. Then Z can
be partitioned into a transient set F and a finite number k of ergodic sets,
E 1 , . . . , E k , such that every ergodic set E is equivalent to some E i , i = 1, . . . , k,
with respect to . For each ergodic set E i , there is a unique ergodic distribution
i satisfying i (E i ) = 1. Moreover, every invariant distribution is a convex
combination of the ergodic distributions 1 , . . . , k associated with these ergodic
sets: if is an P
invariant distribution, thenPthere exist non-negative weights
k
k
1 , . . . , k with i=1 i = 1 such that = i=1 i i .
To complete the analysis of dynamics, define the transition probabilities r ,
m
r = 1, 2, . . ., as follows: 1 = , and
R for r 2, for all measurable C R and
all z Z, we specify r (C|z) = r1 (C|z )(dz |z). That is, r (C|z) is the
probability that given initial state z, the state belongs to the set C after r steps
of the process. Then, beginning from any state, the probability that the state
reaches an ergodic set (and remains there) approaches one over time and at a
geometric rate: there exist constants c > 0 and d [0, 1) such that for all r and
all z Z, we have
!
k
[
i
r
E z
1 cdr .
i=1
55 See
pp.206207 of Doob (1953) Stochastic Processes, John Wiley and Sons: New York,
NY.
76
Let Tr = Tr , so that given an initial distribution over states, the distribution in r steps is Tr (). Beginning with any distribution over states, the
average distribution over future states converges (in a strong sense) to an invariant distribution: for every probability measure with (Z) = 1, there is
anP
invariant distribution such that the average distribution over r steps, i.e.,
r
1
t
0,
T (),
r t=1
where v is the total variation metric on P(Z).56 In fact, if the initial distribution puts probability one on some ergodic set, say (E i ) = 1, then the
limit distribution will be = i , the ergodic distribution corresponding to the
ergodic set.
The reason the above limit results are stated for the average distributions is
that the Markov chain can cycle, even within an ergodic set: for example, if
Z = {z1 , z2 } and ({z2 }|z1 ) = ({z1 }|z2 ) = 1, then starting from z1 , the chain
alternates endlessly between the two states. To preclude such cycles, it is sufficient to add the Feller property and the following mixing condition: for every
ergodic set E i , there exists zi E i such that for every open set G Rm with
zi G and for every z E i , we have (G|z) > 0. Under the latter condition,
in combination with the Doeblin and Feller properties, the above limit results
can be stated not for the sequence of average distributions, but for the sequence
of distributions over states in each period, as in
v (Tr (), )
0,
Theorem 5.6, discussion on pp.207208, and Theorem 5.7 of Doob (1953) Stochastic
Processes, John Wiley and Sons: New York, NY.
77
Weak convergence of the sequence {k } of transition probabilities to is equivalent to the following condition:57 for every measurable C Rm and every
bounded, continuous function f : Y R, we have
Z Z
Z Z
f (y)(dy|z) (dz).
f (y)k (dy|z) (dz)
C
78
(, ) =
X
i=1 j=1
Z
Z
1
fi (y)(dy|z) (dz)
2i+j (Bj ) zBj
y
Z
Z
16
Continuous Correspondences
79
(x)
U
Figure 11: Upper hemicontinuity
all m, we have y (x).60 Assuming Y is compact, the converse direction holds
as well. Recall that the product space X Y can be endowed with the product
metric, making it a metric space, and that a sequence {(xm , ym )} converges to
(x, y) in the product space if and only if xm x in X and ym y in Y .
Defining the graph of as
graph()
= {(x, y) X Y | y (x)},
the correspondence has closed graph if, fittingly, its graph is closed in the product space X Y . Assuming Y is compact, a correspondence with closed
values is upper hemicontinuous if and only if it has closed graph. By a version
of Weierstrass theorem for correspondences, the image of a compact set under
an upper hemi-continuous correspondence with compact values is compact: if
: X Y is upper hemi-continuous
S and has compact values, then for all compact K X, the image (K) = {(x) | x K} is a compact subset of Y .
80
(x)
V
U
Figure 12: Lower hemicontinuity
correspondence has open graph if, fittingly, its graph is open in the product
space X Y . Every correspondence with open graph is lower hemi-continuous.
In fact, every correspondence with open lower sections, i.e., for all y Y ,
{x X | y (x)}
is open, is lower hemi-continuous.
A correspondence : X Y is continuous at x X if it is both upper and
lower hemi-continuous at x. It is continuous if it is continuous at all x X, i.e.,
it is both upper and lower hemi-continuous. If has singleton values, so there
is a function f : X Y such that (x) = {f (x)} for all x X, then upper
hemi-continuity, lower hemi-continuity, and continuity of the correspondence
are equivalent conditions, and they in turn are equivalent to continuity of the
function f . Assuming Y is compact, and letting K(Y ) denote the nonempty,
compact subsets of Y with the Hausdorff metric, : X Y is continuous with
closed values if and only if the function F : X K(Y ) defined by F (x) = (x)
is continuous.
Upper hemi-continuity is preserved by finite unions, and under weak conditions
by arbitrary intersections and products: letting X and Y be metric spaces, and
letting {i | i I} be a collection of upper hemi-continuous correspondences
i : X Y indexed by elements of the set I,61
61 The result for intersections uses Aliprantis and Borders (2006) Corollary 3.21, which
implies that every metric space is a regular topological space, and their Theorem 17.25.
81
m
[
i (x)
i=1
is upper hemi-continuous,
if each i has closed graph, or if each i has closed values and at least
one is compact-valued, then the correspondence : X Y defined by
\
(x) =
i (x)
iI
is upper hemi-continuous,
if I is countable and each i has compact values, then the correspondence
: X Y defined by
Y
(x) =
i (x)
iI
is lower hemi-continuous,
62 This relies on Lemma 17.22 of Aliprantis and Border (2006), with the fact that every
metric space is normal, as established in their Corollary 3.21.
82
k
Y
i (x)
i=1
= {x X | f (x, p) = 0}.
is upper hemi-continuous with nonempty, compact, convex values with the Prohorov metric on P(Y ). Adding the assumption that X is a separable metric
space, a stronger result is possible: then is upper hemi-continuous if and only
if is upper hemi-continuous; and is lower hemi-continuous if and only if is
lower hemi-continuous.63 For another special case, let X Rn be measurable,
and define the support correspondence : P(X) X by
()
= supp().
Then is lower hemi-continuous and has closed values with the Prohorov metric
on P(X).
Given metric space X, a special case of Michaels selection theorem is that
if : X Rm is lower hemi-continuous and has nonempty, closed, convex
values, then it admits a continuous selection, i.e., there is a continuous function
f : X Rm such that for all x X, we have f (x) (x).
Let X and P be metric spaces, let f : X P R be a continuous function, and
let : P X be a continuous correspondence with nonempty, compact values.
Then Berges theorem of the maximum (or simply the maximum theorem) states
that the correspondence : P X defined by
(p) = arg max f (x, p)
x(p)
84
For purposes of comparison, note that we can obtain a fixed point result from
Michaels selection theorem by replacing upper hemi-continuity in Glicksbergs
theorem with lower hemi-continuity: a correspondence satisfying these conditions admits a continuous selection f : X X, which by the metric version
of Schauders theorem admits a fixed point x = f (x) (x).
17
Measurable Correspondences
m (x)
m=1
is lower measurable,
if X is separable, then the correspondence : X Y defined by
(x)
m (x)
m=1
85
: X Y k defined by
(x)
m (x)
m=1
is lower measurable.
A correspondence : X Y is lower measurable if and only if the correspondence : X Y defined by (x) = clos((x)), the pointwise closure of , is
lower measurable.
When Y = Rm and the correspondence : X Rm is lower measurable, I
show in the appendix that the pointwise convex hull of is lower measurable;
that is, the correspondence : X Rm defined by (x) = conv((x)) is
lower measurable. And when : X Rm has nonempty, compact values,
the correspondence is lower measurable if and only if the correspondence
: X P(Rm ) defined by
(x) =
{x X | f (x, p) = 0}
65 See Theorem 3 of Himmelberg and Van Vleck (1975) Multifunctions with Values in
a Space of Probability Measures, Journal of Mathematical Analysis and Applications, 50:
108112.
86
x X, we have sup ||(x)|| = sup{||y|| | y (x)} f (x). If is integrably bounded and lower measurable with nonempty, closed values, then by
the Kuratowski-Ryll-Nardzewski selection theorem, since Rm is complete and
separable, it has a -integrable selection, and therefore the correspondence is
-integrable.
If is finite Rand : X Rm is -integrably bounded with closed values, then
the integral (x)(dx) is a compact set.66 A version of Lyapunovs theorem
for
R correspondences states that if is finite and atomless, then the integral
(x)(dx) is convex; in fact, as long as (x) is convex for each atom x X,
the integral will be convex for a general finite measure .67 Now assume that is
finite and that is -integrably bounded and lower measurable with nonempty,
closed values Then the convex hull of the integral is equal to the integral of the
convex hull:68
Z
Z
conv
(x)(dx)
=
conv((x))(dx).
It follows that if is also atomless, then
(x)(dx) =
conv((x))(dx).
The latter claim may not seem surprising when m = 1 and is real-valued,
given our intuition from the intermediate value theorem. To see how things
work in multiple dimensions, consider Figure 13, where : [0, 1] R2 maps
real numbers x to pairs (y1 , y2 ) and takes as its constant value the boundary
of
R a triangle; then the center of the triangle, denoted y , clearly belongs to
conv((x))(dx). To obtain it as the integral of a selection of , we select
the vertices of the triangle over an appropriate range (selecting each vertex over
roughly one third of the interval), thereby weighting the vertices to yield y .
It is sometimes of interest to consider measurable sets X Rn and P Rk and a
correspondence : X P Rm , where the values (x, p) of the correspondence
are parameterized by p. For p P , define p : X Rm by p (x) = (x, p); let
be a finite measure on Rk ; and let : Ln P R be a transition probability on
X, i.e., a mapping such that (i) for all p P , (|p) is a probability measure on
Rn with (X|p) = 1, and (ii) for all measurable Y Rn , (Y |p) is measurable
as a function of p. Assume that satisfies the Feller property, so that pm p
66 See
88
(x )
y
y2
x
x
y1
Figure 13: Lyapunovs theorem for correspondences
89
p ()
Rm
() (x)(dx|)
p
p
R
p p (x)(dx|p) is indicated in blue, and two selections from p () and
p () are indicated in red.71
Let X Rn be measurable and P be a metric space, and consider a correspondence : X P R Rm . Letting be a finite measure on Rn , we can
compute the integral p (dx)(dx) for each p. This determines a correspondence : P Rm defined as
Z
(p) =
p (dx)(dx).
We have already seen conditions under which any measurable selection g of
can be generated by a jointly measurable selection f of . Now we consider
the properties of as p varies; in particular, this correspondence will be upper
hemi-continuous quite generally. Assume that for -almost all x X, the
correspondence x : P Rm defined by x (p) = (x, p) has closed graph;
and assume that the family {p | p P } of correspondences is uniformly integrably bounded, i.e., there is a -integrable function f : X R such that
endows X P with the Borel -algebra. The assumption of the Feller property implies that the
mapping p (|p) is Borel measurable. Using Castaings theorem, lower measurability of
implies that it is the pointwise closure of a countable set {fi } of measurable selections. For each
fi , Theorem 10.35 of Aliprantis and Border (2006) implies that there is a function gi that is
BorelSmeasurable and equal to fi outside a measurable set Zi with Lebesgue measure zero. Let
Z=
i=1 Zi , which is also measure zero, and define the correspondence (z) = clos({gi (z)}).
Applying Castaings theorem again, is lower measurable with respect to the Borel -algebra
on X P . Moreover, for each z (X P )\ Z, we have (z) = clos({fi (z)}) = clos({gi (z)}) =
(z). Artsteins theorem can then be applied to .
71 For the reader who has printed out this document on a laser printer, there are two red
lines, one for p and one for p , and there is one blue line, which varies over p.
90
for all p P and all x X, sup ||p (x)|| f (x). Then a version of Fatous
lemma for correspondences states that has closed graph.72 Note as well that
if is atomless, then Lyapunovs theorem for correspondences implies that is
convex-valued.
The above form of Fatous lemma holds the integrating probability measure
fixed, in contrast to the analysis of jointly measurable selections. We recover
that generality by adding convex values. Returning to the variable measure
formulation, now let X Rn be compact, let P be a metric space, and let
: Ln P R be a transition probability on X, i.e., a mapping satisfying (i)
and (ii), above. Assume that satisfies the Feller property, and assume that
: X P Rm is upper hemi-continuous and has non-empty, compact values.
Then the correspondence : P Rm defined by
Z
(p) =
p (x)(dx|p)
has closed graph.73
18
Topological Spaces
Tk
i=1
Gi T,
A topological space is any set X together with a topology T (though it is customary to refer to X itself, leaving the topology implicit). A set U X is open
if U T, and it is closed if U = X \ U T. We then define the boundary
of a set Y X, denoted bd(Y ), to consist of every x X such that for all
U T, we have U Y 6= and U Y 6= . Extending the usual definition, a
set is open if and only if it is disjoint from its boundary, and it is closed if and
only if it contains its boundary. The interior of Y is int(Y ) = Y \ bd(Y ), the
closure of Y is clos(Y ) = Y bd(Y ), and these sets comprise, respectively, the
largest open subset and smallest closed superset of Y . As always, finite unions
and arbitrary intersections of closed sets are closed. Normally, the focus is on
72 See
73 See
91
relative topology on X Rn
n[
o
T e (X) =
G | G {Br (x) X | r > 0, x Rn }
Qk
product topology on X = i=1 Xi with Xi Rni
(
))
(n
[
Y
ni
T
=
,
Bri (xi ) | ri > 0, xi R , i = 1, . . . , k
G|G
i=1
{Y G | G T}.
92
so again the collection {Br (x) | r > 0, x X} of open balls is a base for the
topology, and the metric topology is generated by this base; we may also say it
is generated by the metric . A topology T on a space X is metrizable if there is
a metric on X that generates the topology in the above way, i.e., T = T . Note
that a set Y Rn is openSif and only if for all x Y , there exists x > 0 such
that Bx (x) Y , so Y = xY Bx (x), so the Euclidean topology is metrizable
by the Euclidean metric. The discrete topology is metrizable by the discrete
metric. There are topologies that are not metrizable.
there exists n such that Un V . In such spaces, we can dispense with nets,
and the above equivalences for closed sets and continuous functions can all be
stated with sequences only. Assuming X is first countable and Y is an arbitrary
topological space:
a set F X is closed if and only if for every convergent sequence {xm }
in F with limit x X, we have x F ,
a function f : X Y is continuous if and only if for every convergent
sequence {xm } in X with limit x X, we have f (xm ) f (x) in Y .
Every metric space (including Euclidean space with the usual norm) is first
countable. There are topological spaces that are not first countable.
A vector space is a set X, which contains a zero vector, 0, and on which addition
and scalar multiplication are defined and possess certain intuitive properties:
x+y =y+x
(x + y) + z) = x + (y + z)
x+0=x
x + (1)x = 0
1x = x
(x) = ()x
(x + y) = x + y
( + )x = x + x,
where x, y, z range over X and , range over the real numbers. As usual,
given a subset A X of a vector space, conv(A) denotes the set of every convex
combination of vectors in X, i.e., it consists of every 1 x1 + + n xn such
that the weights i are non-negative and sum to one. A vector space X is a
topological vector space (or tvs or linear topological space) when it is endowed
with a topology such that addition and scalar multiplication are continuous.74
Of course, Rn is a topological vector space with the Euclidean topology. Letting
X be a topological space and Y a topological vector space, if f, g : X Y are
continuous and , R, then f + g is continuous.
Given a vector space X, a set C X is convex if for all x, y X and all
(0, 1), we have x + (1 )y C. As usual, given Y X and a function
f : Y X, say x Y is a fixed point of f if f (x) = x. The topological vector
space X is locally convex if for every x X and every open set V containing
74 In some treatments (e.g., Dunford and Schwartz (1958)), a topological vector space is by
definition also Hausdorff.
95
In words, the Lebesgue integral of the indicator function of any finite set is zero,
and the net of these integrals (directed by the collection D of finite subsets) does
not converge to the integral of the function taking a constant value of one, which
is one.
Given topological spaces X and Y , a correspondence : X Y assigns a
set (x) Y to each x X. The original definitions of upper and lower
hemi-continuity were entirely topological, i.e., they involved only open sets, so
they extend to the current setting unchanged. In particular, a correspondence
: X Y is upper hemicontinuous if for every x X and every open set
V Y with (x) V , there is an open set U X with x U such that for all
z U , we have (z) V .
As usual, given a correspondence : X X, say x X is a fixed point of
if x (x). Given a locally convex, Hausdorff topological vector space X,
Glicksbergs theorem states that if K X is nonempty, compact, and convex,
and if : K K is upper hemicontinuous with the relative topology pn K and
has nonempty, closed, convex values, then has at least one fixed point. This
generalizes Schauders theorem, and all of the above applications of Schauder
can be extended to correspondences in the obvious way.75 It is noteworthy
that compactness in Glickbergs theorem can be weakened somewhat. Let C
be a nonempty, closed, convex subset of a locally convex, Hausdorff topological
vector space; let : C C have closed graph and nonempty, convex values;
and assume that there is a compact set K X such that (x) K for all
x C. Then the Bohnenblust-Karlin theorem states that has at least one
fixed point.
19
98
99
q
Tw
.
This means that when is finite, a compact subset K in the weak topology
on Lq (X, ) will also be compact in the weak topology on Lp (X, ); and given
any continuous function : Lp (X, ) R with the weak topology on Lp (X, ),
the restriction |Lq (X,) R will be continuous with the weak topology on
Lq (X, ). Similar remarks hold for the weak* topologies on Lp (X, ), with
p (1, ) and p = .
Given p [1, ), if a set F Lp (X, ) is closed (resp. open, compact) in the
weak topology, then it is weakly closed (resp. weakly open, weakly compact ).
Given topological space Y , a function : Lp (X, ) Y is weakly continuous if
it continuous with respect to the the weak topology on Lp (X, ), i.e., for all open
G Y , the pre-image 1 (Y ) is weakly open. Given a subset L Lp (X, ),
the relative weak topology on L is
p
Tw
(L) =
p
{L G | G Tw
}.
p
q
For example, in the discussion above, we argued that Tw
(Lq (X, )) Tw
when
is finite and p < q. A set F L is weakly closed (resp. weakly open, weakly
compact ) relative to L if it is closed (resp. open, compact) in the topology
p
(L). The function : L Y is weakly continuous relative to L if for all open
Tw
G Y , 1 (G) is weakly open relative to L. Analogously, given p (1, ) or
p = , we define weak * closed, weak * open, and weak * compact sets and weak *
continuous functions using the topology Tp . Given a subset L Lp (X, ), the
relative weak * topology on L is
Tp (L) =
p
},
{L G | G Tw
and we define relatively weak* closed, open, and compact sets and relatively
weak* continuous functions as above.
The weak topology on Lp (X, ), with p [1, ), underlies the notion of weak
convergence. Although we have defined weak convergence for sequences, the
topological approach raises the possibility of convergent nets, as well as sequences. Given p and q conjugate with p [1, ), a net {f } converges in
the weak topology (or just weakly) to f in Lp (X, ) if it converges in the weak
p
. I show in the appendix that after some simplification this reduces
topology Tw
to the following: f f weakly in Lp (X, ) if and only if for all > 0, all n,
and all g1 , . . . , gn Lq (X, ), there exists such that for all , we have
x U (f, , g1 , . . . , gn ). Equivalently, f f weakly if for all g Lq (X, ),
Z
(f (x) f (x))g(x)(dx) 0.
100
Thus, the weak topology extends the notion of weak convergence of sequences
in Lp (X, ) with p [1, ). Consistent with the discussion in the preceding
paragraph, assuming is finite and p < q, if {f } Lq (X, ) converges to
q
f Lq (X, ) in the weak topology Tw
, then it converges in the weak topology
p
Tw . By definition, for p (1, ), weak convergence and weak* convergence
coincide. Finally, a net {f } converges in the weak* topology (or just weak *) to
f in L (X, ) if it converges in the weak* topology T described above: for
all > 0, all n, and all g1 , . . . , gn L1 (X, ), there exists such that for all
, we have x U (f, , g1 , . . . , gn ). Equivalently, f f weak* if for all
g L1 (X, ),
Z
(f (x) f (x))g(x)(dx) 0.
So, again, the weak* topology extends the notion of weak* convergence of a
sequence of functions.
For the special case of a sequence {fm } in L1 (X, ), we can provide some characterization of weak limits. Recall from the sawtooth example in Figure 9 that a
sequence may converge weakly yet possess no subsequence (or subnet) that converges pointwise -almost everywhere to its limit f (which takes a constant value
of one in the figure), but there is still a characterization in terms of convex hulls
of pointwise limits. Recall that a set K L1 (X, ) is uniformly -integrable
if for all > 0, there exists >R 0 such that for all f K and all measurable
Y X with (Y ) < , we have Y |f (x)|(dx) < . Let {fm } be a uniformly integrable sequence that weakly converges to f in L1 (X, ). If is finite, then for
-almost all x X, f (x) is contained in the convex hull of accumulation points
of the sequence {fm (x)}, i.e., f (x) [lim inf fm (x), lim sup fm (x)] (where we ignore the -measure zero set on which one of these limits may be infinite). In the
sawtooth example, we have f (x) = 1 [0, 2] = [lim inf fm (x), lim sup fm (x)] for
almost all x. More generally, let {f m } be a sequence of vector-valued functions
f m = (f1m , . . . , fnm ) and f = (f1 , . . . , fn ) such that f m : X Rn is measurable
for each m, let f : X Rn be measurable, and assume that {fim } is uniformly
-integrable and weakly converges to fi in L1 (X, ), i = 1, . . . , n. If is finite,
then for -almost all x X, we have:78
m
f (x) conv ls {f (x)} .
101
The weak and weak* topologies are Hausdorff when is -finite, as shown in
the appendix. Furthermore, each basic open set U (f, , g1 , . . . , gn ) is convex,
which implies that the weak and weak* topologies are locally convex. The weak
and weak* topologies are not generally metrizable. Recall that for p [1, )
or p = , we denote by Drp (f ) the disc of radius r around x in the metric p .
A special case of Alaoglus theorem is that for all p (1, ), all r > 0, and all
f Lp (X, ), the disc Drp (f ) is compact in the weak* (or equivalently, weak)
topology on Lp (X, ); and for all r > 0 and all f L (X, ), the disc Dr (f ) is
compact in the weak* topology on L (X, ). But Alaoglus theorem does not
apply to the discs Dr1 (f ) in L1 (X, ) with the weak topology; in general, this
disc need not be weakly compact (see the discussion in the next paragraph).
Of course, a set is contained in Drp (f ) if and only if it is a bounded subset of
Lp (X, ). It follows that for p (1, ), if a subset K Lp (X, ) is convex,
bounded, and closed in the metric topology T p , then it is compact in the weak
(equivalently, weak*) topology.
Weak compactness in L1 (X, ) presents special difficulties, not because the weak
topology is very strong (recall that it is generated by basic sets indexed by only
functions g1 , . . . , gn L (X, )), but because the space is large. Assuming
is finite, however, a set K L1 (X, ) is compact in the weak topology if and only
if it is bounded in the L1 -metric, weakly closed, and uniformly -integrable.79
An implication is that for convex K L1 (X, ), the set K is compact in the
weak topology if and only if it is closed and bounded in the metric topology
and uniformly -integrable. To see why uniform -integrability is crucial for
compactness in L1 (X, ), return to the earlier example of the sequence
{fm } of
R
functions fm : [0, 1] R defined by fm = mI[1 m1 ,1] . Note that |fm (x)|dx = 1
for all m, so the sequence belongs to the disc D11 (0) of radius one around the zero
function in L1 ([0, 1], [0,1] ), where [0,1] is the restriction of Lebesgue measure to
measurable subsets of the unit interval. But there is no subsequence (or subnet)
of {fm } that converges to any function in the weak topology. Indeed, if there
were such a function, say f , it must be that f takes the value of zero on [0, 1).
Letting Rg1 L ([0, 1], [0,1]R) take the constant value of one, weak
R convergence
implies fm (x)g1 (x)dx f (x)g1 (x)dx, but this implies that f (x)dx = 1,
which is impossible. Obviously, the sequence {fm } is not uniformly integrable,
1
because there are arbitrarily small intervals [1 m
] over which integrals of fm do
2
not go
to zero. In contrast, the corresponding example
R in L (X, ) would1 specify
2
fm = mI[1 m1 ] , so that fm D1 (0), but then fm (x)g1 (x)dx = m 0,
avoiding the difficulty in L1 (X, ).
The above result on compactness in L1 (X) can be applied to selections of correspondences. Assume is a finite measure, and let : X R be -integrably
bounded and lower measurable with closed, convex values. Then the set of
79 See Theorem 15, p.76, of J. Diestel and J. Uhl (1977) Vector Meaures, American Mathematical Society: Providence RI.
102
-integrable selections,
o
n
S() =
f L1 (X, ) f (x) (x) for -almost all x X ,
is a weakly compact subset of L1 (X, ).80 I provide a short proof of this result
in the appendix. Note that the assumption of convex values is crucial for the
result: the constant correspondence : [0, 1] R defined by (x) = {0, 2}
satisfies the remaining conditions of the result, and the sawtooth functions in
Figure 9 are selections from , but they converge weakly to the function with
constant value one, which is not a selection from .
Combining Schauders theorem with previous results:
If K L1 (X, ) is nonempty, convex, bounded, closed in the metric topology, and uniformly -integrable, then every weakly continuous : K K
has at least one fixed point.
Given p (1, ), if K Lp (X, ) is nonempty, convex, bounded, and
closed in the metric topology, then every weakly continuous : K K
has at least one fixed point.
If K L (X, ) is nonempty, convex, bounded, and closed in the weak*
topology, then every weak* continuous : K K has at least one fixed
point.
Using Glicksbergs theorem, these results remain true if we replace weakly (or
weak*) continuous functions : K K with weakly (or weak*) upper hemicontinuous correspondences : K K such that for all f K, (f ) is nonempty,
convex, and closed in the metric topology.
Although the weak and weak* topologies are not metrizable, we can consider
the relative topologies they induce on discs. This is potentially useful because it
can allow us to work with sequences in relevant sets, avoiding problems entailed
by nets. Specifically, for p (1, ) and f Lp (X, ), we give the disc Drp (f )
the relative topology induced by the weak* topology on Lp (X, ); that is, we
endow the space Drp (f ) with the topology
Tp (r, f ) = Tp (Drp (f ))
= {Drp (f ) G | G Tp }.
103
p
= {Drp (f ) G | G Tw
},
p
and ask whether the relative topology Tw
(r, f ) is metrizable. For the case p
p
(1, ), these relative topologies are the same, i.e., Tw
(r, f ) = Tp (r, f ), because
the weak and weak* topologies coincide. When p = 1, however, the relative
1
weak topology Tw
(r, f ) on the disc Dr1 (f ) is not metrizable unless L (X, )
is separable, which holds only if has finite support. Thus, L1 (X, ) again
presents certain difficulties.
Although the discs in L1 (X, ) with the weak topology may not be metrizable, it
turns out that compactness nevertheless has a sequential characterization. For
p [1, ), say K Lp (X, ) is weakly sequentially compact if every sequence
{fm } in K has a subsequence that converges weakly to a limit in K.83 By
the Eberlein-Smulian
theorem, a set K Lp (X, ) is compact in the weak
topology if and only if it is weakly sequentially compact.84 Because every weakly
convergent sequence is bounded,85 an implication is that every weakly compact
set is bounded. In fact, even though the disc Dr1 (f ) is not metrizable, if a set
K L1 (X, ) is compact in the weak topology, then the relative topology on
p
p
K induced by the weak topology, Tw
(K) = {G K | G Tw
}, is metrizable.86
p
Of course, a set K L (X, ), with p (1, ) or p = , is weak * sequentially
compact if every sequence {fm } in K has a subsequence that converges weak*
to a limit in Lp (X, ).
Given a subset K Lp (X, ), with p [1, ), a function : K K is weakly
sequentially continuous if for all f K and all sequences {fm } in K weakly converging to f , we have (fm ) (f ) weakly. Analogously, given K Lp (X, ),
with p (1, ) or p = , the function is weak * sequentially continuous if the
same condition holds after replacing weak convergence with weak*. Combining
Schauders theorem with previous results:
Given p [1, ), if K Lp (X, ) is nonempty, convex, and weakly
sequentially compact, then every weakly sequentially continuous : K
K has at least one fixed point.
82 See
104
f (x)g(x)(dx)
{pf | f Lp (X, )}
Q
q
be the set of mappings so-defined. Note that pf gLq (X,) R = RL (X,) ,
q
and let T denote the product topology on RL (X,) . Then the weak topology
p
on L (X, ) is essentially the relative topology on p induced by the product
q
topology on RL (X,) , i.e., we have
p
Tw
{p G | G T },
105
20
Maximal Elements
The axiom of choice is one of the axioms usually imposed in the definition of set.
Given any
S arbitrary collection A of nonempty sets, it says there is a function
fQ: A A such that for all A A, f (A) A. The set of such mappings is
A, the Cartesian product of the collection. Often, we assume the collection is
indexed
by a set I via a bijective mapping : I A, and the product is written
Q
I A . Thus, the axiom of choice is simply that the Cartesian product of
nonempty sets is nonempty.
There are several equivalent reformulations of the axiom of choice. A relation on
X is a two-place predicate expressing a property possessed by pairs of elements
in X. Formally, a relation, typically denoted , can be viewed as a subset of
X X, and we write simply x y for (x, y) . A partial order is a relation
is (i) reflexive, i.e., for all x X, x x, (ii) anti-symmetric, i.e., for all x, y X,
x y and y x implies x = y, and (iii) transitive, i.e., for all x, y, z X, x y
and y z implies x z. Given the relation on X, the set C X is a chain
if for all x, y C, either x y or y x. An element x X is an upper bound
of C if for all y C, x y. An element x X is maximal for the partial order
if for all y X, y x implies x = y. It turns out that the axiom of choice is
equivalent to Zorns lemma, which states that given any set X and partial order
on X, if every chain has an upper bound, then there is at least one maximal
element.
A well-ordering of a set X is a partial order on X such that every subset
of Y possesses a unique dominant element, i.e., there exists x Y such that
for all y Y \ {x}, we have x y. The axiom of choice is equivalent to the
well-ordering principle, which states that given any nonempty set X, there is
a well-ordering of X. Of course, the reverse of the usual ordering of natural
numbers is a well-ordering (because every nonempty subset of natural numbers
has a unique minimum), but a well-ordering of the real numbers, for example,
cannot be explicitly constructed. The usual greater than relation does not
suffice, of course, because there is no greatest real number, nor for that matter
is there a greatest real number in the open interval (0, 1).
Given a partial order on X and a chain C, say the chain is maximal if there
is no other chain that contains it, i.e., there does not exist x X \ C such that
for all y C, either x y or y x. The axiom of choice is equivalent to
the Hausdorff maximality principle, which states that given any set X and any
partial order of X, there is at least one maximal chain.
As an application, let X be a topological space and K any nonempty collection
of nonempty, compact subsets of X. Then there is a nonempty, compact set
K X that is minimal with respect to set inclusion among the sets in K, i.e.,
there does not exist K K \ {K} such that K K. To see this, let be the
106
107
Given the reflexive relation &, let > be the associated dual relation defined so
that for all x, y X, the statement x > y holds if and only if it is not the
case that y & x. It follows that > is irreflexive, i.e., for all x X, not x > x.
Given the relation >, say an element x X is undominated if for all y X,
not y > x. Undominated elements are always maximal. Condition (iii ) can be
reformulated as follows: there does not exist a finite set Y X such that for
all x X, there exists y Y with y > x. The dominant elements of & coincide
with the undominated elements of >, and therefore under (i), (ii), and (iii ), the
relation > has at least one undominated element.
A relation & on X is complete if for all x, y X, either x & y or y & x. This
is equivalent to the condition that the dual > is asymmetric, in the sense that
for all x, y X, not both x > y and y > x. Note that completeness strengthens
reflexivity, and asymmetry strengthens irreflexivity; and when & is complete,
or equivalently when > is asymmetric, the dominant elements of & and undominated elements of > coincide with the maximal elements of these relations. We
say & is negatively acyclic if for all n and all finite sets {x1 , . . . , xn } X, either
xn & x1 or there exists i = 1, . . . , n 1 such that xi & xi+1 . Equivalently,
the dual > is acyclic if there do not exist n and a finite set {x1 , . . . , xn } X
such that x1 > x2 > xn > x1 . Note that negative acyclicity strengthens
completeness, and acyclicity strengthens asymmetry. If & is negatively acyclic,
or equivalently if > is acyclic, then the finite dominance property holds, immediately providing an additional set of sufficient conditions for existence.
In vector spaces, the finite dominance property is also implied by a convexity
condition on the upper sections of a relation. Letting X be a convex subset
of a vector space, we say a relation & on X is semi-convex if for all x X,
x
/ conv(>x ), where >x = {y X | y > x} is the upper section of the dual. This
implies that & is reflexive. If X is in addition a subset of a Hausdorff topological
vector space and &x is closed for all x X, then semi-convexity implies the
finite dominance property. Indeed, consider any finite set Y = {y1 , . . . , yk }, and
note that the span of Y is essentially a finite-dimensional Euclidean space.89
Letting &yj = &yj span(Y ), it follows that {&y1 , . . . , &yk } is a collection of
closed subsets of finite-dimensional Euclidean space, and semi-convexity
implies
S
that for all I {1, . . . ,Tk}, we have conv({yj | j I}) {&yj | j I}. Then
by the KKM theorem, {&yj | j I} 6= , as required. These observations yield
another form of the existence result. Let be X is a convex subset of a Hausdorff
topological vector space, and assume (i) for all x X, &x is closed, (ii) for
some x X, &x is compact, and (iii ) & is semi-convex. Then the relation &
has at least one dominant element, or equivalently, the dual > has at least one
undominated element.90
89 See
108
Technical Details
109
I address details related to the space Lp (Rn , ) of equivalence classes of measurable functions, the space P(X) of probability measures with support in X Rn ,
and the space K(X) of nonempty, compact, convex subsets of compact X Rn .
I verify the metric characterization of weak convergence of transition probabilities. I address pointwise convex hulls of upper and lower hemi-continuous and
lower measurable correspondences. I establish some claims about the weak and
weak* topologies on Lp (X, ) and L (X, ) stated above. And I give conditions
under which measurable selections from a correspondence are weakly compact
in L1 (X, ).
Uniform integrability. Let be a finite measure on Rn , and let f : Rn R
be a -integrable function. Suppose in order to deduce a contradiction that
{f } is not uniformly -integrable,R so for all > 0, there exists > 0 and
. Then for each m, there
measurable Y with (Y ) < and Y |f (x)|(dx)
R
1
exists measurable Ym with (Ym ) < m
and Ym |f (x)|(dx) . Defining the
R
sequence {fm }R by fm (x) = f (x)IRn \Ym (x) for all m, it follows that |fm (x)
f (x)|(dx) = Ym f (x)(x) for all m and fm f in measure. Convergence
in measure implies there is a subsequence (still indexed by m) that converges
pointwise -almost everywhereR (see Section 11). But then Lebesgues dominated
convergence theorem implies |fm (x) f (x)|(dx) 0, a contradiction.
Lebesgues dominated convergence theorem. Let X Rn be measurable,
let P Rk , and consider any bounded function f : X P R. Assume that for
all x X, the function fx : P R defined by fx (p) = f (x, p) is continuous; and
assume that for all p P , the function fp : X R defined by fp (x) = f (x, p) is
measurable. Let {m } be a sequence of probability measures on Rn converging
weakly to , and let {pm } be a sequence of parameters converging to p in
P such that f (x, p) is continuous in x. Consider any > 0. Let b be any
bound for f . For each natural number m, define the function fm : X R
by fm (x) = f (x, pm ), and define f0 : X R by f0 (x) = f (x, p). Then K =
{fm | m = 0, 1, 2, . . .} is uniformly bounded by b, and for each x, sx (y) =
sup{|fm (x) fm (y) | m = 1, 2,R . . .} = sup{|f (x, pRm ) f (y, pm )| | m = 1, 2, . . .}
is continuous in y. Therefore, fm (x)k (dx) fm (x)(dx) uniformly across
mR as k . In particular,
there exists k such that for all m k, we have
R
| fm (x)m (dx) fm (x)(dx)| < 2 . Furthermore, continuity of fx implies
that for all x, fm (x) = f (x, pm ) = fx (pm ) fx (p) = f (x, p) = f0 (x), i.e.,
fm f0 pointwise. By Lebesgues
dominated Rconvergence, there exists such
R
that for all m , we have | fm (x)(dx) f (x)(dx)| < 2 . Setting m =
110
+
2 2
,
as required.
Convergence in measure. I claim that
R if fm f in Rmeasure, then the
sequence converges in mean as long as |fm (x)|(dx) |f (x)|(dx) (and
|fm | + |f | is -integrable for all m). Suppose to the contrary that there exists
R > 0 such that for a subsequence (still indexed by m, for simplicity) we have
|fm (x)f (x)|(x) for all m. Going to a further subsequence (still indexed
by m), we can assume that fm f pointwise almost everywhere. Define the
measurable functions g = 2|f | and for each m,
R gm = |fm | + R|f |, and note
that gm g pointwise almost everywhere, and gm (x)(dx) g(x)(dx) <
. Furthermore, note that |fm (x) f (x)| 0 for -almost all x X, and
that for all m and all x X, |fm (x) f (x)| |fm (x)| + |f (x)|R = gm (x).
Therefore, Lebesgues dominated convergence theorem implies that |fm (x)
f (x)|(dx) 0, a contradiction. Thus, fm f in mean, as required.
Product measures. Let Rn be the collection of all half-open rectangles in Rn .
A quasi-measure on Rn is a mapping : Rn : R+ such that:93
(i) () = 0,
(ii) for all pairwise disjoint collections of rectangles R1 , R2 , . . . such that
is itself a half-open rectangle in Rn , we have
!
X
[
=
(Yi ).
Yi
i=1
Ri
i=1
i=1
Sm
.
(Ri )
(Y ) = inf
rectangles with Y i=1 Ri
i=1
111
the empty set and is closed with respect to complements and countable unions,
i.e., it is a -algebra. The restriction of to is a measure, and (normally
without risk of confusion) it is just denoted as well. Thus, we begin with a
quasi-measure defined on half-open rectangles, and we extend it to a measure on
the -algebra of -measurable sets. By Lemma 10.19 of Aliprantis and Border
(2006), the measure space (Rn , , ) is complete, i.e., for all Y Rn and all
Z with Y Z and (Z) = 0, we have Y .
Now let and be measures on R and Rm , respectively, that are -finite and
absolutely continuous with respect to Lebesgue measure. Let be Lebesgue
measure on Rn . Define the quasi-measure on Rn as follows: write any
half-open rectangle R Rn as R = P Q, where P R and Q Rm ,
and specify ( )(R) = (P )(Q). Then is the collection of measurable sets, and is a measure on . Moreover, the measure space
(Rn , , ) is complete and -finite. To show that is a measure
on Rn , it suffices to prove that contains all Lebesgue measurable sets. To
this end, let B be the -algebra of Borel sets on R , and define Bm and Bn
similarly. Then B and Bm , and therefore:94
Bn = B Bm ,
where the first equality follows from Theorem 4.44 of Aliprantis and Border
(2006), and the last inclusion follows from Theorem 10.47 of Aliprantis and
Border (2006).
Consider any measurable A Ln = with Lebesgue measure zero, i.e., (A) =
0. I claim that A . By Theorem 10.23 (part 6), there is a measurable
set B Bn = B Bm such that A B and n (B) = 0. Let
f = IB : Rn R be the indicator function of B, and note that f is measurable;
furthermore, it is measurable with respect to , i.e., for all open G R, we
have f 1 (G) . Obviously, f is -integrable. For each z Rm , define
fz : R R by fz (y) = f (y, z) = IB (y, z). Fubinis theorem implies that fz is
measurable and in fact -integrable for all y outside a Lebesgue measure zero set
m
(and thus outside
a -measure zero set), that the function
R
R g : R R95defined
by g(z) = fz (y)dy is measurable, and that (B) = g(z)dz = 0.
This
implies that the set Bz = {y R | (y, z) B} has Lebesgue measure zero for
-almost all z, and by absolute continuity,
we have (By ) = 0 for -almost all
R
y. Defining g : Rm R by g(z) = fz (y)(dy), it follows that g(z) = 0 for all
z outside a -measure zero set. Applying Tonellis theorem, using -finiteness
94 Recall that the class of Borel sets is the smallest collection of sets that contains all open
sets and is closed with respect to complements and countable unions; the class is included
among the Lebesgue measurable sets. See Definition 4.14 of Aliprantis and Border (2006)
for details on Borel sets. See Definition 4.43 of Aliprantis and Border (2006) for details on
product -algebras.
95 See Theorem 22.6 of Aliprantis and Burkinshaw (1990) or Theorem 11.27 of Aliprantis
and Border (2006) for an exact statement of Fubinis theorem.
112
of and , we have:96
( )(B) =
Z Z
Z
f (y, z)(dy) (dz) =
g(z)(dz) = 0.
x
.
Since f
i=1 i i
is continuous, the image f (X ) = conv(F ) is compact. Second, assume
X is complete, let K X be a compact subset, and let C = clos(conv(K))
be the closure of the convex hull. Because X is complete and C is closed,
it follows that C is itself a complete metric space, so to show compactness it
suffices to show that C is totally bounded. Consider any > 0, and note that
{B 3 (x) | x K} is an open covering of K. Since K is compact, there is a
finite subcover {B 3 (x1 ), . . . , B 3 (xm )} of open balls centered at x1 , . . . , xm
K. Letting F = {x1 , . . . , xm }, it follows from the above that the convex hull
H = conv(F ) is compact. Of course, H C. Again, {B 3 (x) | x H} is an
open covering of H, so there is a finite subcover {B 3 (y1 ), . . . , B 3 (yn )} centered
at y1 , . . . , yn H. Now, consider any z conv(K), so there exist z1 , . . . , zk K
Pk
such that z can be written as the convex combination j=1 j zj . For each j,
Pk
there exists wj F such that zj B 3 (wj ). Letting w = j=1 j wj , we have
k
k
X
X
113
X
X
1
j yj max{(x1 , y1 ), . . . , (x , y )}
j xj ,
(xm , ym ) =
.
m
j=1
j=1
p1
p1
Z
Z
|m | |fm (x) f (x)|p dx
+ |m | |f (x)|p dx
p1
p1
Z
Z
+ |1 m | |gm (x) g(x)|p dx
+ | m | |g(x)|p dx
0,
where the inequality follows from Minkowskis inequality, and the limit follows
from the facts that |m | 1, and lim p (fm , f ) = lim p (gm , g) = lim |m | =
0. Therefore, K is a metric mixture space. To see that the metric p is quasiconvex (in fact, convex) on K, consider f, g, , K and any (0, 1). Then
p (f + (1 )g, + (1 ))
Z
p1
p
=
|(f (x) (x)) + (1 )(g(x) (x))| dx
Z
Z
p1
p1
+ (1 )
|g(x) (x)|p dx
|f (x) (x)|p dx
= p (f, ) + (1 )p (g, ),
(Y ) + (1 )(Y ) +
m (Y ) + (1 )m (Y ) + .
Thus, for all > 0, we can choose k high enough that for all m k and all
[0, 1], we have r (m + (1 )m , + (1 )) . Furthermore, for
all > 0 and all m such that |m | , we have for all measurable Y X
and all , P(X),
m (Y ) + (1 m ) (Y )
(Y ) + (1 ) (Y )
(Y ) + (1 ) (Y ) +
m (Y ) + (1 m ) (Y ) + .
Thus, for all > 0, we can choose k high enough that for all m k and all
, P(X), we have r (m + (1 m ), + (1 ) ) . Combining these
115
observations, it follows that for all > 0, we can choose k high enough that for
all m k,
r (m m + (1 m )m , + (1 ))
r (m m + (1 m )m , m + (1 m ))
+r (m + (1 m ), + (1 ))
+
2 2
,
and
(Y ) (Y ) +
(Y ) + (1 ) (Y ) + + (1 )
(Y ) + (1 )(Y ) + + (1 ),
= max
min
||a b || = ||a b||,
a A b B
but this contradicts ||a b|| = minb B ||a b ||. Therefore, the Hausdorff metric
is quasi-convex.
Transition probabilities. To see that R(Y, Z, ) equipped with the metric r
is a metric mixture space, let {k } and {k } be sequences converging, respectively, to and in R(Y, Z, ), and let k in [0, 1]. Then we have
r (k k + (1 k )k , + (1 ) )
Z
Z
X
X
1
=
f
(y)(
+
(1
)
)(dy|z)
(dz)
i
k k
k k
2i+j (Bj ) zBj
y
i=1 j=1
Z
Z
X
i=1 j=1
Z
1
(k )
2i+j (Bj )
zBj
zBj
Z
+ ( k )
+ (1 )
0,
fi (y)k (dy|z) (dz)
fi (y)(k )(dy|z) (dz)
zBj
Z
zBj
Z
Z
(dz)
fi (y)k (dy|z)
fi (y)(k )(dy|z) (dz)
117
Then we have
r (1 + (1 )3 , 2 + (1 )4 )
Z
Z
X
X
1
1
3
=
fi (y)( + (1 ) )(dy|z) (dz)
2i+j (Bj ) zBj
y
i=1 j=1
Z
Z
fi (y)(2 + (1 )4 )(dy|z) (dz)
y
zBj
Z
Z
XX
1
1
2
=
f
(y)(
)(dy|z)
(dz)
i
2i+j (Bj ) zBj
y
i=1 j=1
Z
Z
(1 )
fi (y)(3 4 )(dy|z) (dz)
r
zBj
r 3
y
4
( , ) + (1 ) ( , ),
as required.
Convex hulls of upper hemi-continuous correspondences. Assume X is a
metric space, Y is a metric mixture space, : X Y is upper hemi-continuous,
and (x) = conv((x)) is compact for all x X. Consider any x X and any
open V Y with (x) V . I claim that there exists > 0 such that for all
y (x), we have B (y) V . Indeed, otherwise there are sequences {ym } and
{zm } in Y such that for all m, ym (x) and zm B m1 (ym ) \ V . Since (x) is
compact, there is a convergent subsequence of {ym } (still indexed by m) with
limit y (x). Moreover, (zm , y) (zm , ym ) + (ym , y) 0, so zm y.
But then zm V for sufficiently high m, a contradiction.
Thus, we can choose
S
> 0 as in the claim; in particular, G = {B (y) | y (x)} V . Since
(x) G and is upper hemi-continuous, there is an open set U X with
x UPsuch that for all z U , (z) G. Now consider z U and w (z), so
m
w = j=1 j wj is a convex combination of elements w1 , . . . , wm (z) G.
For each
Pm j = 1, . . . , m, there exists yj (x) such that (wj , yj ) < . Let
y = j=1 j yj (x). By repeated application of quasi-convexity of , we
have
m
m
X
X
j yj max{(w1 , y1 ), . . . , (wm , ym ) < ,
(w, y) =
j wj ,
j=1
j=1
Ui X such
Tm that x Ui and for all z Ui , we have (z) B (yi ) 6=
Let U = i=1 Ui , and consider
P any z U . Since z Ui , there exists wi
(z) B (yi ). Then w = m
i=1 i wi (z), and by repeated application
quasi-convexity of , we have (y, w) max{(y1 , w1 ), . . . , (ym , wm )} <
This implies w B (y) V , and therefore (z) V 6= , as required.
of
.
m+1
X
i yi ,
i=1
and define m+1 : X R(m+1)m by m+1 (x) = [(x)]m+1 , and note that is
continuous and m+1 is lower measurable. Then (x) = (m+1 (x) ) for
each x X. For every open G Rm+1 , we have
n
o
{x X | (x) G 6= } =
x X | m+1 (x) X ( 1 (G)) 6= ,
Z
Pn= {x X | n 1 < f (x) f (x) n} for n = 1, 2, . . .. Since (Z) =
n=1 (Zn ) > 0, there is some n such that 0 < (Zn ) < . Define g : X R
by g(x) = (f (x) f (x))IZn (x), and note that g takes values strictly above n 1
and bounded by n on Zn , zero elsewhere. Therefore, g Lq (X, ). Now define
Z
Z
1
1
=
(f (x) f (x))2 (dx),
(f (x) f (x))g(x)(dx) =
2
2 Zn
and note that > 0. I claim that U (f, , g, g) and U (f , , g, g) are disjoint.
Indeed, if h belongs to both sets, then we have both
Z
Z
(f (x) h(x))g(x)(dx) < and
(f (x) h(x))g(x)(dx) > ,
R
but this implies (f (x) f (x))g(x)(dx) < 2, a contradiction. Thus, f and
f are separated by disjoint basic open sets, as required. The argument for the
weak* topology is similar.
Compactness of selections. Assume is finite, X Rn is measurable, and
: X R is -integrably bounded and lower measurable with closed, convex
values. Since is -integrably bounded, there exists a -integrable function
g : X R such that for all x X, we have sup |(x)| = sup{|y| | y (x)}
g(x). Consider any > 0, and using the fact that {g} is uniformly -integrable,
n
Rchoose > 0 suchR that for all measurable Y R with (Y ) < , we have
|f (x)|(dx) Y g(x)(dx) < for all f S(). Thus, S() is uniformly
Y
120
121
Index
C r manifold, 34
C r manifold with boundary, 34
Lp -metric, 66
-integrable
uniformly, 68
-integrable correspondence, 87
-integrable function, 41
-integrably bounded, 87
-measure zero, 29
-measurable, 111
-algebra, 28
-field, 28
-finite measure, 29
r-times continuously partially differentiable function, 26, 32
r-times partially differentiable function,
26
weakly, 100
weakly relative to L, 100
closed values, 79
closure of set, 14
closure point, 13
coarser topology, 92
codomain, 6
column rank, 12
combination
convex, 9, 61, 95
linear, 9
compact set
Euclidean space, 15
metric space, 59
product topology, 21
relative topology, 19
topological space, 94
weak sequentially, 104
weak*, 100
weak* relative to L, 100
weak* sequentially, 104
weakly, 100
weakly relative to L, 100
compact values, 79
complement, 4
complete measure, 28
complete metric space, 60
complete relation, 108
completeness axiom, 5
composition, 7
concave function, 10
conditional probability of an event, 74
conjugate, 52, 99
connected set, 16, 19, 21, 59, 94
continuous correspondence, 81
continuous correspondence at x, 81
continuous function
Euclidean space, 16
metric space, 60
product topology, 22
relative topology, 20
topological space, 94
continuous function at x, 16
continuous selection, 84
continuous vector-valued function
Euclidean space, 17
metric space, 60
product topology, 22
relative topology, 20
continuously partially differentiable function, 24
continuously twice partially differentiable
function, 25
contour set
strict lower, 7
strict upper, 7
weak lower, 7
weak upper, 7
contraction mapping
Euclidean space, 20
metric space, 61
contraction mapping theorem
Euclidean space, 20
metric space, 61
convergence
asymptotic, 14
in metric space, 59
in product topology, 21, 96
in relative topology, 18
in topological space (net), 94
in topological space (sequence), 93
convergence in L -metric, 49
convergence in Lp -metric, 49
convergence in pth mean, 49
convergence in distribution, 53
convergence in essential supremum metric, 49
convergence in mean, 49
convergence in measure, 49
convergence in probability, 49
convergence in total variation norm, 45
convergence in weak topology, 100
convergence in weak* topology, 101
convergence of functions
almost sure, 49
in L -metric, 49
in Lp -metric, 49
in pth mean, 49
in distribution, 53
in essential supremum metric, 49
in mean, 49
in measure, 49
123
in probability, 49
in product topology, 96
in weak topology, 100
in weak* topology, 101
pointwise, 17, 20
pointwise almost everywhere, 49
uniform, 17, 20
uniform almost everywhere, 49
weak, 52
weak*, 52
weakly order p, 52
convergence of measures
in total variation, 45
set-wise, 45
strong, 45
uniformly set-wise, 45
weak, 45
weak*, 45
convergent net, 94
convergent net (in product topology),
96
convergent sequence, 14, 93
convex, 65, 67
convex combination
Euclidean space, 9
metric mixture space, 61
vector space, 95
convex function, 11
convex hull
Euclidean space, 9
metric mixture space, 61
convex metric, 62
convex set
Euclidean space, 9
metric mixture space, 61
vector space, 95
correspondence, 79
-integrable, 87
-integrably bounded, 87
closed graph, 80
lower hemicontinuous, 80
lower measurable, 85
open graph, 81
open lower sections, 81
upper hemicontinuous, 79, 98
weak* sequentially uhc, 105
weakly measurable, 85
weakly sequentially uhc, 105
countable additivity, 29
countable set, 7
countable sub-additivity, 30
countably infinite set, 8
counting measure, 29
critical point, 35
critical value, 35
cross partial derivative, 25
De Morgans law, 4
decreasing function, 7
decreasing sequence, 14
degenerate on x, 31
dense set, 60
density function, 43
density with respect to , 43
derivative of f at x in direction t, 23
determinant, 13
diagonally dominant matrix, 13
diffeomorphic sets, 34
diffeomorphism, 33
differentiable at x in direction t, 23
differentiable function, 23
dimension of Euclidean space, 6
dimension of linear space, 11
directed set, 93
direction (in Rn ), 10
direction (on arbitrary set), 93
directionally differentiable at x, 23
directionally differentiable function, 23
disc, 13, 68
discrete metric, 58
discrete topology, 92
disjoint, 4
pairwise disjoint, 4
distribution of a function, 39
divergent sequence, 14
Doeblins condition, 75
domain, 6
dominant element, 106, 107
dominated functions, 42, 44
dot product, 9
dual relation, 108
124
Eberlein-Smulian
theorem, 104
Egoroffs theorem, 52
empty set, 4
equicontinuous, 65
ergodic distribution, 76
ergodic set, 76
essentially bounded function, 41
Euclidean metric, 58
Euclidean norm, 9
Euclidean space, 6
expected value, 41
expected value of vector-valued funciton, 44
extended real numbers, 3
extension, 7
concave, 10
continuous, 16, 17, 20, 22, 60, 94
convex, 11
decreasing, 7
diffeomorphism, 33
differentiable, 23
directionally differentiable, 23
distribution of, 39
essentially bounded, 41
increasing, 7
indicator, 7
injective, 6
integrable, 40
jointly measurable, 54
linear, 10, 11
local diffeomorphism, 33
Fatous lemma
measurable, 38, 44, 61
correspondences, 91
partially differentiable, 24
real-valued function, 42
quasi-concave, 11
variable measure, 47
quasi-convex, 11
vector-valued function, 44
strictly concave, 11
Feller property, 71
strictly convex, 11
Filippovs implicit function theorem, 87
strictly decreasing, 7
finite cylinder set, 96
strictly increasing, 7
finite dominance property, 107
strictly quasi-concave, 11
finite intersection property, 16
strictly quasi-convex, 11
finite measure, 29
surjective, 7
finite partition, 4
twice directionally differentiable, 24
finite set, 8
twice partially differentiable, 25
first countable, 94
weak* continuous, 100
fixed point, 20, 61, 84, 95, 98
weak* continuous relative to L, 100
Fubinis theorem, 56, 72
weak* sequentially continuous, 104
full rank, 12
weakly continuous, 100
column, 12
weakly continuous relative to L,
row, 12
100
function, 6
weakly sequentially continuous, 104
C 1 , 24
fundamental theorem of calculus, 40
C 2 , 25
fundamental theorem of linear algebra,
C r , 26, 32
11, 12
-integrable, 41
r-times partially differentiable, 26 generated topology (from base), 92
bijective, 7
Glicksbergs fixed point theorem
Borel measurable, 39
metric space, 84
bounded, 7
topological space, 98
bounded on Y , 7
gradient, 24
Caratheodory, 55, 86
graph of correspondence, 80
125
Jensens inequality, 44
jointly measurable function, 54
H
olders inequality, 66
half-open rectangle, 27
half-space
closed, 10
open, 10
Hausdorff metric, 69, 70
Hausdorff topological space, 92
Heine-Borel theorem, 16
Hessian matrix, 25
hyperplane, 10
identity function, 6
identity matrix, 12
image, 6
implicit function theorem, 33
increasing function, 7
increasing sequence, 14
index, 4
index set, 4
indicator function, 7
infimum, 5
injective, 6
integers, 3
integrable function, 40
integrable vector-valued function, 44
integrable with respect to , 41
integral of f , 40
integral of f with respect to , 41
integral of correspondence, 87
integral of vector-valued function, 44
interior of set, 13
interior point, 13, 21, 58
intermediate value theorem, 16, 20, 22,
60, 94
intersection, 4
invariant distribution, 75
invariant set, 76
inverse function theorem, 33
inverse mapping, 7
inverse matrix, 13
irreflexive relation, 108
isomorphism, 12
Jacobian matrix, 32
126
127
orthogonal, 9
outer Lebesgue measure, 27
outer regular measure, 30
pairwise disjoint, 4
partial derivative of f , 24
partial derivative of f at x, 24
partial order, 106
partially differentiable at x, 24
partially differentiable function, 24
partition, 5
finite, 4
measurable, 69
pointwise a.e. convergence, 49
pointwise convergence
net of functions, 96
sequence of functions, 17, 20
power set, 4
preimage, 6
preimage theorem, 35
probability density function, 43
probability measure, 29
product measure, 55
product metric, 63, 64
product rule, 23
product topology, 96
Prohorov metric, 68
quasi-concave function, 11
quasi-convex function, 11
quasi-convex metric, 61
quasi-measure, 111
Radon-Nikodym theorem, 43
random variable, 39
range, 6
rank, 11, 12
rational numbers, 3
real numbers, 3
rectangle (half-open), 27
reflexive relation, 106
regular conditional probability, 74
regular measure, 30
regular point, 35
regular value, 35
relation, 106
acyclic, 108
anti-symmetric, 106
asymmetric, 108
complete, 108
dominant element, 107
dual, 108
irreflexive, 108
maximal element, 106, 107
negatively acyclic, 108
partial order, 106
reflexive, 106
semi-convex, 108
transitive, 106
undominated element, 108
relative complement, 4
relative topology, 92
relative weak topology, 100
relative weak* topology, 100
restriction, 7
Riesz representation theorem, 99
row rank, 12
Russells paradox, 3
Sards theorem, 35
scalar, 8
Schauders fixed point theorem
metric mixture space, 62
topological space, 96
section of a set, 55
self-supporting set, 76
semi-convex relation, 108
separable metric space, 60
separating hyperplane theorem, 10, 17
sequence, 6
Cauchy, 60
convergent, 14
decreasing, 14
divergent, 14
in Rn , 14
in X, 6
increasing, 14
strictly decreasing, 14
strictly increasing, 14
subsequence of, 15
series, 14
set, 3
128
absorbing, 76
bounded, 15, 19, 21, 59
closed, 21
closed in Euclidean space, 14
compact, 15, 19, 21, 94
compact in metric space, 59
complement, 4
connected, 16, 19, 21, 59, 94
convex, 9, 61, 95
countable, 7
countably infinite, 8
directed, 93
ergodic, 76
finite, 8
intersection, 4
invariant, 76
linearly independent, 11
measurable, 27
measure zero, 28
nowhere dense, 29
open in Euclidean space, 14
open in product topology, 21
power set, 4
relative complement, 4
relatively closed, 18
relatively open, 18
self-supporting, 76
singleton, 8
transient, 76
uncountably infinite, 8
union, 4
set-wise convergence of measures, 45
simplex, 9
singleton, 8
singular matrix, 12
Skorohods theorem, 40
span, 9
sphere, 13
square matrix, 12
stationary distribution, 75
stochastic kernel, 71
Stone-Weierstrass theorem, 65
strict local maximizer, 26
strict lower contour set, 7
strict upper contour set, 7
strictly concave function, 11
vector space, 95
weak convergence of functions, 52, 100
weak convergence of measures, 45
weak convergence of transition probabilities, 78
weak lower contour set, 7
weak order p convergence of functions,
52
weak topology, 99
weak upper contour set, 7
weak* closed, 100
weak* closed relative to L, 100
weak* compact, 100
weak* compact relative to L, 100
weak* continuous function, 100
weak* continuous function relative to
L, 100
weak* convergence of functions, 52, 101
weak* convergence of measures, 45
weak* open, 100
weak* open relative to L, 100
weak* sequentially compact set, 104
weak* sequentially continuous function,
104
weak* sequentially upper hemi-continuous
correspondence, 105
weak* topology, 99
weak-strong convergence of transition
probabilities, 78
weaker topology, 92
weakly closed, 100
weakly closed relative to L, 100
weakly compact, 100
weakly compact relative to L, 100
weakly continuous function, 100
weakly continuous function relative to
L, 100
weakly measurable correspondence, 85
weakly open, 100
weakly open relative to L, 100
weakly sequentially compact set, 104
weakly sequentially continuous function,
104
weakly sequentially upper hemi-continuous
correspondence, 105
130
131