You are on page 1of 132

Basic Concepts in Mathematical Analysis

A Tourist Brochure
John Duggan
November 29, 2013

Contents
1 Opening Remarks

2 Set Theory

3 Linear Algebra

4 Euclidean Topology

13

5 Relative and Product Topologies

18

6 Differentiability

23

7 Lebesgue Measure

27

8 Differential Topology

32

9 Measurable Functions

38

10 Convergence of Measures

45

11 Convergence of Functions

48

12 Product Measurability

54

13 Metric Spaces

58

14 Special Metric Spaces

63

15 Transition Probabilities

71

16 Continuous Correspondences

79

17 Measurable Correspondences

85

18 Topological Spaces

91

19 Weak and Weak* Topologies

98

20 Maximal Elements

106

A Technical Details

109

Opening Remarks

This survey is predicated on the assumption that there is some value in gathering
together a subset of mathematical tools in a systematic, if very selective and
terse, way. Aside from the obvious disadvantages of a cursory and incomplete
treatment, the advantages are that a variety of concepts can be distilled to
their essence, connections between concepts can be drawn more easily, and it
makes for relatively quick reading (because there are no proofs). The trade-off
is, basically, depth for breadth of understanding. I have tried to select tools that
would be especially useful to someone whose goal is mathematical modeling (at
least in the fields of economics or political science), rather than pure math. This
reflects, admittedly, both the preferences and the limitations of the author. It
may go without saying, but my intention in writing this is to provide an overview
of some aspects of mathematical analysis it is not to write a math book that
is suitable for citation or as a substitute for serious study.
A distinguishing feature of the survey is that it does not attempt to present
definitions and results in their greatest generality; in some cases, I introduce
concepts in very simple terms, then return to them later in greater generality,
then again in even greater generality. I have deliberately developed the material
avoiding derivatives as linear approximations (focussing instead on directional
derivatives), with almost no reference to Borel sets or -algebras (relying on the
Lebesgue measurable sets for needed measurable structure), without mention
of normed linear spaces (limiting the discussion to metric properties of these
spaces) or the theory of linear operators, defining Lp -spaces only for real-valued
(rather than vector-valued) functions, and pushing discussion of abstract topological spaces to the end. This approach entails considerable redundancy and
some loss of generality (the absence of the Borel sets is particularly inconvenient), but I hope it is an effective pedagogical technique for those of us outside
the measure zero set of natural mathematicians, who see concepts more clearly
with less intervening structure.
The inspiration for the current endeavor is a pair of mathematical summaries
in outstanding books by Werner Hildenbrand and by Andreu Mas-Colell; both
provide excellent coverage of somewhat different slices of mathematics, and I
draw on some of that material here. For a better understanding, the reader
should consult any of a large number of deeper and more thorough references.
I list a few below, but the book by Aliprantis and Border is noteworthy for its
encyclopedic scope; many of the results presented here are just special cases of
more general statements in that book.
[1] C. Aliprantis and K. Border (2006) Infinite Dimensional Analysis: A
Hitchhikers Guide, 3rd ed., Springer: New York, NY.
[2] C. Aliprantis and O. Burkinshaw (1990) Principles of Real Analysis, 2nd
ed., Academic Press: New York, NY.

[3] N. Dunford and J. Schwartz (1958) Linear Operators, Part 1: General


Theory, Interscience: New York, NY.
[4] V. Guillman and A. Pollack (1974) Differential Topology, Prentice-Hall:
Englewood, NJ.
[5] W. Hildenbrand (1974) Core and Equilibria of a Large Economy, Princeton University Press: Princeton, NJ.
[6] A. Mas-Colell (1985) The Theory of General Economic Equilibrium: A
Differentiable Approach, Econometric Society Monographs, Cambridge
University Press: New York, NY.
[7] H. Royden (1988) Real Analysis, 3rd ed., Mcmillan Publishing Company:
New York, NY.
[8] C. Simon and L. Blume (1994) Mathematics for Economists, W.W. Norton
and Company: New York, NY.
I do not provide references for results that, in my opinion, are very standard. I
do attempt to direct the reader to the appropriate references for ones that are
less obvious, and I address some technical details in the appendix and numerous
footnotes. Most of the results I present on finite-dimensional analysis can be
found in Simon and Blume (1994), while most on measure theory and infinitedimensional analysis are contained in Aliprantis and Border (2006).
That said, the survey does cover some material that is not commonly found,
and it does have some slightly novel touches. The graphs of trajectories of a
smooth function, in Figure 4, do not seem to have been used before; the coverage of convergence notions for functions is, I think, fairly systematic; I give a
metric version of Glicksbergs fixed point theorem and apply it to a variety of
metric spaces in Section 13, including the space of compact subsets of a compact subset of Euclidean space with the Hausdorff metric; I collect a number of
results on transition probabilities, including a useful notion of convergence of
transition probabilities, and consider connections to regular conditional probabilities. I state a version of Fatous lemma for correspondences with a variable
integrating measure; I think there is some value in collecting results, which are
usually stated for abstract normed spaces, for the special case of weak and weak*
topologies on Lp spaces in Section 19; and the application of Zorns lemma to
the existence of maximal elements is perhaps deeper than normally expected.
I begin with elementary set theory in Section 2, then summarize the linear and
topological structure of Euclidean space in Sections 3 and 4, with the first mention of slightly more exotic topological notions in Section 5. Basic ideas from
calculus are surveyed in Section 6. Section 7 introduces the concept of measure,
Section 8 combines ideas from calculus and measure theory in a quick summary
of differentiable topology, and Sections 912 return to topics of measurability:
measurable functions, convergence of measures and functions, and measures on
2

product spaces. The analysis of Euclidean space is extended to metric spaces


in Sections 13 and 14. Section 15 contains material on transition probabilities.
Sections 16 and 17 present results on set-valued functions, called correspondences. The analysis is extended further to general topological spaces in Sections 18 and 19. Section 20 provides a statement and equivalent formulations of
the axiom of choice and conditions for existence of maximal elements. Finally,
a short appendix addresses some details that, as far as I know, are relatively
hard to find in the literature.

Set Theory

Mathematical analysis takes the concept of a set and membership within a set
as primitive notions. Roughly, a set is a collection of objects that belong to
it, but this begs the questions, What is a collection? and How does an
object belong to a collection? Formally, we assume that sets and membership
satisfy the standard axioms of Zermelo-Frankel set theory,1 which allows us
to capture the familiar number systems. In particular, the Zermelo-Frenkel
axioms provide a way of working with the natural numbers N, the integers Z,
the rational numbers Q, the real numbers R, the non-negative real numbers R+ ,
and the positive real numbers R++ . Normally, we will take as given a universe
of discourse that contains all objects of interest, and we consider sets, denoted
A, B, X, Y , etc., made up of objects, denoted a, b, x, y, etc., from the universe
of discourse. When referring specifically to natural numbers, we typically use
the variables i, j, k, , m, n to denote such objects. In general, we write a A
if a belongs to A and a
/ A otherwise. We typically specify a set by listing its
elements between braces, e.g., {1, 2, 3}, or by defining a property possessed by
exactly the elements of the set, e.g., {x N | 1 x 3}. In general, if the
universe of discourse is X and if p(x) is a well-formed formula containing just
one free variable x, then we can form the set of elements possessing the property
described by p(x) as {x X | p(x)}, as long as the predicate function does not
refer to the set itself.2
The real numbers include the following important types of sets: the closed
interval [a, b] = {x R | a x b}, the open interval (a, b) = {x R | a <
x < b}, and the half-closed intervals [a, b) = {x R | a x < b} and (a, b] =
{x R | a < x b}. We also have half-lines, written [a, ), (, b], (a, ),
and (, b), where the infinity symbol indicates that the set extends without
bound in one direction. We sometimes work with the extended real numbers,
R = R {, }, where by convention represents a quantity larger than
all other extended real numbers, and we write x < for all x R \ {}.
Similarly, is a quantity less than all other extended real numbers. Also by
1 For a brief but instructive overview, see Appendix A.2 of Dudley (2002) Real Analysis
and Probability, Cambridge University Press: Cambridge, MA.
2 The risk is Russells paradox, in which we try to form the set of all sets that do not include
themselves.

convention, adding any real number to or multiplying by any positive real


number returns , and similarly for . In the context of the extended real
numbers, we can write [a, ] (resp. (a, ]) for the interval consisting of [a, )
(resp. (a, )) together with the infinite quantity , and similarly for .
We write A B if all elements of A belong to B, A $ B if A B but not
B A, and A = B if A B and B A. Of note, we identify sets that contain
the same elements, so the specification of a set does not entail any ordering of
the elements within the set. It follows that if A contains no elements, then we
have A B trivially, and furthermore, if A and B are sets with no elements,
then A = B. That is, there is a unique empty set, which we denote .
We can build sets from sets in various ways. Let A and B be sets. Then the
complement of A, denoted A, consists of all elements of the universe of discourse
that do not belong to A. Furthermore, we define:
A B = {x | x belongs to A and B}
A B = {x | x belongs to A or B}
A \ B = {x | x belongs to A but not B}

intersection
union
relative complement

Of course, A \ B = A B. Finally, 2A consists of all subsets of A and is called


the power set of A. If two sets have no elements in common, i.e., A B = ,
then we say they are disjoint. Various rules for manipulation of sets follow from
these definitions, but we note De Morgans law, which states that A B = AB
and A B = A B.
The operations of intersection and union can be extended to combine more than
one set at a time. If there is a finite number n of sets, denoted A1 , . . . , An , we
write, respectively,
n
\

i=1
n
[

i=1

Ai

= {x | for all i = 1, . . . , n, x Ai }

Ai

= {x | for some i = 1, . . . , n, x Ai }

for the intersection and union of the sets. The new sets consist, respectively, of
the elements belonging to all and to some of the original sets. Given a set A,
a collection
{A1 , . . . , An } of nonempty subsets of A is a finite partition of A if
Sn
A = i=1 Ai and it is pairwise disjoint, i.e., for all distinct i, j = 1, . . . , n, we
have Ai Aj = . In the above, the sets are indexed by i, which ranges over
the index set {1, 2, . . . , n}. When there is one set for each natural number, we
may index the sets by the natural numbers, as in A1 , A2 , . . ., and write

Ai

and

i=1

i=1

Ai

for the intersection and union of the collection, which consist, respectively, of
the elements belonging to all and to some of the sets. Finally, we can extend
these operations to arbitrary collections of sets. When A is a collection of sets,
we write
\
A = {x | for all A A, x A}
[
A = {x | for some A A, x A}

for the intersection and union, respectively, of the members


of A. And a partiS
tion of A is a pairwise disjoint collection A such that A = A.

Given one set for each natural number and indexing the sets as A1 , A2 , . . . , we
define two limiting operations: the limit supremum (or limsup), denoted
lim sup An

An ,

m=1 n=m

consists of any element x that belongs to infinitely many An , and the limit
infimum (or liminf ), denoted
lim inf An
n

An

m=1 n=m

consists of any x that eventually (for n high enough) belongs to all An . When
lim sup An = lim inf An , we write lim An for this limit.
Note that the natural numbers satisfy the well-ordering axiom with respect to
the usual greater than or equal to relation: for every nonempty subset Y of
natural numbers, there exists x Y such that for all y Y , we have x y;
this important property enables proof by induction. This axiom is not satisfied
by the real numbers with respect to the greater than or equal to relation, for
there is no smallest positive real number. The real numbers do satisfy the
completeness axiom, which states that if a nonempty subset is bounded below,
then it has an infimum. Formally, A R is bounded below if there exists c R
such that for all x A, we have c x. If A is nonempty and bounded below,
then the infimum (or greatest lower bound ) of A, denoted inf A, is the greatest
lower bound of A, i.e., it is a lower bound of A, and it satisfies inf A c for
all other lower bounds c. By convention, if A is empty, then inf A = ; and
if A is not bounded below, then inf A = . The completeness axiom simply
states that if A is nonempty and bounded below, then it does indeed have an
infimum, in which case the infimum is unique. Note that the rational numbers
do not satisfy the completeness axiom.
A set A R is bounded above if there exists c R such that for all x A,
we have x c. If A is nonempty and bounded above, then the supremum (or
5

least upper bound ) of A, denoted sup A, is the least upper bound of A, i.e., it
is an upper bound of A, and it satisfies sup A c for all other upper bounds
c. By convention, if A is empty, then sup A = ; and if A is not bounded
above, then sup A = . By the completeness axiom, every nonempty set that is
bounded above has a supremum, in which case it is unique. A set A is bounded
if it is bounded above and below. A real number x R is a maximum of A,
denoted max A, if it is a supremum of A and x A; it is a minimum of A,
denoted min A, if it is an infimum of A and x A. The well-ordering property
of the natural numbers can then be rephrased as saying that every nonempty
subset has a minimum with respect to the usual ordering.
Given two sets, A first and B second, an ordered pair is a list, typically denoted
(x, y), consisting of an element of A in the first component and an element of B
in the second, and the set of all ordered pairs is the Cartesian product A B.
This concept is distinct from the set {x, y}, because we now distinguish between
(x, y) and (y, x) (when x and y are distinct). To extend this idea, suppose we
have a finite number n of sets A1 , . . . , An . An ordered n-tuple (a1 , . . . , an ) of
elements from these sets is a list such that ai is an element of Ai for i = 1, . . . , n.
The set of Q
all such ordered pairs is the Cartesian product, written A1 An ,
or simply ni=1 Ai . If the sets are identical, i.e., Ai = A for all i = 1, . . . , n,
then this can be simplified further to An . And when A = R, we refer to Rn as
Euclidean space, and n is the dimension of the space.
We can extend this idea one step further to take the product of an infinite number sets, one for each natural
Q number and indexed as A1 , A2 , . . .. An element of
the Cartesian product i=1 Ai is a sequence, which we can denote (a1 , a2 , . . .),
such that ai Ai for each i = 1, 2, . . .. When the sets in the product are the
same, say Ai = X for each i = 1, 2, . . ., we can write X for the Cartesian
product, and an element (x1 , x2 , . . .) of X is a sequence in X. Although the
parenthetic notation extends the formalism of ordered n-tuple, it is common
convention to denote a sequence in X using set-theoretic notation {xi }. Technically, this is not entirely correct, because the list representation for sets does
not entail a particular ordering of elements, while we count two sequences that
differ with respect to the ordering of elements as distinct.
Given sets X and Y , a function (or mapping) from X to Y assigns to each
element x X a particular element y Y , which may vary with x. Such
a function is often denoted f , or more informatively, f : X Y , and we let
f (x) Y be the element assigned to x. Here, X is the domain of the function
and Y is the codomain. When X Y , a trivial example of a function f : X Y
is the identity function, defined by f (x) = x. Given A X and B Y , the
image of A and preimage of B are, respectively,
f (A) = {f (x) | x A}

and

f 1 (B) = {x | f (x) B}.

The range of f is the image f (X). We say f is injective if distinct elements


6

x, x X are assigned distinct elements, i.e., x 6= x implies f (x) 6= f (x ); it


is surjective if Y = f (X); and it is bijective if it is injective and surjective.
If a mapping f : X Y is injective, then it has an inverse mapping, denoted
f 1 : f (X) X, such that for all x X, we have f 1 (f (x)) = x. Given a
mapping f : X Y and a subset Z X, the restriction of f to z is the function
f |Z : Z Y defined by f |Z (x) = f (x) for all x Z. Given a sets X, Y, Z with
Z X and a mapping f : Z Y , a mapping g : X Y is an extension of f
to X if g|Z = f . Given sets X, Y, A, B and mappings f : X Y and g : A B
satisfying f (X) A, we define the composition of g with f as the mapping
g f : X B defined by (g f )(x) = (g(f (x)) for all x X.
An ordered n-tuple (x1 , . .S
. , xn ) from sets A1 , . . . , An can be identified with the
n
function : {1, . . . , n} i=1 Ai defined by (i) = xi . Similarly, the sequence
(x1 , x2 , . .S.) of elements from A1 , A2 ,. . . , can be identified with the function
: N
i=1 Ai defined by (i) = xi . Thus, functions provide an alternative
way of expressing the selection of elements from sets in a certain order.
Given sets X and Y X, a real-valued function f : X R is bounded on Y
if f (Y ) is a bounded set; it is simply bounded if it is bounded on X. Given
real-valued functions f, g : X R, we can scale and sum these functions to
produce new functions: for , R, f + g is the function that takes value
f (x) + g(x) at each x X; for another example, f g takes value f (x)g(x) for
each x X. Given X R, a function f : X R is increasing if for all x, y X
with x y, we have f (x) f (y), and it is decreasing if f is increasing; it is
strictly increasing if it is increasing and injective, and it is strictly decreasing
if f is strictly increasing. For an arbitrary set X and subset A X, the
mapping IA : X R is the indicator function of the set A, i.e., IA (x) = 1 for
x A and IA (x) = 0 otherwise. Given f : X R and a value c R, the level
set of f at c is {x X | f (x) = c}, and the weak and strict upper contour sets
of f at c are
{x X | f (x) c}

and

{x X | f (x) > c},

respectively, with weak and strict lower contour sets defined analogously, with
the reverse inequalities.
Given Y X and f : X R, an element x Y is a maximizer of f over Y
if f (x) is a maximum of f (Y ), i.e., if for all y Y , f (x) f (y); it is simply
a maximizer if it maximizes f over its domain. The concept of minimizer is
defined similarly; in fact, x minimizes f if and only if it maximizes f . The
maximum value of f over Y , when it is well-defined, is sometimes denoted
maxxY f (x), and the minimum value is denoted minxY f (x). Denote the set
of maximizers of f over Y by arg maxxY f (x), and denote the minimizers of f
over Y by arg minxY f (x).
We say a set A is countable if there is an injective mapping f : A N, and it
7

is uncountably infinite (or simply uncountable) otherwise. If there is a bijective


mapping f : A N, then A is countably infinite. Of course, the set is finite
if it is countable but not countably infinite, in which case there is a natural
number n and a bijective mapping f : A {1, . . . , n}. Then n is the size of A,
and we write |A| = n. If |A| = 1, then it is singleton. If a set A is finite or
countably infinite, then we may index its elements by the natural numbers, as
in A = {a1 , . . . , an } or A = {a1 , a2 , . . . , }, as we have already done on occasion.
If there is a bijective mapping f : A R, then A is uncountably infinite, and
we say it has the cardinality of the continuum.
As we did for the operations of intersection and union, we can take arbitrary
products of sets as well. Let A be an arbitrary collection of sets, let I be an
index set, and let : I A be a bijective function that serves to index members
of A by elements of the set I; then for each I, () is the set in A attached
to the index . It is customary to simply write A for this set, submerging the
indexing function and mimicking the notation for indexing finite or countable
sets. By the axiom of choice,3 we can then
Q take the product of sets in A indexed
by I; this gives us a new set, denoted I A , and an element of this product
set is a list, where we now we have an entry in the list for each I, and
the entry corresponding to consists of an element of A . So an element of the
product S
set assigns an element of A to each , so we can view it as a mapping
: I {A | I} such that for all I, we have () A . In particular,
Q
the set of functions from X to Y can be formalized as the product xX Y ,
and an element of this product set is just a function f : X Y . In fact, the
notation Y X is sometimes used to represent the set of mappings from X to Y .

Linear Algebra

Recall that the product of n copies of the real line is a finite-dimensional Euclidean space, where n is the dimension of the space. An element of Rn is an
ordered n-tuple x = (x1 , . . . , xn ) called a vector. Notable vectors are the zero
vector with zeroes in each coordinate, denoted simply 0, and the ith unit coordinate vector ei = (0, . . . , 0, 1, 0, . . . , 0), which is defined for each i = 1, . . . , n and
has a one in the ith coordinate and zeroes elsewhere. We can multiply a vector
x = (x1 , . . . , xn ) Rn by a scalar R (applying to each coordinate of x)
and add two vectors x, y Rn (summing coordinate by coordinate), writing the
new vectors as x and x + y, respectively. Formally,
x + y

(x1 + y1 , . . . , xn + yn ).

We can extend these operations to sets of vectors as well, where


A + B

{x + y | x A, y B}

consists of every vector of the form x + y with x A and y B.


3 More

on the axiom of choice in Section 20.

Given m vectors x1 , . . . , xm Rn and corresponding scalars 1 , . . . , m , we call


1 x1 + 2 x2 + + m xm

a linear combination of the vectors. Note that x = x1 e1 + +xn en . A nonempty


set L Rn is a linear space if it is closed with respect to linear combinations,
i.e., every linear combination of every finite subset belongs to L. Note that Rn is
a linear space, {0} is a (trivial) vector space, and every linear space necessarily
contains the zero vector. Another example is the span (or linear span) of a set
{x1 , . . . , xm } of vectors, which consists of all linear combinations of x1 , . . . , xm .
More generally, given any A Rn , we define the span of A, denoted span(A),
as the set of all linear combinations of all finite subsets of A. If L Rn is
a linear space, then there exist x1 , . . . , xn L such that L is contained in
span({x1 , . . . , xn ), i.e., every y L is a linear combination of x1 , . . . , xn . The
latter statement gives an upper bound on the number of vectors needed to span
L. It may be that L is contained in the span of a smaller finite subset, and the
size of the smallest such finite set is the dimension of L; we will return to this
shortly.
Given m vectors x1 , . . . , xm Rn , a convex combination of the vectors is any linear combination such that all coefficients are non-negative and sum to one. A set
X Rn is convex if for all x, y X and all (0, 1), we have x+(1)y X.
A special case is the convex hull of a set {x1 , . . . , xm } of vectors, which consists
of all convex combinations of x1 , . . . , xm . The unit simplex in Rn is


Pn
xi = 1 and xi = 0
n
i=1
=
x = (x1 , . . . , xn ) R |
,
for all i = 1, . . . , n
which is the convex hull of {e1 , . . . , en }. More generally, given any set A Rn ,
we define the convex hull of A, denoted conv(A), as the set of all convex combinations of all finite subsets of A. An important result is Caratheodorys theorem,
which states that for every x conv(A), there exist x1 , . . . , xn+1 A such that
x conv({x1 , . . . , xn }). That is, even though the set A may be infinite, an
element of the convex hull of A can be written as a convex combination of n + 1
elements from the set. Evidently, A is convex if and only if A = conv(A), so
that the convex hull of a set is itself convex, and in fact it is the intersection of
all convex supersets of A, so we can say unambiguously that it is the smallest
convex set containing A. Given any convex sets A, B Rn and any (0, 1),
the weighted sum of sets A + (1 )B is convex.
Pn
The dot product of two vectors in Rn is x y = i=1 xi yi , and geometrically
x y = 0 means that the vectors are orthogonal, i.e., they form a right angle. In
this case, we may write xy, and if x is orthogonal to every element of a subset
A Rn , then we may write xA. The Euclidean norm in Rn is defined by
v
u n
p
uX
(x y) (x y) = t (xi yi )2
||x y|| =
i=1

for all x, y Rn , which gives us a measure of distance between x and y. The


Euclidean norm possesses the following properties:
(i) for all x Rn , ||x|| 0, where equality holds if and only if x = 0,
(ii) for all x Rn and all R, ||x|| = || ||x||,
(iii) for all x, y Rn , ||x + y|| ||x|| + ||y||.

Property
(iii) is known as the triangle inequality and implies that ||x||


||y|| ||x y||. A fundamental fact about Euclidean space, known as the
Cauchy-Schwartz inequality, is that for all x, y Rn , |x y| ||x|| ||y||, where
equality holds if and only if there exist , R with x = y. A vector t Rn
is a direction if it is norm one, i.e., ||t|| = 1.
A function f : Rn R is linear if for all x, y Rn and all , R, we have
f (x+y) = f (x)+f (y); or equivalently, there exists a fixed gradient p Rn
such that for all x Rn , we have f (x) = p x. Given linear f with non-zero
+
gradient p and value c R, the closed half-space of f at c, denoted Hp,c
, is the
weak upper contour set of f at c, and the open half-space of f at c, denoted
++
Hp,c
, is the strict upper contour set of f at c, i.e.,
+
Hp,c
= {x Rn | p x c}

and

++
Hp,c
= {x Rn | p x > c}.

+
++

We then write Hp,c


for Hp,c
and Hp,c
for Hp,c
. In general, a set H Rn is
a closed half-space or open half-space if there exist non-zero p Rn and value
+
++
c such that, respectively, H = Hp,c
or H = Hp,c
. For a finite set {x1 , . . . , xm },
the convex hull of the set is in fact the intersection of all closed half-spaces
containing it.

Given a linear function f : Rn R with non-zero gradient p and value c, the hyperplane of f at c, denoted Hp,c , is the level set of f at c, i.e., Hp,c = {x Rn | p
x = c}. Clearly, Hp,c is a linear space if and only if c = 0, which holds if and only
if Hp,c contains the zero vector. In general, a hyperplane is any level set of a linear function with
non-zero gradient. Let two sets X, Y Rn be
p
Y
convex and disjoint; then a weak version of the
separating hyperplane theorem establishes that
X
the two sets can be separated by a linear funcf =c
tion with non-zero gradient, i.e., there is a linear
function f with gradient p 6= 0 such that for all
x X and all y Y , we have f (x) = p x p y = f (y). A sharper form of
this result is stated with the help of topological ideas in Section 4.
Given a convex set X Rn , a function f : X R is concave if for all distinct
x, y X and all (0, 1), we have f (x + (1 )y) f (x) + (1 )f (y);
10

and the definition of a strictly concave function is identical but with strict inequality replacing weak. The definition of convex and strictly convex function
is obtained by replacing the weak and strict greater than relation, respectively,
with weak and strict less than. We say f : X R is quasi-concave if for all
distinct x, y X and all (0, 1), we have f (x+(1)y) min{f (x), f (y)};
and the definition of strictly quasi-concave function is identical but with strict
inequality replacing weak. And, as above, the definition of quasi-convex and
strictly quasi-convex function is obtained by replacing greater than with less
than. The set of maximizers of a quasi-concave function is convex, and the set
of maximizers of a strictly quasi-concave function is either singleton or empty;
analogous statements for minimizers of (strictly) quasi-convex functions apply.
Given vectors x1 , . . . , xm Rn , we say {x1 , . . . , xm } is linearly dependent if
there exist scalars 1 , , m R (not all zero) such that
1 x1 + 2 x2 + + m xm

0,

and the set is linearly independent if it is not linearly dependent. It turns out
that {x1 , . . . , xm } is linearly independent if no vector xj belongs to the span of
the other vectors. The rank of {x1 , . . . , xm } is the size of the largest linearly
independent subset of the set; this is positive as long as at least one xj is nonzero. Given a linear space L, a basis is a linearly independent set that spans L.
Every nontrivial linear space has a basis consisting of no more than n vectors;
the bases of a given linear space all contain the same number of elements; and
this number is the dimension of L, written dim(L). In particular, dim({0}) = 0,
and the rank of {x1 , . . . , xm } is the dimension of its span.
A function f : Rn Rm takes vector values, f (x) = (f1 (x), . . . , fm (x)), and
we say that such a function is linear if each component function fj is linear,
j = 1, . . . , m, extending the above definition. Given a linear function f =
(f1 , . . . , fm ) : Rn Rm , two linear spaces are of interest:
ker(f ) =

{x Rn | f1 (x) = = fm (x) = 0},

which is the kernel (or null space) of f , and


range(f ) =

{(f1 (x), . . . , fm (x)) | x Rn },

which, unsurprisingly, is the range of f . By the fundamental theorem of linear


algebra, we have
dim(ker(f )) + dim(range(f ))

= n.

Letting p1 , . . . , pm be the gradients corresponding to the linear component functions of f , ker(f ) = {0} if and only if {p1 , . . . , pm } are linearly independent,
which holds if and only if for each y range(f ), there is a unique x Rn such
that y = f (x), i.e., f is injective. If, in addition, m = n, then range(f ) has
11

dimension equal to n, and it follows that f is an isomorphism, i.e., it is linear


and bijective.
A matrix is a rectangular array of numbers, such as

a1,1 a1,n

.. ,
..
A = ...
.
.
am,1 am,n

with entry ai,j in row i, column j. The above matrix is dimension m n.


The transpose of a matrix A, denoted A , is the result of reflecting A across
its diagonal, so the entry ai,j in row i, column j of A is aj,i . A matrix is
square if m = n, as for example the identity matrix, denoted I, which has ones
on the diagonal and zeroes elsewhere. Given a vector t = (t1 , . . . , tn ), we may
sometimes view t as a n1 column matrix and the transpose t as the equivalent
1 n row matrix. We can right-multiply A by t = (t1 , . . . , tn ) or left-multiply
by s = (s1 , . . . , sm ) to obtain, respectively, a new m 1 column matrix or 1 n
row matrix,

Pn
j=1 tj a1,j

 Pm
Pm

..

.
At =
and s A =
i=1 si ai,1
i=1 si ai,n
.
Pn
j=1 tj am,j
These operations are associative when combined, i.e., (s A)t = s (At), and
produce a 1 1 matrix, i.e., a number. When A is square, say n n, the
product is simply
t At

n X
n
X

ti tj ai,j .

i=1 j=1

Of course, we can interpret the ith row of A as the gradient pi = (ai,1 , . . . , ai,n ) of
a linear function fi , i = 1, . . . , m, in which case Ax is the vector (f1 (x), . . . , fm (x))
for each x Rn . As this observation suggests, a linear function f : Rn Rm
has a unique matrix representation A with the ith row of A equal to the gradient
of fi ; and conversely, every m n matrix A determines a unique linear function
f : Rn Rm with the gradient of fi equal to the ith row of A.
The row rank of the m n matrix A is the rank of the set of vectors making up
its rows, and the column rank of A is the rank of its columns; another form of
the fundamental theorem of linear algebra is that these quantities are the same,
and we can then unambiguously refer to the rank of a matrix. It has full column
(resp. row ) rank if the rank of the columns is n (resp. rows is n); equivalently,
the columns (resp. rows) are linearly independent. We say an n n matrix A
has full rank (or is non-singular ) if its row and column rank equal n; otherwise,
it is singular ; it is symmetric if A = A , or equivalently if ai,j = aj,i for all
12

i, j = 1, . . . , n; it is negative semi-definite if for all non-zero t Rn , we have


t At 0; and it is negative definite if the latter inequality always holds strictly.
When an n n matrix A has full rank, it has a unique inverse matrix A1
such that for all x Rn , x = AA1 x = A1 Ax. Note that a negative definite
matrix necessarily has full rank: indeed, if the zero vector can be obtained by
a linear combination of columns of A with weights 1 , . . . , n (not all zero),
then we can define t = (1 , . . . , n ) to obtain t At = 0. P
A square matrix A
is diagonally dominant if for each row i, we have |ai,i | j6=i |ai,j |, and it is
strictly diagonally dominant if the latter inequality holds strictly for each row.
Every symmetric, diagonally dominant matrix with non-positive entries along
the diagonal is negative semi-definite; and every symmetric, strictly diagonally
dominant matrix with negative entries along the diagonal is negative definite.
To every square matrix A, we assign a number called the determinant of A
and denoted det(A). To compute the determinant, let Ai,j be the submatrix
obtained by deleting thePith row and jth column of A. Assuming A is n
n
i+j
det(Ai,j ), where we specify that the
n, we define det(A) =
j=1 ai,j (1)
determinant of a 1 1 matrix is just the single element of that matrix. The
determinant is in fact characterized by three properties: (i) det(I) = 1, (ii)
if A and B are identical except that column j in B is equal to column j of
A multiplied by a constant c, then det(B) = c det(A), (iii) the determinant
is unchanged by adding a multiple of one column to another column. Other
properties are that switching columns of A reverses the sign of the determinant,
and det(AB) = det(A)det(B). An especially useful property of the determinant
is that det(A) 6= 0 if and only if the matrix A is non-singular.

Euclidean Topology

Euclidean topology refers to the study of intervals, balls, and their generalization
to higher dimensions. Formally, the open ball of
radius r > 0 around a vector x Rn is the set
Br (x) = {y Rn | ||y x|| < r} of vectors strictly
within distance r of x, depicted to the right for
r
n = 2. The sphere of radius r > 0 around x is
x
the set Sr (x) = {y Rn | ||y x|| = r} of vectors
that are exactly distance r from x; in the figure,
Br (x)
this set consists of the dashed boundary of the
open ball. The disc of radius r around x, denoted
Dr (x) = {y Rn | ||y x|| r}, is the union of the open ball and sphere of
radius r around x.
A vector x in Rn is an interior point of a set Y Rn if there exists > 0 such
that B (x) Y ; it is a boundary point of Y if for all > 0, B (x) Y 6= and
B (x) \ Y 6= ; and it is a closure point of Y if for all > 0, B (x) Y 6= . The
boundary of Y , denoted bd(Y ), consists of all of its boundary points; the interior
13

of Y , denoted int(Y ), consists of all of its interior points, i.e., int(Y ) = Y \bd(Y );
and the closure of Y , denoted clos(Y ), consists of all interior and boundary
points, i.e., clos(Y ) = int(Y ) bd(Y ). A set Y is open if Y = int(Y ), and it is
closed if Y = clos(Y ); equivalently, Y is open if is disjoint from its boundary,
and it is closed if it contains its boundary. Note that int(Y ) is the largest
open subset of Y , in the sense that if G Y is open, then G int(Y ). Also,
clos(Y ) is the smallest closed set containing Y , so that if F Y is closed, then
clos(Y ) F . Furthermore, a set is closed if and only if its complement is open,
and it is open if and only if its complement is closed. It is easily verified that
and n are both open; arbitrary unions of open sets are open; and finite
intersections of open sets are open. Furthermore, and n are both closed;
finite unions of closed sets are closed; and arbitrary intersections of closed sets
are closed. Of course, closed intervals and closed half-lines are closed subsets of
the real line, closed half-spaces are closed subsets of Rn , and any finite subset
of n is automatically closed.
Recall that a sequence in Rn is a countably infinite list (x1 , x2 , . . .) (Rn ) of
vectors indexed by the natural numbers m N, with the convention being that
we instead use set-theoretic notation, as in {xm }.4 We say a sequence {xm }
converges to a vector x if it eventually becomes arbitrarily close to x: for all
> 0, there exists k such that for all m k, we have xm B (x). This is
written xm x or limm xm = x, the sequence is convergent, and x is the
limit of the sequence. A sequence {m } of real numbers diverges to infinity,
written limm m = , if for all c R, there exists k such that for all m k,
we have m c. Similarly, the sequence diverges to negative infinity, written
limm m = , if {m } diverges to infinity. In either case, the sequence
is divergent. The sequence is increasing if higher indices are assigned to weakly
higher numbers, i.e., m k implies m k ; and it is decreasing if {m }
is increasing. Write m if the sequence is increasing and converges to ,
and write m if it is decreasing and converges to . If a sequence of real
numbers is increasing and bounded above, or if it is decreasing and bounded
below, then it is convergent. Of course, a sequence is strictly increasing if it is
increasing and has no repetitions, i.e., m 6= k implies m 6= k , and it is strictly
decreasing if {m } is strictly increasing.
As an example, given a sequence {m } of non-negative
real numbers, the sePm

is called
quence {m } of partial sums defined by

=
k
m
k=1
Pm a series and
P
is increasing. In particular, we have m=1 21m = limm k=1 21k = 1. A
sequence {m } of real numbers converges asymptotically to if it converges to
but does not hit it infinitely often: m and there exists k such that
for all m k, we have m 6= . In this case, we write m
. Given a
sequence {m } in R, define lim supm m = limm sup{k | k m}, and
define lim inf n m = limm inf{k | k m}. These ideas should not be
4 Note that we now use superscripts to index elements of a sequence, because subscripts
may appear to refer to the coordinate of a vector.

14

confused with lim sup An and lim inf An in the context of set theory.
Given two sequences {xm } and {y m } in Rn , we say {y m } is a subsequence of
{xm } if there is a mapping : N N that is strictly increasing and such that
for all m N, we have y m = x(m) . Intuitively, the subsequence {y m } is the
result of deleting elements of the original sequence {xm }. For another way of
viewing a subsequence of {xm }, let {mk } be a strictly increasing sequence of
natural numbers indexed by k; then for each k, xmk is an element of the original
sequence, and {xmk } is a selection of elements from the original, one for each
natural number k. Thus, using different indexing variables for clarity, {y k } is a
subsequence of {xm } if and only if there is a strictly increasing sequence {mk }
of natural numbers such that for all k = 1, 2, . . ., we have y k = xmk . For this
reason, a subsequence of {xm } is usually written {xmk }, where it is understood
that {mk } is a strictly increasing sequence of natural numbers indexed by k. In
some contexts, there is actually no need to have special notation for a subsequence of {xm }, and we can then just use the notation for the original sequence.
A sequence {xm } converges to x in Rn if and only if every subsequence of {xm }
converges to x.
A set Y Rn is closed if and only if the limits of all convergent sequences in
Y belong to Y . Given a sequence {Ym } of subsets of Rn , define the topological
limit superior and the topological limit inferior of the sequence, respectively, as




n for all > 0 and all n, there exists
ls({Ym }) =
xR
m n such that Ym B (x) 6=

and

li({Ym }) =


x Rn



for all > 0, there exists n such that

.

for all m n, Ym B (x) 6=

The former consists of limits of all subsequences of elements from the sets {Ym },
and the latter consists of limits of all sequences of elements from the sets. Both
sets are closed. For the case of a sequence {xm }, which can be viewed as a
sequence of singleton sets Ym = {xm }, the set ls({xm }) is referred to as the
accumulation points of the sequence, and {xm } converges to x if and only if
ls({xm }) = li({xm }) = {x}. Given a sequence {m } of real numbers that
is bounded above, it follows that lim supm m = sup(ls({m })), i.e., the
limsup of the sequence identifies its largest accumulation point; similar remarks
hold for the liminf of a sequence that is bounded below.
A set Y Rn is bounded if there exists r > 0 such that Y Br (0). It is compact
if it is closed and bounded; equivalently, Y is compact if every sequence in Y
has a subsequence that converges to an element of Y S
. We say a collection U
of open subsets of Rn is an open cover of Y if Y U, and we say U is a
subcover of U if it is an open cover of Y and U U. Then Y is compact if
15

and only if every open cover of Y has a finite subcover.5 Let K be a collection
of closed subsets of K satisfying the finite intersection
property, which means
T
that for all m and all K1 , . . . , Km K, we have m
K
j=1 T j 6= . If K is compact,
then the collection has nonempty intersection, i.e., K 6= . Conversely, if
every collection K of closed subsets of K with the finite intersection property
has nonempty intersection, then K is compact. It is easily verified that is
compact; finite unions of compact sets are compact; and arbitrary intersections
of compact sets are compact. Finite subsets are automatically compact. If a
set K is compact, then every closed subset of K is compact. A special feature
of Rn is that the convex hull of a compact set is compact. Every nonempty,
compact subset of the real line has a maximum and a minimum.
A set Y Rn is connected if there do not exist open sets U, V Rn such that
U V = , Y U V , U Y 6= , and V Y 6= . In words, X is connected
if is not possible to break X into two parts by intersecting it with two disjoint
open sets. Every convex set is connected (but not vice versa).
Given x Rn , a function f : Rn R is continuous at x if for every sequence
{xm } in Rn converging to x, we have f (xm ) f (x). It is continuous if it is
continuous at every x Rn ; equivalently, for every open set G R, the preimage
f 1 (G) is open. Note that the Euclidean norm ||x y|| in Rn is continuous.
More precisely, denote vectors in R2n by (x, y), where x, y Rn , and define
f : R2n R by f (x, y) = ||x y||; then f is continuous. We can combine
continuous functions to make new continuous functions. Let f, g : Rn R be
continuous at x Rn , let h : R R be continuous at f (x), and let , R.
Then:
1. f + g is continuous at x,
2. f g is continuous at x,
3. max{f (x), g(x)} is continuous at x,
4. h f is continuous at x.
If f : Rn R is continuous and K Rn is compact, then by Weierstrass
theorem, the image f (K) is compact, and thus there exists a maximizer of f
over K. By the intermediate value theorem, if f : Rn R is continuous and
Y Rn is connected, then the image f (Y ) is convex.
We can now revisit the separating hyperplane theorem and give more general
condition under which the separation result holds. Let two sets X, Y Rn be
5 By the Heine-Borel theorem, a set in Rn is closed and bounded if and only if every open
cover of the set has a finite subcover; and by the Bolzano-Weierstrass theorem, every sequence
in a set in Rn has a convergent subsequence with limit in the set if and only if every open
cover of the set has a finite subcover.

16

convex and such that Y has nonempty interior


disjoint from X, i.e., int(Y ) 6= and X int(Y ) =
p
; then the separating hyperplane theorem estabY
lishes that the two sets can be separated by a
linear function with non-zero gradient, i.e., there
X
is a linear function f with gradient p 6= 0 such
f =c
that for all x X and all y Y , we have
f (x) = p x p y = f (y). Moreover, if X is
compact and Y is closed, and if X Y = , then f can be chosen so that the
previous weak inequalities hold strictly.
We can consider a sequence {fm } of functions fm : Rn R and analyze its
limiting properties as m gets large. Analogous to the case of a sequence of
vectors in Rn , it may be that the sequence approaches, to an arbitrary degree,
another function f : Rn R; in contrast to sequences of vectors, however, there
are now multiple compelling notions of convergence that can be defined. The
sequence {fm } converges uniformly to f if for all > 0, there exists m such
that for all k m and all x Rn , we have |fk (x) f (x)| < ; or equivalently,
sup{|fm (x) f (x)| | x Rn } 0. The sequence converges pointwise if for
all x Rn , we have fm (x) f (x). Obviously, uniform convergence implies
pointwise convergence, but the converse does not hold generally: define fm : R
R by fm (x) = max{1m|x|, 0}, define f (0) = 1 and f (x) = 0 for all x R\{0},
and note that fm f pointwise but not uniformly.
More generally, we can consider vector-valued mappings f = (f1 , . . . , fm ) : Rn
Rm . Such a mapping is continuous if each component function fi is continuous;
equivalently, for every sequence {xm } in Rn converging to x Rn , we have
f (xm ) f (x) in Rm ; or equivalently, for every open set G Rm , the preimage
f 1 (G) is open. If f : Rn Rm is continuous and K Rn is compact, then
the image f (K) is compact; and if f : Rn Rm is continuous and Y Rn is
connected, then the image f (Y ) is connected (though not necessarily convex).
Given vectors y 1 , . . . , y m Rn , let K = conv({y 1 , . . . , y m }) denote the convex
hull, and {Y1 , . . . , Ym } be any collection of closed subsets of Rn such that for
every subset I {1, . . . , m}, we have
[
conv({y j | j I})
{Yj | j I}.
j
j
Note the implication
Tm that y Y , j = 1, . . . , m. By the KKM theorem, the
intersection K j=1 Yj is compact and nonempty. This is depicted in Figure
1 for the m = n = 3 case.6
6 See Lemma 17.43 of Aliprantis and Border (2006). The abbreviation refers to Knaster,
Kuratowski, and Mazurkiewicz.

17

y1
Y

Y3
Y2

y2

y3

Figure 1: KKM Theorem

Relative and Product Topologies

In general, a topology is a convention for specifying some sets as open, some


as closed, and others as neither open nor closed. In Euclidean space, there is a
standard convention for doing so, but these ideas can be extended beyond Rn
to more general spaces. When X Rn is the set of interest, we often employ
the relative topology on X.7 To define the idea of an open set in this context,
we define the relatively open ball,
BrX (y) =

{x X | ||x y|| < r}.

Note the difference between this open ball and the open balls defined above for
Euclidean space: this one is restricted to X. A sequence {xm } in X converges
in the relative topology to x X if for all > 0, there exists k such that for
all m k, we have xm BX (x). We can then define the notions of relative
boundary, interior, and closure just as we did above. For example, given Y X,
we say x is a boundary point of Y relative to X if for all > 0, we have
BX (x) Y 6= and BX (x) \ Y 6= . As the other definitions are structurally
identical to the originals, there is no need to repeat all of them. We may use
the notation bdX , intX , and closX to distinguish the relative versions of these
concepts.
We then say Y X is open relative to X (or open in the relative topology
on X, or open in X, or just relatively open) if Y is disjoint from its relative
boundary; equivalently, if for all x Y , there is some > 0 such that BX (x)
Y . A set Y is closed relative to X (or closed in the relative topology on X,
7 The

term topology is defined formally in Section 18.

18

or closed in X, or just relatively closed ) if it contains its relative boundary;


equivalently, if every sequence in Y with limit in X converges to an element
of Y . It may be useful to give equivalent formulations of relatively open and
closed sets: U is open in X if and only if there is an open subset V Rn in
Euclidean space such that U = X V , as
depicted to the right; and E is closed in X if
V
and only if there is a closed subset F Rn
in Euclidean space such that E = X F .
U
The largest relatively open subset of Y X
X
is intX (Y ); and the smallest relatively closed
superset of Y X is closX (Y ). A set Y
X is relatively closed if and only if X \ Y is
relatively open, and it is open if and only if X \ Y is relatively closed. As
above, and X are both open in X; arbitrary unions of relatively open sets
are relatively open; and finite intersections of relatively open sets are relatively
open. Furthermore, and X are both closed in X; finite unions of relatively
closed sets are relatively closed; and arbitrary intersections of relatively closed
sets are relatively closed.
A set Y X is bounded if it is contained in the ball BrX (x) for some r > 0
and some x X; equivalently, if it is contained in Br (x) for some r > 0 and
some x Rn . We do not define a set Y X to be compact relative to X
if it is bounded and relatively closed: every X is closed relative to itself, but
then we would say the interval (0, 1) is compact relative to itself, which is not
a useful convention. Instead, we extend the original definition of compactness
by defining Y X to be compact if every sequence in Y has a subsequence
that converges to an element of Y . An implication is that, unlike openness
and closedness (but like boundedness), compactness is not context-dependent:
a subset Y X is compact as a subset of X if and only if it is compact as
a subset of Rn . Thus, we do not define a concept of compactness relative to
X. Paralleling the result for Euclidean space, Y is compact if and only if every
open cover of Y relative to X has a finite subcover, where now open cover
refers to the open sets relative to X. And Y is compact if and only if every
collection of relatively closed subsets of Y with the finite intersection property
has nonempty intersection. Assuming X is a closed subset of Rn in the usual
Euclidean topology, it is indeed true that a set Y X is compact if and only if it
is closed relative to X and bounded, but we no longer take this as the definition
of compactness. As before, the empty set and all finite sets are compact; finite
unions of compact sets are compact; and arbitrary intersections of compact sets
are compact. Every compact set is relatively closed, and if Y is compact, then
every relatively closed subset of Y is compact.
Like compactness (and boundedness), connectedness is independent of context:
we can define Y X to be connected if there do not exist relatively open sets
U, V X such that U V = , Y U V , U Y 6= , and V Y 6= . But
19

this is equivalent to the original definition, because U and V are open relative
to X if and only if there exist open sets U , V Rn such that U = U X and
V = V X. Thus, we do not define a concept of connectedness relative to X.
Given X Rn , the definition of a continuous function f : X R is nearly
identical to the original: given x X, a function f : X R is continuous at
x if for every sequence {xm } converging to x in X, we have f (xm ) f (x).
It is continuous if it is continuous at every x X; equivalently, for every
open G R, the preimage f 1 (G) is relatively open in X. Again, we can
combine continuous functions to make continuous functions: given continuous
f, g : X R, the functions f + g, f g, and max{f (x), g(x)} are continuous;
and compositions of continuous functions are continuous. Again, the image
of a compact set under a continuous function is compact; and the image of a
connected set under a continuous function is convex. Notions of convergence of a
sequence {fm } of functions fm : X R to f : X R extend straightforwardly:
the sequence converges uniformly if sup{|fm (x) f (x)| | x X} 0, and it
converges pointwise if for all x X, fm (x) f (x).
More generally, a vector-valued mapping f : X Rm is continuous if each
component function is continuous; equivalently, for every x X and every
sequence {xm } converging to x in X, f (xm ) f (x); equivalently, for every
open set G Rm , the preimage f 1 (G) is open relative to X. Again, we can
combine continuous functions to make continuous functions, and compositions
of continuous functions is continuous: if f : X Rm is continuous, if Y Rm
contains f (X), and if g : Y Rk is continuous, then g f is continuous. By
Weierstrass theorem, the image of a compact set under a continuous function
is compact; and by the intermediate value theorem, the image of a connected
set under a continuous function is connected. Given any finite set X Rn , note
that every subset Y X is open relative to X and compact, and every function
f : X R is continuous.
Given a set X Rn and a function f : X Rn , and element x is a fixed point of
f if f (x) = x. By Browers fixed point theorem, if X is nonempty, compact, and
convex, and if f is continuous with f (X) X, then the function has at least one
fixed point. There may, however, be multiple fixed points, e.g., f is the identity
function. A function f : X R is a contraction mapping on X if f (X) X
and there exists [0, 1) such that for all x, y X, ||f (x) f (y)|| ||x y||,
in which case is the modulus of the contraction. By the contraction mapping
theorem, if X is a nonempty, closed subset of Rn and f : X Rn is a contraction
mapping on X, then f has a unique fixed point.
Another variation on the usual Euclidean topology is the product topology,
P
which starts with subsets Xi Rni , i = 1, . . . , k, where we let n = ki=1 ni .
Qk
We then consider the product X = i=1 Xi , which consists of ordered k-tuples
1
k
i
x = (x , . . . , x ) with component x belonging to set Xi , i = 1, . . . , k. We define
20

the open ball of radius r around x = (x1 , . . . , xk ) X in the product space as


the product of relatively open balls,
BrX (x)

k
Y

BrXi (xi ).

i=1

A sequence {xm } of ordered k-tuples xm = (x1m , . . . , xkm ) in X converges in the


product topology to x = (x1 , . . . , xk ) X if for all > 0, there exists such that
for all m , we have xm BX (x).8 Breaking this definition down, it reduces
to the simple statement that a sequence {(x1m , . . . , xkm )} of ordered k-tuples in
X converges to (x1 , . . . , xk ) X if xim xi for all i = 1, . . . , k. That is, the
sequence converges if and only if it converges in all coordinates.
Given a subset Y X, we can define the boundary, interior, and closure of Y
according to the usual conventions, and we can distinguish between open and
closed sets. For example, given Y X, we say x = (x1 , . . . , xk ) is a boundary
point of Y if for all > 0, BX (x) Y 6= and BX (x) \ Y 6= , and Y is open if
it is disjoint from its boundary; equivalently, for all x = (x1 , . . . , xk ) Y , there
exists > 0 such that BX (x) Y . As usual, a set Y X is closed if it contains
its boundary; equivalently, if for every sequence {xm } with limit x X, we have
x Y . The largest open subset of Y X is intX (Y ); and the smallest closed
superset of Y X is closX (Y ). A set Y X is open if and only if X \ Y is
closed; and Y is closed if and only if X \ Y is open. As above, and X are both
open; arbitrary unions of open sets are open; and finite intersections of open
sets are open. Furthermore, and X are both closed; finite unions of closed
sets are closed; and arbitrary intersections of closed sets are closed.
A set is bounded if it is contained in a ball BrX (x) for some r > 0 and some x X;
equivalently, if it is contained in Br (x) for some r > 0 and some x Rn . A set
Y X is compact if every sequence in Y has a subsequence that converges to
an element of Y . Then Y is compact if every open cover of Y (using the above
definition of open set) has a finite subcover; and it is compact if and only if
every collection of closed subsets (using the above definition of closed set) of Y
with the finite intersection property has nonempty intersection. If each Xi is
closed, then Y is compact if and only if it is closed and bounded, but (as in the
discussion of the relative topology) this equivalence is no longer definitional.
The empty set and all finite sets are compact; finite unions of compact sets
are compact; and arbitrary intersections of compact sets are compact. Every
compact set is closed, and if Y is compact, then every closed subset of Y is
compact. A very special case of the Tychonoff product theorem is that if each
Xi is compact in the Euclidean space Rni , i = 1, . . . , k, then X is itself compact
in the product topology. A set Y X is connected if there do not exist open
sets U, V X such that U V = , Y U V , U Y 6= , and V Y 6= .
8 We revert to indexing sequences by subscripts, as now superscripts index the factor in the
product. Thus, xim is a vector in Rni .

21

A function f : X R is continuous if for every x = (x1 , . . . , xk ) X and


every sequence {xm } converging to x in X, we have f (xm ) f (x); equivalently, for every open set G R, the preimage f 1 (G) is open as a subset of
X. This definition extends to vector-valued functions in the usual way: a function f : X Rm is continuous if for every x X and every sequence {xm }
converging to x in X, f (xm ) f (x); equivalently, for every open G Rm , the
preimage f 1 (G) is open relative to X. We can combine continuous functions
in the usual way to produce new continuous functions; by Weierstrass theorem,
the image of a compact set under a continuous function is compact; and by the
intermediate value theorem, the image of a connected set under a continuous
function is connected.
A generalization of Browers theorem is Browders theorem, which considers a
function f : X [0, 1] Rn with values f (x, t), where t [0, 1] is viewed as
a parameter. Assume that X Rn is nonempty, open, and convex, that f is
continuous, and that there is a compact set K X with f (X [0, 1]) K. Then
for each t [0, 1], there is a fixed point x K such that f (x, t) = x, and there
exist a connected set C K [0, 1] and x0 , x1 K such that (x0 , 0), (x1 , 1) C
and for all (x, t) C, f (x, t) = x.9 That is, there is a connected set of fixed
points from parameter value t = 0 to t = 1.
Qk
As the notation for open balls suggests, the product topology on X = i=1 Xi is
precisely the relative topology on X when
viewed as a subset of Rn . More precisely, a set
Y X is open (resp. closed) in the product
topology on X if and only if it is open (resp.
closed) in the relative topology on X viewed
as a subset of Rn ; and a sequence {xm } in X
converges to x X in the product topology if
and only if it converges in the relative topology on X as a subset of Rn . This equivalence
is fortuitous, because it implies that we do
not require different notions of continuity for functions defined on a multidimensional domain, depending on how we factor the domain.
To be more specific, let X = Y Z Rn , where Y R , Z Rm , and
n = + m. Then a function f : X R is continuous on X (with the Euclidean
topology of Rn ) if and only if the same function f : Y Z R is continuous
on Y Z (with the product topology). To emphasize this, we may refer to the
function as jointly continuous in its arguments. For each y Y , we can define
the function fy : Z R by fy (z) = f (y, z), and for each z Z, we can define
fz : Y R by fz (y) = f (y, z). If f : X R is continuous, then it is continuous
in each argument: for all y Y and z Z, the functions fy and fz are
9 See Theorem 1 of Mas-Colell (1974) A Note on a Theorem of F. Browder, Mathematical
Programming, 6: 229233.

22

2
continuous. The converse does not hold. For example, define
R as
n f : [0, 1] o
|yz|
follows: if y = 0, then f (y, z) = 0; otherwise, f (y, z) = max 1 y , 0 . This

1 1
, m )} converges
function is continuous in each argument, but the sequence {( m
1 1
to zero, while f ( m , m ) = 1 6 0 = f (0, 0). Thus, continuity in each argument
separately does not imply joint continuity.

Differentiability

Given a set X Rn , a function f : X R, and an element x int(X), we say


f is differentiable at x in direction t if for every sequence {k } in R converging
asymptotically to zero, the sequence


f (x + k t) f (x)
k
converges. In this case, the limit, which is
lim

f (x + t) f (x)
,

must in fact be unique; we call the limit the derivative of f at x in direction


t, and we denote it Dt f (x). We say f is directionally differentiable at x if it is
differentiable at x in every direction. When the domain is an open set U Rn ,
we say f : U R is directionally differentiable if it is directionally differentiable
at every x U . More generally, given X Rn , a function f : X R, and an
open set U R, we say f is directionally differentiable on U if the restriction
f |U is directionally differentiable. When n = 1, so there are just two directions,
t {1, 1}, we then say simply that f is differentiable.
Let f, g : X R be differentiable at x int(X) in direction t, let h : f (X) R
be differentiable at f (x), and let , R. Then:
1. f + g is differentiable at x in direction t, and Dt (f + g)(x) =
Dt f (x) + Dt g(x),
2. f g is differentiable at x in direction t, and Dt (f g)(x) = Dt f (x)g(x) +
f (x)Dt g(x),
3. h f is differentiable at x in direction t, and Dt (h f )(x) = Dh(y)Dt f (x)
with y = f (x).
These are known as the sum rule, product rule, and chain rule.
A fundamental result from univariate calculus, which is of great use in multivariate analysis, is the mean value theorem, which states that given a, b R

23

Figure 2: Gradients
with a < b, if f : [a, b] R is differentiable on (a, b) and continuous at a and b,
then there is some x (a, b) such that
Df (x)

f (b) f (a)
.
ba

The function f : X R is partially differentiable at x if it is differentiable at


x in direction ei for each i = 1, . . . , n. The ith partial derivative of f at x is
f
(x).
the derivative of f at x in the direction ei , and it is denoted Di f (x) or x
i
n
When U R is open, the function f : U R is partially differentiable if it is
partially differentiable at every x U . The vector of partial derivatives at x is
the gradient of f at x, and it is denoted by Df (x) or f (x). An implication of
the Cauchy-Schwartz inequality is that the gradient at x points in the direction
of steepest ascent of the function at x, and therefore the gradient of f at x is
orthogonal to the level set of f through x, as in Figure 2.
If f : U R is partially differentiable, then for each x U and each coordinate
i = 1, . . . , n, the partial derivative Di f (x) is well-defined. Then we may consider
the mapping Di f : U R, which gives the ith partial derivative of f at every
element of the domain. We call Di f the ith partial derivative of f . If all
partial derivatives Di f : U R are continuous, then f is continuously partially
differentiable, or C 1 . Every C 1 function is, in fact, continuous. Furthermore,
every C 1 function f is directionally differentiable, and directional derivatives
are straightforward to calculate: for all x U and every direction t, we have
Dt f (x) = Df (x) t.
Given a directionally differentiable function f : U R, if each directional
derivative Dt f : U R is also directionally differentiable, then f is twice directionally differentiable. For the derivative of Dt f at x in direction s, we use the
24

special notation
Dst f (x) =

Ds (Dt f )(x).

When s = t, we write Dt2 f (x) for the second derivative in direction t at x. If


f : U R is partially differentiable and each partial derivative Di f : U R is
itself partially differentiable, then we say f is twice partially differentiable. In
this case, for the ith partial derivative of Dj f (x), we use the special notation
Dij f (x) =

Di (Dj f )(x),

and we refer to this as the ijth cross partial derivative of f at x. The Hessian
of f at x is then the n n matrix of cross partial derivatives at x,

D11 f (x)
D12 f (x)

D1n f (x)
D21 f (x)
D22 f (x)

D2n f (x)

D2 f (x) =
.
..
..
..

.
.
.
Dn1 f (x)

Dn2 f (x)

Dnn f (x)

If f is twice partially differentiable and for all i, j = 1, . . . , n, the cross partial


derivative Dij f : U R is continuous, then f is continuously twice partially
differentiable, or C 2 .
The calculation of cross partial derivatives of C 2 functions is simplified by
Youngs theorem, states that the order in which partial derivatives is taken
is irrelevant: if f : U R is C 2 , then for all i, j = 1, . . . , n and all x U ,
we have Dij f (x) = Dji f (x). Thus, the Hessian matrix of a C 2 function f is
always symmetric. Furthermore, under the conditions of Youngs theorem, we
can easily calculate second directional derivatives: for all directions s and t, we
have
Dst f (x)

n X
n
X

Dij f (x)si tj .

i=1 j=1

Of course, this can be written

D11 f (x)
D


21 f (x)
s1 s2 sn
..

Dn1 f (x)

D12 f (x)
D22 f (x)
..
.

D1n f (x)
D2n f (x)
..
.

Dn2 f (x)

Dnn f (x)

t1
t2
..
.
tn

or more concisely, viewing s and t as column vectors and s as the transpose of


s, we have Dst f (x) = s D2 f (x)t.
We can extend the above definitions to higher orders of differentiability. We refer
to partial derivatives as first order derivatives and to cross partial derivatives
25

as second order derivatives. More generally, for a function f : U R, if all


cross partial derivatives Di1 ,...,ir1 f of order r 1 are well-defined and themselves partially differentiable, then f is r-times partially differentiable if each
Di1 ,...,ir1 f : U R is partially differentiable, and we write Di1 ,...,ir1 ,ir f (x)
for the ir th partial derivative of Di1 ,...,ir1 f at x, where ir = 1, . . . , n. It is rtimes continuously partially differentiable, or C r , if each cross partial derivative
of order r is continuous. The function is C 0 if it is continuous, and it is C if
it is C r for every r = 0, 1, 2, . . ..
To extend the concept of differentiability to functions defined on more general domains, let X Rn be contained in the closure of its interior, i.e.,
X clos(int(X)). Then a function f : X R is r-times continuously partially
differentiable, or C r , for r = 1, 2, . . ., if it is the restriction of a C r function defined on an open superset of X, i.e., there exist open U Rn and a C r function
g : U R such that f = g|X . Note that the partial derivatives of a C 1 function
are uniquely defined on the entire domain X. Indeed, if x bd(X), then there
is a sequence {xm } in int(X) converging to x; and if g : U R and h : V R
are both C 1 extensions of f to an open superset of X, then
Dg(x) =

lim Dg(xm ) =

lim Df (xm ) =

lim Dh(xm ) = Dh(x),

and therefore Df (x) = Dg(x) = Dh(x) independently of the extension of f .


Thus, we can extend the partial derivatives of a C 1 function to the domain X
and unambiguously write Df : X R. Similar remarks hold for C r functions
and derivatives of higher orders. In the sequel, however, we will usually consider
C r functions defined on an open domain, with results for functions on general
domains left implicit.
We can give characterizations of concavity and strict concavity in terms of
second directional derivatives. If X Rn is convex and f : X R is C 2 , then
1. f is concave if and only if for all x X, D2 f (x) is negative semi-definite.

2. if D2 f (x) is negative definite for all x X, then f is strictly concave.

To see that (2) can only be stated for one direction, consider f : R R defined
by f (x) = x4 ; this function is strictly concave, but D2 f (0) = 0.
Given X Rn if f : X R is directionally differentiable at x int(X), and
if x is a local maximizer of f , i.e., there is an open set U Rn such that
f (x ) = maxxU f (x), then for every direction t, we have Dt f (x ) = 0. If,
in addition, f is twice directionally differentiable, then Dt2 f (x ) 0 for every
direction t. If, in addition, f is C 2 , then D2 f (x ) is negative semi-definite.
Conversely, assuming f is C 2 and x int(Y ), if Df (x ) = 0 and D2 f (x ) is
negative definite, then x is a strict local maximizer of f , i.e., there exists open
G Y such that x G and for all x G \ {x }, f (x ) > f (x). Assuming
X Rn is convex and f : X R is C 1 and concave, if x int(X) and
Df (x ) = 0, then x is a maximizer of f .
26

Figure 3: Outer measure of Lebesgue measurable set

Lebesgue Measure

A half-open rectangle in Rn is any product set R = (a1 , b1 ] (an , bn ] of


half-open intervals. The Lebesgue measure of a half-open rectangle R is the
product of the lengths of the intervals,
(R)

(b1 a1 )(b2 a2 ) (bn an ).

Given Y Rn , we can cover Y Rn with


rectangles, R1 , . . . , Rm ,
Phalf-open
m
and we can approximate the size of Y by i=1 (Ri ). This approximation is
better when we use more, smaller rectangles. Then the outer Lebesgue measure
of Y , which may be infinite, is
)
(m

X
m N and R1 , . . . , Rm are half-open

S
.
(Ri )
(Y ) = inf
rectangles with Y m
i=1 Ri
i=1

See Figure 3. Technically, this defines a mapping : 2R R+ from subsets


of Rn to the non-negative extended real numbers (including ).
A set Y is Lebesgue measurable (or simply measurable) if for all Z Rn ,
(Z)

= (Y Z) + (Y Z).

That is, we can use Y to break up any Z and compute the size of Z in each part
separately, as in Figure 3 where a (curvy) measurable set is used to partition an
arbitrary (jagged) set.10 Letting Ln denote the Lebesgue measurable subsets of
10 In some treatments, the term measurable applied to a set may mean Borel measurable.
The class of Borel sets is the smallest collection of sets that contains all open sets and is closed
with respect to complements and countable unions. The class is included among the Lebesgue
measurable sets, and in fact the Lebesgue measurable sets are the completion of the Borel sets
with respect to Lebesgue measure. There are Lebesgue measurable sets that are not Borel
measurable. More general treatments may define measurability with respect to an abstract
-algebra. These considerations will come up from time to time in footnotes or the appendix,
but they are mostly outside the scope of this survey.

27

Rn , we henceforth restrict the mapping to Ln and let = |Ln denote this


restriction. The mapping is referred to as Lebesgue measure, and given measurable Y Rn , the Lebesgue measure of Y is precisely (Y ). If a measurable
set Z satisfies (Z) = 0, then Z is Lebesgue measure zero (or simply measure
zero). If a property holds at all x outside a measurable set Z with (Z) = 0,
then we say it holds almost always or holds for almost every x (sometimes
abbreviated a.e. x). Given measurable X Rn , a property holds for almost
every x X if it holds for all x X \ Z, where Z is measurable with (Z) = 0.
Here are some facts about measurable sets.
1. The sets and Rn are measurable.
2. If Y is measurable, then Y is measurable.
S
3. If Y1 , Y2 , . . . are measurable, then
i=1 Yi is measurable.
T
4. If Y1 , Y2 , . . . are measurable, then i=1 Yi is measurable.
5. Every half-open rectangle is measurable.
6. All closed sets are measurable.
7. All open sets are measurable.
Conditions 13 are equivalent to the statement that the Lebesgue measurable
sets form a -algebra (or -field ). A special property of the Lebesgue measurable
sets is that Lebesgue measure is complete, in the sense that if a measurable set
Z has Lebesgue measure zero and Y Z, then Y is measurable.11
A particularly interesting measurable subset of the unit interval is the Cantor
set, denoted C, defined as follows. Let C0 =
[0, 1]; then define C1 = C0 \ (1/3, 2/3) by removing the middle third of C0 , leaving the
union of two disjoint intervals; then define C2
by removing the middle thirds of the intervals
[0, 1/3] and [2/3, 1], leaving the union of four
disjoint intervals; then define C3 by removing
their middle thirds, and so on. In general,
for n
S 2, define Cn = Cn1 \n Dn , where
Dn = {[ 3kn , k+1
3n ] | k = 1, . . . , 3 2, odd}
is theTunion of alternating subintervals of length 1/3n . Then the Cantor set is

C = i=1 Ci . This set has the following properties:


(i) it is closed,

11 The definition of completeness differs from Definition 10.34 of Aliprantis and Border
(2006). Equivalence of the definitions for -finite measures follows from Theorems 4.2 and
4.3, and discussion on p.84, in Kingman and Taylor (1966) Introduction to Measure and
Probability, Cambridge Press: New York, NY.

28

(ii) it is nowhere dense, i.e., its closure contains no open set,


(iii) it has Lebesgue measure zero,
(iv) it has the cardinality of the continuum.
In particular, there exist uncountable sets with Lebesgue measure zero. There
are subsets of the unit interval that are not measurable, but the proof of this
claim is non-constructive: the existence of such sets relies on the axiom of choice.
In general, a function : Ln R+ mapping Lebesgue measurable sets to the
non-negative extended real numbers (including ) is a measure on Rn if it
satisfies the following:
(i) () = 0,
(ii) for all pairwise disjoint collections Y1 , Y2 , . . . of measurable sets,
!

X
[
(Yi ).
=
Yi

i=1

i=1

Condition (ii) is known as countable additivity. Of course, Lebesgue measure


is a particular measure. Another simple example is the counting measure, ,
defined as (Y ) = |Y | if Y is finite and (Y ) = otherwise.
Given a measure on Rn , if (Rn ) < , then is finite. If (Rn ) = 1,
then is a probability measure. If
there is a countable collection {Y1 , Y2 , . . .} of
S
measurable sets such that Rn = i=1 Yi and (Yi ) < for all i = 1, 2, . . ., then
is -finite. Lebesgue measure is not finite, but it is -finite. If a measurable
set Z satisfies (Z) = 0, then it is -measure zero, and if a property holds at
all x outside a -measure zero set Z, then it holds for -almost all x. Given
measurable X Rn , a property holds for -almost every x X if it holds
for all x X \ Z.
The Caratheodory extension theorem establishes several facts about measures
(and Lebesgue measure in particular):
1. If Y and Z are measurable and Y Z, then (Y ) (Z).
2. If Y and Z are measurable and (Y ) < , then (Y \ Z) = (Y ) (Y
Z).
3. If Y1 , Y2 , . . . are measurable, then
!

X
[
(Yi ).

Yi

i=1

i=1

29

4. If Y1 , Y2 , . . . are measurable and (Yj ) < for some j = 1, 2, . . ., then


!

X
\
(Yi ).
(Yj )
Yi

i=1

i=1

The property in condition 4 is known as countable sub-additivity.


Using the structure of Euclidean space, we can also state several results on
approximation of measurable sets for a finite measure .12
1. For all measurable Y and all > 0, there exists an open set U Y such
that (U ) < (Y ) + .
2. For all measurable Y and all > 0, there exists a compact set K Y
such that (K) > (Y ) .
Condition 1 means that is outer regular, and condition 2 means that is
tight. If, in addition to being outer regular and tight, (K) < for every
compact set K, then is regular. Lebesgue measure is regular.13 As well, every
finite measure is regular, and it follows that for every measurable Y with
(Y ) < and every > 0, there exist an open set U and a compact set K
satisfying K Y U and |(U ) (K)| < .
Given that every finite measure is regular, and that Lebesgue measure is
regular, one might suspect that every -finite measure is regular. This is not
the case, however, as the following example shows. Let Q[0,1] denote the rational
numbers between zero and one, and let I[0,1] = [0, 1] \ Q[0,1] denote the irrational
numbers between zero and one. Note that Q[0,1] is countable, so we can index
the rationals in the unit interval as a1 , a2 , . . . (the precise way the rationals
are enumerated is irrelevant), and that Q[0,1] has Lebesgue measure zero, while
I[0,1] has Lebesgue measure one. Define the measure on R using the uniform
distribution on I[0,1] and counting measure on Q[0,1] . Formally, given measurable
Y R, we define
(Y ) =

(Y I[0,1] ) + |Y Q[0,1] |,

where || denotes the number (possibly infinite) of rationals between zero and one
belonging to Y . This measure is -finite, but it is not outer regular: every open
set U containing I[0,1] contains Q[0,1] (and infinite number of rational numbers),
12 These results follow from Theorem 12.5 in Aliprantis and Border (2006), except that the
latter result concerns approximation of Borel sets. By Theorem 10.23 (part 6) of Aliprantis
and Border (2006), given measurable Y , there is a Borel set B Y with (B) = (Y ),
so Y can be approximated from above by open sets. Note as well that there is a Borel set
C (Y ) with (C) = (Y ), and therefore C Y is a Borel set with (C) = (Y ), so Y can
be approximated from below by compact sets.
13 See Theorem 4.5 of Kingman and Taylor (1966) Introduction to Measure and Probability,
Cambridge Press: New York, NY.

30

and therefore (U ) = . It is true, nevertheless, that for every measure on


Rn , every measurable Y Rn such that (Y ) < , and all > 0, there exists
a compact set K Y such that (K) > (Y ) .
Measures possess certain continuity properties. Letting Y1 , Y2 , . . . be measurable, we have


lim inf (Yn ),
lim inf Yn
n

S
and if (
m=n Ym ) < for some n, then


lim sup (Yn ).
lim sup Yn
n

This implies that is continuous for increasing sequences of sets: if Y1 , Y2 , . . .


are measurable and Yk Yk+1 for all k = 1, 2, . . . , then limk (Yk ) =
(limk Yk ). Similarly, is continuous for decreasing sequences of sets: if
Y1 , Y2 , . . . are measurable and Yk+1 Yk for all k = 1, 2, . . . , and if (Yk ) <
for some k, then limk (Yk ) = (limk Yk ).
For every measure , there is a unique closed set, called the support of and
denoted supp(), such that
(i) (Rn \ supp()) = 0
(ii) for all open G Rn with G supp() 6= , we have (G supp()) > 0.
If is finite, then the support of is the intersection of all closed sets F with
(F ) = (Rn ).
Given a measure on Rn , x Rn is an atom of if ({x}) > 0, and is
atomless (or non-atomic) if it admits no atoms. If there is an atom x Rn
such that (Rn \ {x}) = 0, then is degenerate on x. By Lyapunovs theorem,
the range of a vector of atomless measures is convex, i.e., given any m and any
atomless measures 1 , . . . , m , the set
{(1 (Y ), . . . , m (Y )) | Y is measurable }
is convex.14 Of course, if m = 1 and is a probability measure, then the range
is the unit interval [0, 1]. We can actually extend the idea of measure to a
vector measure, which is a mapping which takes measurable sets Y to vectors
(Y ) = (1 (Y ), . . . , m (Y )) such that each i is a measure, i = 1, . . . , m. It is
atomless if each i is atomless. Then Lyapunov theorem states that the range
of an atomless vector measure is convex.
14 Although Lyapunovs theorem holds for vectors of measures of any finite length, the
result does not hold for infinite-dimensional (countably or otherwise) vectors of measures.
For further details, see Example 1 of Sun (1997) Integration of Correspondences on Loeb
Spaces, Transactions of the American Mathematical Society, 349: 129153.

31

Differential Topology

We now briefly study the differentiable structure of sets in Rn , and for this
it is necessary to consider in more detail functions f : U Rm defined on
an open set U Rn . The values of such a function can be decomposed as
f (x) = (f1 (x), . . . , fm (x)), where each component function satisfies fi : U R,
i = 1, . . . , m. Then f : U Rm is r-times continuously partially differentiable,
or C r , if each component function fi is C r . In this case, the derivative of f at
x U is a m n matrix, the Jacobian matrix of f at x,

D1 f1 (x) D2 f1 (x) Dn f1 (x)


D1 f2 (x) D2 f2 (x) Dn f2 (x)

Df (x) =
,
..
..
..

.
.
.
D1 fm (x)

D2 fm (x)

Dn fm (x)

with rows corresponding to the gradients of the component functions. This


should not be confused with the Hessian of a real-valued function. Alternatively,
the notation Jf (x) could be used for the Jacobian matrix.
A function f : U Rm has two useful graphical representations that allow us to
see its Jacobian matrix: one representation is useful when the domain has dimension equal to two (n = 2) and the other when the range has dimension equal
to two (m = 2). In the first case, we can graph the level sets of each component
function fi through x U , with gradients orthogonal to the corresponding
level sets. See the left panel in Figure 4, where we suppose f (x ) = (c1 , c2 ).
This contour map is the traditional approach to graphing a vector-valued
mapping, and by plotting the gradients Dfi (x ), it allows us to see the rows of
the Jacobian matrix.
When the domain has dimensionality greater than two, however, drawing level
sets is problematic. In this case, as long as m = 2, we can plot the trajectories
of the function at a given x = (x1 , x2 ) in the range of f as we vary each
coordinate xj , holding all other coordinates fixed. To be more precise, given
x , we write xj = (x1 , . . . , xij , xj+1 , . . . , xn ) for the vector in Rn1 obtained
by deleting coordinate xj in x ; we let Uxj = {xj R | (xj , xj ) U } be the
section of U at xj , and we define fxj : Uxj Rm by fxj (xj ) = f (xj , xj ). That
is, fxj gives the values of f as we vary the jth coordinate of x , holding all other
coordinates fixed. The image fxj (Uxj ) of this mapping is the trajectory of f
along the jth coordinate given x . The Jacobian matrix of fxj at xj is

Df j (x ) =

Dfxj (xj ) =

32

Dj f1 (x )
Dj f2 (x )
..
.
Dj fm (x )

Df2 (x )
Df 2 (x )

Df1 (x )
x1 = x1

f 2 = c2

Df 1 (x )
x2 = x2

f 1 = c1

Figure 4: Level sets and trajectories


which is just the jth column of the Jacobian of f at x . Geometrically, this
vector will be tangent to the trajectory of f along the jth coordinate at x , as
in the right panel of Figure 4. As long as the range of f has dimension equal to
two, we can graph the trajectories of f at a particular x and plot the vectors
Dj f (x ), which allows us to see the columns of the Jacobian matrix.
Note that the rows of Df (x) in the left panel of Figure 4 and columns of Df (x)
in the right panel of the figure are linearly independent, so the Jacobian of f
at x has full rank. This means that we can vary f (x) over an open set by
small perturbations of x near x . Letting r 0 and given open U Rn , a C r
function f : U Rm is a C r diffeomorphism if it is injective, and the inverse
f 1 : f (U ) Rn can be extended to a C r function defined on an open set V ,
i.e., there exist open V Rm and a C r mapping g : V Rn such that f (U ) V
and g|f (U) = f 1 . The mapping f : U Rm is a C r local diffeomorphism at
x U if there is an open set V U with x V such that the restriction
f |V : X Rm is a C r diffeomorphism.
Let r 1, let U Rn be open, let f : U Rm be C r , and assume the Jacobian
Df (x ) is non-singular at some x U ; in particular, this implies that Df (x )
is square, so m = n. By the inverse function theorem, the mapping f is a local
diffeomorphism at x . Moreover, the derivative of the inverse is the inverse of
the derivative: Df 1 (x ) = Df (x )1 . Thus, locally around x , the function f
behaves like a bijective mapping.
We can also examine the zeroes of a function as we vary its parameters. Let
r 1, let U Rn and P Rm be open, and let f : U P Rn be C r , where
we view p P as a parameter of the function. Assume that the derivative with
respect to x, denoted Dx f (x , p ), is non-singular at some (x , p ) U P . By
the implicit function theorem, there are open sets V U and Q P and a C r
33

function g : Q V such that g(p ) = x , and for all (x, p) V Q, f (x, p) = 0


holds if and only if g(p) = x. Moreover, Dg(p ) = Dx f (x , p )1 Dp f (x , p ).
That is, as we vary p near p , there is a unique solution (x1 , . . . , xn ) to the
system of equations
f1 (x1 , . . . , xn , p) =
..
.
fn (x1 , . . . , xn , p) =

near (x1 , . . . , xn ), and this solution varies in a C r way as a function of the


parameter p.
Two sets, say N Rn and M Rm , are C r -diffeomorphic to one another
if there exists an open set U Rn with N U and a C r diffeomorphism
f : U Rm such that f (U ) = M . In this case, the sets are essentially one and
the same, from a differentiable point of view.
A set M Rm is an n-dimensional C r manifold if it is locally diffeomorphic to
an open subset of Rn ; that is, for all x M , there is an open subset U Rn
with 0 U and a C r diffeomorphism f : U Rm such that f (0) = x and
f (U ) is an open subset of M in the relative topology, i.e., there is an open set
G Rn such that f (U ) = M G. Thus, at every x M , the manifold is locally
equivalent to m-dimensional Euclidean space. And M is an n-dimensional C r
manifold with boundary if for all x M , there is an open subset U Rn with
0 U and a C r diffeomorphism f : U Rm such that f (0) = x and the image
f (U (Rn1 R+ )) is an open subset of M in the relative topology. Given
x M and diffeomorphism f in the definition of C r manifold, we can view the
matrix Df (0) as a linear mapping from Rn to Rm , so range(Df (0)) is a linear
subspace of dimension n in Rm , and in fact it is independent of the particular
choice of the diffeomorphism f . This set, denoted Tx M , is the tangent space to
M at x. See Figure 5 for a one-dimensional manifold in R2 , where the image
f (U ) is the thick section of M .
For other examples, given any x Rn and r > 0, the open ball Br (x) is an
n-dimensional C -manifold, the disc Dr (x) is an n-dimensional C -manifold
with boundary, and the sphere Sr (x) is an (n 1)-dimensional C manifold.
In fact, every open subset of Rn is an n-dimensional C manifold. In R3 , the
projected ball of radius r > 0 around zero, {x R3 | ||x|| < r and x3 = 0},
is a two-dimensional C -manifold. A subset Y Rn is a zero-dimensional
manifold if and only if its elements are isolated, in the sense that for all x Y ,
there exists > 0 such that B (x) Y = {x}. An implication is that if Y is a
zero-dimensional manifold and is a compact subset of Rn , then it is finite. By
convention, the empty set is a manifold of all dimensions.
Given open U Rn and C r function f : U Rm with r 1, say y Rm is
34

{x} + Tx M

f
M

Tx M

Figure 5: One-dimensional manifold


a regular value of f if for all x U with f (x) = y, the Jacobian Df (x) at x
has full row rank. Otherwise, if there exists x U such that f (x) = y and
Df (x) has row rank less than m, then y is a critical value. Note that if y is
not in the range f (U ) of f , then it is by convention a regular value. Say x U
is a regular point if Df (x) has full row rank, and say x U is a critical point
if Df (x) does not have full row rank. That is, y is a regular value if every
element of f 1 ({y}) is a regular point, and it is a critical value of some element
of f 1 ({y}) is a critical point. By the preimage theorem, if y is a regular value
of f , then f 1 ({y}) is a C r manifold of dimension n m. In particular, if zero
is a regular value of f , then given a solution x = (x1 , . . . , xn ) U to the system
of equations
f1 (x1 , . . . , xn ) =
..
.
fm (x1 , . . . , xn ) =

0,

the set of solutions near x has a manifold structure, and each equation reduces
the dimensionality of the solutions near x by one. This justifies pictures, as
in Figure 2, that depict the level sets of differentiable functions f : R2 R as
one-dimensional, smooth curves: by the preimage theorem, the level set of f
at a value c will have these properties as long as the gradient of f is non-zero
everywhere on the level set. To see that the result holds only for regular values,
define f : R2 R by f (x, y) = x2 y 2 , and note that the level set of f at zero
is not smooth.
Again, consider an open set U Rn and a C r function f : U Rm . By Sards
theorem, if r > max{n m, 0}, then the set of critical values of f is measurable and has Lebesgue measure zero. An implication is that the set of regular
35

values of f is dense in Rm , i.e., for


all y Rm and all > 0, the open
ball B (y) contains at least one regular value. Furthermore, given any
function f : U R, almost all additive perturbations, e.g., f (x) + y with
y Rm , will have the property that
zero is a regular value: note that zero
is a regular value of f (x) + y if and
only if y is a regular value of f .
Given a nonempty, open, convex set U Rn , let f : U R Rn be a C r function with values f (x, t), where t [0, 1] is interpreted as a parameter. Define
g : U R Rn by g(x, t) = xf (x, t). With Sards theorem, which implies that
zero is a regular value for almost all additive perturbations of a given function, it
follows that for almost every additive perturbation of f (and therefore of g), zero
is a regular value; and furthermore, for almost every additive perturbation of f ,
zero is also a regular value of the function g0 : U R defined by f0 (x) = f (x, 0).
So assume zero is a regular value of g and g0 . By the preimage theorem, the set
of solutions to g(x, t) = 0, i.e., g 1 ({0}) = {(x, t) U R | g(x, t) = 0}, is a C r
manifold of dimension (n + 1) n = 1 in Rn+1 , so it is a union, essentially, of
loops and arcs. Assuming there is a compact set K U such that f (U
[0, 1]) K, Browders theorem implies that for all t [0, 1], there exists x K
such that f (x, t) = x; and furthermore, there exist a connected set C K [0, 1]
and x0 , x1 K such that (x0 , 0), (x1 , 1) C and for all (x, t) C, f (x, t) = x,
i.e., C g 1 ({0}). Thus, (x, t) C means that x is a fixed point of the function
f (, t) parameterized by t. As a connected subset of a one-dimensional manifold,
C is diffeomorphic to the open interval (0, 1) or to the unit sphere S1 (0). In the
latter case, however, Dg0 (x0 ) does not have full rank, contradicting the assumption that zero is a regular value of g0 . Thus, there is a connected, smooth path
of fixed points from the fixed point x0 , for parameter value t = 0, to the fixed
point x1 , for parameter value x = 1.15 This observation underlies homotopy
methods for computation of solutions to systems of equations. See Figure 6.
Next, consider open sets U Rn and P Rk , and let f : U P Rm be C r .
Assume zero is a regular value of f , i.e., for all (x, p) U P with f (x, p) = 0,
the Jacobian matrix Df (x, p) has full row rank. By the transversality theorem,
if r > n, then for almost all p P , the set
f 1 ({0}, p) =

{x U | f (x, p) = 0}

is a C r manifold of dimension n m. Intuitively, if given any (x , p ) satisfying


f (x , p ) = 0, we can obtain values of f in an open set around zero by arbitrarily
15 See Theorem 2 of Mas-Colell (1974) A Note on a Theorem of F. Browder, Mathematical
Programming, 6: 229233.

36

x1

x0

Figure 6: Smooth path of solutions

small variations of (x, p), so that Df (x, p) has full row rank, then even if the
set of solutions (x1 , . . . , xn ) to the equations
f1 (x1 , . . . , xn , p) =
..
.

fm (x1 , . . . , xn , p) =

has poor structure, almost all perturbations p of the system will produce a set
of solutions with a manifold structure.
It is noteworthy that the full rank condition in the transversality theorem allows
variation of the parameter p as well as x; thus, if the parameterization of f is
sufficiently rich, the condition should hold. An application of interest is the case
in which m > n. Then, under the assumptions of the theorem, for almost all
p P , the set of solutions to f (x, p) = 0 has dimension n m < 0, which means
that the set is empty: there are no solutions to the system. This is depicted
in Figure 7 for the case n = k = 1 and m = 2. Here, we write Dx f (x , p )
for the first column of the Jacobian of f (corresponding to the variable x)
and Dp f (x , p ) for the second column (corresponding to p), and clearly these
columns have rank two, so the Jacobian of f has full row rank at (x , p ).
Assuming the rank condition does not fail at any other solutions to f (x, p) = 0,
zero is a regular value of f , and the transversality theorem applies. Although
x does solve the system of equations given parameter p , an increase of p to,
say, p , shifts the trajectory of f given p (the solid line) to the trajectory given
p (the dashed line), and there is no solution.
For a very simple application of the transversality theorem, let U Rn be open
37

x = x

Df p (x , p )

p = p

Df x (x , p )
p = p

p = p

Figure 7: Transversality theorem


with n 2, let f : U R be a C n+1 function, and assume that the gradient of
f is non-zero on the level set of f at value c. To prove that this level set has
Lebesgue measure zero, define the mapping g : R U R2 by


f (x) c
g(z, x) =
.
z
The Jacobian matrix of g is Dg(z, x) = [ 01 Df0(x) ], which has full row rank at all
(z, x) such that g(z, x) = 0. By the transversality theorem, for almost all x U ,
the set {z U | g(z, x) = 0} has dimension no greater than 2 3 = 1 < 0, i.e.,
for almost all x, the set {z U | g(z, x) = 0} is empty, which implies f (x) 6= c.
Thus, the level set of f at value c has Lebesgue measure zero.
Finally, consider X Rn and open P Rk , let f : X P R be continuous.
Given x X, let fx : P R be the mapping defined by fx (p) = f (x, p), and
similarly given p P , define fp : X R by fp (x) = f (x, p). Assume that for
all x X, fx is C 1 , and moreover that the partial derivatives Dj fx (p) are continuous in (x, p) for each j = 1, . . . , k. Furthermore, assume that for all distinct
x, x X and all p P with f (x, p) = f (x , p), we have Dfx (p) 6= Dfx (p).
Then for almost all p P , the function fp (x) has at most one maximizer. Note
that if g : X R is any continuous function, and if f is a linear perturbation
of g, i.e., k = n and f (x, p) = g(x) + p x, then the assumptions of the latter
result hold: if x 6= x , then Dfx (p) = x 6= x = Dfx (p). Thus, we can perturb
any continuous function so that it has at most on maximizer.

Measurable Functions

Given a measurable set X Rn , a function f : X R is measurable if for all


c R, {x X|f (x) c} is measurable. The following are equivalent definitions:
{x X|f (x) > c} is measurable for all c R,
38

{x X|f (x) c} is measurable for all c R,


{x X|f (x) < c} is measurable for all c R,

f 1 (G) is measurable for all open G R.

Implications are that level sets of a measurable function are measurable, and every continuous function with measurable domain is measurable.16 A measurable
function is sometimes referred to as a random variable.
Measurable functions can be combined in a number of ways to produce new
measurable functions. Let f, g : X R be measurable, let {fm } be a sequence
of measurable functions fm : X R, m = 1, 2, . . ., and let , R. Then:17
1. f + g is measurable,

2. f g is measurable,
3. max{f (x), g(x)} is measurable as a function of x,
4. if h : X R satisfies fm (x) h(x) for all x X, then h is measurable.

In words, property 4 means that pointwise limits of measurable functions are


measurable. Note that, in contrast, continuity is not preserved by pointwise limits: recall fm : [0, 1] R defined by fm (x) = max{1 mx, 0}, which converges
pointwise to a discontinuous function. The conclusion of property 4 continues to
hold if f is the pointwise limit of {fm } almost everywhere, i.e., if fm (x) f (x)
for -almost all x, then f is measurable. Furthermore, this implies that if
{fm (x)} is increasing and bounded outside a measurable set Z Rm with measure zero, then the function h : X R defined by h(x) = sup fm (x) for
x Rn \ Z, and h(x) = 0 otherwise, is measurable.
A measurable function, in combination with a measure on its domain, induces a
distribution on its range. Given a measure on Rn , a measurable set X Rn ,
and a measurable function f : X R, we avoid technical measurability issues
by assuming that for every measurable set Y R with Lebesgue measure zero,
the set f 1 (Y ) = {x X | f (x) Y } is measurable and has -measure
zero. Then we can define a measure f on R as follows: for every measurable
Y R, f (Y ) = (f 1 (Y )).18 This is the distribution of f . The distribution
16 In some contexts, a measurable function may be intended to mean Borel measurable, i.e.,
for all open G R, f 1 (G) is a Borel set. In other contexts, an abstract -algebra may be
given, and measurability may be defined with respect to it. These more abstract notions of
measurability are for the most part outside the scope of this survey.
17 The composition of measurable functions in our framework is not guaranteed to be measurable, because some Lebesgue measurable sets are not Borel measurable.
18 By construction, the distribution = f 1 is a Borel measure, but the technical issue
f
is whether the Lebesgue measurable sets are included among the f -measurable sets. We
simplify matters by imposing the absolute continuity condition. By Theorem 10.23 (part 7)
of Aliprantis and Border (2006), every Lebesgue measurable set Y Rn is the disjoint union
of a Borel set B and a Lebesgue measure zero set C. By Theorem 10.23 (part 6), there is a
Borel set C C with Lebesgue measure zero. Then C is f -measurable with f (C ) = 0,
and it follows that C is measurable, and therefore so is Y = B C.

39

of f generated by is sometimes denoted f 1 . Reversing the direction of


the analysis, letting be a finite measure on R, we can ask whether is the
distribution of some random variable. In fact, we can generate any given finite
measure using Lebesgue measure on the unit interval: by Skorohods theorem,
for every finite measure on R, there exists a measurable function f : [0, 1] R
such that = f 1 .
Measurable functions are almost continuous, in the sense that discontinuities of
a measurable function can be confined to sets of arbitrarily small measure. In
fact, we can approximate any sequence of functions in this way. Let X Rn be
measurable, and let {fm } be a sequence of measurable functions fm : X R.
By Lusins theorem, if is finite, then for all > 0, there is a compact subset
K X such that (X \ K) < and the restriction fm |K to K of each fm is
continuous, m = 1, 2, . . . , with the relative topology on K. As a digression,
given any closed subset F Rn and any continuous function f : X R, an
implication of the Tietze extension theorem is that it can be extended to a
continuous function on Rn , i.e., there exists continuous g : Rn R such that
g|F = f .
Given measurable X Rn and a non-negative, measurable
R function f : X R,
the Lebesgue integral (or simply integral ) of f , denoted f (x)dx, is



m2
X
k 1
k
k1
,

x X
f (x) <
lim
m
m
m
m
k=1

R
which may be infinite. See Figure 8. If f (x)dx < , then f is Lebesgue
integrable (or simply integrable). For general measurable f (taking possibly
negative values), define
f + (x) = max{f (x), 0}

and

f (x) = min{f (x), 0}.

If either f + or f are integrable, then the integral of f is


Z
Z
Z
f (x)dx =
f + (x)dx f (x)dx.
R
If both f + and f are integrable, then < f (x)dx < , and f is Lebesgue
n
integrable (or simplyR integrable). If the integral
R of f is well-defined and Y R
is measurable, then Y f (x)dx is written for f (x)IXY (x)dx, where recall that
IXY : X R is the indicator function of the set X Y .
Every Riemann integrable function f : [a, b] R is Lebesgue integrable, and
R
Rb
for such functions the integrals coincide, i.e., f (x)dx = a f (x)dx. Since continuous functions are Riemann integrable, the fundamental theorem of calculus
implies that if f R: [a, b] R is continuous, then the mapping F : [a, b] R dex
fined by F (x) = a f (x)dx is C 1 , and DF (x) = f (x) for all x [a, b]; conversely,
40

Figure 8: Lebesgue integral


if G : [a, b] R is C 1 , then for all x [a, b], we have G(x) G(a) =
for all x [a, b].

Rx
a

DG(x)dx

For a general measure on Rn , measurable X Rn , and measurable function


Rf : X R with non-negative values, the integral of f with respect to , denoted
f (x)(dx), is



m2
X
k 1
k
k1
,

x X
f (x) <
m
m
m
m
lim

k=1

R
which may be infinite. If f (x)(dx) < , then f is integrable with respect to
(or -integrable). For general measurable f : X R, if either f + or f are
integrable with respect to , then the integral of f with respect to is
Z
Z
Z
+
f (x)(dx) =
f (x)(dx) f (x)(dx).
If both f + or f are -integrable, then f is -integrable. Say f is essentially bounded with respect to (or -bounded ) if there exists c > 0 such that
({x X | f (x) > c}) = 0. If
R is finite and Rf is -bounded, then f is for f (x)IXY (x)(dx). If is a
integrable. As before, we write Y f (x)(dx)
R
probability measure, then the integral
f
(x)(dx)
R
R is called the expected value
of f , and the vector of integrals ( x1 (dx), . . . , xn (dx)) is the expected value
(or mean) of x.
Let be a measure on Rn , let X Rn be measurable, and let f, g : X R be
measurable. Then:
R
1. if f is non-negative, then f (x)(dx) = 0 if and only if f (x) = 0 for
-almost all x,
41

R
R
2. if
R either f or g is -integrable, then [f (x) + g(x)](dx) = f (x)(dx) +
g(x)(dx),
R
R
3. for all R, f (x)(dx) = f (x)(dx),
R
R
4. if f (x) g(x) for -almost all x X, then f (x)(dx) g(x)(dx).

Note that property 2 implies that if A and B are disjoint measurable sets, then
Z
Z
Z
f (x)(dx) =
f (x)(dx) +
f (x)(dx).
AB

And properties 3 and 4 imply that | f (x)(dx)|

|f (x)|(dx).

Let {fm } be a sequence of -integrable functions fm : X R, m = 1, 2, . . .,


and suppose the functions are dominated by a -integrable function g : X R,
in the sense that g(x) |fm (x)| for all m and -almost all x X. Then by
Fatous lemma, we have
Z
Z
lim sup fm (x)(dx) lim sup fm (x)(dx),
m

and ignoring the subset of the domain on which it takes infinite values, the
integrand on the left-hand side, lim sup fm (x), is a -integrable function of x.
More precisely, if we define f (x) = 0 when lim supm fm (x) = and f (x) =
lim supm fm (x) otherwise, the function f is integrable. The assumption that
{fm } is dominated by integrable g can be weakened somewhat: it suffices to
assume that each fm is dominated by a -integrable gm : X R, and that there
is a -integrable
function gR: X R satisfying gm (x) g(x) for -almost all
R
x X and gm (x)(x) g(x)(dx).

An easy consequence of Fatous lemma is the following. Let {fm } be a sequence of -integrable functions f : X R, m = 1, 2, . . .. Assume that for
-almost Rall x X, the sequence {fm (x)} is increasing, and assume that
limm fm (x)(dx) < . Then by Levis monotone convergence theorem,
there exists a -integrable
R function f : X
R R such that fm (x) f (x) for
-almost all x X and fm (x)(dx) f (x)(dx).

Again let {fm } be a sequence of -integrable functions fm : X R, and suppose


that they are dominated by a -integrable function g : X R. If f : X R
satisfies fm (x) f (x) for -almost all x X, then Lebesgues dominated
convergence theorem states that f is -integrable and
Z
Z
f (x)(dx) = lim
fm (x)(dx).
m

Thus, integrals are (modulo domination by a -integrable function) continuous


with respect to pointwise limits of -integrable functions.
42

This result has useful implications for integrals of parameterized functions. Let
X Rn be measurable, let P Rk , and consider any bounded function
f : X P R, where we view p P as a parameter. Assume that for all
x X, the function fx : P R defined by fx (p) = f (x, p) is continuous; and
assume that for all p P , the function fp : X R defined by fp (x) = f (x, p)
is measurable. Suppose further that there is a -integrable function g : X R
such that for -almost all x and for all p, |f (x, p)| g(x). Let {pm } be a
sequence of parameters
convergingRto p in P . Then the parameterized inteR
grals
converge:
f
(x,
p
)(dx) f (x, p)(dx). In other words, the integral
m
R
f (x, p)(dx) is a continuous function of the parameter p. While these limiting
results are stated for a fixed measure , in Section 10 we allow the measures to
vary in a continuous way along the sequence.
Given a measure on Rn , a density function for is any measurable function
f : Rn R R with non-negative values such that for all measurable Y Rn ,
we have RY f (x)dx = (Y ). If f is a density function and is a probability
measure, f (x)dx = 1, then it is a probability density function for , but
sometimes the term density function is used to refer to a probability density
function. A measure on Rn is absolutely continuous (with respect to Lebesgue
measure) if for all measurable Y , (Y ) = 0 implies (Y ) = 0. For example, any
measure defined by a density function is absolutely continuous with respect to
Lebesgue measure; for a counterexample, the counting measure is not absolutely
continuous. The Radon-Nikodym theorem states that if a measure on Rn is
finite and absolutely continuous with respect to Lebesgue measure, then it has
a density that is unique up to sets of measure zero, i.e., if f and g are both
densities for , then f (x) = g(x) for almost all x.
More generally, given two measures and on Rn , say is absolutely continuous
with respect to , written , if for all measurable Y Rn , (Y ) = 0 implies
(Y ) = 0. The Radon-Nikodym theorem establishes that if is finite and
is -finite, then there is a measurable function f : Rn RR with non-negative
values such that for all measurable Y , we have (Y ) = f (x)(dx), where f
is the density of with respect to . Conversely, starting with an arbitrary
measurable mapping f : Rn R with non-negative values and any measure
on Rn , we can define a mapping from measurable subsets Y R Rn to the
extended real numbers (possibly including infinity) by (Y ) = Y f (x)(dx).
Then , so-defined, is a measure on Rn ; it is absolutely continuous with respect
to ; and f is a density for with respect to .
Given a probability measure , the expected value of a concave function f is
less than or equal to the value of the function at the mean of x. To be more
precise, let X R be measurable and convex, let be a probability measure
on R with (X) = 1, and let f : X R be concave; then f is measurable (in
fact, it is directionally differentiable at -almost all x int(X)), and Jensens

43

inequality states that


f

Z


x(dx)

f (x)(dx).

Moreover, strict inequality holds if f is strictly concave and is not degenerate


on some x. More generally, if X Rn is measurable and convex, if is a
probability measure on Rn , and if f : X R is concave, then f is measurable,
and
Z

Z
Z
f
x1 (dx), . . . , xn (dx)

f (x)(dx),
again with strict inequality if f is strictly concave and is not degenerate on
some x.
We can define measurability and integration for vector-valued functions as well.
Given measurable X Rn , a mapping f : X Rm with values f (x) =
(f1 (x), . . . , fm (x)) is measurable if each component mapping fi : X R is measurable, i = 1, . . . , m; equivalently, f is measurable if for all open G Rm , the
inverse image f 1 (G) is measurable. For a measure on Rn , the integral of the
vector-valued function is just the vector of integrals of its components,
Z

Z
Z
f (x)(dx) =
f1 (x)(dx), . . . , fm (x)(dx) ,
and f is -integrable if each component fi is -integrable,
R i = 1, . . . , m. When
is a probability measure, the expected value of f is just f (x)(dx), permitting
a more compact statement of Jensens inequality:
given measurable
and convex
R
R
X Rn and f : X R concave, we have f ( x(dx)) f (x)(dx).
A set K of measurable functions f : X Rm is uniformly -integrable if for all
> 0, there exists > R0 such that for all f K and all measurable Y X
with (Y ) < , we have Y ||f (x)||(dx) < . In other words, for every sequence
{Yk } of measurable subsets with (Yk ) 0, we have
Z

sup
||f (x)||(dx) | f K 0
Yk

as k . I confirm in the appendix that, as one would expect, if is finite and


f : X R is -integrable, then the singleton set {f } if uniformly -integrable.
As well, if the set K of functions is dominated by a -integrable function g : X
R, in the sense that g(x) ||f (x)|| for all f K and -almost all x X, then
K is uniformly -integrable.
We can state a generalized form of Fatous lemma in multiple dimensions. Let
{f k } be a sequence of -integrable functions f k : X Rm , assume that the
44

functions
are dominated by a -integrable function g : X R, and assume that
R
{ f k (x)(dx)} is a convergent sequence. Then there is a -integrable function
f : X Rm such that:19
(i) for -almost all x X, f (x) ls({f k (x)}),
R
R
(ii) f (x)(dx) = limk f k (x)(dx).

Thus, we can select from the pointwise accumulation points ofR {f k (x)} for almost all x in a way that preserves the limit of the integrals { f k (x)(dx)}.

10

Convergence of Measures

We can consider a sequence {m } of measures on Rn and, as with functions,


define several compelling notions of convergence to another measure on Rn .20
The sequence {m } converges to . . .
uniformly set-wise if for every sequence {Ym } of measurable sets of Rn ,
we have |m (Ym ) (Ym )| 0,
set-wise if for every measurable set Y Rn , we have m (Y ) (Y ),
weakly if for every measurable set Y Rn with (bd(Y )) = 0, we have
m (Y ) (Y ).
Uniform set-wise convergence is usually called convergence in total variation
norm (or sometimes strong convergence). Set-wise convergence is sometimes (at
the risk of confusion) called strong convergence. Sometimes weak convergence
is referred to as weak * convergence.
These convergence concepts have equivalent formulations. First, m
uniformly set-wise if and only if for all sequences {fm } of measurable functions
fm : Rn RR with sup{|fm (x)| | x Rn } 1 for all m, we have
R
| fm (x)m (dx) fm (x)(dx)| 0. Second, m set-wise
if and only
R
n
if
for
all
bounded,
measurable
functions
f
:
R

R,
we
have
f
(x)
m (dx)
R
f (x)(dx). And third, m weakly if and only if any of the following
conditions hold:
19 See

Lemma 3, p. 69, of Hildenbrand (1974).


notions of convergence are often defined on the -algebra of Borel sets. Because
the Lebesgue measurable sets are the completion of the Borel sets, Theorem 10.23 (part
6) of Aliprantis and Border (2006) implies that the difference is immaterial. Indeed, given
Lebesgue measurable Y : there exists Borel measurable B0 such that Y B0 and (Y ) =
(B0 ); and for each m,
T there exists Borel measurable Bm such that Y Bm and m (Y ) =
m (Bm ). Then B =
m=0 Bm is Borel measurable, and we have (Y ) = (B) and for all
m, m (Y ) = m (B). Thus, uniform set-wise and set-wise convergence defined for Borel sets
imply the conditions defined here. Weak convergence can be formulated in topological terms
(see conditions (i)(iii) following the definition), so a sequence of measures defined on the
Borel -algebra converges weakly if and only if the completions of those measures converge
weakly.
20 These

45

Figure 9: Sawtooth sequence


(i)

f dn

f d for all bounded, continuous functions f : Rn R,

(ii) lim supn n (F ) (F ) for each closed F Rn ,


(iii) lim inf n n (G) (G) for each open G Rn .
In fact, weak convergence is often defined using condition (i).
The above convergence notions are listed in decreasing strength: uniform setwise convergence implies set-wise convergence, which implies weak convergence.
To see that weak convergence does not generally imply set-wise, let n = 1
1
1
1
and m be the unit mass on m
, i.e., m ({ m
}) = 1 and m (R \ { m
}) = 0.
Then {m } converges weakly to the measure defined as the unit mass on
{0}, but m ({0}) = 0 for all m, so the sequence does not converge set-wise to
. To see that set-wise convergence does not generally imply uniform set-wise
convergence, let f1 = 2I[0,1/2] be two times the indicator function of [0, 1/2], let
f2 be two times the indicator function of [0, 1/4] [2/4, 3/4], let f3 be two times
the indicator function of
[0, 1/8] [2/8, 3/8] [4/8, 5/8] [6/8, 7, 8],
m

1
2
k
and so on. In general, given m, define intervals Im
, . . . , Im
so that Im
=
m
m
[(k 1)/2S , k/2 ], and let fm : [0, 1] R be two times the indicator function
k
of Jm = {Im
| k = 1, . . . , 2m 1, odd }. Note that the Lebesgue measure of
each Jm is one half. As depicted in Figure 9, these functions have an increasingly
jagged, sawtooth appearance (I will refer to this useful
R example several times
in the sequel). Defining the measure m by m (Y ) = fm (x)dx for all measurable Y R, it can be shown that the sequence {m } converges set-wise to the
uniform distribution, i.e., the measure given by the density f = I[0,1] . But then
Z
Z
1
[2 1]dx =
m (Jm ) (Jm ) = [fm (x) f (x)]dx =
2
Jm

for all m, so the sequence does not converge uniformly set-wise.


Extending the original definition for a single finite measure, we say a set M of
probability measures on Rn is tight if for all > 0, there is a compact set K such
46

that for all M , we have (K) > 1 . If the sequence {m } of probability


measures converges weakly to , then the set {m | m = 1, 2, . . .} {} is
tight.21
A generalized form of Fatous lemma allows the integrating measure to vary
in a set-wise continuous way: let {m } be a sequence of measures on Rn that
converge set-wise to the measure on Rn , let X Rn be measurable, let
{fm } and {gm } be sequences of measurable functions fm , gm : X R, and let
f, g : X R be measurable functions such that fm is dominated by gm , i.e., for
all m and -almost all x, |fm (x)| gm (x), and such that fm (x) f (x) and
gm (x) g(x) for -almost all x X. Then:22
Z
Z
gm (x)m (dx)
g(x)(dx) <
implies
Z

f (x)(dx)

lim sup
m

fm (x)m (dx),

with a corresponding generalization of Levis monotone convergence theorem.


A similar generalization is possible for Lebesgues dominated convergence theorem, as follows: let {m } be a sequence of measures on Rn that converge
set-wise to the measure on Rn , let X Rn be measurable, let {fm } and {gm }
be sequences of measurable functions fm , gm : X R, let f, g : X R be measurable functions such that fm is dominated by gm and such that fm (x) f (x)
and gm (x) g(x) for -almost all x X. Then:23
Z
Z
gm (x)m (dx)
g(x)(dx) <
implies
Z

fm (x)m (dx)

f (x)(dx) < .

Of course, if the sequence {fm } is uniformly bounded, in the sense that sup{f (x) |
x X, m N} < , then the antecedent is automatically satisfied, and we obtain convergence of the integrals.
This result has obvious implications for integrals of parameterized functions.
Let X Rn be measurable, let P Rk , and consider any bounded function
21 See Theorem 15.22 of Aliprantis and Border (2006). Their result is stated for probability
measures defined on the Borel -algebra, but we can apply it to the sequence of probability
measures restricted to the Borel sets.
22 See Proposition 17, p.269, of Royden (2008). Royden considers the liminf of integrals of
non-negative functions, so we apply his result to {gm fm } and g f .
23 See Proposition 18, p.270, of Royden (2008).

47

f : X P R. Assume that for all x X, the function fx : P R defined


by fx (p) = f (x, p) is continuous; and assume that for all p P , the function
fp : X R defined by fp (x) = f (x, p) is measurable. Let {m } be a sequence
of finite measures on Rn converging strongly to , and let {pm } be a sequence
of
R parameters converging
R to p in P . Then the parameterized integrals converge:
f (x, pm )m (dx) f (x, p)(dx).

The latter formulation allows for a variable integrating measure, but it assumes
that the sequence {m } converges to set-wise. We can relax this to weak convergence if we impose correspondingly stricter requirements on the sequence of
functions. Let {m } be a sequence of probability measures on Rn that converge
weakly to the probability measure on Rn , let X Rn be measurable, let F
be a set of measurable functions f : X R that is uniformly bounded, in the
generalized sense that sup{f (x) | x X, f F } < . Furthermore, assume
that for every x, the function sx : X R defined by
sup{|f (x) f (y)| | f F }
R
R
is continuous. Then implies that f (x)m (dx) f (x)(dx) uniformly in f ,
i.e.,
R for all > 0, Rthere exists k such that for all m k and all f F , we have
| f (x)m (dx) f (x)(dx)| < .24 Now let {fm } be a uniformly bounded sequence of functions fm : X R converging
function
R pointwise to the continuous
R
f : X R. In the appendix, I show that fm (x)m (dx) f (x)(dx), which
provides a version of Lebesgues dominated convergence theorem that allows the
integrating probability measure to vary in a weakly continuous way.
sx (y) =

The preceding formulation of the dominated convergence theorem has the following implication for parameterized integrals. Let X Rn be measurable, let
P Rk , and consider any bounded function f : X P R. Assume that for
all x X, the function fx : P R defined by fx (p) = f (x, p) is continuous; and
assume that for all p P , the function fp : X R defined by fp (x) = f (x, p) is
measurable. Let {m } be a sequence of probability measures on Rn converging
weakly to , and let {pm } be a sequence of parameters converging to p in P
such
that f (x, p) is continuous
in x. Then the parameterized integrals converge:
R
R
f (x, pm )m (dx) f (x, p)(dx). In comparison with the previous convergence result, the advantage is that we allow weak convergence of the integrating
measure, but the disadvantage is that we assume the limiting function f (x, p)
is continuous in x, and we focus on probability measures.

11

Convergence of Functions

We can also consider several different notions of convergence for a sequence of


measurable functions in addition to uniform and pointwise convergence already
24 See Exercise 8, p.17, of Billingsley (1968) Convergence of Probability Measures, John
Wiley and Sons: New York, NY.

48

defined. Let be a measure on Rn , let X Rn be measurable, let {fm } be a


sequence of measurable functions fm : X R, and let f : X R be measurable:
the sequence {fm } converges to f . . .
uniform almost everywhere if there is a -measure zero set Z X such
that the functions are bounded and fm f uniformly on outside Z, i.e.,
|fm | + |f | is bounded on X \ Z, and
sup{|fm (x) f (x)| | x X \ Z} 0,
pointwise almost everywhere if for -almost every x X, fm (x) f (x),
in pth mean if |fm |p + |f |p is -integrable for all m, and
Z
|fm (x) f (x)|p (dx) 0,
in measure if for all > 0, we have ({x X | |fm (x) f (x)| > }) 0,
where p [1, ). Uniform almost everywhere convergence could be called convergence in essential supremum metric or convergence in L -metric; pointwise
almost everywhere convergence is sometimes called almost sure convergence;
convergence in pth mean could be called convergence in Lp -metric, and when
p = 1, convergence in pth sometimes called convergence in mean; and when is
a probability measure, convergence in measure is sometimes called convergence
in probability.
The above convergence notions are listed in roughly decreasing strength, as
depicted below. Clearly, uniform almost everywhere convergence implies pointwise almost everywhere convergence. If fm f pointwise almost everywhere,
then the sequenceR converges in pth
mean
as long as |fm (x)|p (dx)
R
uniform a.e.
|f (x)|p (dx) (and |fm |p + |f |p is integrable for all m).25 Uniform almost
pointwise a.e.

everywhere convergence implies converfinite


R
gence in pth mean for all p [1, ). If
|fm (x)|p (dx)
R
|f (x)|p (dx)
fm f in pth mean, if p > q, and
if is finite, then the sequence conin pth mean, p [1, )
verges in qth mean. Convergence in pth
p = 1,
R
|fm |
R
mean implies convergence in measure;
|f |
and assuming is finite, pointwise alin measure
most everywhere convergence also implies convergence in measure. In the
converse direction, if fm f in pth mean, then there is a subsequence of
{fm } that converges to f pointwise almost everywhere. I show in the appendix
R
R
condition |fm (x)|p (dx) |f (x)|p (dx) will hold, by Lebesgues dominated convergence theorem, if the sequence {|fm |p } is dominated by a -integrable function.
25 The

49

that
if fm f inR measure, then the sequence converges in mean as long as
R
|fm (x)|(dx) |f (x)|(dx) (and |fm | + |f | is -integrable for all m). Convergence in measure implies that a subsequence of {fm } converges to f pointwise
almost everywhere; in fact, assuming is finite, a sequence {fm } converges to
f in measure if and only if every subsequence of {fm } itself possesses a subsequence that converges pointwise almost everywhere to f . In the above diagram,
dashed arrows indicate that the direction of implication holds for a subsequence
of functions.
It may also be useful to consider counterexamples for some unstated directions of implication. That pointwise almost everywhere convergence does not
imply uniform almost everywhere can be seen from the sequence of functions
fm : [0, 1] R defined by fm (x) = max{1 mx, 0}. To see that convergence
in pth mean and convergence in measure do not imply pointwise convergence,
let f1 = I[0,1/2] , f2 = I[1/2,1] , f3 = I[0,1/4] , f4 = I[1/4,1/2] , etc. In general, to define
Pk fm for m 2, let k be the greatest nonnegative integer such
that n = =1 2 < m, and let fm be the indicator function of the interval
[ mn1
, mn
]. These functions cycle continuously through the unit interval
2k+1
2k+1
1
with ever smaller support (i.e., the Lebesgue measure of fm
({1}) becomes arbitrarily small). Thus, they converge to the function f = 0 in pth mean and
in measure, but they do not converge pointwise almost everywhere. To see
Rthat convergence inR measure does not imply convergence in pth mean unless
|fm (x)|(dx) |f (x)|(dx), let fm = mI[1 m1 ,1] , so that {fm } converges
1
to f R= 0 in measure; indeed, ({x R | |fm (x) f (x)| > 0}) = m
0.
p
But |fm (x)| dx = 1 for all m, so it does not converge in pth mean for any
p [1, ). Finally, to see that pointwise almost everywhere convergence does
not imply convergence in measure unless is finite, define fm : R R by
fm = I[m1,m] for all m, and note that {fm } converges pointwise to f = 0,
but ({x R | |fm (x) f (x)| 1}) = 1 for all m, so it does not converge in
measure. See Figure 10.
To see that convergence in pth mean does not generally imply convergence in
qth mean when p > q unless is finite, note that for arbitrary c > 0, we can
c
. Thus, for each
choose y > 0 sufficiently close to zero such that ln(y) < pq
m, we can choose ym > 0 satisfying q ln(ym ) > ln(m) + p ln(ym ), and then we
q
can choose m = 1/ym
> 0, which implies
q ln(ym ) = ln(m ) > ln(m) + p ln(ym ),
or equivalently, m (ym )q = 1 > mm (ym )p . Now define f1 = y1 I[0,1 ] as ym
times the indicator function of [0, 1 ], define f2 = y2 I[1 ,1 +2 ] , etc. In general,
define fm = ym I[m1 ,m1 +m ] , and let f = 0 be identically zero. Then
Z
1
|fm (x) f (x)|p dx = m (ym )p <
,
m
50

pointwise a.e.

in pth mean and

in measure

pointwise a.e.

does not imply


uniform a.e.

in measure
does not imply
pointwise a.e.

does not imply

does not imply

in pth mean

in measure

Figure 10: Convergence counterexamples

51

so fm f in pth mean. But


Z
|fm (x) f (x)|q dx

m (ym )q = 1,

so the sequence does not converge in qth mean.


Pointwise almost everywhere convergence almost implies almost uniform convergence. Indeed, let X Rn be measurable, and let {fm } be a sequence of
measurable functions fm : X R that converges pointwise almost everywhere
to f : X R. By Egoroff s theorem, if is finite, then for all > 0, there is
a measurable set Y X such that (Y ) < and the sequence {fm |X\Y } of
restricted functions converges uniformly to f |X\Y .
Next, we define a class of convergence concepts parameterized by p ranging from
1 to . For p (1, ), let q (1, ) satisfy p1 + 1q = 1, so that p and q are
conjugate. Let X Rn be measurable, let {fm } be a sequence of measurable
functions fm : X R, and let f : X R be measurable: the sequence {fm }
converges to f . . .
weakly if |fm |+|f | is -integrable for all m, and for all measurable g : X
R such that g is -bounded, we have
Z
(fm (x) f (x))g(x)(dx) 0.
weakly order p if |fm |p +|f |p is -integrable for all m, and for all measurable
g : X R such that |g|q is -integrable, we have
Z
(fm (x) f (x))g(x)(dx) 0.
weak * if |fm | + |f | is -bounded for all m, and for all -integrable g : X
R, we have
Z
(fm (x) f (x))g(x)(dx) 0.
Note that the definition of weak convergence depends on the order p, but this
dependence by convention is sometimes left implicit. Then for p (1, ), weak
convergence order p is simply referred to as weak convergence, with p understood;
and specifically when 1 < p < , it is also referred to as weak * convergence. To
gain intuition, it may be useful to consider the case in which fm (x) f (x) for
all m and -almost all x X; in this case, when is finite, there is no loss of
Rgenerality in assuming that g is a probability density function, so the integral
(fm (x) f (x))g(x)(dx) can be interpreted as the expected value of fm f .
Then fm f weakly if the expected value of the functions fm converge to the
expected value of f for every probability density.
52

To relate these convergence notions to those introduced previously, we note


that weak convergence is, as the name suggests, weak. Indeed, convergence in
pth mean implies weak convergence order p. That the converse does not hold
generally can be seen from the sawtooth example in Figure 9, where the sequence
{fm } does not converge in pth mean, but it does converge weakly order p to the
function f that is identical to one, i.e, to f = I[0,1] .
A last concept of convergence of functions makes use of the notion of weak
convergence of measures. Consider a measurable set X Rn , a measure
on Rn , and a measurable function f : X Rm satisfying, as in Section 9,
the following absolute continuity condition: for every measurable set Y Rm
with Lebesgue measure zero, the set {x X | f (x) Y } is measurable and
has -measure zero. Recall that the distribution of f is f . Previous notions
of convergence have required the sequence of functions to be defined on the
same domain with respect to the same measure, but we can now generalize
this considerably. For each k = 1, 2, . . ., let X k Rnk be a measurable set,
let k be a measure on Rnk , let f k : X k R be measurable, let X Rn be
measurable, let be a measure on Rn , and let f : X Rm be a measurable
function. Assume that each fk satisfies the absolute continuity condition with
respect to k , and that f satisfies the condition with respect to . Let k be
the distribution of f k for k = 1, 2, . . ., and let be the distribution of f . Then
{f k } converges to f in distribution if k weakly. By a change of variables,
the sequence converges in distribution
if and only if for
R
R all bounded, continuous
functions g : Rm R, we have g(f k (x))k (dx) g(f (x))(dx). Note that
by setting g identically equal to one, a necessary condition for convergence in
distribution is that k (X) = (X) for all k.
If the functions fm are real-valued, i.e., m = 1, and they are defined with respect to the same finite measure = k , then convergence in pth mean implies
convergence in distribution. Indeed, otherwise there is a bounded, continuous g, a subsequence (stillR indexed by k for simplicity), and an > 0 such
that for all k, we have | (g(f k (x)) g(f (x)))(dx)| . But by convergence in pth mean, there is a subsequence (still indexed by k) such that f k
converges pointwise outside a measurable set with -measure zero. Given any
x outside that exceptional set, continuity of g implies g(f k (x)) g(f (x)).
Since g is bounded and is finite, the sequence {g f k } is dominated by a
-integrable
function, andR then Lebesgues dominated convergence theorem imR
plies g(f k (x))(dx) g(f (x))(dx), a contradiction. The same argument,
combined with a version of Lebesgues dominated convergence theorem from
Section 10, shows that pointwise almost everywhere convergence implies convergence in distribution, as long as k set-wise.
We can now state a last generalization of Lebesgues dominated convergence theorem that replaces set-wise and weak convergence with convergence in distribution. For each k = 1, 2, . . ., let X k Rnk be measurable, let k be a measure on
53

Rnk , and let fk : X k R be a measurable function; and let X Rn be measurable, let be a measure on Rn , and let f : X Rm be a measurable function.
Assume that f , along withR each fk , satisfies the absolute continuity condition,
and assume that lim supk |fk (x)k (dx)| < .R If the sequence {f
R k } converges
in distribution to f , then f is -integrable, and fk (x)k (dx) f (x)(dx).26

12

Product Measurability

For all j = 1, 2 . . ., let Lj be the collection of Lebesgue measurable subsets


of Rj , and let j be the corresponding Lebesgue measure.
P Recall that the
outer Lebesgue measure ,n is defined as the infimum of i n (Ri ) over collections of half-open rectangles Ri Rn covering any given set. Here, n (Ri ) =
1 ((a1 , b1 ]) 1 ((an , bn ]) is defined by taking the product of Lebesgue measures of half-open intervals in the real line. For all measurable sets Y R and
Z Rm , the product Y Z is a measurable subset of Rn , and in this case,
n (Y Z) = (Y )m (Z). Of course, there are measurable subsets of Rn that
are not product sets. It may be that a product set Y Z Rn is measurable
in Rn but Y is not measurable in R .27
To examine the consistency of the definition of Lebesgue measure, choose natural
numbers and m such that + m = n. Denote by P = (a1 , b1 ] (a , b ]
a half-open rectangle in R , and denote by Q = (a+1 , b+1 ] (an , bn ] a
half-open rectangle in Rm . Note that R = P Q is a half-open rectangle in
Rn , and that n (R) = (P )m (Q). Therefore, given any AP
Rn , the outer
,n
measure (A) can be equivalently defined as the infimum of i (Pi )m (Qi )
over collections of half-open rectangles Ri = Pi Qi covering A. Thus, we can
decompose rectangles in Rn arbitrarily and evaluate each factor using Lebesgue
measure for the Euclidean space of appropriate dimensionality. In short, the
construction of Lebesgue measure in multidimensional spaces is independent of
how we decompose rectangle sets. A useful implication of this observation is
that given A Rn and + m = n, we do not require a separate notion of outer
measure if we view A as a subset of R Rm rather than Rn . And given a
function f : X R defined on a set X = Y Z Rn , where Y R , Z Rm ,
and n = +m, we do not need to consider a different notion of integral if instead
of viewing f (x) as a function of x Rn , we view f (y, z) as a function of two
variables y R and z Rm . When the argument of a function is decomposed
in this way, we sometimes say it is jointly measurable to signify that it is
measurable.
Given measurable Y R and Z Rm , if a function f : Y Z R is mea26 See

statement 42, p.52, of Hildenbrand (1974).


see this, let Y be any non-measurable subset of R and Z be any measure zero subset
of Rm . Note that Y Z R Z, and that R Z is measure zero in Rn . Since Lebesgue
measure is complete, every subset of R Z is measurable in Rn , and therefore Y Z Ln .
27 To

54

surable, then it is measurable in each argument: for each y Y , the function


fy : Z R defined by fy (z) = f (y, z) is measurable, and for each z Z, the
function fz : Y R defined by fz (y) = f (y, z) is measurable.28 The converse
direction holds if f : Y Z R is a Caratheodory function, which means that
(i) for each y Y , fy is measurable, and (ii) for each z Z, fz is continuous. Thus, a Caratheodory function f (y, z) is defined to be measurable in one
argument and continuous in the other but is in fact jointly measurable. This
converse direction does not hold if (ii) is weakened to the assumption that fz
is measurable.29 Recall from Section 5, as well, that joint continuity of f does
not follow if (i) is strengthened to the assumption that fy is continuous.
The foregoing discussion suggests a general method for taking products of two
measures. Let be a measure on R , and let be a measure on Rm , and
assume that both measures are -finite and, to avoid measure-theoretic issues,
absolutely continuous with respect to Lebesgue measure. We will define a measure on Rn as follows. For all half-open rectangles P and Q in R and
Rm , respectively, define ( )(P Q) = (P )(Q).PDefine the outer measure
( ) (A) for every set A Rn as the infimum of i (Pi )(Qi ) over collections of half-open rectangles covering A. Defining the product measure as
the restriction of ( ) to Ln , I show in the appendix that the function
is indeed a measure on Rn ,30 i.e., it takes non-negative values and
(i) ( )() = 0,

(ii) for all pairwise disjoint collections A1 , A2 , . . . of measurable sets in Rn ,


!

X
[
( )(Ai ).
=
Ai
( )
i=1

i=1

Given a measurable subset A Rn and y R , let Ay = {z Rm | (y, z) A}


be the section of A at y; similarly, for z Rm , let Az = {y R | (y, z) A}
be the section at z. Then for all y Y outside a -measure zero exceptional
set, the set Ay is measurable, and the function : R R defined by (y) =
(Ay ) outside the exceptional set (and equal to zero on the exceptional set)
is measurable; similarly, for all z outside a -measure zero exceptional set, Az
is measurable, and the function : Rm R defined by (z) = (Az ) outside
the exceptional set (and equal to zero on the exceptional set) is measurable.
Furthermore, the product measure of A can be computed as
Z
Z
( )(A) =
(y)(dy) =
(z)(dz),
28 See

Theorem 4.48 of Aliprantis and Border (2006).


the discussion of the Sierpi
nski set on pp.151152 of Aliprantis and Border (2006).
30 At issue is whether the Lebesgue measurable sets are included among the -measurable
sets. We maintain the assumption that and are absolutely continuous with respect to
Lebesgue measure for purposes of presentation, but all results in this section extend to general
measures defined on the Borel sets.
29 See

55

integrating across sections of the set.31


Now consider a function f : X R defined on X = Y Z Rn , where Y R
and Z Rm are measurable, and write x = (y, z), where y Y , z Z, and
n = + m. As above, for each y Y , define fy : Z R by fy (z) = f (y, z),
and for each z Z, define fz : Y R by fz (y) = f (y, z). An implication of
Fubinis theorem is that if f is Lebesgue integrable, then for almost all y Y ,
fy is Lebesgue integrable; and for almost all z Z, fz is Lebesgue integrable.
In fact, the functions g : Y R and h : Z R defined by
Z
Z
g(y) =
fy (z)dz and h(z) =
fz (y)dy
outside the corresponding exceptional sets (and equal to zero on the exceptional
sets) are Lebesgue integrable. Furthermore, the integral of f is independent of
the order of integration:
Z
Z
Z
f (x)dx =
g(y)dy =
h(z)dz.
Whereas Fubinis theorem assumes that f is Lebesgue integrable, Tonellis theorem gives conditions for integrability: if f : Y Z R is measurable, then for
almost all y, fy : Z R is measurable; if fy is Lebesgue integrable for almost
all y, then the function g defined above is measurable; and if g is Lebesgue integrable, then f is Lebesgue integrable, and Fubinis theorem then implies that
the order of integration is irrelevant.
The above results extend to product measures. Let be a measure on R and
a measure on Rm , and assume and are absolutely continuous with respect
to Lebesgue measure (to ensure that is a well-defined measure on Rn ).
By Fubinis theorem, if f : Y Z R is ( )-integrable, then for -almost
all y, fy is -integrable, and for -almost all z, fz is -integrable; the functions
defined by
Z
Z
g(y) =
fy (z)(dz) and h(z) =
fz (y)(dy)
outside the corresponding exceptional sets (and equal to zero on the exceptional
sets) are, respectively, -integrable and -integrable; and order of integration is
irrelevant:32
Z
Z
Z
f (x)( )(dx) =
g(y)(dy) =
h(z)(dz).
Conversely, by Tonellis theorem, if f is measurable, then for -almost all y,
fy is measurable; if fy is -integrable for -almost all y, then the function g
31 See
32 See

Proposition 18, p.307, of Royden (1988).


Theorem 19, p.307, of Royden (1988).

56

defined above is measurable; and if g is -integrable and and are absolutely


continuous with respect to Lebesgue measure (so is well-defined), then f
is -integrable, and Fubinis theorem can be applied.33
We can build on Fubinis theorem to record results on parameterized integrals.
Let X R and P Rm be measurable, and consider a measurable function
f : X P R. Let be a measure on R , and let be Lebesgue measure
on Rm . By FubinisR theorem, if f is ( )-integrable, then it follows that
the integral over x, f (x, p)(dx), is a measurable function of p; here, we set
the integral equal to zero on the measure zero set of p such that f (, p) is not
-integrable. Now assume that the parameterization is continuous: for each
x X, the function fx : P R defined by fx (p) = f (x, p) is continuous,
i.e., f is a Caratheodory function. Suppose further that there is a -integrable
function g : X R such that for -almost all x and for all p, |fR(x, p)| g(x).
By Lebesgues dominated convergence theorem, the integral, f (x, p)(dx),
is a continuous function of p. Finally, assume that the parameterization is
smooth: for each x X, the function fx : P R is partially differentiable.
Suppose further that there is a -integrable function g : X R such that for
all i = 1,R. . . , m, for -almost all x, and for all p, |Di fx (p)| g(x). Then the
integral, f (x, p)(dx), is a partially differentiable function of p, and
Z
Z
Di f (x, p)(dx) =
Di fx (p)(dx)
for all i = 1, . . . , m.34 Thus, quite generally, we can pass differentiation through
the integral sign.

Not all measures on a product space are product measures. But any measure
on Rn = R Rm determines a measure on each factor. The marginal measure
of on R is the measure defined by (Y ) = (Y Rm ) for all measurable
Y R . Similarly, the marginal on Rm is defined by m (Z) = (R Z) for
all measurable Z Rm . Assuming they are -finite and absolutely continuous
with respect to Lebesgue measure, these marginal measures determine a product
measure, m , on Rn , but it will not be the case that = m unless the
initial measure was indeed a product measure. In that case, if = , then
the marginals will coincide with the factors in the product: = and m = .
We can characterize weak convergence of product measures in terms of convergence of marginals. Consider a sequence { k } of measures on Rn = R Rm
converging weakly to the measure . Then the sequence {k } of marginals on
R converges weakly to the marginal , and similarly for the marginals on Rm .
Indeed, consider any Y R with -measure zero boundary in R , and note
that bd(Y Rm ) = bd(Y ) Rm , which has -measure zero in Rn . Then
33 See
34 See

k (Y ) = k (Y Rm ) (Y Rm ) = (Y ).

Theorem 20, p.309, of Royden (1988).


Theorem 20.4 of Aliprantis and Burkinshaw (1990).

57

The converse holds for product measures: letting k = k k for all k and
= , if k weakly and k weakly, then k weakly.35

13

Metric Spaces

It is sometimes useful to consider sets that cannot be meaningfully imbedded in


any finite-dimensional Euclidean space. Given such a set X, we may yet want
to impose further structure, such as a concept of distance between any two
elements, in the form of a metric, which is a function : X X R satisfying
three properties:
(i) for all x, y X, (x, y) 0, where equality holds if and only if x = y,
(ii) for all x, y X, (x, y) = (y, x),
(iii) for all x, y, z X, (x, z) (x, y) + (y, z).
Property (iii) is the metric version of the triangle inequality. A metric space is
a set X with a metric . We have already seen one metric space: Euclidean space
with the Euclidean metric, defined by e (x, y) = ||x y||. In fact, any subset
X Rn is a metric space in its own right when equipped with the Euclidean
metric. For a different (but trivial) example, an arbitrary set X is a metric space
with the discrete metric, which is defined as d (x, y) = 0 if x = y and d (x, y) =
1 otherwise. For a more interesting example, we could consider a set of bounded
functions defined on Rn , in which case a common measure of distance is given
by the the supremum metric, defined by s (f, g) = sup{|f (x) g(x)| | x Rn }.
Though spare, the structure of a metric space permits us to make topological distinctions along the usual lines. The ball of radius r around x X is
Br (x) = {y X | (x, y) < r}, and then we define the boundary, interior, and
closure of a set Y as usual, using the notation bd(Y ), int(Y ), and clos(Y ). For
example, given Y X, we say x is a boundary point of Y if for all > 0, we
have B (x) Y 6= and B (x) \ Y 6= . And we define open and closed sets following the above conventions. In particular, a set Y X is open if it is disjoint
from its boundary; equivalently, if for all x Y , there is some > 0 such that
B (x) Y . And Y is closed if it contains its boundary. As usual, a set Y X
is closed if and only if its complement X \ Y is open; and Y is open if and
only if X \ Y is closed. As above, and X are both open; arbitrary unions of
open sets are open; and finite intersections of open sets are open. Furthermore,
and X are both closed; finite unions of closed sets are closed; and arbitrary
intersections of closed sets are closed.
35 See Theorem 3.2 of Billingsley (1968) Convergence of Probability Measures, John Wiley
and Sons: New York, NY. Although Billingsley considers measures defined on the Borel algebra, if the sequence { k } restricted to the Borel sets converges weakly to , then so does
the sequence of completions.

58

As usual, a sequence {xm } in a metric space X is a countably infinite list of


elements in indexed by the natural numbers m N.36 Again, a subsequence
{xmk } is a selection from this list, where {mk } is a strictly increasing sequence
of natural numbers indexed by k. A sequence {xn } in X converges to x X if
for all > 0, there exists n such that for all m n, xm B (x); equivalently,
if (xn , x) 0. As usual, a set Y X is closed if and only if every convergent
sequence in Y has limit in Y ; and as in Euclidean space, {xm } converges to x
if and only if every subsequence converges to this same limit.
Given a sequence {Ym } of subsets of X, define the topological limit superior and
the topological limit inferior of the sequence, respectively, as



for all > 0 and all n, there exists
ls({Ym }) =
x X
m n such that Ym B (x) 6=
and

li({Ym }) =


xX



for all > 0, there exists n such that

.

for all m n, Ym B (x) 6=

The former consists of limits of all subsequences of elements from the sets {Ym },
and the latter consists of limits of all sequences of elements from the sets. Both
sets are closed. For the case of a sequence {xm }, which can be viewed as a
sequence of singleton sets Ym = {xm }, an element of the set ls({xm }) is referred
to as an accumulation point of the sequence.
A set Y X is bounded if it is contained in a ball Br (x) for some r > 0 and
some x X. A set Y X is compact if every sequence in Y has a subsequence
that converges to an element of Y ; equivalently, Y is compact if and only if
every open cover of Y has a finite subcover; and Y is compact if and only
if every collection of closed subsets of Y with the finite intersection property
has nonempty intersection. As before, the empty set and all finite sets are
compact; finite unions of compact sets are compact; and arbitrary intersections
of compact sets are compact. Every compact set is closed and bounded, but
the converse need not hold. For example, if X = (0, 1) is the open unit interval
with the Euclidean metric, it is closed when viewed as a metric space and is
bounded, but it is not compact. For a deeper example (one that is essentially
infinite-dimensional), note that the sawtooth sequence in Figure 9 belongs to
the closure of the unit ball around zero in the space of bounded functions defined
on [0, 1] (in the supremum metric), but it has no convergent subsequence. For a
1
x)} belongs to the closure of the unit
continuous example, the sequence {cos( 2n
ball around zero in the space of bounded, continuous functions defined on [0, 1]
(in the supremum metric) but also has no convergent subsequence. Every closed
subset of a compact subset is compact. As usual, a set Y X is connected if
36 We revert to subscripting indices, as metric spaces do not generally possess a vector
structure; thus, we cannot interpret xm as the value of the mth coordinate of x.

59

there do not exist open sets U, V X such that U V = , Y U V ,


U Y 6= , and V Y 6= .
In addition to compactness and connectedness, we now consider several more esoteric properties of metric spaces. A metric space X is complete if every Cauchy
sequence in X converges to an element x X, where a sequence {xn } is Cauchy
if for all > 0, there exists m such that for all k, m, we have (xk , x ) < .
The metric space X is totally bounded
if for all > 0, there exist n and elements
Sn
x1 , . . . , xn X such that X i=1 B (xi ). The metric space X is separable if
there is a countable subset {x1 , x2 , . . .} X that is dense, i.e., for all x X
and all > 0, there exists n satisfying xn B (x). And X is locally compact if
for each x, there exists r > 0 such that the disc Dr (x) = {y X | (x, y) r} is
compact. Every totally bounded metric space is bounded and separable. Every
locally compact metric space is complete. A metric space X is compact if and
only if it is both complete and totally bounded. Every subset of a separable
metric space is itself a separable metric space, and every closed subset of a complete metric space is itself a complete metric space. Euclidean space is complete,
locally compact, and separable; the latter property follows, for example, because
the subset of vectors with rational coordinates is countable and dense. In fact,
if X Rn is equipped with the Euclidean metric, then X is automatically
separable, and it is complete as long as X is a closed subset of Rn .
A real-valued function f : X R defined on a metric space X is continuous if
for every sequence {xn } converging to x in X, we have f (xn ) f (x); equivalently, if for every open G R, the preimage f 1 (G) is open. More generally,
given metric spaces X and Y , a function f : X Y is continuous if for every
sequence {xn } converging to x in X, we have f (xn ) f (x) in Y ; equivalently,
f is continuous if for every open G Y , the preimage f 1 (G) is open. If
Y = Rn , then we can always combine continuous functions in the usual way
to produce new continuous functions, e.g., letting X, Y , and Z be any metric
spaces, if f : X Y and g : Y Z are continuous with f (X) Y , then g f
is continuous; and if f, g : X Rn are continuous and , R, then f + g
is continuous. As always, by Weierstrass theorem, the image of a compact set
under a continuous function is compact; and by the intermediate value theorem,
the image of a connected set under a continuous function is connected.
We can then extend the implicit function theorem to a metric space of parameters. Let U Rn , and let P be a metric space. Assume f : U P Rn is continuous, that for all p P , the function fp : U Rn defined by fp (x) = f (x, p)
is C 1 , and that the derivative Dfp (x ) is non-singular for some (x , p ) U P .
Then by a metric version of the implicit function theorem, there are open sets
V U and Q P with (x , p ) V Q and a continuous function g : Q V
such that g(p ) = x , and for all (x, p) V Q, f (x, p) = 0 if and only if

60

g(p) = x.37
We can also generalize the definition of measurability to mappings taking values
in a metric space. Given a measurable set X Rn and metric space Y , a
function f : X Y is measurable if for every open G Y , the pre-image
f 1 (G) is measurable. Thus, if f is continuous, then it is measurable.
Given a metric space X and a function f : X X, an element x is a fixed
point of f if f (x) = x. A function f : X X is a contraction mapping on X
if there exists [0, 1) such that for all x, y X, we have (f (x), f (y))
(x, y), in which case is the modulus of the contraction. By the contraction
mapping theorem, if X is a nonempty, complete metric space and f : X X is
a contraction mapping, then f has a unique fixed point.
A mixture space is a set X for which a mapping : X X [0, 1] X is
defined, where we write x + (1 )y for (x, y, ), and possesses certain intuitive properties; intuitively, the mapping acts just like the usual idea of
convex combination of vectors in Euclidean
Pmspace. Since convex combinations
are independent of order, we can write i=1 i xi without ambiguity.38 Following the analysis of Rn , we say x is a convex combination of m elements
x1 , . .P
. , xm if there exist non-negative coefficients that sum to one and such that
m
x = i=1 i xi . A subset Y X of a metric mixture space is convex if for all
x, y Y and all (0, 1), we have x + (1 )y. And given any A X, the
convex hull of A is

I N is finite, xi Y for
X

,
conv(Y ) =
i xi all i I, P
i 0 for all i I,


and iI i = 1
iI
which consists of all convex combinations of all finite subsets of Y . As before,
A is convex if and only if A = conv(A), so that the convex hull of a set is itself
convex, and in fact it is the intersection of all convex supersets of A.

The set X is a metric mixture space if X is a metric space (with metric ), and
the mapping is continuous with respect to (x, y, ), i.e., for all (x, y, ) and
all sequences {(xm , ym , m )} with xm x, ym y, and m , we have
(m xm + (1 m )ym , x + (1 )y) 0. The metric is quasi-convex if for
all x, y, z, w X and all [0, 1],
(x + (1 )z, y + (1 )w)

max{(x, y), (z, w)}.

37 See Theorem 9.3 of Loomis and Sternberg (1968) Advanced Calculus, Addison-Wesley
Publishing: Reading, MA.
38 To be more precise, it must be that for all x, y, z X and all , , [0, 1] with
+ + = 1, we have (i) 1x + 0y = x, (ii) x + (1 )y = (1 )y + x, and





(iii) x + (1 )
y+
z = y + (1 )
x+
z .
1
1
1
1

61

Obviously, a sufficient condition for quasi-convexity of is that it is convex, i.e.,


for all x, y, z, w X and all [0, 1],
(x + (1 )z, y + (1 )w)

(x, y) + (1 )(z, w).

I show in the appendix that in a metric mixture space, the convex hull of a finite
set is compact; and assuming X is complete, the closure of the convex hull of
each compact subset is compact.
In contrast to Rn , the latter result is not true if stated without taking the closure
of the convex hull: in general metric spaces, there are examples of compact sets
with non-compact convex hulls. Let X RN be the metric mixture
Pspace of
summable sequences x = (x1 , x2 , . . .), so that for all x P
X, we have i=1 |xi | <
k
. We endow this space with the metric (x, y) =
i=1 |xi yi |. Let x =
k
k
(0, . . . , 0, 1/2 , 0, . . .) be the sequence with 1/2 in the kth coordinate and zeroes
elsewhere, and note that the sequence {xk } converges to the sequence with all
zero entries, i.e., (xk , 0) 0, so the union Y = {xk | k = 1, 2, . . .} {0} is
compact, and every x conv(Y ) is such that xi = 0 for all but finitely many
1
coordinates. For each k, define the sequence y k X by y k = ( 14 , 16
, . . . , 212k ),
and note that
!
k
X
1
1
1 1 1 2
k
k
0,
x + x + + k x + 1
y
=
2
4
2
2i
i=1
1
1
so y k conv(Y ). But then {y k } converges to the limit y = ( 14 , 16
, 64
, . . .) with
2i
ith coordinate yi = 1/2 , which has positive entries in all coordinates, and
therefore y
/ conv(Y ). Thus, conv(Y ) is not closed, let alone compact.

A metric version of Schauders fixed point theorem states that if X is a nonempty,


compact metric mixture space with quasi-convex metric, and if f : X X is
continuous, then f has at least one fixed point. In the appendix, I prove a
metric version of Glicksbergs fixed point theorem, which generalizes Schauders
theorem to correspondences.
As indicated above, given a metric space with metric and a subset Y X,
we can view Y as a metric space when equipped with . Formally, we must
associate Y with the restriction = |Y Y of to Y Y . Then Y is a metric
space in its own right with the metric . A sequence {xm } in Y converges to
x Y for the metric if (xm , x) 0; equivalently, (xm , x) 0. Open and
closed subsets of Y are defined in the usual way in terms of the metric , and
in fact a set G Y is open for the metric if and only if there is an open set
G X such that G = Y G; and a set F Y is closed for the metric if
and only if there is a closed set F X such that F = Y F . Furthermore, Y
is a compact subset of X if and only if Y is itself a compact metric space. This
generalizes the relative topology in Euclidean spaces in Section 5 and specializes
the concept of relative topology in Section 18.
62

14

Special Metric Spaces

Next are some examples of useful metric spaces. Of note, we will see that all
but the pointwise (and set-wise) convergence concepts and weak (and weak*)
convergence are metrizable, in the sense that we can define a metric on the
appropriate space (of functions or measures) such that a sequence in the space
converges to some limit if and only if the distance between the elements of the
sequence and the limit point goes to zero.
1. Countable Cartesian products. Returning to the P
definition of the
k
product topology, let Xi Rni for i = 1, . . . , k with n =
i=1 ni , and let
Qk
X = i=1 Xi be the product space. We can define the product metric on
this space by
(x, y)

k
X

i=1

||xi y i ||i ,

where || ||i denotes the Euclidean norm in Rni . With this metric, a sequence
{xm } in X converges to x X if and only if it converges in each component,
i.e., letting xm = (x1m , . . . , xkm ) and x = (x1 , . . . , xk ), we have xim xi for
each i = 1, . . . , k. That is, (xm , x) 0 if and only if {xm } converges to
x in the product topology, showing that convergence in the product topology
is metrizable; but we knew that anyway, because a sequence converges in the
product topology on the finite product X if and only if it converges in the space
Rn in which X is contained.
This extends straightforwardly to the product of a countably infinite Q
number

of subsets Xi Rni , i = 1, 2, . . .. Then the Cartesian product X = i=1 Xi


1
2
i
consists of sequences x = (x , x , . . .), where each component x is a vector in
Rni belonging to the set Xi . Assume for now that each Xi is bounded, so there
is some ri > 0 such that Xi Bri (0). Define a metric on X as follows: for each
x = (x1 , x2 , . . .) and y = (y 1 , y 2 , . . .), let
(x, y)

k=1
k

1
2k+1 rk

||xk y k ||k .

Note that given any k


any x , y Xk , we have ||xk y k ||k ||xk ||+||y k ||
Pand

2rk , so (x, y)
k=1 2k < , so this metric is well-defined. With this
metric, a sequence {xm } converges to x in X if and only if it converges in
each component, i.e., letting xm = (x1m , x2m , . . .) and x = (x1 , x2 , . . .), we have
xim xi for each i = 1, 2, . . .. Whether the product is finite or countably
infinite, X is separable; if each Xi is closed, then X is closed and complete; and
by Tychonoff s product theorem, if each Xi is compact, then X is compact.
We focus on the more interesting case of an infinite product set. If each Xi
is a bounded, convex subset of Rni , then X is a metric mixture space: given
63

any x, y X and any (0, 1), the convex combination x + (1 )y =


(x1 + (1 )y 1 , x2 + (1 )y 2 , . . .) belongs to X; and given sequences {xm }
converging to x in X, {ym } converging to y in X, and {m } converging to in
[0, 1], we have m xm + (1 m )ym x + (1 )y in the product topology.
Moreover, the product metric is convex: for all x, y, z, w X and all (0, 1),
we have
(x + (1 )y), z + (1 )w)

X
1
=
||(xk z k ) + (1 )(y k wk )||k
2k+1 rk

k=1

k=1


1 
||xk z k ||k + (1 )||y k wk ||k
2k+1 rk

= (x, z) + (1 ) (y, w).

Combining the metric version of Schauders theorem with previous results, assume each Xi is a nonempty, compact, convex subset of Rni , and equip the product space X with the product metric; then every continuous function f : X X
has at least one fixed point.
The above choice of metric for the infinite product space relied on the assumption that all (or all but finitely many) component sets Xi are bounded. In
general, when some Xi are not bounded, we can define a version of the product
metric as follows:
(x, y)

X
1 ||xk y k ||k
.
2k 1 + ||xk y k ||k

k=1

With this metric, a sequence {xm } again converges to x in X if and only if it converges in each component, i.e., letting xm = (x1m , x2m , . . .) and x = (x1 , x2 , . . .),
we have xim xi for each i = 1, 2, . . .. Again, X is separable; if each Xi is
closed, then X is closed and complete; and by Tychonoff s product theorem, if
each Xi is compact, then X is compact. In contrast to the initial definition of
the product metric, however, the latter definition may not be convex.
The above concepts can be extended to products of metric spaces.
Q Let Xi be a
metric space with metric i . Define the product space X = i=1 Xi with the
product metric
(x, y)

X
1
(x , y ),
kr i i i
2
i
i=1

assuming sup{i (xi , yi ) | xi , yi Xi } ri for each i = 1, 2, . . . . Again, a


sequence {xm } converges to x in X if and only if it converges in each component.
Whether the product is finite or countably infinite: if each Xi is separable, then
64

X is separable; if each Xi is complete, then X is complete; and if each Xi is


a compact metric space, then Tychonoff s product theorem implies that X is
compact.
2. Bounded, bounded and measurable, and bounded and continuous functions. Given X Rn , let Fb (X) consist of all bounded, realvalued functions f : X R, equipped with the supremum metric s defined
by s (f, g) = sup{|f (x) g(x)| | x X}. The metric space Fb (X) with the
supremum metric is complete. Note that a sequence {fm } converges to f in
Fb (X) if and only if sup{|fm (x) f (x)| | x X} 0. That is, s (fm , f ) 0
if and only if {fm } converges uniformly to f , showing that uniform convergence
is metrizable. Given measurable X Rn , let Mb (X) consist of all mapping
f : X R that are bounded and measurable, equipped with the supremum
metric; again, Mb (X) is complete.
Endowing X Rn with the relative topology, let Cb (X) consist of all mappings
f : X R that are bounded and continuous, equipped with the supremum
metric. If X is compact, then Cb (X) is complete and, by the Stone-Weierstrass
theorem, separable. A set K Cb (X) is equicontinuous if for all > 0, there
exists > 0 such that for all f K and all x, y X, ||x y|| < implies
|f (x) f (y)| < . Assuming X is compact, the Arzela-Ascoli theorem states
that a subset of Cb (X) is compact if and only if it is bounded, closed, and
equicontinuous. Thus, a bounded, closed, equicontinuous set K Cb (X) is
itself a compact metric space with the supremum metric. A subset K Cb (X)
is convex if for all f, g K and all (0, 1), the convex combination of
functions, f + (1 )g, belongs to K. Equipping any convex K Cb (X) with
the supremum metric, it becomes a metric mixture space: given sequences {fm }
converging to f in K, {gm } converging to g in K, and {m } converging to in
[0, 1], we have m fm + (1 m )gm f + (1 )g uniformly. Moreover, the
supremum metric on K is convex: for all f, g, , Cb (X) and all (0, 1),
we have
s (f + (1 )g, + (1 ))
=

sup{|(f (x) (x)) + (1 )(g(x) (x)) | x X}

sup{|f (x) (x)| | x X} + (1 ) sup{|g(x) (x)| | x X}


s (f, ) + (1 )s (g, ).

Combining the metric version of Schauders theorem with previous results, assume X Rn is nonempty and compact, and let K Cb (X) be a nonempty,
convex, bounded, closed, equicontinuous set; then every continuous function
f : K K has at least one fixed point.
The analysis of the space of continuous functions can be extended to metric
spaces. If X is a compact metric space, then the space Cb (X) consists of all
bounded, continuous mappings f : X R; and equipped with the supremum
65

metric, it is complete and separable. In fact, consider metric spaces X and


Y , with respective metrics 1 and 2 , and assume X is compact. Then define
Cb (X, Y ) to be the set of all bounded, continuous functions f : X Y with the
supremum metric,
(f, g) =

sup{2 (f (x), g(x)) | x X}.

If Y is separable, then Cb (X, Y ) is separable; and if Y is complete, then Cb (X, Y )


is complete.
3. Equivalence classes of measurable functions. Assume a measure on
Rn and partition the real-valued, measurable functions on Rn into -equivalence
classes; specifically, given measurable f : Rn R, let [f ] consist of all measurable g : Rn R such that ({x Rn | f (x) 6= g(x)}) = 0. By convention,
however, the equivalence class [f ] is simply denoted f , with the caution that f
is identified by properties of integrals over sets of positive -measure; unless a
singleton {x} has positive -measure, the function f cannot be evaluated at x.
Given measurable X Rn , let Lp (X, ) Rconsist of all equivalence classes of measurable functions f : X R such that |f (x)|p (dx) < , where p [1, ).
Define the Lp -metric p as follows: given f, g Lp (X, ), let
p (f, g) =

Z

 p1
|f (x) g(x)|p (dx)
.

Let L (X, ) consist of all functions f : X R that are -bounded, and define
as follows: given f, g L (X, ), let
(f, g) =

inf{c R | ({x X | |f (x) g(x)| > c}) = 0}.

If is finite and f, g L (X, ), then limp p (f, g) = (f, g). Furthermore, if is finite, then p < q implies Lq (X, ) Lp (X, ).
Next are two fundamental inequalities for (equivalence classes of) measurable
functions. Recall that p and q are conjugate if p, q (1, ) and 1p + 1q = 1. Then
H
olders inequality is that if p and q are conjugate, then for all f Lp (X, )
and all g Lq (X, ), we have
Z

|f (x)g(x)|(dx)

Z

 p1 Z
 1q
q
|f (x)| (dx)
.
|g(x)| (dx)
p

Note the similarity between H


olders inequality with p = q = 2 and the CauchySchwartz inequality; not coincidentally, they are very much analogous,
with f
R
and g playing the role of vectors x and y, and the integral being f (x)g(x)(dx)
similar to the dot product x y. Of course, if f L (X, ), g L1 (X, ), and
R|f (x)| c for -almost
R all x X, then we have the analogous inequality,
|f (x)g(x)|(dx) c |g(x)|(dx).
66

And Minkowskis inequality is that if p [1, ), then for all measurable f, g


Lp (X, ), we have
Z
 p1
Z
 p1 Z
 p1
|f (x) + g(x)|p (dx)

+
.
|f (x)|p (dx)
|g(x)|p (dx)
Analogously, if f,R g L (X, ), and if |f (x)| c and |g(x)| d for -almost
all x X, then |f (x) + g(x)|(dx) c + d. An implication of Minkowskis
inequality is that any linear combination of functions in Lp (X, ) also belongs
to Lp (X, ). Another implication is that for all f, g, h Lp (X, ), we have
p (f, g) p (f, h) + p (h, g), delivering the triangle inequality and confirming
that p is in fact a metric.
The spaces Lp (X, ), with p [1, ) or p = , are complete when equipped
with the metric p . For p [1, ), if is finite, then Lp (X, ) is separable,
but L (X, ) is separable if and only if has finite support, a very restrictive
condition. For the case of X = Rn , Lebesgue measure, and p [1, ), the
Kolmogorov-Riesz compactness theorem states that a subset K Lp (Rn , ) is
compact if and only if all of the following hold:39
(i) K is closed and bounded,
(ii) for every Rsequence {fm } in K and every sequence {xm } in Rn with xm 0,
we have |fm (xm + y) fm (y)|p dy 0,

(iii) for
R every sequence {fpm } in K and for every sequence rm , we have
Rn \Br (0) (y)|fm (y)| dy 0.
m

Of course, condition (iii) holds automatically if the functions f K are zero


outside some bounded set common to the functions. Thus, when p [1, ), a
set K Lp (Rn , ) satisfying (i)(iii) is itself a compact metric space with the
metric p . A subset K Lp (Rn , ) is convex if for all f, g K and all (0, 1),
the function f + (1 )g belongs to K. I show in the appendix that, in fact, a
convex set K Lp (Rn , ) with the metric p is a metric mixture space and that
the metric p is convex. Combining the metric version of Schauders theorem
with previous results, let p [1, ), and assume K Lp (Rn , ) is a nonempty,
convex set satisfying (i)(iii); then every continuous function : K K has at
least one fixed point.
Let L0 (X, ) consist of -equivalence classes of all measurable functions f : X
R, which obviously contains all of the Lp spaces. Define the metric m on this
space as follows: for all f, g L0 (X, ),
Z
|f (x) g(x)|
(dx),
m (f, g) =
1 + |f (x) g(x)|
39 This result can be found in Theorem 21, p.301, of Dunford and Schwartz (1958), or
Theorem 5 of H. Hanche-Olsen and H. Holden (2010) The Kolmogorov-Riesz Compactness
Theorem, Expositiones Mathematicae, 28: 385394.

67

making L0 (X, ) a complete metric space. Of note, a sequence {fm } in L0 (X, )


converges to f L0 (X, ) if and only if fm f in measure, which shows
that convergence in measure (translated to equivalence classes of measurable
functions) is metrizable.
Recall that we have defined separate, weaker notions of convergence for (equivalence classes of) measurable functions, namely, weak and weak* order p. Although we do not give a closed form, it turns out that for p (1, ) and p = ,
given any f Lp (X, ) and any r > 0, the concept of weak* convergence on
the disc of radius r around f , denoted
Drp (f ) = {g Lp (X, ) | p (f, g) r},
is metrizable. That is, there is a metric p, defined on Drp (f ) such that a
sequence {gm } converges weak* order p to g in Drp (f ) if and only if p, (gm , g)
0. Moreover, the metric space Drp (f ) equipped with p, is compact. Thus,
given a set K Drp (f ) that is nonempty, convex, and closed (in the sense that
K contains all weak* order p limits of sequences in K), and given a function
: K K that is weak* continuous (in the sense that gm g weak* order p
implies (gm ) (g) weak* order p), the metric version of Schauders theorem
implies that has at least one fixed point. These observations hold for the
disc in L1 (X, ), suitably modified to address certain technical complexities. A
set K L1 (X, ) is uniformly -integrable if for all > 0, there exists > 0
such
that for all f K and all measurable Y X with (Y ) < , we have
R
|f
(x)|(dx)
< . If K Dr1 (f ) is weakly closed (in the sense that K contains
Y
all limits of weakly convergent sequences in K) and uniformly -integrable, then
there is a metric 1,w such that {gm } converges weakly to g in K if and only if
1,w (gm , g) 0, and K, equipped with the metric 1,w , is compact.40
4. Probability measures. Another special metric space of interest is the
set of probability measures with support in a fixed, nonempty, measurable set
X Rn ,
P(X) =

{ | is a probability measure on Rn with (X) = 1}.

A commonly used measure of distance is the Prohorov metric, denoted r and


defined as follows: for , P(X), let



for all mble Y X, (Y ) (Y ) +
r (, ) = inf > 0
,
and (Y ) (Y ) +

where Y = {x X| there exists y Y with ||x y|| < }. The collection of all
such open balls and arbitrary unions of open balls is sometimes called the weak*
topology.41 A sequence {m } of probability measures in P(X) converges to
P(X) if and only if for all measurable Y Rn with (bd(Y )) = 0, we have
40 See
41 A

Section 19 for more details on weak and weak* convergence.


caveat is that P(X) is usually defined to consist of probability measures defined on

68

m (Y ) (Y ). That is, r (m , ) 0 if and only if m weakly, showing


that weak convergence of probability measures is metrizable. Interesting properties of the space P(X) equipped with the Prohorov metric are that it is separable;
if X is closed, then P(X) is complete; and if X is compact, then so is P(X).42
Note that P(X) is closed with respect to convex combinations: for all ,
P(X) and all (0, 1), we can define the measure + (1 ) by ( +
(1 ))(Y ) = (Y ) + (1 )(Y ) for all measurable Y Rn . I show in
the appendix that, in fact, P(X) is a metric mixture space, and the Prohorov
metric is quasi-convex. Combining the metric version of Schauders theorem
with previous results, assume X Rn is nonempty and compact; then every
continuous function f : P(X) P(X) has at least one fixed point.
Another metric on P(X) is the total variation metric, denoted v and defined
as:
( k
)

X
{Y1 , . . . , Yk } is a finite,
v
(, ) = sup
|(Yi ) (Yi )|
,
mble partition of X
i=1

where of course a collection Y is a measurable partition of X if it is a partition


of X and each Y Y is measurable. In fact, we can define the sup metric
on P(X) by s (, ) = sup{|(Y ) (Y )| | Y Ln }, and it turns out that
for all , P(X), we have v (, ) = 2s (, ); thus, convergence in the
total variation metric is equivalent to convergence in the sup metric. Given a
sequence {m } in P(X) and P(X), we therefore have m uniformly
set-wise if and only if v (m , ) 0, so that uniform set-wise convergence of
probability measures is metrizable. The space P(X) with the total variation
metric is complete, whether or not X is closed.43
5. Compact subsets. We can also define a metric on the space of compact
subsets of a fixed, nonempty set X Rn , denoted
K(X)

= {Y X | Y is nonempty and compact }.

The measure of distance between sets we use is the Hausdorff metric, defined
Borel sets. Although the Lebesgue measurable sets include some sets that are not Borel, this
discrepancy does not affect this survey, because the Prohorov distance between two measures
on Rn equals the distance between the restrictions of the measures to the Borel -algebra.
Indeed, for every measurable Y , Theorem 10.23 (part 7) yields a -measure zero set B such
that Y B is Borel and a -measure zero set C such that Y C is Borel; then Y (B C)
is Borel, and (Y (B C)) ((Y (B C)) ) + implies (Y ) (Y ) + . See Section
18 for a general treatment of topological spaces.
42 The above observations hold also when X is a complete and separable metric space,
but then we must define probability measures on the Borel measurable sets. With that
modification, P(X) equipped with the Prohorov metric is complete and separable; and if in
addition X is compact, then so is P(X).
43 See the discussion on p.161 of Dunford and Schwartz (1958).

69

as follows: for Y, Z K(X), let




h (Y, Z) = max max min ||y z||, max min ||y z|| .
yY zZ

zZ yY

As described above, the open ball of radius r > 0 is


Br (Y )

= {Z K(X) | h (Y, Z) < r},

and a set G K(X) is open if for all Y G, there exists > 0 such that
B (Y ) G. Here, of course, the open ball Br (Y ) is a set of sets: it consists
of compact sets that are, in a sense, close to Y . Convergence in the Hausdorff
metric has a simple characterization when X is compact.
Indeed, assuming X is compact, let {Ym } be a sequence of sets in K(X), and
let Y K(X). Then the following conditions in conjunction are necessary and
sufficient for h (Ym , Y ) 0:
(i) for every subsequence {Ymk } in K(X) and every x X, if xmk Ymk for
all k and xmk x, then x Y ,
(ii) for all x Y , there is a sequence {xm } in X such that xm Ym for all m
and xm x.
In other words, {Ym } converges to Y in the Hausdorff metric if and only if
ls({Ym }) = li({Ym }). Conditions (i) and (ii) are together referred to as closed
convergence of the sequence of sets. Interesting properties of the metric space
K(X) equipped with the Hausdorff metric are that it is separable; if X is closed,
then K(X) is complete; and if X is compact, then so is K(X). When X Rn
is convex, K(X) is also convex: given Y, Z K(X) and [0, 1], Y + Z is a
nonempty, compact subset of X. In fact, I show in the appendix that K(X) is a
metric mixture space and that the Hausdorff metric is quasi-convex. Combining
the metric version of Schauders theorem with previous results, assume X Rn
is nonempty, compact, and convex; then every continuous function f : K(X)
K(X) has at least one fixed point.
The above concepts can be extended to subsets of a metric space X with metric
, defining Hausdorff distance using the metric on X. Again, let K(X) be the
collection of nonempty, compact subsets of X, and define the Hausdorff metric
as


h (Y, Z) = max max min (y, z), max min (y, z) .
yY zZ

zZ yY

If X is separable, then K(X) is separable; if X is separable and locally compact,


then K(X) is complete; and if X is compact, then K(X) is compact.

70

15

Transition Probabilities

Given a measurable set Y R , let L(Y ) be the collection of measurable subsets


of Y , and let Z Rm be measurable. A mapping : L(Y ) Z R with
values denoted (B|z) is a transition probability (or stochastic kernel or Markov
kernel ) if (i) for all z Z, (|z) is a probability measure on R such that
(Y |z) = 1, and (ii) for all measurable B R , (B|z) is measurable as a
function of z. More generally, (|z) may be any finite measure (not necessarily
a probability measure), in which case the mapping is a Young measure. We
focus here on transition probabilities, which are useful for modeling Markov
chains, behavioral strategies, and stationary Markovian strategies, among other
things. A useful fact is that given a transition probability and any bounded,
continuous
function f : Y R, the function g : Z R defined by g(z) =
R
f (y)(dy|z) is measurable.44

The transition probability satisfies the Feller property if for every bounded,
continuous
function f : Y R, the function g : Z R defined by g(z) =
R
f (y)(dy|z) is continuous, rather than only measurable. Given , we can
define the mapping P : Z P(Y ) by P (z) = (|z). Then, in fact, satisfies
the Feller property if and only if P is continuous with the Prohorov metric on
P(Y ).45 Thus, if zk z in Z, then (|zk ) (|z) weakly. Therefore, by a
version of Lebesgues dominated convergence theorem in Section 10, if {f k } is
a uniformly bounded sequence of functions f k : Y R converging
R k pointwise to
the
continuous
function
f
:
Y

R,
then
z

z
in
Z
implies
f (y)(dy|zk )
k
R
f (y)(dy|z).
Because the Feller property is quite useful, we note the following sufficient condition. Let h : Y Z R be a density for that is Caratheodory, i.e., the
function hz : Y R defined by hz (y) = h(y, z) is continuous for all z, and the
function hy : Z R defined by hy (z) = h(y, z) is measurable for all y. And
assume that the set {hz : z Z} is dominated by the -integrable function
g : Y R. Let zk z in Z, and consider any bounded, continuous functions
f : Y R. By Lebesgues dominated convergence theorem, we have
Z
Z
Z
Z
f (y)(dy|zk ) =
f (y)h(y, zk )dy
f (y)h(y, z)dy =
f (y)(dy|z),
which establishes the Feller property.

Under absolute continuity assumptions, a transition probability : L(Y ) Z


R and a probability measure on Rm with (Z) = 1 determine a probability
44 The argument for this follows the proof of Theorem 19.7 in Aliprantis and Border (2006).
The proof that their condition 3 implies condition 4 is valid if we replace Borel measurability
in condition 3 with (Lebesgue) measurability;
R then instead of Borel measurability, we deduce
(Lebesgue) measurability of the map s 7 f (z)(dz|y) for each measurable simple function,
and the remainder of the proof carries over.
45 See Theorem 19.14 of Aliprantis and Border (2006).

71

measure on Rn = R Rm , where n = + m. This probability measure is


denoted (|z) . Given any product R = P Q of half-open rectangles
P R and Q Rm , we specify that
Z
((|z) )(R) =
(P |z)(dz),
Q

and we extend this to an arbitrary set A R Rm by taking the infimum of


P
i ((|z))(Ri ) over collections of half-open rectangles covering A. Assuming
that and (|z) are absolutely continuous with respect to Lebesgue measure
for all z,46 the resulting mapping (|z) is a probability measure on Rn .47
Note that if the transition probability (|z) is constant in z, so we can write
it simply as a probability measure on R , then (|z) = , so the
operation we have defined generalizes the concept of product measure.
Assume is a transition probability, as above. Given a measurable subset
A Rn = R Rm and z Rm , recall that Az = {y R | (y, z) A} is the
section of A at z Rm . For all z Z outside a -measure zero exceptional
set, the set Az is measurable, and if and (|z) are absolutely continuous with
respect to Lebesgue measure for all z, then the function : Rm R defined by
(z) = (Az |z) outside the exceptional set (and equal to zero on the exceptional
set) is measurable. Furthermore, the measure of A is
Z
Z
((|z) )(A) =
(z)(dz) =
(Az |z)(dz),
integrating across sections of the set; in contrast to the case of product measures,
we now measure sections using the transition probability, which may itself vary
with z. In fact, the above construction is quite general: it holds when is any
-finite measure and the collection {(|z) | z Z} is uniformly -finite, i.e.,
there S
is a countable collection {Y1 , Y2 , . . .} of measurable subsets of R such that

48
R =
i=1 Yi and for all i = 1, 2, . . ., sup{(Yi |z) | z Z} < .

Maintaining that is a transition probability and that and (|z) are absolutely continuous with respect to Lebesgue measure for all z, let f : Y Z R
be measurable. Then a general form of Fubinis theorem states that if f is
(|z) -integrable, then for -almost all z, the function fz : Y R defined

46 We maintain this absolute continuity condition for presentational purposes (to avoid
measure-theoretic issues), but this construction can accommodate probability measures (|z)
that are not absolutely continuous, in which case the measure (|z) must be defined on
the Borel -algebra.
47 At issue is whether the Lebesgue measurable sets are included among the sets measurable
with respect to the outer measure (|z) . The argument is similar to the product measure
case (which is found in the appendix) and is omitted.
48 See Theorem 2.6.2 of Ash (1972) Measure, Integration, and Functional Analysis, Academic Press: New York, NY. Although Ash states this result for Borel sets, we accommodate
Lebesgue measurable subsets of Rn using the absolute continuity assumption, as in the product
measure case in the appendix.

72

by fz (y) = f (y, z) is (|z)-integrable outside an exceptional set of -measure


zero; the function defined by
Z
g(z) =
fz (y)(dy|z)
outside the exceptional set (and equal to zero on the exceptional set) is integrable; and

Z
Z
Z Z
fz (y)(dy|z) (dy).
f (y, z)((|z) )(d(y, z)) =
g(z)(dz) =
z

Conversely, by a general form of Tonellis theorem, if f is measurable and for


-almost all z, fz is (|z)-integrable, then the function g defined above is measurable; and if g is -integrable, then f is (|z) -integrable.49
For a transition probability such that and (|z) is absolutely continuous
with respect to Lebesgue measure for all z, it follows from the Radon-Nikodym
theorem that for each z, there is a density hz : Y R for (|z). Because these
densities are selected independently for each z, there is no guarantee that they
vary in a measurable way across z. In fact, however, there is a measurable
function h : Y Z R such that for all z, h(, z) is a density for (|z).50 That
is, the densities may be specified to be jointly measurable. Then we can write
the integral of f with respect to (|z) as
Z
Z Z
f (y, z)(dy|z)(dz)
f (y, z)((|z) )(d(y, z)) =
z y
Z Z
=
f (y, z)h(y, z)(dy)(dz)
z y
Z
=
f (y, z)h(y, z)( )(d(y, z)),
and we can use the standard version of Fubinis theorem to conclude that the
order of integration is irrelevant.
We have seen that a collection {(|z) | z Z} of probability measures on
R satisfying the measurability conditions of a transition probability, given a
probability measure on Rm , induces a probability measure on Rn = R Rm .
Conversely, a probability measure on Rn induces a probability measure on
Rm via the marginal probability m , denoted in the present context, defined
by (C) = (R C) for all measurable C Rm . To construct a corresponding
collection of probability measures on R , first consider any measurable A Rn .
49 See Theorem 2.6.4 of Ash (1972) Measure, Integration, and Functional Analysis, Academic Press: New York, NY. Ash considers Borel measurable functions, but we may apply
his result via Theorem 10.35 of Aliprantis and Border (2006).
50 See Proposition 1.1 of Orey (1971) Limit Theorems for Markov Chain Transition Probabilities, van Nostrand Reinhold: New York, NY.

73

A function P (A|) : Z R is a conditional probability of A if it takes values


between zero and one, it is measurable, and
Z

P (A|z)(dz)
(A (R C)) =
C

for every measurable C Rm . In this context, we refer to A and C as events,


and P (A|z) is interpreted as the conditional probability that the event A occurs
given the information that z is the value of the conditioning variable.
A system of conditional probabilities is a collection {P (A|) | A Ln } of conditional probabilities for each measurable set. It is known that if {A1 , A2 , . . .}
is any countable, pairwise disjoint collection of events, then for all z outside a
-measure zero set, the conditional probabilities are countably additive:51
!

X
[
P (Ai |z).
Ai |z
=
P
i=1

i=1

A weakness of this countable additivity property is that the exceptional set can
depend on the collection {A1 , A2 , . . .}. In other words, there need not be a
single -measure zero set such that for all z outside this exceptional set, P (|z)
is a probability measure.
In fact, any given probability measure on Rn admits a regular conditional
probability, a mapping : L Rm R such that for -almost all z, (|z) is
a probability measure on R and {P (A|) | A Ln } is a system of conditional
probabilities, where P (A|z) = (Az |z) for all measurable A Rn and all z.52
An implication is that for each measurable B R , (B|z) is measurable as a
function of z, and therefore is a transition probability. Furthermore, we have
Z
Z
(A) =
P (A|z)(dz) =
(Az |z)(dz)
for all measurable A Rn , and therefore = (|z). That is, the probability
measure induces the marginal and the transition probability (a regular
conditional probability), which return the initial probability measure. Note that
absolute continuity of (|z) with respect to Lebesgue measure is not used here.
51 See Theorem 6.4.6 of Ash (1972) Real Analysis and Probability, Academic Press: New
York, NY.
52 See Theorems 10.2.1 and 10.2.2 of Dudley (2002) Real Analysis and Probability, Cambridge University Press: Cambridge, MA. Dudley establishes a mapping such that (A) =
R
(Az |z)(dz) for all A belonging to the product -algebra L Lm . By Theorem 10.23
(parts 6 and 7) of Aliprantis and Border (2006), for each measurable A Rn , there is a set
C L Lm (in fact, a Borel set) with -measure zero such that A \ C A C. Then
((A \ C)z |z) and ((A C)z |z) are measurable functions of z and equal for -almost all z;
moreover, these functions are equal to (Az |z) for -almost all z, and therefore the latter
function is measurable. Finally, it follows that (A) is equal to the integral across sections.

74

Transition probabilities are often used to model discrete time Markov chains,
in which case we equate Y = Z and view an element z Z as the state of
a system; then (|z) specifies the probability distribution over next periods
state, conditional on the current state being z. Given a transition probability
: L(Z) Z R, a distribution P(Z) over the current state induces a
distribution T () over next periods state as follows: for all measurable subsets
A Rm ,
Z

T ()(A) =
(A|z)(dz).
The mapping T is referred to as the adjoint of the transition probability.
A probability measure is an invariant distribution (or stationary distribution)
if T () = , i.e., it is a fixed point of T . Note that if satisfies the Feller
property, then the mapping T : P(Z) P(Z) is continuous with the Prohorov
metric on P(Z). Indeed, let k weakly, and let f : Z R be any bounded,
continuous function. Then
Z

Z
Z
f (z)T(k )(dz) =
f (z)
(dz|z )k (dz )
z
z

Z Z
=
f (z)(dz|z ) k (dz )
z
z

Z Z

f (z)(dz|z ) (dz )

z
Zz
=
f (z)T()(dz),
where the second and third equalities follows from linearity of the integral in
the integrating measure, and the limit follows from the Feller property and
weak convergence. In fact, the converse holds as well.53 By the metric version
of Schauders fixed point theorem, if Z is compact and satisfies the Feller
property, so P(Z) is compact and T is continuous, then admits at least one
invariant distribution.
To explore further the dynamic interpretation of transition probabilities, we
consider a transition probability : L(Z)Z R satisfying Doeblins condition:
there exist a finite measure on Rm and > 0 such that for every measurable
set C Rm and every z Z, (C) implies (C|z) 1 .54 In words, this
requires that the transition probability not be too concentrated on small sets,
where the meaning of small is given by and . (Note that the measure
53 See

Theorem 19.14 of Aliprantis and Border (2006).


is actually stronger than the standard definition of Doeblins condition, which imposes the current condition on r (C, z) for some r = 1, 2, . . ., where the latter is the probability,
given the starting point z, that the state transitions to the set C in r steps. A formal definition
is provided below.
54 This

75

in Doeblins condition must satisfy (Z) > 0.) Because Doeblins condition is
quite useful, we note that it is satisfied if has a density h : Z Z R that is
bounded. Indeed, if b 1 is an upper bound for h, then we can set = and
= 1/2b; then given measurable C with (C) , we have (C) 1/2b, and
therefore
Z
1
1
1
= 1
h(y, z)dy (C)b
(C|z) =
2
2b
C
for all z, as required.
Then a measurable set C Z is invariant (or absorbing or self-supporting) if
for all z C, we have (C|z) = 1. And an invariant set E is ergodic if it is
nonempty and contains no invariant set C with smaller -measure, i.e., there is
no invariant set C E with (C) < (E). Every invariant set, and therefore
every ergodic set, has -measure of at least . Furthermore, if E and E are
ergodic sets, then it must be either that E and E are equivalent with respect
to , i.e., (E \ E ) = (E \ E) = 0, or that they are disjoint with respect to
, i.e., (E E ) = 0. As a consequence, the number of meaningfully distinct
ergodic sets is bounded above by (Z)/. Furthermore, there is at least one
ergodic set.55 An invariant distribution is an ergodic distribution if there is
an ergodic set E such that (E) = 1.
Say a set F is transient if for all z Z, we have (F |z) < 1. Then Z can
be partitioned into a transient set F and a finite number k of ergodic sets,
E 1 , . . . , E k , such that every ergodic set E is equivalent to some E i , i = 1, . . . , k,
with respect to . For each ergodic set E i , there is a unique ergodic distribution
i satisfying i (E i ) = 1. Moreover, every invariant distribution is a convex
combination of the ergodic distributions 1 , . . . , k associated with these ergodic
sets: if is an P
invariant distribution, thenPthere exist non-negative weights
k
k
1 , . . . , k with i=1 i = 1 such that = i=1 i i .
To complete the analysis of dynamics, define the transition probabilities r ,
m
r = 1, 2, . . ., as follows: 1 = , and
R for r 2, for all measurable C R and
all z Z, we specify r (C|z) = r1 (C|z )(dz |z). That is, r (C|z) is the
probability that given initial state z, the state belongs to the set C after r steps
of the process. Then, beginning from any state, the probability that the state
reaches an ergodic set (and remains there) approaches one over time and at a
geometric rate: there exist constants c > 0 and d [0, 1) such that for all r and
all z Z, we have
!
k

[
i
r
E z
1 cdr .

i=1

55 See

pp.206207 of Doob (1953) Stochastic Processes, John Wiley and Sons: New York,

NY.

76

Let Tr = Tr , so that given an initial distribution over states, the distribution in r steps is Tr (). Beginning with any distribution over states, the
average distribution over future states converges (in a strong sense) to an invariant distribution: for every probability measure with (Z) = 1, there is
anP
invariant distribution such that the average distribution over r steps, i.e.,
r
1
t

t=1 T (), converges to as r goes to infinity; or equivalently,


r
!
r
1 X t
v

0,
T (),
r t=1
where v is the total variation metric on P(Z).56 In fact, if the initial distribution puts probability one on some ergodic set, say (E i ) = 1, then the
limit distribution will be = i , the ergodic distribution corresponding to the
ergodic set.
The reason the above limit results are stated for the average distributions is
that the Markov chain can cycle, even within an ergodic set: for example, if
Z = {z1 , z2 } and ({z2 }|z1 ) = ({z1 }|z2 ) = 1, then starting from z1 , the chain
alternates endlessly between the two states. To preclude such cycles, it is sufficient to add the Feller property and the following mixing condition: for every
ergodic set E i , there exists zi E i such that for every open set G Rm with
zi G and for every z E i , we have (G|z) > 0. Under the latter condition,
in combination with the Doeblin and Feller properties, the above limit results
can be stated not for the sequence of average distributions, but for the sequence
of distributions over states in each period, as in
v (Tr (), )

0,

where is an initial distribution with (Z) = 1 and is invariant. And if we


strengthen the mixing condition so that there exists z Z such that for every
open set G containing z and every z Z, we have (G|z) > 0, then there is
a unique ergodic set (up to -equivalences) and a unique invariant distribution,
say ; then starting from any initial distribution over states, the induced distribution over states converges in the total variation metric to , providing the
strongest possible ergodicity properties.
Note that these strong ergodicity properties hold if has a density h : Z Z R
such that (i) h is Caratheodory, (ii) {hz | z Z} is dominated by -integrable
g : Z R, (iii) h is bounded, (iv) Z has nonempty interior in Rm , and (v) h
is everywhere positive, i.e., h(y, z) > 0 for all y, z Z. Of course, (i) and (ii)
together imply the Feller property, and (iii) yields Doeblins condition. To
see that the strong mixing condition obtains, choose any z int(Z), any
open set G containing z , and any z Z. Since z is interior to Z, there
56 See

Theorem 5.6, discussion on pp.207208, and Theorem 5.7 of Doob (1953) Stochastic
Processes, John Wiley and Sons: New York, NY.

77

is an open set G Rm such that z G Z. Then G G is an open


set, which accordingly
has positive Lebesgue measure, and therefore (G|z)
R
(G G |z) = GG h(y, z)(dy) > 0, as required.

To consider the metric properties of transition probabilities, let Y R and


Z Rm be measurable, and let be a probability measure on Rm with (Z) =
1. A mapping f : Y Z R is a Caratheodory integrand if it is a Caratheodory
function, so (i) for all y Y , the function fy : Z R defined by fy (z) =
f (y, z) is measurable, and (ii) for all z Z, the function fz : Y R defined
by fz (y) = f (y, z) is continuous, and if in addition, (iii) there is a -integrable
mapping g : Z R such that for all z Z, sup{|f (y, z)| | y Y } g(z). An
implication is that for all z Z, the function fz is bounded and continuous.
Let k : L(Y ) Z R, k = 1, 2, . . . , and : L(Y ) A R be transition
probabilities. Generalizing the notion of weak convergence of measures, we say
the sequence {k } of transition probabilities converges weakly to if for every
Caratheodory integrand f , we have


Z Z
Z Z
f (y, z)(dy|z) (dy).
f (y, z)k (dy|z) (dz)
z

Weak convergence of the sequence {k } of transition probabilities to is equivalent to the following condition:57 for every measurable C Rm and every
bounded, continuous function f : Y R, we have


Z Z
Z Z
f (y)(dy|z) (dz).
f (y)k (dy|z) (dz)
C

Other possible terminology for this form of convergence is narrow convergence


or weak-strong convergence.
To gain some insight into weak convergence of transition probabilities, reconsider the sawtooth sequence in Figure 9, but transform each element fk of
the sequence into a transition probability k : L1 [0, 1] R all follows: for
all z [0, 1], k (|z) places probability one on fk (z), i.e., k ({fk (z)}|z) = 1.
Whereas the sequence of functions {fk } converges weakly to the function that
takes a constant value equal to one, this sequence of transition probabilities
converges weakly to the transition probability that places probability one half
on 0 and 2, i.e., for all z [0, 1], ({0}|z) = ({2}|z) = 12 .
Given measurable Y R and Z Rm and probability measure on Rm with
(Z) = 1, let R(Y, Z, ) denote the set of equivalence classes of transition probabilities : L(Y ) Z R, where we identify any two transition probabilities
and that differ only on a -measure zero set, i.e., ({z Z | (|z) 6=
(|z)}) = 0. Then weak convergence of transition probabilities is metrizable
57 See Theorem 2.2 of Balder (1988) Generalized Equilibrium Results for Games with Incomplete Information, Journal of Economic Theory, 13: 265276.

78

in the sense that there is a metric, r , on R(Y, Z, ) such that k weakly if


and only if r (k , ) 0.58 In case the transition probabilities are constant in
z, so they are simply probability measures, the metric reduces to the Prohorov
metric, justifying the similar notation. Indeed, there is a countable collection
{fi } of bounded, continuous functions fi : Y R such that we can set
r

(, ) =

X
i=1 j=1

Z
Z


1

fi (y)(dy|z) (dz)
2i+j (Bj ) zBj
y

Z

Z

fi (y) (dy|z) (dz) ,


zBj

where B1 , B2 , . . . are the relatively open balls in Z defined by Bj = Z Br (x)


for rational radius r and vectors x with rational coordinates. As with the case of
probability measures, if Y is compact, then the set R(Y, Z, ) endowed with the
metric r is compact.59 Assuming Y is compact, I show in the appendix that the
set R(Y, Z, ) of transition probabilities equipped with the metric r is a metric
mixture space and that the metric r is convex. With the metric version of
Schauders theorem, then every continuous function f : R(Y, Z, ) R(Y, Z, )
has at least one fixed point.

16

Continuous Correspondences

Given metric spaces X and Y , a correspondence from X to Y , denoted : X


Y , is a mapping from X to subsets of Y , i.e., (x) Y for all x X. As a
special case, we may view a function f : X Y as a correspondence that takes
only singleton sets as values; it is associated to the correspondence : X Y
defined by (x) = {f (x)} for all x X. A correspondence : X Y has
nonempty values (resp. closed values, compact values) if for all x X, (x) 6=
(resp. (x) is closed, (x) is compact). In contrast to the case of functions,
there are two main notions of continuity of correspondences, upper and lower
hemicontinuity, each generalizing the usual notion for functions.
The correspondence : X Y is upper hemicontinuous at x X if for every
open set V Y with (x) V , there is an open set U X such that x
U and for all z U , we have (z) V . It is upper hemicontinuous if it
is upper hemicontinuous at every x X. See Figure 11. Equivalently, the
correspondence is upper hemi-continuous if for every closed set F Y , the set
{x X | (x) F 6= } is closed. A correspondence with closed values is
upper hemicontinuous only if for all x X, all y Y , and all sequences {xm }
converging to x in X and {ym } converging to y in Y such that ym (xm ) for
58 See Proposition 4.1.1 of Balder (2002) A Unifying Pair of Cournot-Nash Equilibrium
Existence Results, Journal of Economic Theory, 102: 437470.
59 See Theorem 4.1.1 of Balder (2002) A Unifying Pair of Cournot-Nash Equilibrium Existence Results, Journal of Economic Theory, 102: 437470.

79

(x)

U
Figure 11: Upper hemicontinuity
all m, we have y (x).60 Assuming Y is compact, the converse direction holds
as well. Recall that the product space X Y can be endowed with the product
metric, making it a metric space, and that a sequence {(xm , ym )} converges to
(x, y) in the product space if and only if xm x in X and ym y in Y .
Defining the graph of as
graph()

= {(x, y) X Y | y (x)},

the correspondence has closed graph if, fittingly, its graph is closed in the product space X Y . Assuming Y is compact, a correspondence with closed
values is upper hemicontinuous if and only if it has closed graph. By a version
of Weierstrass theorem for correspondences, the image of a compact set under
an upper hemi-continuous correspondence with compact values is compact: if
: X Y is upper hemi-continuous
S and has compact values, then for all compact K X, the image (K) = {(x) | x K} is a compact subset of Y .

The correspondence : X Y is lower hemicontinuous at x X if for every


open set V Y with (x) V 6= , there is an open set U X such that x U
and for all z U , (z) V 6= . It is lower hemicontinuous if it is lower hemicontinuous at every x X. See Figure 12. Equivalently, the correspondence is
lower hemi-continuous if for every closed set F Y , the set {x X | (x) F }
is closed. For yet another equivalence, the correspondence is lower hemicontinuous if and only if for all x X, all y (x), and all sequences {xm }
converging to x in X, there exist a subsequence {xmk } of {xm } and a corresponding sequence {yk } in Y such that yk y and for all k, yk (xmk ). The
60 This follows from Aliprantis and Borders (2006) Corollary 3.21, which implies that every
metric space is a regular topological space, and their Theorem 17.10.

80

(x)
V

U
Figure 12: Lower hemicontinuity
correspondence has open graph if, fittingly, its graph is open in the product
space X Y . Every correspondence with open graph is lower hemi-continuous.
In fact, every correspondence with open lower sections, i.e., for all y Y ,
{x X | y (x)}
is open, is lower hemi-continuous.
A correspondence : X Y is continuous at x X if it is both upper and
lower hemi-continuous at x. It is continuous if it is continuous at all x X, i.e.,
it is both upper and lower hemi-continuous. If has singleton values, so there
is a function f : X Y such that (x) = {f (x)} for all x X, then upper
hemi-continuity, lower hemi-continuity, and continuity of the correspondence
are equivalent conditions, and they in turn are equivalent to continuity of the
function f . Assuming Y is compact, and letting K(Y ) denote the nonempty,
compact subsets of Y with the Hausdorff metric, : X Y is continuous with
closed values if and only if the function F : X K(Y ) defined by F (x) = (x)
is continuous.
Upper hemi-continuity is preserved by finite unions, and under weak conditions
by arbitrary intersections and products: letting X and Y be metric spaces, and
letting {i | i I} be a collection of upper hemi-continuous correspondences
i : X Y indexed by elements of the set I,61
61 The result for intersections uses Aliprantis and Borders (2006) Corollary 3.21, which
implies that every metric space is a regular topological space, and their Theorem 17.25.

81

if I = {1, . . . , m} is finite, then the correspondence : X Y defined by


(x) =

m
[

i (x)

i=1

is upper hemi-continuous,
if each i has closed graph, or if each i has closed values and at least
one is compact-valued, then the correspondence : X Y defined by
\
(x) =
i (x)
iI

is upper hemi-continuous,
if I is countable and each i has compact values, then the correspondence
: X Y defined by
Y
(x) =
i (x)
iI

is upper hemi-continuous (with the product metric on Y ).


In fact, the last result extends to arbitrary products with the product topology
on Y I , defined in Section 18. If : X Y is upper hemi-continuous, then the
pointwise closure of is upper hemi-continuous; that is, the correspondence
: X Y defined by (x) = clos((x)) is upper hemi-continuous.62
When Y = Rm and the correspondence : X Rm is upper hemi-continuous
and has compact values, the pointwise convex hull of is upper hemi-continuous;
that is, the correspondence : X Y defined by (x) = conv((x)) is upper
hemi-continuous. This uses the fact that the convex hull of a compact subset of
Rn is compact. In a general metric mixture space, however, we saw in Section
13 that this property does not always hold. I show in the appendix that this
result remains true when Y is a metric mixture space, : X Y is upper
hemi-continuous, and (x) = conv((x)) is compact for all x X.
Lower hemi-continuity is preserved by arbitrary unions and finite products:
letting {i | i I} be a collection of lower hemi-continuous correspondences
i : X Y indexed by elements of the set I,
the correspondence : X Y defined by
[
(x) =
i (x)
iI

is lower hemi-continuous,
62 This relies on Lemma 17.22 of Aliprantis and Border (2006), with the fact that every
metric space is normal, as established in their Corollary 3.21.

82

if I = {1, . . . , k} is finite, then the correspondence : X Y k defined by


(x) =

k
Y

i (x)

i=1

is lower hemi-continuous (with the product metric on Y k ).


Intersections, even finite ones, of lower hemi-continuous correspondences need
not be lower hemi-continuous. Given a finite collection of correspondences i ,
i = 1, . . . , n, T
with open lower sections, however, the intersection of corresponn
dences = i=1 i will have open lower sections, and therefore it is lower
hemi-continuous.
If the correspondence : X Y is lower hemi-continuous, then the pointwise
closure of is lower hemi-continuous; that is, the correspondence : X
Y defined by (x) = clos((x)) is lower hemi-continuous. Conversely, if the
pointwise closure of a correspondence is lower hemi-continuous, then so is .
When Y = Rm and the correspondence : X Rm is lower hemi-continuous,
the pointwise convex hull of is lower hemi-continuous; that is, the correspondence : X Y defined by (x) = conv((x)) is lower hemi-continuous. In
fact, I show in the appendix that this result remains true when Y is a metric
mixture space and : X Y is lower hemi-continuous.
For an example of an upper hemi-continuous correspondence, let X and P be
metric spaces, and let f : X P Rm be a continuous function. Define the
correspondence : P X by
(p)

= {x X | f (x, p) = 0}.

Assuming X is compact, the correspondence is upper hemi-continuous. In


words, the zeroes of a continuous function vary upper hemi-continuously with
parameters. This construction does not generally yield a lower hemi-continuous
correspondence, but assume that X Rm is open, that f is not only continuous
but for all p P , the function fp (x) = f (x, p) is a C 1 function of x, and that
for all p P , zero is a regular value of f . Then the correspondence defined
above is lower hemi-continuous. Indeed, consider any p P and any open set
V Rm with V (p) 6= , so there exists x X with fp (x) = 0. Since zero
is a regular value, the derivative Dfp (x) is non-singular, and then the metric
version of the implicit function theorem yields open sets V Rm and Q P
with (x, p) V Q and a continuous mapping g : Q V such that g(p) = x
and for all (x , p ) V Q, f (x , p ) = 0 if and only if g(p ) = x . Setting
U = g 1 (V ) Q, we have an open set such that for all p U , f (g(p ), p ) = 0,
so g(p ) V (p ), as required for lower hemi-continuity.
For another example, let X be a metric space, let Y Rm be compact, and let
: X Y be upper hemi-continuous with nonempty, closed values. Then the
83

correspondence : X P(Y ) defined by


(x)

= { P(Y ) | ((x)) = 1} = P((x))

is upper hemi-continuous with nonempty, compact, convex values with the Prohorov metric on P(Y ). Adding the assumption that X is a separable metric
space, a stronger result is possible: then is upper hemi-continuous if and only
if is upper hemi-continuous; and is lower hemi-continuous if and only if is
lower hemi-continuous.63 For another special case, let X Rn be measurable,
and define the support correspondence : P(X) X by
()

= supp().

Then is lower hemi-continuous and has closed values with the Prohorov metric
on P(X).
Given metric space X, a special case of Michaels selection theorem is that
if : X Rm is lower hemi-continuous and has nonempty, closed, convex
values, then it admits a continuous selection, i.e., there is a continuous function
f : X Rm such that for all x X, we have f (x) (x).
Let X and P be metric spaces, let f : X P R be a continuous function, and
let : P X be a continuous correspondence with nonempty, compact values.
Then Berges theorem of the maximum (or simply the maximum theorem) states
that the correspondence : P X defined by
(p) = arg max f (x, p)
x(p)

is upper hemi-continuous and has nonempty, compact values; furthermore, the


maximized value function v : P R defined by v(p) = maxx(p) f (x, p) is
continuous.
Extending the concept of a fixed point of a mapping to a correspondence
: X X, say x X is a fixed point of if x (x). Kakutanis fixed
point theorem states that if X Rn is nonempty, compact, and convex, and if
: X X is an upper hemi-continuous correspondence with nonempty, closed,
convex values, then has at least one fixed point. This can be extended to more
general spaces. A metric version of Glicksbergs fixed point theorem, proved in
the appendix, states that if X is a nonempty, compact metric mixture space
with a quasi-convex metric, and if : X X is an upper hemi-continuous
correspondence with nonempty, closed, convex values, then has at least one
fixed point. This generalizes the metric version of Schauders theorem, and all
of the applications to fixed point results in the previous section can be stated for
upper hemi-continuous correspondences with nonempty, closed, convex values.
63 See Theorem 3 of Himmelberg and Van Vleck (1975) Multifunctions with Values in
a Space of Probability Measures, Journal of Mathematical Analysis and Applications, 50:
108112.

84

For purposes of comparison, note that we can obtain a fixed point result from
Michaels selection theorem by replacing upper hemi-continuity in Glicksbergs
theorem with lower hemi-continuity: a correspondence satisfying these conditions admits a continuous selection f : X X, which by the metric version
of Schauders theorem admits a fixed point x = f (x) (x).

17

Measurable Correspondences

Given measurable X Rn and metric space Y , a correspondence : X Y is


lower measurable (or weakly measurable) if for every open set V Y , the set
{x X | (x) V 6= }
is measurable. If has singleton values, so there is a function f : X Y such
that (x) = {f (x)} for all x X, then lower measurability of is equivalent
to measurability of the function f . Clearly, a lower hemi-continuous correspondence is lower measurable. Less obviously, assuming Y is complete and separable, every upper hemi-continuous correspondence with closed values is also lower
measurable.64 There are other notions of measurability for correspondences, but
they will not be defined here.
Lower measurability is preserved by countable unions, often countable products,
and sometimes countable intersections: letting {m } be a sequence of lower
measurable correspondences m : X Y ,
the correspondence : X Y defined by
(x)

m (x)

m=1

is lower measurable,
if X is separable, then the correspondence : X Y defined by
(x)

m (x)

m=1

is lower measurable (with the product metric on Y ),


if X is separable, if each m has closed values, and if for each x X,
there is some m such that m (x) is compact, then the correspondence
64 To see this, let V Y be open, and note that {x X | (x) V 6= } is the projection of
graph()(XV ) onto X. Since graph() is closed and XV is open, the intersection is Borel
and, therefore, analytic. Moreover, the projection mapping is continuous, so its projection is
analytic (see remarks on p.446 of Aliprantis and Border (2006)). Thus, {x X | (x)V 6= }
is analytic. By Theorem 12.41 of Aliprantis and Border (2006), it is universally measurable.
And since Lebesgue measure is complete, it follows that the set is measurable.

85

: X Y k defined by
(x)

m (x)

m=1

is lower measurable.
A correspondence : X Y is lower measurable if and only if the correspondence : X Y defined by (x) = clos((x)), the pointwise closure of , is
lower measurable.
When Y = Rm and the correspondence : X Rm is lower measurable, I
show in the appendix that the pointwise convex hull of is lower measurable;
that is, the correspondence : X Rm defined by (x) = conv((x)) is
lower measurable. And when : X Rm has nonempty, compact values,
the correspondence is lower measurable if and only if the correspondence
: X P(Rm ) defined by
(x) =

{ P(Rm ) | ((x)) = 1} = P((x))

is lower measurable with the Prohorov metric on P(Rn ).65


Assuming X Rn is measurable and Y is a complete, separable metric space,
a special case of the Kuratowski-Ryll-Nardzewski selection theorem is that every lower measurable correspondence : X Y with nonempty, closed values
admits a measurable selection, i.e., there is a measurable function f : X Y
such that for all x X, we have f (x) (x). In fact, more can be said. Again,
let have nonempty, closed values. Castaings theorem states that if there is a
countable set {fm } of measurable mappings such that for all x X,
(x)

= clos({f1 (x), f2 (x), . . . , }),

then is lower measurable. The converse holds as well if either Y is complete


and separable or if Y is separable and has compact values.
Next, we extend the notion of Caratheodory function to accommodate a metric
space of parameters. Given metric space X and Y and measurable P Rn ,
the function f : X P Y is a Caratheodory function if (i) for each x X,
the mapping fx : P Y defined by fx (p) = f (x, p) is measurable, and (ii) for
each p P , the mapping fp : X Y defined by fp (x) = f (x, p) is continuous.
Assuming X is compact and Y = Rn , the correspondence : P X defined
by
(p) =

{x X | f (x, p) = 0}

65 See Theorem 3 of Himmelberg and Van Vleck (1975) Multifunctions with Values in
a Space of Probability Measures, Journal of Mathematical Analysis and Applications, 50:
108112.

86

is lower measurable with compact values. In words, the zeroes of a Caratheodory


function vary lower measurably with parameters.
Again consider metric spaces X and Y , measurable P Rn , and Caratheodory
function f : X P Y . Let : P X be a lower measurable correspondence
with nonempty, compact values. Let g : P Y be a measurable function such
that for each p P , there exists x X with g(p) = f (x, p). Assuming X and Y
are separable, a special case of Filippovs implicit function theorem establishes
that the element x solving the latter equation can be chosen as a measurable
function of p. More precisely, the correspondence : P X defined by
(p) =

{x (p) | f (x, p) = g(p)}

is lower measurable and admits a measurable selection, i.e., there is a measurable


function h : P X such that for all p P , we have h(p) (p) and g(p) =
f (h(p), p).
For a metric space X and measurable P Rn , let f : X P R be a
Caratheodory function, and let : P X be a lower measurable correspondence. Assuming X is separable, the measurable maximum theorem implies that
the correspondence : P X defined by
(p) = arg max f (x, p)
x(p)

is lower measurable, admits a measurable selection, and has nonempty, compact


values; furthermore, the maximized value function v : P R defined by v(p) =
maxx(p) f (x, p) is measurable.
Given measurable X Rn , recall that f : X Rm is measurable if each component function of f = (f1 , . . . , fm
of f with
R ) is measurable,R and that the integral
R
respect to a measure on Rn is f (x)(dx) = ( f1 (x)(dx), . . . , fm (x)(dx)).
We can extend this notion to the integral of a correspondence by collecting the
integrals of all -integrable selections: given correspondence : X Rm , the
Aumann integral (or simply integral ) of is

Z

Z
f is a -integrable selection
(x)(dx) =
f (x)(dx)
,
from
R
and is -integrable Rif (x)(dx) 6= . Note that in contrast to the usual
integral, the integral (x)(dx) is a subset of Rm . If is lower measurable
and has singleton values, so (x) = {f (x)} for a measurable function,
then
R
isR -integrable if and only if f is -integrable, and in this case (x)(dx) =
{ f (x)(dx)}, as one would hope.
Given measurable X Rn , a correspondence : X Rm is -integrably
bounded if there exists a -integrable function f : X R such that for all
87

x X, we have sup ||(x)|| = sup{||y|| | y (x)} f (x). If is integrably bounded and lower measurable with nonempty, closed values, then by
the Kuratowski-Ryll-Nardzewski selection theorem, since Rm is complete and
separable, it has a -integrable selection, and therefore the correspondence is
-integrable.
If is finite Rand : X Rm is -integrably bounded with closed values, then
the integral (x)(dx) is a compact set.66 A version of Lyapunovs theorem
for
R correspondences states that if is finite and atomless, then the integral
(x)(dx) is convex; in fact, as long as (x) is convex for each atom x X,
the integral will be convex for a general finite measure .67 Now assume that is
finite and that is -integrably bounded and lower measurable with nonempty,
closed values Then the convex hull of the integral is equal to the integral of the
convex hull:68
Z

Z
conv
(x)(dx)
=
conv((x))(dx).
It follows that if is also atomless, then

(x)(dx) =

conv((x))(dx).

The latter claim may not seem surprising when m = 1 and is real-valued,
given our intuition from the intermediate value theorem. To see how things
work in multiple dimensions, consider Figure 13, where : [0, 1] R2 maps
real numbers x to pairs (y1 , y2 ) and takes as its constant value the boundary

of
R a triangle; then the center of the triangle, denoted y , clearly belongs to
conv((x))(dx). To obtain it as the integral of a selection of , we select
the vertices of the triangle over an appropriate range (selecting each vertex over
roughly one third of the interval), thereby weighting the vertices to yield y .
It is sometimes of interest to consider measurable sets X Rn and P Rk and a
correspondence : X P Rm , where the values (x, p) of the correspondence
are parameterized by p. For p P , define p : X Rm by p (x) = (x, p); let
be a finite measure on Rk ; and let : Ln P R be a transition probability on
X, i.e., a mapping such that (i) for all p P , (|p) is a probability measure on
Rn with (X|p) = 1, and (ii) for all measurable Y Rn , (Y |p) is measurable
as a function of p. Assume that satisfies the Feller property, so that pm p
66 See

Proposition 7, p.73, of Hildenbrand (1974).


Theorem 3, p.62, of Hildenbrand (1974) for convexity. The more general statement
is found in Corollary 18.1.2 of Klein and Thompson (1984) Theory of Correspondences, John
Wiley and Sons: New York, NY.
68 This follows from Theorem 4, p.64, of Hildenbrand (1974) after two observations. Let
f : X R be a -integrable function such that for all x X, sup ||(x)|| f (x). First,
because is lower measurable with closed values, Theorem 18.6 of Aliprantis and Border
(2006) implies that it has measurable graph. Second, because f is measurable, Proposition
1(a), p.59, of Hildenbrand (1974) implies that the correspondence (x) = f (x) (x) has

measurable graph. Its values are subsets of Rm


+ , so Hildenbrands result may be applied to .
67 See

88

(x )

y
y2

x
x
y1
Figure 13: Lyapunovs theorem for correspondences

in P implies (|pm ) (|p) weakly.69 Thus, the probability measure on X


varies continuously with the parameter. Assume is lower measurable, i.e., for
all open G Rm , the set {(x, p) X P | (x, p) G 6= } is measurable
in Rn+k , and has nonempty, compact values. Furthermore, assume that for all
p P , the correspondence p is (|p)-integrably bounded. By definition of the
integral
of a correspondence, if a measurable function g : P Rm satisfies g(p)
R
p (x)(dx|p) for all p P , then for each p P , there is a selection fp from the
correspondence p that is measurable as a function of x and integrates to g(p).
Thus, we have a family {fp | p P } of selections that are each measurable with
respect to x; but the definition of the integral does not provide any information
about how these selections vary with p. Under the above assumptions, however,
there is a function f : X P Rm that is measurable as a function of both
variables (viewing X P as a subset of Rn+k ) and such that for -almost all p,
for (|p)-almost all x, we have f (x, p) (x, p)
R
f (x, p)(dx|p) = g(p).

That is, we can choose a selection from that is measurable as a function


of both variables (x, p) and determines the desired integrals g(p) as we vary
the parameter p.70 See Figure 14, where a selection from the correspondence
69 This assumption is stronger than needed. For the current result, all that is needed is that
(|p) vary measurably with respect to the Borel -algebra on P . This condition is implied
by the Feller property.
70 See the theorem of Artstein (1989) Parameterized Integration of Multifunctions with
Applications to Control and Optimization, SIAM Journal of Control and Optimization,
27: 13691380. The application of the latter result is complicated by the fact that Artstein

89

p ()

Rm

() (x)(dx|)

p
p

Figure 14: Jointly measurable selection

R
p p (x)(dx|p) is indicated in blue, and two selections from p () and
p () are indicated in red.71
Let X Rn be measurable and P be a metric space, and consider a correspondence : X P R Rm . Letting be a finite measure on Rn , we can
compute the integral p (dx)(dx) for each p. This determines a correspondence : P Rm defined as
Z
(p) =
p (dx)(dx).
We have already seen conditions under which any measurable selection g of
can be generated by a jointly measurable selection f of . Now we consider
the properties of as p varies; in particular, this correspondence will be upper
hemi-continuous quite generally. Assume that for -almost all x X, the
correspondence x : P Rm defined by x (p) = (x, p) has closed graph;
and assume that the family {p | p P } of correspondences is uniformly integrably bounded, i.e., there is a -integrable function f : X R such that
endows X P with the Borel -algebra. The assumption of the Feller property implies that the
mapping p (|p) is Borel measurable. Using Castaings theorem, lower measurability of
implies that it is the pointwise closure of a countable set {fi } of measurable selections. For each
fi , Theorem 10.35 of Aliprantis and Border (2006) implies that there is a function gi that is
BorelSmeasurable and equal to fi outside a measurable set Zi with Lebesgue measure zero. Let
Z=
i=1 Zi , which is also measure zero, and define the correspondence (z) = clos({gi (z)}).
Applying Castaings theorem again, is lower measurable with respect to the Borel -algebra
on X P . Moreover, for each z (X P )\ Z, we have (z) = clos({fi (z)}) = clos({gi (z)}) =
(z). Artsteins theorem can then be applied to .
71 For the reader who has printed out this document on a laser printer, there are two red
lines, one for p and one for p , and there is one blue line, which varies over p.

90

for all p P and all x X, sup ||p (x)|| f (x). Then a version of Fatous
lemma for correspondences states that has closed graph.72 Note as well that
if is atomless, then Lyapunovs theorem for correspondences implies that is
convex-valued.
The above form of Fatous lemma holds the integrating probability measure
fixed, in contrast to the analysis of jointly measurable selections. We recover
that generality by adding convex values. Returning to the variable measure
formulation, now let X Rn be compact, let P be a metric space, and let
: Ln P R be a transition probability on X, i.e., a mapping satisfying (i)
and (ii), above. Assume that satisfies the Feller property, and assume that
: X P Rm is upper hemi-continuous and has non-empty, compact values.
Then the correspondence : P Rm defined by
Z
(p) =
p (x)(dx|p)
has closed graph.73

18

Topological Spaces

We now consider imposing the minimal structure on a set X needed in order


to distinguish open from closed sets. In fact, we simply specify the collection T
of sets, called a topology, that we count as open, with the understanding that
a set is closed if and only if it is the complement of an open set. Any topology
can be defined on any set X, with the only restrictions being that and X are
open, that finite intersections of open sets are open, and that arbitrary unions
of open sets are open. More formally, we require:
, X T,
for all finite collections G = {G1 , . . . , Gk } T, we have
S
for all collections G T, we have G T.

Tk

i=1

Gi T,

A topological space is any set X together with a topology T (though it is customary to refer to X itself, leaving the topology implicit). A set U X is open
if U T, and it is closed if U = X \ U T. We then define the boundary
of a set Y X, denoted bd(Y ), to consist of every x X such that for all
U T, we have U Y 6= and U Y 6= . Extending the usual definition, a
set is open if and only if it is disjoint from its boundary, and it is closed if and
only if it contains its boundary. The interior of Y is int(Y ) = Y \ bd(Y ), the
closure of Y is clos(Y ) = Y bd(Y ), and these sets comprise, respectively, the
largest open subset and smallest closed superset of Y . As always, finite unions
and arbitrary intersections of closed sets are closed. Normally, the focus is on
72 See
73 See

Proposition 8, p.73, of Hildenbrand (1974).


Lemma A.2 of Duggan (2011) Coalitional Bargaining Equilibria, mimeo.

91

Hausdorff topological spaces, meaning that distinct elements can be separated


by open sets, i.e., for all distinct x, y X, there exist open sets U, V X such
that x U , y V , and U V = . An implication is that in a Hausdorff space,
singleton sets (and therefore finite sets) are always closed.
We can always give a topology to any set: simple examples are the trivial
topology, denoted T t = {, X}, which consists of just two sets, and the discrete
topology, denoted T d = 2X , which consists of the power set of X. Clearly, a
set can have more than one topology, and our choice of which topology to work
with is a matter of analytical convenience. Given two topologies T and T on a
space X, say T is weaker (or coarser ) than T if T T , and say T is stronger (or
finer ) than T if T T. The fundamental analytical trade-off is compactness
vs. continuity. Both properties are generally desirable, but there is tension
between them: a coarser topology on X has fewer open coverings, and therefore
compactness is easier to achieve, but a finer topology on X provides more open
sets, so that a function with domain X is more likely to be continuous; the
formal definitions that follow should make the tradeoff more transparent.
In fact, we have implicitly seen several examples of topologies in earlier sections:
Euclidean topology on Rn
n[
o
Te =
G | G {Br (x) | r > 0, x Rn }

relative topology on X Rn
n[
o
T e (X) =
G | G {Br (x) X | r > 0, x Rn }

Qk
product topology on X = i=1 Xi with Xi Rni
(
))
(n
[
Y

ni
T
=
,
Bri (xi ) | ri > 0, xi R , i = 1, . . . , k
G|G
i=1

and its extension to countably infinite products of sets.


In each case, the topology consists of arbitrary unions of open balls, suitably
defined. More generally, if there is a collection B of subsets of X such that
S a
topology T consists of arbitrary unions of subcollections of B, i.e., T = { G |
G B}, then T is generated by B, and B is a base for the topology.
The idea of relative topology, introduced in Section 5 for Rn and mentioned at
the end of Section 13 for metric spaces, can be extended to subsets of arbitrary
topological spaces. Let X be a space with topology T, and let Y X. Then
the relative topology on Y induced by T is
T(Y ) =

{Y G | G T}.
92

Assuming T is Hausdorff, T(Y ) will be Hausdorff for every Y X, and if in


addition Y is finite, then T(Y ) will be discrete.
Every metric space X with metric determines a particular topology, the metric
topology, which is denoted T and defined as arbitrary unions of open balls, i.e.,
n[
o
T =
B | B {Br (x) | r > 0, x X} ,

so again the collection {Br (x) | r > 0, x X} of open balls is a base for the
topology, and the metric topology is generated by this base; we may also say it
is generated by the metric . A topology T on a space X is metrizable if there is
a metric on X that generates the topology in the above way, i.e., T = T . Note
that a set Y Rn is openSif and only if for all x Y , there exists x > 0 such
that Bx (x) Y , so Y = xY Bx (x), so the Euclidean topology is metrizable
by the Euclidean metric. The discrete topology is metrizable by the discrete
metric. There are topologies that are not metrizable.

Without a notion of distance in a general topological space, we cannot define the


concept of a ball around x X of a given radius, but we can define convergence
using only open sets: a sequence {xm } in X converges to x X if for every
open set U T containing x, there exists n such that for all m n, we have
xm U . Unfortunately, sequences are not sufficient to capture the idea of
close proximity in general topological spaces, so we need to define a notion of
generalized sequence, or net. To define this idea formally, we begin with a
set D and a subset  D D, where given , D, we write  instead of
(, ) . We say that  is a direction on D if (i) for all D,  , (ii) for
all , , D,  and  implies  , and (iii) for all , D, there
exists D satisfying  and  . Then together with , we call D a
directed set. Finally, a net in X with directed set D is any mapping : D X.
Instead of (), we typically write x , and instead of the functional notation,
we follow convention and denote a net by {x }, submerging the directed set D
and mimicking the notation for sequences. In particular, a sequence {xm } is
a net with the set N of natural numbers directed by the usual greater than or
equal to relation.
Consider nets {x } and {y } on directed sets A and B with directions  and
, respectively. We say {y } is a subnet of {x } if there is a mapping : B A
such that (i) for all B, we have y = x() , and (ii) for each A, there
exists B such that for all B, implies ( )  . Clearly, the
notion of subnet extends the idea of a subsequence, where instead of assuming
the function : B A is strictly increasing with respect to the directions on
A and B, we assume condition (ii) above. The latter condition (ii) is more
permissive when the directions  and are not complete, i.e., when there
exist , A such that neither  nor  . Every subsequence of
a sequence is a subnet, but there are subnets on directed sets (other than the
natural numbers) that do not correspond to subsequences.
93

A net {x } in X converges to x X, written x x, if for all open sets


U T with x U , there exists D such that for all D with  ,
we have x U ; equivalently, when B is a base for , x x if for all U B
with x U , there exists D such that for all D with  , we have
x U . For the special case of a sequence, this definition of convergence for nets
coincides with the original definition in Euclidean space or a metric space. Nets
behave like sequences in the respect that x x if and only if every subnet of
{x } converges to x. Moreover, a topology is Hausdorff if and only if every net
{x } converges to at most one element x X. Analogous to the case of metric
spaces, a set Y X is closed if and only if every convergent net in Y has limit
in Y .
A set K X is compact if every open cover of K has a finite subcover; equivalently, K is compact if and only if every net {x } in K has a subnet that
converges to some x K; and Y is compact if and only if every collection of
closed subsets of Y with the finite intersection property has nonempty intersection. As before, the empty set and all finite sets are compact; finite unions of
compact sets are compact; and arbitrary intersections of compact sets are compact. When X is Hausdorff, every compact subset K X is closed, but this is
not generally true of non-Hausdorff spaces: if X has two or more elements and
is endowed with the trivial topology, then every singleton is compact but not
closed. It is true, however, that every closed subset of a compact set is compact.
As usual, a set Y X is connected if there do not exist open sets U, V X
such that U V = , Y U V , U Y 6= , and V Y 6= .
A function f : X R is continuous if for every open G R, we have f 1 (G)
T. We can generalize this notion of continuity to functions f : X Y , where Y
is any metric space, or even more generally, any topological space: the function
is continuous if for all open G Y , the pre-image f 1 (G) X is open. Note
that there are two topologies implicitly invoked in this definition: a topology on
X and a topology on Y . Due to complexities of some exotic topological spaces,
we cannot give equivalent statements in terms of sequences, but we can use nets
instead: a function f : X Y is continuous if and only if for every net {x }
in X converging to x X, the net {f (x )} converges to f (x) in Y . Thus, the
definition for metric spaces extends. Note that with the trivial topology, X is
automatically compact, but only constant functions are continuous; with the
discrete topology, every function is automatically continuous, but X is compact
if and only if it is finite. Again, by Weierstrass theorem, the image of a compact set under a continuous function is compact; and by the intermediate value
theorem, the image of a connected set under a continuous function is connected.
Letting X, Y , and Z be any topological spaces, if f : X Y and g : Y Z are
continuous with f (X) Y , then g f is continuous.
A topological space X is first countable if for every x X, there is a countable
collection {Un | n = 1, 2, . . .} T such that for every open V T with x V ,
94

there exists n such that Un V . In such spaces, we can dispense with nets,
and the above equivalences for closed sets and continuous functions can all be
stated with sequences only. Assuming X is first countable and Y is an arbitrary
topological space:
a set F X is closed if and only if for every convergent sequence {xm }
in F with limit x X, we have x F ,
a function f : X Y is continuous if and only if for every convergent
sequence {xm } in X with limit x X, we have f (xm ) f (x) in Y .
Every metric space (including Euclidean space with the usual norm) is first
countable. There are topological spaces that are not first countable.
A vector space is a set X, which contains a zero vector, 0, and on which addition
and scalar multiplication are defined and possess certain intuitive properties:
x+y =y+x
(x + y) + z) = x + (y + z)
x+0=x
x + (1)x = 0
1x = x
(x) = ()x
(x + y) = x + y
( + )x = x + x,
where x, y, z range over X and , range over the real numbers. As usual,
given a subset A X of a vector space, conv(A) denotes the set of every convex
combination of vectors in X, i.e., it consists of every 1 x1 + + n xn such
that the weights i are non-negative and sum to one. A vector space X is a
topological vector space (or tvs or linear topological space) when it is endowed
with a topology such that addition and scalar multiplication are continuous.74
Of course, Rn is a topological vector space with the Euclidean topology. Letting
X be a topological space and Y a topological vector space, if f, g : X Y are
continuous and , R, then f + g is continuous.
Given a vector space X, a set C X is convex if for all x, y X and all
(0, 1), we have x + (1 )y C. As usual, given Y X and a function
f : Y X, say x Y is a fixed point of f if f (x) = x. The topological vector
space X is locally convex if for every x X and every open set V containing
74 In some treatments (e.g., Dunford and Schwartz (1958)), a topological vector space is by
definition also Hausdorff.

95

x, there is a convex open set U such that x U V . Given a locally convex,


Hausdorff topological vector space X, Schauders theorem states that if K X
is nonempty, compact, and convex and if f : K K is continuous with the
relative topology on K, then f has at least one fixed point.
To extend the product topology from countable products to arbitrary ones, let
I be an arbitrary index set, and
S spaces.
Q let {X | I} be a collection of metric
Recall that the product X = I X consists of mappings f : I {X |
I} such that for all I, we have f () X . We can give the space X a
topology as follows. First,
Q say Y X is a finite cylinder set in X if (i) it is a
product set, i.e., Y = I Y , (ii) there is a finite set F I such that for all
F , Y is an open subset of X , and (iii) for all I \ F , Y = X. Next,
let F be the collection of finite cylinder sets in X. Finally, define the product
topology on X to consist of arbitrary unions of finite cylinder sets in X:
n[
o
F | F F .
T =
Thus, the finite cylinder sets form a base for the product topology.

A net {f } in X with directed set D converges in the product topology to f X


if for every open set U T with f U , there exists D such that for all
D with  , we have f U . This definition is unchanged if we replace
the open set U with a finite cylinder set C F , and we see that f f in
the product topology if and only if for all I, f () f (). For this reason,
convergence in the product topology is sometimes called pointwise convergence.
By Tychonoff s product theorem, if each X is compact, then X is compact in
the product topology.
When each factor X in the product is a topological space, we define the product
topology T in exactly the sameQway. And when each X is in fact a topological
vector space, the product X = I X is itself a topological vector space with
vector operations defined pointwise: that is, given f, g X and , R,
f + g is the function defined by (f + g)() = f () + g() for all I.
Moreover, if each factor is Hausdorff, then X is also Hausdorff. To see this,
consider distinct f, g X, and choose such that f 6= g . Since X is
Hausdorff, there exist disjoint open sets U, V X such that f U and g V .
Define the finite cylinder set Y so that Y = U if = and Y = X otherwise;
and define Z so that Z = V if = and Z = X otherwise. Then Y and Z are
disjoint open sets in the product topology containing, respectively, f and g, as
required. Finally, if each factor is locally convex, then X is also
Q locally convex.
Indeed, consider any f X and any finite cylinder set Y = I Y X with
f Y . Let F I be a finite set of indices satisfying conditions (ii) and (iii)
above. For each each F , local convexity of X yields a convex open set
U X such that f U Y . Define the finite cylinder set Y by Y = U
for F , and Y = Y = X otherwise, and note that this is a convex open
96

subset of Y containing f , as required. Obviously, the above remarks carry over


if each X is a convex subset of a locally convex, Hausdorff topological vector
space.QCombining Schauders theorem with previous results, it follows that if
X = I X is a product of nonempty, compact, convex subsets of locally
convex, Hausdorff topological vector spaces, and if f : X X is continuous
with the product topology on X, then f has at least one fixed point.
Q
For the special case of a sequence {fm } in an arbitrary product set X = I X
and f X, we have fm f in the product topology if and only if for all
I,Qfm () f (). When I = Rn and X = R for all , the product space
n
X = I X = RR consists of all functions f : Rn R, and a sequence {fm }
of functions converges to f in X pointwise if and only if it converges in the product topology. Thus, the abstract treatment above demonstrates that pointwise
convergence of real-valued functions can be obtained from a topological framework. When the index set I is countable, so the product X is either finite or
countably infinite, and each factor Xi is a bounded subset of Euclidean space
Rni , i I, we have seen that convergence in the product metric is equivalent
to pointwise convergence. In this case, the product topology is metrizable, and
the open balls {Br (x) | r > 0, x X} form a base for the product topology; in
particular, the product topology is first countable.
But the product topology is not first countable in general, and there can be
convergent nets with no corresponding convergent sequences. For example, let
I = [0, 1] and X = R for all [0, 1], so X = R[0,1] consists of all functions
f : [0, 1] R; and let D consist of all finite subsets F [0, 1] directed by
set inclusion, i.e., F  F if and only if F F . I claim that the net {IF }
of indicator functions IF : [0, 1] R of finite sets F D converges to the
function that takes a constant value equal to one, i.e., IF I[0,1] . To see
this, consider any x [0, 1] and F = {x}; then for every F with F F ,
we have IF (x) = 1. Thus, the net {IF } converges pointwise, and therefore in
the product topology, to I[0,1] . It follows that I[0,1] is in the closure of the set
{IF | F [0, 1], F finite }, but there is no sequence in this set that converges
to I[0,1] . A word of caution: assuming each XQ
is a compact metric space,
Tychonoffs product theorem implies that X = I X is compact, so every
sequence {fm } in X will indeed have a convergent subnet, but it need not have
a convergent subsequence; see the preceding example.
In sum, when considering uncountable products, we must deal with nets and the
problems they entail. Of note, while Lebesgues dominated convergence theorem
provides general conditions under which we can pass the pointwise limit of a
sequence of measurable functions through the integral sign, it does not hold for
nets. To see this, note that in the above example of indicator functions of finite
sets, we have
Z
Z
IF (x)dx = 0 6
I[0,1] (x)dx = 1.
97

In words, the Lebesgue integral of the indicator function of any finite set is zero,
and the net of these integrals (directed by the collection D of finite subsets) does
not converge to the integral of the function taking a constant value of one, which
is one.
Given topological spaces X and Y , a correspondence : X Y assigns a
set (x) Y to each x X. The original definitions of upper and lower
hemi-continuity were entirely topological, i.e., they involved only open sets, so
they extend to the current setting unchanged. In particular, a correspondence
: X Y is upper hemicontinuous if for every x X and every open set
V Y with (x) V , there is an open set U X with x U such that for all
z U , we have (z) V .
As usual, given a correspondence : X X, say x X is a fixed point of
if x (x). Given a locally convex, Hausdorff topological vector space X,
Glicksbergs theorem states that if K X is nonempty, compact, and convex,
and if : K K is upper hemicontinuous with the relative topology pn K and
has nonempty, closed, convex values, then has at least one fixed point. This
generalizes Schauders theorem, and all of the above applications of Schauder
can be extended to correspondences in the obvious way.75 It is noteworthy
that compactness in Glickbergs theorem can be weakened somewhat. Let C
be a nonempty, closed, convex subset of a locally convex, Hausdorff topological
vector space; let : C C have closed graph and nonempty, convex values;
and assume that there is a compact set K X such that (x) K for all
x C. Then the Bohnenblust-Karlin theorem states that has at least one
fixed point.

19

Weak and Weak* Topologies

In this section, we fix a measurable set X Rn and a measure on Rn .


Combining the terminology from Sections 11 and 14, a sequence {fm } converges
weakly to f in L1 (X, ) if for all g L (X, ), we have
Z
(fm (x) f (x))g(x)(dx) 0.
For p (1, ), a sequence {fm } converges weakly (or weak*) order p to f in
Lp (X, ) if for all g Lq (X, ), we have
Z
(fm (x) f (x))g(x)(dx) 0,
75 See Section 20 for a different set of conditions for a fixed point that relies on convex values
and open lower sections.

98

where p1 + 1q = 1. And a sequence {fm } weak* converges to f in L (X, ) if


for all g L1 (X, ), we have
Z
(fm (x) f (x))g(x)(dx) 0.
These notions of convergence lead to a class of topologies on the spaces Lp (X, )
and L (X, ). In what follows, we combine the cases p = 1, p (1, ), and
p = by specifying that p and q are conjugate if both 1 < p < and p1 + q1 = 1
or if both p = 1 and q = or if both p = and q = 1.
Given p [1, ) or p = , and letting q be conjugate to p, consider the
collection Up of sets of the form
R


(f (x) h(x))gi (x) < ,
p
U (f, , g1 , . . . , gn ) =
h L (X, )
,
i = 1, . . . , n

where f Lp (X, ), > 0, n N, and g1 , . . . , gn Lq (X, ). Collecting all


unions of sets of the above form, we have the topology generated by the base
Up . When p [1, ) (but not p = ), this topology is called the weak topology
p
on Lp (X, ) and is denoted Tw
; and when p (1, ) or p = (but not p = 1),
the topology is called the weak * topology on Lp (X, ) and is denoted Tp . Note
that the weak and weak* topologies coincide for p (1, ); we do not define
the weak topology on L (X, ), and we do not define the weak* topology
on L1 (X, ).76
For p [1, ) or p = , denote the metric topology on Lp (X, ) with the
p
metric p by T p = T . As the name suggests, the weak topology on Lp (X, )
with p [1, ) is weaker than the metric topology, and similarly, the weak*
topology on L (X, ) is weaker than the metric topology. Thus, there are
generally more closed subsets in the metric topology than the weak topology. It
turns out, however, that the metric and weak topologies have the same closed,
convex sets. Indeed, let p [1, ), and consider any set F Lp (X, ) that
is convex, i.e., we have f + (1 )g F for all f, g F and all (0, 1);
then F is closed in the metric topology T p if and only if it is closed in the weak
p 77
.
topology Tw
Recall that when is finite and 1 p < q < , we have Lq (X, ) Lp (X, ),
which implies that the base Up generating the weak topology on Lp (X, ) is
smaller, in a sense, than the base Uq generating the weak topology on Lq (X, ).
This implies that the weak topology on Lp (X, ) (when relativized to Lq (X, ))
is weaker, or coarser, than the weak topology on Lq (X, ), i.e., the weak topology
76 These terminological conventions hinge on whether Lp (X, ) consists of all continuous
linear functions on Lq (X, ) or vice versa, where p and q are conjugate. This question is
addressed by the Riesz representation theorem and is outside the scope of this survey.
77 See Theorem 13, p.422, of Dunford and Schwartz (1958).

99

on Lq (X, ) is stronger, or finer, than the weak topology on Lp (X, ) (when


relativized to Lq (X, )). More precisely,
p
{G Lq (X, ) | G Tw
}

q
Tw
.

This means that when is finite, a compact subset K in the weak topology
on Lq (X, ) will also be compact in the weak topology on Lp (X, ); and given
any continuous function : Lp (X, ) R with the weak topology on Lp (X, ),
the restriction |Lq (X,) R will be continuous with the weak topology on
Lq (X, ). Similar remarks hold for the weak* topologies on Lp (X, ), with
p (1, ) and p = .
Given p [1, ), if a set F Lp (X, ) is closed (resp. open, compact) in the
weak topology, then it is weakly closed (resp. weakly open, weakly compact ).
Given topological space Y , a function : Lp (X, ) Y is weakly continuous if
it continuous with respect to the the weak topology on Lp (X, ), i.e., for all open
G Y , the pre-image 1 (Y ) is weakly open. Given a subset L Lp (X, ),
the relative weak topology on L is
p
Tw
(L) =

p
{L G | G Tw
}.

p
q
For example, in the discussion above, we argued that Tw
(Lq (X, )) Tw
when
is finite and p < q. A set F L is weakly closed (resp. weakly open, weakly
compact ) relative to L if it is closed (resp. open, compact) in the topology
p
(L). The function : L Y is weakly continuous relative to L if for all open
Tw
G Y , 1 (G) is weakly open relative to L. Analogously, given p (1, ) or
p = , we define weak * closed, weak * open, and weak * compact sets and weak *
continuous functions using the topology Tp . Given a subset L Lp (X, ), the
relative weak * topology on L is

Tp (L) =

p
},
{L G | G Tw

and we define relatively weak* closed, open, and compact sets and relatively
weak* continuous functions as above.
The weak topology on Lp (X, ), with p [1, ), underlies the notion of weak
convergence. Although we have defined weak convergence for sequences, the
topological approach raises the possibility of convergent nets, as well as sequences. Given p and q conjugate with p [1, ), a net {f } converges in
the weak topology (or just weakly) to f in Lp (X, ) if it converges in the weak
p
. I show in the appendix that after some simplification this reduces
topology Tw
to the following: f f weakly in Lp (X, ) if and only if for all > 0, all n,
and all g1 , . . . , gn Lq (X, ), there exists such that for all  , we have
x U (f, , g1 , . . . , gn ). Equivalently, f f weakly if for all g Lq (X, ),
Z
(f (x) f (x))g(x)(dx) 0.
100

Thus, the weak topology extends the notion of weak convergence of sequences
in Lp (X, ) with p [1, ). Consistent with the discussion in the preceding
paragraph, assuming is finite and p < q, if {f } Lq (X, ) converges to
q
f Lq (X, ) in the weak topology Tw
, then it converges in the weak topology
p
Tw . By definition, for p (1, ), weak convergence and weak* convergence
coincide. Finally, a net {f } converges in the weak* topology (or just weak *) to
f in L (X, ) if it converges in the weak* topology T described above: for
all > 0, all n, and all g1 , . . . , gn L1 (X, ), there exists such that for all
 , we have x U (f, , g1 , . . . , gn ). Equivalently, f f weak* if for all
g L1 (X, ),
Z
(f (x) f (x))g(x)(dx) 0.
So, again, the weak* topology extends the notion of weak* convergence of a
sequence of functions.
For the special case of a sequence {fm } in L1 (X, ), we can provide some characterization of weak limits. Recall from the sawtooth example in Figure 9 that a
sequence may converge weakly yet possess no subsequence (or subnet) that converges pointwise -almost everywhere to its limit f (which takes a constant value
of one in the figure), but there is still a characterization in terms of convex hulls
of pointwise limits. Recall that a set K L1 (X, ) is uniformly -integrable
if for all > 0, there exists >R 0 such that for all f K and all measurable
Y X with (Y ) < , we have Y |f (x)|(dx) < . Let {fm } be a uniformly integrable sequence that weakly converges to f in L1 (X, ). If is finite, then for
-almost all x X, f (x) is contained in the convex hull of accumulation points
of the sequence {fm (x)}, i.e., f (x) [lim inf fm (x), lim sup fm (x)] (where we ignore the -measure zero set on which one of these limits may be infinite). In the
sawtooth example, we have f (x) = 1 [0, 2] = [lim inf fm (x), lim sup fm (x)] for
almost all x. More generally, let {f m } be a sequence of vector-valued functions
f m = (f1m , . . . , fnm ) and f = (f1 , . . . , fn ) such that f m : X Rn is measurable
for each m, let f : X Rn be measurable, and assume that {fim } is uniformly
-integrable and weakly converges to fi in L1 (X, ), i = 1, . . . , n. If is finite,
then for -almost all x X, we have:78



m
f (x) conv ls {f (x)} .

And since convergence in the weak topology of Lp (X, ) implies convergence in


the weak topology of L1 (X, ), the characterization extends to Lp (X, ), with
p (1, ) and p = .

78 See Proposition C of Artstein (1979) A Note on Fatous Lemma in Several Dimensions,


Journal of Mathematical Economics, 6: 277282. The statement here follows from Artsteins
result after noting that if the component functions fim converge weakly to fi in L1 (X, ), then
the vector-valued functions f m converge weakly to f in the L1 -space of Rn -valued functions.

101

The weak and weak* topologies are Hausdorff when is -finite, as shown in
the appendix. Furthermore, each basic open set U (f, , g1 , . . . , gn ) is convex,
which implies that the weak and weak* topologies are locally convex. The weak
and weak* topologies are not generally metrizable. Recall that for p [1, )
or p = , we denote by Drp (f ) the disc of radius r around x in the metric p .
A special case of Alaoglus theorem is that for all p (1, ), all r > 0, and all
f Lp (X, ), the disc Drp (f ) is compact in the weak* (or equivalently, weak)
topology on Lp (X, ); and for all r > 0 and all f L (X, ), the disc Dr (f ) is
compact in the weak* topology on L (X, ). But Alaoglus theorem does not
apply to the discs Dr1 (f ) in L1 (X, ) with the weak topology; in general, this
disc need not be weakly compact (see the discussion in the next paragraph).
Of course, a set is contained in Drp (f ) if and only if it is a bounded subset of
Lp (X, ). It follows that for p (1, ), if a subset K Lp (X, ) is convex,
bounded, and closed in the metric topology T p , then it is compact in the weak
(equivalently, weak*) topology.
Weak compactness in L1 (X, ) presents special difficulties, not because the weak
topology is very strong (recall that it is generated by basic sets indexed by only
functions g1 , . . . , gn L (X, )), but because the space is large. Assuming
is finite, however, a set K L1 (X, ) is compact in the weak topology if and only
if it is bounded in the L1 -metric, weakly closed, and uniformly -integrable.79
An implication is that for convex K L1 (X, ), the set K is compact in the
weak topology if and only if it is closed and bounded in the metric topology
and uniformly -integrable. To see why uniform -integrability is crucial for
compactness in L1 (X, ), return to the earlier example of the sequence
{fm } of
R
functions fm : [0, 1] R defined by fm = mI[1 m1 ,1] . Note that |fm (x)|dx = 1
for all m, so the sequence belongs to the disc D11 (0) of radius one around the zero
function in L1 ([0, 1], [0,1] ), where [0,1] is the restriction of Lebesgue measure to
measurable subsets of the unit interval. But there is no subsequence (or subnet)
of {fm } that converges to any function in the weak topology. Indeed, if there
were such a function, say f , it must be that f takes the value of zero on [0, 1).
Letting Rg1 L ([0, 1], [0,1]R) take the constant value of one, weak
R convergence
implies fm (x)g1 (x)dx f (x)g1 (x)dx, but this implies that f (x)dx = 1,
which is impossible. Obviously, the sequence {fm } is not uniformly integrable,
1
because there are arbitrarily small intervals [1 m
] over which integrals of fm do
2
not go
to zero. In contrast, the corresponding example
R in L (X, ) would1 specify
2
fm = mI[1 m1 ] , so that fm D1 (0), but then fm (x)g1 (x)dx = m 0,
avoiding the difficulty in L1 (X, ).
The above result on compactness in L1 (X) can be applied to selections of correspondences. Assume is a finite measure, and let : X R be -integrably
bounded and lower measurable with closed, convex values. Then the set of
79 See Theorem 15, p.76, of J. Diestel and J. Uhl (1977) Vector Meaures, American Mathematical Society: Providence RI.

102

-integrable selections,

o
n

S() =
f L1 (X, ) f (x) (x) for -almost all x X ,

is a weakly compact subset of L1 (X, ).80 I provide a short proof of this result
in the appendix. Note that the assumption of convex values is crucial for the
result: the constant correspondence : [0, 1] R defined by (x) = {0, 2}
satisfies the remaining conditions of the result, and the sawtooth functions in
Figure 9 are selections from , but they converge weakly to the function with
constant value one, which is not a selection from .
Combining Schauders theorem with previous results:
If K L1 (X, ) is nonempty, convex, bounded, closed in the metric topology, and uniformly -integrable, then every weakly continuous : K K
has at least one fixed point.
Given p (1, ), if K Lp (X, ) is nonempty, convex, bounded, and
closed in the metric topology, then every weakly continuous : K K
has at least one fixed point.
If K L (X, ) is nonempty, convex, bounded, and closed in the weak*
topology, then every weak* continuous : K K has at least one fixed
point.
Using Glicksbergs theorem, these results remain true if we replace weakly (or
weak*) continuous functions : K K with weakly (or weak*) upper hemicontinuous correspondences : K K such that for all f K, (f ) is nonempty,
convex, and closed in the metric topology.
Although the weak and weak* topologies are not metrizable, we can consider
the relative topologies they induce on discs. This is potentially useful because it
can allow us to work with sequences in relevant sets, avoiding problems entailed
by nets. Specifically, for p (1, ) and f Lp (X, ), we give the disc Drp (f )
the relative topology induced by the weak* topology on Lp (X, ); that is, we
endow the space Drp (f ) with the topology
Tp (r, f ) = Tp (Drp (f ))

= {Drp (f ) G | G Tp }.

When p (1, ), this topology is actually metrizable.81 The same is true


for f L (X, ) and the disc Dr (f ) with the relative topology induced by
T , denoted T (r, f ) and defined similarly: the relative topology T (r, f )
80 This is a version of Theorem 3.1 of Yannelis (1991) Integration of Banach-valued Correspondences, BEPR Reprint Number 91-023, pp.235, Springer-Verlag.
81 This follows from Theorem 6.30 of Aliprantis and Border (2006) and the fact that Lp (X, )
is separable for p [1, ) when X R. See also Theorem 1, p.426, of Dunford and Schwartz
(1958).

103

is metrizable.82 When p [1, ), we could also define the relative topology


induced by the weak topology,
p
p
Tw
(r, f ) = Tw
(Drp (f ))

p
= {Drp (f ) G | G Tw
},

p
and ask whether the relative topology Tw
(r, f ) is metrizable. For the case p
p
(1, ), these relative topologies are the same, i.e., Tw
(r, f ) = Tp (r, f ), because
the weak and weak* topologies coincide. When p = 1, however, the relative
1
weak topology Tw
(r, f ) on the disc Dr1 (f ) is not metrizable unless L (X, )
is separable, which holds only if has finite support. Thus, L1 (X, ) again
presents certain difficulties.

Although the discs in L1 (X, ) with the weak topology may not be metrizable, it
turns out that compactness nevertheless has a sequential characterization. For
p [1, ), say K Lp (X, ) is weakly sequentially compact if every sequence
{fm } in K has a subsequence that converges weakly to a limit in K.83 By

the Eberlein-Smulian
theorem, a set K Lp (X, ) is compact in the weak
topology if and only if it is weakly sequentially compact.84 Because every weakly
convergent sequence is bounded,85 an implication is that every weakly compact
set is bounded. In fact, even though the disc Dr1 (f ) is not metrizable, if a set
K L1 (X, ) is compact in the weak topology, then the relative topology on
p
p
K induced by the weak topology, Tw
(K) = {G K | G Tw
}, is metrizable.86
p
Of course, a set K L (X, ), with p (1, ) or p = , is weak * sequentially
compact if every sequence {fm } in K has a subsequence that converges weak*
to a limit in Lp (X, ).
Given a subset K Lp (X, ), with p [1, ), a function : K K is weakly
sequentially continuous if for all f K and all sequences {fm } in K weakly converging to f , we have (fm ) (f ) weakly. Analogously, given K Lp (X, ),
with p (1, ) or p = , the function is weak * sequentially continuous if the
same condition holds after replacing weak convergence with weak*. Combining
Schauders theorem with previous results:
Given p [1, ), if K Lp (X, ) is nonempty, convex, and weakly
sequentially compact, then every weakly sequentially continuous : K
K has at least one fixed point.
82 See

the previous footnote.


that Dunford and Schwartz (1958) give a weaker definition of sequential compactness
(p.21) that only requires that {fm } have some limit in Lp (X, ).
84 See Theorem 6.34 of Aliprantis and Border (2006), or Theorem 1, p.430, of Dunford and
Schwartz (1958).
85 See Theorem 3.4.11, part (a), of Ash (1972) Measure, Integration, and Functional Analysis, Academic Press: New York, NY. Interestingly, this result does not hold for nets: if the
support of is infinite (the typical case), then for all p [1, ) (or p = ), there is a net
{f } that weakly (or weak*) converges to the function that is identically zero, yet p (f , 0)
diverges to infinity; see Lemma 6.28 of Alirpantis and Border (2006).
86 See Theorem 6.32 of Aliprantis and Border (2006), or Theorem 3, p.434, of Dunford and
Schwartz (1958).
83 Note

104

If K L (X, ) is nonempty, convex, and weak* sequentially compact,


then every weak* sequentially continuous : K K has at least one fixed
point.
Given a set K Lp (X, ), a correspondence : K K is weakly sequentially
upper hemi-continuous if for every F K such that F is relatively weakly
p
closed, i.e., K \ F Tw
(K), and for every sequence {fm } weakly converging to
f in K with (fm ) F 6= for all m, we have (f ) F 6= . Analogously, the
correspondence is weak * sequentially upper hemi-continuous if the same condition holds with weak convergence replaced by weak*. Then, using Glicksbergs
theorem, the first of the above two fixed point results remains true if we replace
weakly sequentially continuous functions : K K with weakly sequentially
upper hemicontinuous correspondences : K K such that for all f K,
(f ) is nonempty, convex, and closed in the metric topology. The second result
extends to correspondences as well, though now we specify that (f ) be weak*
closed.
The weak and weak* topologies have deep connections to the product topology.
First, consider p and q conjugate with p [1, ). To each f Lp (X, ), we
can associate a function pf : Lq (X, ) R defined by
pf (g) =

f (x)g(x)(dx)

for all g Lq (X, ). Let


p

{pf | f Lp (X, )}

Q
q
be the set of mappings so-defined. Note that pf gLq (X,) R = RL (X,) ,
q
and let T denote the product topology on RL (X,) . Then the weak topology
p
on L (X, ) is essentially the relative topology on p induced by the product
q
topology on RL (X,) , i.e., we have
p
Tw

{p G | G T },

where the equivalence sign indicates that we identify an element f Lp (X, )


with the element pf p . In particular, f f in the weak topology
q
on Lp (X, ) if and only if f f in the product topology on RL (X,) .
Analogous remarks hold for the weak* topology on L (X, ), now defining
1

be the set of such mappings, and

f : L (X, ) R similarly, letting


L (X,)
endowing R
with the product topology. In fact, Alaoglus theorem on
compactness of discs in the weak* topology is obtained from the application of
Tychonoffs theorem via this connection.

105

20

Maximal Elements

The axiom of choice is one of the axioms usually imposed in the definition of set.
Given any
S arbitrary collection A of nonempty sets, it says there is a function
fQ: A A such that for all A A, f (A) A. The set of such mappings is
A, the Cartesian product of the collection. Often, we assume the collection is
indexed
by a set I via a bijective mapping : I A, and the product is written
Q
I A . Thus, the axiom of choice is simply that the Cartesian product of
nonempty sets is nonempty.
There are several equivalent reformulations of the axiom of choice. A relation on
X is a two-place predicate expressing a property possessed by pairs of elements
in X. Formally, a relation, typically denoted , can be viewed as a subset of
X X, and we write simply x  y for (x, y) . A partial order is a relation 
is (i) reflexive, i.e., for all x X, x  x, (ii) anti-symmetric, i.e., for all x, y X,
x  y and y  x implies x = y, and (iii) transitive, i.e., for all x, y, z X, x  y
and y  z implies x  z. Given the relation  on X, the set C X is a chain
if for all x, y C, either x  y or y  x. An element x X is an upper bound
of C if for all y C, x  y. An element x X is maximal for the partial order
 if for all y X, y  x implies x = y. It turns out that the axiom of choice is
equivalent to Zorns lemma, which states that given any set X and partial order
 on X, if every chain has an upper bound, then there is at least one maximal
element.
A well-ordering of a set X is a partial order  on X such that every subset
of Y possesses a unique dominant element, i.e., there exists x Y such that
for all y Y \ {x}, we have x  y. The axiom of choice is equivalent to the
well-ordering principle, which states that given any nonempty set X, there is
a well-ordering of X. Of course, the reverse of the usual ordering of natural
numbers is a well-ordering (because every nonempty subset of natural numbers
has a unique minimum), but a well-ordering of the real numbers, for example,
cannot be explicitly constructed. The usual greater than relation does not
suffice, of course, because there is no greatest real number, nor for that matter
is there a greatest real number in the open interval (0, 1).
Given a partial order  on X and a chain C, say the chain is maximal if there
is no other chain that contains it, i.e., there does not exist x X \ C such that
for all y C, either x  y or y  x. The axiom of choice is equivalent to
the Hausdorff maximality principle, which states that given any set X and any
partial order of X, there is at least one maximal chain.
As an application, let X be a topological space and K any nonempty collection
of nonempty, compact subsets of X. Then there is a nonempty, compact set
K X that is minimal with respect to set inclusion among the sets in K, i.e.,
there does not exist K K \ {K} such that K K. To see this, let  be the
106

relation of set inclusion on K: K  K if K K . This relation is a partial


ordering of K, and the Hausdorff maximality principle implies that K contains
a maximal chain, say C. Note that for all K, K C, either K K or K K.
Thus, the collection C has
T the finite intersection property, and by compactness,
the intersection K = C is nonempty. Moreover, since the intersection of
compact sets is compact, K is compact. Suppose there is some K K \ {K }
such that K K . It follows that K
/ C, but then C {K } is a chain that
contains C as a proper subset, contradicting maximality. We conclude that K
is minimal among K.87
From this observation, we can deduce a general existence result for maximal
elements of a relation. First note that we can extend the concept of dominant
element to a general relation & on X as follows: an element x X is maximal
if for all y X, y & x implies x & y; for a well-ordering, an element is maximal
if and only if it is dominant. Let X be a topological space, and let & satisfy:
(i) for all x X, the upper section of & at x, denoted &x = {y X | y & x},
is closed, (ii) for some x X, &x is compact, and (iii) & is transitive. Then
& has at least one maximal element.88 Indeed, using (ii), choose any x such
that &x is compact. Define Z = {x X | &x{x} &x {x }}, and consider
the collection K = {&x {x} | x Z}. By (i), this collection is nonempty
and consists of nonempty, compact subsets of X. Therefore, there exists z Z
such that &z {z} K is minimal with respect to set inclusion. I claim that
z is maximal, for consider any x such that x & z. By (iii), it follows that
&x {x} &z {z} &x {x }, and in particular, x Z, which implies
&x{x} K. Since &z {z} is minimal among K, we have &z {z} &x{x},
and therefore z & x, and it follows that z is maximal.
We can extend the analysis of dominant elements to non-transitive relations as
well. For a reflexive relation & on X, we say an element x X is dominant if
for all y X, we have x & y. Dominant elements are always maximal. Let X
be a topological space, and assume (i) for all x X, &x is closed, (ii) for some
x X, &x is compact, and (iii ) for every finite set Y X, there exists x X
such that for all y Y , x & y. Then the relation & has at least one dominant
element. Indeed, using (ii), choose x such that &x is compact. If there is no
dominant element belonging to &x , then
o
[n
&x
{x | x 6& y} | y X ,
and by (i), the collection on the righthand side above is an open cover of &x .
By compactness, there is a finite subcover indexed by y1 , . . . , yk . Set Y =
{x , y1 , . . . , yk }. By condition (iii ), which might be called the finite dominance
property, there exists z X such that for all y Y , z & y, which is impossible.
87 Note that we have not shown that K K. For this, it is sufficient that the collection K
is closed with respect to intersections of chains.
88 See Proposition A1 of Banks, Duggan, and Le Breton (2006) Social Choice and Electoral
Competition in the General Spatial Model, Journal of Economic Theory, 126: 194234.

107

Given the reflexive relation &, let > be the associated dual relation defined so
that for all x, y X, the statement x > y holds if and only if it is not the
case that y & x. It follows that > is irreflexive, i.e., for all x X, not x > x.
Given the relation >, say an element x X is undominated if for all y X,
not y > x. Undominated elements are always maximal. Condition (iii ) can be
reformulated as follows: there does not exist a finite set Y X such that for
all x X, there exists y Y with y > x. The dominant elements of & coincide
with the undominated elements of >, and therefore under (i), (ii), and (iii ), the
relation > has at least one undominated element.
A relation & on X is complete if for all x, y X, either x & y or y & x. This
is equivalent to the condition that the dual > is asymmetric, in the sense that
for all x, y X, not both x > y and y > x. Note that completeness strengthens
reflexivity, and asymmetry strengthens irreflexivity; and when & is complete,
or equivalently when > is asymmetric, the dominant elements of & and undominated elements of > coincide with the maximal elements of these relations. We
say & is negatively acyclic if for all n and all finite sets {x1 , . . . , xn } X, either
xn & x1 or there exists i = 1, . . . , n 1 such that xi & xi+1 . Equivalently,
the dual > is acyclic if there do not exist n and a finite set {x1 , . . . , xn } X
such that x1 > x2 > xn > x1 . Note that negative acyclicity strengthens
completeness, and acyclicity strengthens asymmetry. If & is negatively acyclic,
or equivalently if > is acyclic, then the finite dominance property holds, immediately providing an additional set of sufficient conditions for existence.
In vector spaces, the finite dominance property is also implied by a convexity
condition on the upper sections of a relation. Letting X be a convex subset
of a vector space, we say a relation & on X is semi-convex if for all x X,
x
/ conv(>x ), where >x = {y X | y > x} is the upper section of the dual. This
implies that & is reflexive. If X is in addition a subset of a Hausdorff topological
vector space and &x is closed for all x X, then semi-convexity implies the
finite dominance property. Indeed, consider any finite set Y = {y1 , . . . , yk }, and
note that the span of Y is essentially a finite-dimensional Euclidean space.89
Letting &yj = &yj span(Y ), it follows that {&y1 , . . . , &yk } is a collection of
closed subsets of finite-dimensional Euclidean space, and semi-convexity
implies
S
that for all I {1, . . . ,Tk}, we have conv({yj | j I}) {&yj | j I}. Then
by the KKM theorem, {&yj | j I} 6= , as required. These observations yield
another form of the existence result. Let be X is a convex subset of a Hausdorff
topological vector space, and assume (i) for all x X, &x is closed, (ii) for
some x X, &x is compact, and (iii ) & is semi-convex. Then the relation &
has at least one dominant element, or equivalently, the dual > has at least one
undominated element.90
89 See

Theorem 5.21 of Aliprantis and Border (2006).


Theorem 2 of Yannelis (1985) Maximal Elements Over Non-Compact Subsets of
Linear Topological Spaces, Economics Letters, 17: 133136.
90 See

108

For a different application, let Y be a Hausdorff topological vector space, and


let X Y be nonempty. We say a correspondence : X Y posseess the
KKM property if for S
every n and every finite set {x1 , . . . , xn } X we have
n
conv({x1 , . . . , xn }) i=1 (xi ). Note the implication that xi (xi ) for all
i = 1, . . . , n. Then the above results on maximality
T imply that if has compact
values and possesses the KKM property, then {(x) | x X} =
6 . In fact,
the result holds if the requirement that is compact-valued is relaxed to the requirement that (x) is closed for all x and compact for some x.91 The key to this
result is that given , we can define an associated relation & on X so that for all
x, y X, x & y if and only if x (y). Then (i) and (ii) are obviously satisfied,
and the KKM property implies semi-convexity
of &. It follows that & possesses
T
a dominant element, say z, and then z {(x) | x X}, as required.

Finally, the above results on existence of maximal elements have implications


for fixed points of correspondences under conditions unlike those of Glicksbergs
fixed point theorem. Assume X is a nonempty, convex, compact subset of a
Hausdorff topological vector space, and let : X X be a correspondence
with nonempty, convex values and open lower sections, i.e., for all y X, the
set {x X | y (x)} is open. Then has at least one fixed point. In
fact, compactness of X can be weakened to the assumption that X \ {x X |
y (x)} is compact for some x X.92 The key to this result is that given
satisfying the preceding conditions, we can define an associated relation & on X
so that for all x, y X, x & y if and only if y
/ (x); or equivalently, y (x)
if and only if y > x. As above, (i) and (ii) are obviously satisfied. If there were
no fixed point, then we would have x
/ (x) for all x, implying semi-convexity
of &, and then there is a dominant element, say z. But since z is dominant,
it follows that for all y X, we have y
/ (z), i.e., (z) = , violating the
assumption of nonempty values. This contradiction establishes the fixed point
result. Compared to the fixed point theorem derived from Michaels selection
theorem in Section 16, we have dropped the assumption of closed values but
strengthened the assumption of lower hemi-continuity to open lower sections.

Technical Details

Here, I confirm that a single -integrable function is uniformly -integrable.


I prove a version of Lebesgues dominated convergence theorem with variable
integrating probability measure. I give conditions under which convergence in
measure implies convergence in mean. I establish that under assumed conditions, product measures are indeed measures. I investigate compactness of convex hulls in metric mixture spaces. I give a short proof of the metric version of
Glicksbergs theorem (and therefore the metric version of Schauders theorem).
91 See Lemma 1 of Fan (1961) A Generalization of Tychonoffs Fixed Point Theorem,
Mathematische Annalen, 33: 469489.
92 See Theorem 1 of Yannelis (1985) Maximal Elements Over Non-Compact Subsets of
Linear Topological Spaces, Economics Letters, 17: 133136.

109

I address details related to the space Lp (Rn , ) of equivalence classes of measurable functions, the space P(X) of probability measures with support in X Rn ,
and the space K(X) of nonempty, compact, convex subsets of compact X Rn .
I verify the metric characterization of weak convergence of transition probabilities. I address pointwise convex hulls of upper and lower hemi-continuous and
lower measurable correspondences. I establish some claims about the weak and
weak* topologies on Lp (X, ) and L (X, ) stated above. And I give conditions
under which measurable selections from a correspondence are weakly compact
in L1 (X, ).
Uniform integrability. Let be a finite measure on Rn , and let f : Rn R
be a -integrable function. Suppose in order to deduce a contradiction that
{f } is not uniformly -integrable,R so for all > 0, there exists > 0 and
. Then for each m, there
measurable Y with (Y ) < and Y |f (x)|(dx)
R
1
exists measurable Ym with (Ym ) < m
and Ym |f (x)|(dx) . Defining the
R
sequence {fm }R by fm (x) = f (x)IRn \Ym (x) for all m, it follows that |fm (x)
f (x)|(dx) = Ym f (x)(x) for all m and fm f in measure. Convergence
in measure implies there is a subsequence (still indexed by m) that converges
pointwise -almost everywhereR (see Section 11). But then Lebesgues dominated
convergence theorem implies |fm (x) f (x)|(dx) 0, a contradiction.
Lebesgues dominated convergence theorem. Let X Rn be measurable,
let P Rk , and consider any bounded function f : X P R. Assume that for
all x X, the function fx : P R defined by fx (p) = f (x, p) is continuous; and
assume that for all p P , the function fp : X R defined by fp (x) = f (x, p) is
measurable. Let {m } be a sequence of probability measures on Rn converging
weakly to , and let {pm } be a sequence of parameters converging to p in
P such that f (x, p) is continuous in x. Consider any > 0. Let b be any
bound for f . For each natural number m, define the function fm : X R
by fm (x) = f (x, pm ), and define f0 : X R by f0 (x) = f (x, p). Then K =
{fm | m = 0, 1, 2, . . .} is uniformly bounded by b, and for each x, sx (y) =
sup{|fm (x) fm (y) | m = 1, 2,R . . .} = sup{|f (x, pRm ) f (y, pm )| | m = 1, 2, . . .}
is continuous in y. Therefore, fm (x)k (dx) fm (x)(dx) uniformly across
mR as k . In particular,
there exists k such that for all m k, we have
R
| fm (x)m (dx) fm (x)(dx)| < 2 . Furthermore, continuity of fx implies
that for all x, fm (x) = f (x, pm ) = fx (pm ) fx (p) = f (x, p) = f0 (x), i.e.,
fm f0 pointwise. By Lebesgues
dominated Rconvergence, there exists such
R
that for all m , we have | fm (x)(dx) f (x)(dx)| < 2 . Setting m =

110

max{k, }, it follows that for all m m, we have


Z

Z


fm (x)m (dx) f (x)(dx)


Z
Z

Z
Z






fm (x)m (dx) fm (x)(dx) + fm (x)(dx) f (x)(dx)

+
2 2
,

as required.
Convergence in measure. I claim that
R if fm f in Rmeasure, then the
sequence converges in mean as long as |fm (x)|(dx) |f (x)|(dx) (and
|fm | + |f | is -integrable for all m). Suppose to the contrary that there exists
R > 0 such that for a subsequence (still indexed by m, for simplicity) we have
|fm (x)f (x)|(x) for all m. Going to a further subsequence (still indexed
by m), we can assume that fm f pointwise almost everywhere. Define the
measurable functions g = 2|f | and for each m,
R gm = |fm | + R|f |, and note
that gm g pointwise almost everywhere, and gm (x)(dx) g(x)(dx) <
. Furthermore, note that |fm (x) f (x)| 0 for -almost all x X, and
that for all m and all x X, |fm (x) f (x)| |fm (x)| + |f (x)|R = gm (x).
Therefore, Lebesgues dominated convergence theorem implies that |fm (x)
f (x)|(dx) 0, a contradiction. Thus, fm f in mean, as required.
Product measures. Let Rn be the collection of all half-open rectangles in Rn .
A quasi-measure on Rn is a mapping : Rn : R+ such that:93
(i) () = 0,
(ii) for all pairwise disjoint collections of rectangles R1 , R2 , . . . such that
is itself a half-open rectangle in Rn , we have
!

X
[
=
(Yi ).

Yi

i=1

Ri

i=1

i=1

A quasi-measure is extended to a function defined on the power set of Rn


as follows:
)
(m

X
m N and R1 , . . . , Rm are half-open


Sm
.
(Ri )
(Y ) = inf
rectangles with Y i=1 Ri
i=1

A set Y Rn is -measurable if for all Z Rn , we have (Z) = (Z Y ) +


(Z \ Y ). Letting be the collection of all -measurable sets, it contains
93 As far as I know, the concept of quasi-measure is not usually defined explicitly. It is
simply a measure defined on the semi-ring of half-open rectangles.

111

the empty set and is closed with respect to complements and countable unions,
i.e., it is a -algebra. The restriction of to is a measure, and (normally
without risk of confusion) it is just denoted as well. Thus, we begin with a
quasi-measure defined on half-open rectangles, and we extend it to a measure on
the -algebra of -measurable sets. By Lemma 10.19 of Aliprantis and Border
(2006), the measure space (Rn , , ) is complete, i.e., for all Y Rn and all
Z with Y Z and (Z) = 0, we have Y .
Now let and be measures on R and Rm , respectively, that are -finite and
absolutely continuous with respect to Lebesgue measure. Let be Lebesgue
measure on Rn . Define the quasi-measure on Rn as follows: write any
half-open rectangle R Rn as R = P Q, where P R and Q Rm ,
and specify ( )(R) = (P )(Q). Then is the collection of measurable sets, and is a measure on . Moreover, the measure space
(Rn , , ) is complete and -finite. To show that is a measure
on Rn , it suffices to prove that contains all Lebesgue measurable sets. To
this end, let B be the -algebra of Borel sets on R , and define Bm and Bn
similarly. Then B and Bm , and therefore:94
Bn = B Bm ,
where the first equality follows from Theorem 4.44 of Aliprantis and Border
(2006), and the last inclusion follows from Theorem 10.47 of Aliprantis and
Border (2006).
Consider any measurable A Ln = with Lebesgue measure zero, i.e., (A) =
0. I claim that A . By Theorem 10.23 (part 6), there is a measurable
set B Bn = B Bm such that A B and n (B) = 0. Let
f = IB : Rn R be the indicator function of B, and note that f is measurable;
furthermore, it is measurable with respect to , i.e., for all open G R, we
have f 1 (G) . Obviously, f is -integrable. For each z Rm , define
fz : R R by fz (y) = f (y, z) = IB (y, z). Fubinis theorem implies that fz is
measurable and in fact -integrable for all y outside a Lebesgue measure zero set
m
(and thus outside
a -measure zero set), that the function
R
R g : R R95defined
by g(z) = fz (y)dy is measurable, and that (B) = g(z)dz = 0.
This
implies that the set Bz = {y R | (y, z) B} has Lebesgue measure zero for
-almost all z, and by absolute continuity,
we have (By ) = 0 for -almost all
R
y. Defining g : Rm R by g(z) = fz (y)(dy), it follows that g(z) = 0 for all
z outside a -measure zero set. Applying Tonellis theorem, using -finiteness
94 Recall that the class of Borel sets is the smallest collection of sets that contains all open
sets and is closed with respect to complements and countable unions; the class is included
among the Lebesgue measurable sets. See Definition 4.14 of Aliprantis and Border (2006)
for details on Borel sets. See Definition 4.43 of Aliprantis and Border (2006) for details on
product -algebras.
95 See Theorem 22.6 of Aliprantis and Burkinshaw (1990) or Theorem 11.27 of Aliprantis
and Border (2006) for an exact statement of Fubinis theorem.

112

of and , we have:96
( )(B) =

Z Z


Z
f (y, z)(dy) (dz) =
g(z)(dz) = 0.

Since the measure space (Rn , , ) is complete, it follows that A .


Finally, consider A Ln = with (A) > 0. By Theorem 10.23 (part 7) of
Aliprantis and Border (2006), there is a set C such that (C) = 0, A C = ,
and AC Bn = B Bm . Completeness of Lebesgue measure implies
that C is measurable and has Lebesgue measure zero: C Ln and (C) = 0.
From the foregoing, it follows that C , and then A = (AC)\C .
We conclude that every Lebesgue measurable subset of Rn is ()-measurable.
Convex hulls in metric mixture spaces. First, let F = {x1 , . . . , xm } be
a finite subset of a metric mixture space X with metric . Letting be the
unit simplex in Rm , note that X is compact in the
topology,
Pproduct
m
and define the mapping f : X X by f (x, ) =

x
.
Since f
i=1 i i
is continuous, the image f (X ) = conv(F ) is compact. Second, assume
X is complete, let K X be a compact subset, and let C = clos(conv(K))
be the closure of the convex hull. Because X is complete and C is closed,
it follows that C is itself a complete metric space, so to show compactness it
suffices to show that C is totally bounded. Consider any > 0, and note that
{B 3 (x) | x K} is an open covering of K. Since K is compact, there is a
finite subcover {B 3 (x1 ), . . . , B 3 (xm )} of open balls centered at x1 , . . . , xm
K. Letting F = {x1 , . . . , xm }, it follows from the above that the convex hull
H = conv(F ) is compact. Of course, H C. Again, {B 3 (x) | x H} is an
open covering of H, so there is a finite subcover {B 3 (y1 ), . . . , B 3 (yn )} centered
at y1 , . . . , yn H. Now, consider any z conv(K), so there exist z1 , . . . , zk K
Pk
such that z can be written as the convex combination j=1 j zj . For each j,
Pk
there exists wj F such that zj B 3 (wj ). Letting w = j=1 j wj , we have

k
k
X
X

j wj max{(z1 , w1 ), . . . , (zk , wk )} < .


(z, w) =
j zj ,
3
j=1
j=1
Since w H, there is some yi H such that w B 3 (yi ). Thus, (z, yi )
Sn
, and we conclude that conv(K) i=1 B 23 (yi ). Finally,
(z, w) + (w, yi ) < 2
S3n
it follows that C i=1 B (yi ), and therefore it is totally bounded.

Metric Glicksberg theorem. Assume X is a nonempty, compact, metric


mixture space with quasi-convex metric, and let : X X be an upper hemicontinuous correspondence with nonempty, closed, convex values. For each natural number m, compactness of X implies the open covering {B m1 (x) | x X}
96 See Theorem 11.27 of Aliprantis and Burkinshaw (1990) or Theorem 11.28 of Aliprantis
and Border (2006) for an exact statement of Tonellis theorem.

113

has a finite subcover, {B m1 (z1 ), . . . , B m1 (zk )}, where z1 , . . . , zk X. Let Zm =


{z1 , . . . , zk } and Xm = conv(Zm ). Define the correspondences m : Xm Zm
by

[
arg min (z, y) | y (x)
m (x) =
zZm

and m : Xm Xm by m (x) = conv (m (x)), i.e., m (x) consists of convex


combinations of elements zi that minimize distance, among z1 , . . . , zk , to some
y (x). Note that the correspondence m has nonempty, closed values, while
m has nonempty, closed, convex values. To see that m has closed graph, it
suffices to consider x Xm , a sequence x x in Xm , and zi m (x ) for all .
For each , there exists y (x ) such that zi arg minzZm (z, y ). Since X
is compact, we can go to a subsequence (still indexed by m) such that
y y X, and because has
X
closed graph, we have y (x). By
z1 = x 1
y1
the theorem of the maximum, we have

zi arg minzZm ||z y||, and therex3


(x3 )
fore z m (x), establishing that m
X3
z3
y2
3 (x3 )
has closed graph. It follows that m ,

as the pointwise convex hull of m ,


z2 = x 2
has closed graph as well. (This is
proved later in the appendix.) Therefore, because Xm is finite-dimensional
(and equivalent to a nonempty, compact, convex subset of Rm1 ), Kakutanis theorem implies there is a fixed point
xm Xm such that xm m (xm ), as depicted above and to the right for m = 3.
By construction, there exist x1 , . . . , x m (xm ) and 1 , . . . , 0 with
P
P

j=1 j xj . Furthermore, for each j = 1, . . . , ,


j=1 j = 1 such that xm =
there exists yj (xm ) with (xj , yj ) = minzZm (z, yj ), and therefore
P
1
(xj , yj ) m
. Let ym = j=1 j yj , and note because has convex values,
ym (xm ). By iterated application of quasi-convexity of , we then have

X
X
1
j yj max{(x1 , y1 ), . . . , (x , y )}
j xj ,
(xm , ym ) =
.
m
j=1
j=1

Letting m go to infinity, compactness of X allows us to pass to a subsequence


(still indexed by m) such that xm x X and ym y X, and closed graph
of implies y (x ). And from the preceding argument, we have y = x ,
which yields x (x ).
Equivalence classes of measurable functions. To see that when p [1, ),
every convex set K Lp (Rn , ) is a metric mixture space, let {fm } be a sequence converging to f in K, let {gm } be a sequence converging to g in K, and
114

let {m } be a sequence converging to in [0, 1]. Then we have


p (m fm + (1 m )gm , f + (1 )g)
Z
 p1
=
|m fm (x) + (1 m )gm (x) f (x) (1 )g(x)|p dx


 p1 
 p1
Z
Z
|m | |fm (x) f (x)|p dx
+ |m | |f (x)|p dx


 p1 
 p1
Z
Z
+ |1 m | |gm (x) g(x)|p dx
+ | m | |g(x)|p dx

0,

where the inequality follows from Minkowskis inequality, and the limit follows
from the facts that |m | 1, and lim p (fm , f ) = lim p (gm , g) = lim |m | =
0. Therefore, K is a metric mixture space. To see that the metric p is quasiconvex (in fact, convex) on K, consider f, g, , K and any (0, 1). Then
p (f + (1 )g, + (1 ))
Z
 p1
p
=
|(f (x) (x)) + (1 )(g(x) (x))| dx

Z

Z
 p1
 p1
+ (1 )
|g(x) (x)|p dx
|f (x) (x)|p dx

= p (f, ) + (1 )p (g, ),

where again the inequality follows from Minkowskis inequality.


Probability measures. To see that, given measurable X Rn , P(X) is a
metric mixture space, let {m } be a sequence converging to in P(X), let {m }
be a sequence converging to in P(X), and let {m } be a sequence converging
to in [0, 1]. Note that for all > 0 and all m such that r (m , ) and
r (m , ) , we have for all measurable Y X and all [0, 1],
m (Y ) + (1 )m (Y )
(Y ) + (1 )(Y )

(Y ) + (1 )(Y ) +
m (Y ) + (1 )m (Y ) + .

Thus, for all > 0, we can choose k high enough that for all m k and all
[0, 1], we have r (m + (1 )m , + (1 )) . Furthermore, for
all > 0 and all m such that |m | , we have for all measurable Y X
and all , P(X),
m (Y ) + (1 m ) (Y )

(Y ) + (1 ) (Y )

(Y ) + (1 ) (Y ) +

m (Y ) + (1 m ) (Y ) + .

Thus, for all > 0, we can choose k high enough that for all m k and all
, P(X), we have r (m + (1 m ), + (1 ) ) . Combining these
115

observations, it follows that for all > 0, we can choose k high enough that for
all m k,
r (m m + (1 m )m , + (1 ))

r (m m + (1 m )m , m + (1 m ))
+r (m + (1 m ), + (1 ))

+
2 2
,

and we conclude that m m + (1 m )m + (1 ). Therefore, P(X)


is a metric mixture space.
To see that the Prohorov metric r is convex, consider any , , , P(X) and
any [0, 1]. Note that for all , > 0, if
(Y ) (Y ) +

and

(Y ) (Y ) +

for all measurable Y X, then


(Y ) + (1 )(Y )

(Y ) + (1 ) (Y ) + + (1 )

for all measurable Y X. An analogous argument yields


(Y ) + (1 ) (Y )

(Y ) + (1 )(Y ) + + (1 ),

and we conclude that r (+(1), +(1) ) r (, )+(1)r (, ),


as required.
Compact subsets. To see that when X Rn is compact and convex, K(X) is a
metric mixture space, let {Ym } be a sequence converging to Y in K(X), let {Zm }
be a sequence converging to Z in K(X), and let {m } be a sequence converging
to in [0, 1]. It must be shown that m Ym + (1 m)Zm Y + (1 )Z. To
check condition (i) in the definition of closed convergence, it suffices to consider
a sequence {xm } with xm m Ym +(1m)Zm for all m and xm x in X. For
all m, there exist ym Ym and zm Zm such that xm = m ym + (1 m )zm .
Since X is compact, we may go to a subsequence (still indexed by m) such that
ym y and zm z in X. It follows that x = y + (1 )z. By condition (i) of
closed convergence, we have y Y and z Z, and therefore x Y + (1 )Z,
as required. To check condition (ii), consider any x Y + (1 )Z, so there
exist y Y and z Z such that x = y + (1 )z. By condition (ii) of closed
convergence, there is a sequence {ym } such that ym Ym for all m and ym y,
and there is a sequence {zm } such that zm Zm for all m and zm z. Then
for all m, xm = m ym + (1 m )zm m Ym + (1 m )Zm and xm x, as
required. Therefore, K(X) is a metric mixture space.
It remains to be shown that the Hausdorff metric on K(X) is quasi-convex.
Consider any Y, Z, S, T K(X) and any [0, 1]. Let A = Y + (1 )S and
116

B = Z + (1 )T . Assume without loss of generality that


h (A, B)

= max
min
||a b || = ||a b||,

a A b B

where a A and b B. Let (y, z, s, t) Y Z S T satisfy a = y + (1


)s and b = z + (1 )t. Suppose in order to deduce a contradiction that
h (A, B) > max{h (Y, Z), h (S, T )}. In particular, h (A, B) > h (Y, Z) and
y Y imply that minz Z ||y z || < h (A, B), so there exists z Z such that
||y z || < h (A, B). Similarly, h (A, B) > h (S, T ) and s S imply there
exists t T with ||s t || < h (A, B). Note that b = z + (1 )t B.
Furthermore, we have
||a b ||

= ||(y z ) + (1 )(s t )||


||y z || + (1 )||s t ||

< ||a b||,

but this contradicts ||a b|| = minb B ||a b ||. Therefore, the Hausdorff metric
is quasi-convex.
Transition probabilities. To see that R(Y, Z, ) equipped with the metric r
is a metric mixture space, let {k } and {k } be sequences converging, respectively, to and in R(Y, Z, ), and let k in [0, 1]. Then we have
r (k k + (1 k )k , + (1 ) )
Z
Z

X

X

1


=
f
(y)(

+
(1

)
)(dy|z)
(dz)
i
k k
k k
2i+j (Bj ) zBj
y
i=1 j=1

Z

Z

fi (y)( + (1 ) )(dy|z) (dz)


zBj

X
i=1 j=1


Z

1
(k )
2i+j (Bj )

zBj

zBj

Z

+ ( k )
+ (1 )
0,


fi (y)k (dy|z) (dz)


fi (y)(k )(dy|z) (dz)

zBj

Z

zBj

Z

Z


(dz)

fi (y)k (dy|z)




fi (y)(k )(dy|z) (dz)

where the limit follows from k , k , and k . To see that the


metric r is convex, consider transition probabilities 1 , 2 , 3 , 4 and [0, 1].

117

Then we have
r (1 + (1 )3 , 2 + (1 )4 )
Z
Z

X

X

1
1
3

=
fi (y)( + (1 ) )(dy|z) (dz)
2i+j (Bj ) zBj
y
i=1 j=1

Z

Z

fi (y)(2 + (1 )4 )(dy|z) (dz)

y
zBj

Z

Z

XX

1
1
2

=

f
(y)(

)(dy|z)
(dz)
i
2i+j (Bj ) zBj
y
i=1 j=1

Z

Z

(1 )
fi (y)(3 4 )(dy|z) (dz)
r

zBj
r 3

y
4

( , ) + (1 ) ( , ),

as required.
Convex hulls of upper hemi-continuous correspondences. Assume X is a
metric space, Y is a metric mixture space, : X Y is upper hemi-continuous,
and (x) = conv((x)) is compact for all x X. Consider any x X and any
open V Y with (x) V . I claim that there exists > 0 such that for all
y (x), we have B (y) V . Indeed, otherwise there are sequences {ym } and
{zm } in Y such that for all m, ym (x) and zm B m1 (ym ) \ V . Since (x) is
compact, there is a convergent subsequence of {ym } (still indexed by m) with
limit y (x). Moreover, (zm , y) (zm , ym ) + (ym , y) 0, so zm y.
But then zm V for sufficiently high m, a contradiction.
Thus, we can choose
S
> 0 as in the claim; in particular, G = {B (y) | y (x)} V . Since
(x) G and is upper hemi-continuous, there is an open set U X with
x UPsuch that for all z U , (z) G. Now consider z U and w (z), so
m
w = j=1 j wj is a convex combination of elements w1 , . . . , wm (z) G.
For each
Pm j = 1, . . . , m, there exists yj (x) such that (wj , yj ) < . Let
y = j=1 j yj (x). By repeated application of quasi-convexity of , we
have

m
m
X
X
j yj max{(w1 , y1 ), . . . , (wm , ym ) < ,
(w, y) =
j wj ,
j=1

j=1

which implies w V , as required.

Convex hulls of lower hemi-continuous correspondences. Assume X is


a metric space, Y is a metric mixture space, and : X Y is lower hemicontinuous, and let (x) = conv((x)). Consider any P
x X and any open
m
V Y with (x) V 6= . Let y (x) V , so y = j=1 j yj is a convex
combination of elements y1 , . . . , ym (x), and let > 0 satisfy B (y) V .
Since is lower hemi-continuous, for each i = 1, . . . , m, there is an open set
118

Ui X such
Tm that x Ui and for all z Ui , we have (z) B (yi ) 6=
Let U = i=1 Ui , and consider
P any z U . Since z Ui , there exists wi
(z) B (yi ). Then w = m
i=1 i wi (z), and by repeated application
quasi-convexity of , we have (y, w) max{(y1 , w1 ), . . . , (ym , wm )} <
This implies w B (y) V , and therefore (z) V 6= , as required.

of
.

Convex hulls of lower measurable correspondences. Assume X is a


separable metric space and the correspondence : X Rm is lower measurable,
and define : X Rm by (x) = conv((x)). Note that by Caratheodorys
theorem, every y (x) can be written as a convex combination of m + 1
P
elements of (x), i.e., y = m+1
i=1 i yi with yi (x), i = 1, . . . , m + 1. Letting
denote the unit simplex in Rm+1 , define : R(m+1)m Rm by
(y1 , . . . , ym+1 , 1 , . . . , m+1 )

m+1
X

i yi ,

i=1

and define m+1 : X R(m+1)m by m+1 (x) = [(x)]m+1 , and note that is
continuous and m+1 is lower measurable. Then (x) = (m+1 (x) ) for
each x X. For every open G Rm+1 , we have
n
o
{x X | (x) G 6= } =
x X | m+1 (x) X ( 1 (G)) 6= ,

where X is the projection mapping onto X. By continuity of , the pre-image


1 (G) is open, and the projection of the open set 1 (G) is open, and then
lower measurability of m+1 implies that {x X | (x)G 6= } is measurable,
as required.
Weak and weak* convergence. By definition, for p and q conjugate with
p [1, ), a net {f } converges weakly to f in Lp (X, ) if for all f Lp (X, ),
all > 0, all n, and all g1 , . . . , gn Lq (X, ), if f U (f , , g1 , . . . , gn ), then
there exists such that for all  , we have x U (f , , g1 , . . . , gn ). This
can be simplified somewhat, because we only need to consider the basic open
sets U (f, , g1 , . . . , gn ) defined at f itself. That is f f weakly if and only if:
for all > 0, all n, and all g1 , . . . , gn Lq (X, ), there exists such
that for all  , we have x U (f, , g1 , . . . , gn ).
The only if direction is immediate. To see the if direction, suppose that
the latter condition holds. Consider any f Lp (X, ), any > 0, any n, and
any g1 , . . . , gn Lq (X, ) such that f U (f , , g1 , . . . , gn ). Let
Z
=
max
(f (x) f (x))gi (x)(dx),
i=1,...,n

and note that < . Let = > 0. I claim that U (f, , g1 , . . R. , gn )


U (f , , g1 , . . . , gn ). Indeed, consider any h U (f, , g1 , . . . , gn ), so that (f (x)
119

h(x))gi (x)(dx) < . Then


Z
(f (x) h(x))gi (x)(dx)
Z
Z
=
(f (x) f (x))gi (x)(dx) + (f (x) h(x))(dx)
< ,

so h U (f , , g1 , . . . , gn ), establishing the claim. By assumption, there exists


such that for all  , we have f U (f, , g1 , . . . , gn ) and, therefore,
f U (f , , g1 , . . . , gn ). We conclude that f f weakly, as required. An
analogous argument holds for weak* convergence.
To see that the weak topology for p [1, ) is Hausdorff if is -finite, consider
any distinct f, f Lp (X, ), so the set {x X | f (x) f (x) 6= 0} has positive
measure. Let Y = {x X | f (x) f (x) > 0}, and without loss of generality
assume (Y ) > 0. It may be that the function |f f |q is not -integrable.
Because is -finite,
there is a countable collection {Zm } of measurable
sets
S
S
such that Rn
Y
=
(Y
Z
m ).
m=1 Zm and for all m, (Zm ) < . Then
m=1
P
Since is countably additive, this implies (Y )
m=1 (Y Zm ), which
implies that Z = Y Zm is a measurable subset of Y with positive, finite
-measure for some m. Define a measurable partition {Z1 , Z2 , . . .} of Z by

Z
Pn= {x X | n 1 < f (x) f (x) n} for n = 1, 2, . . .. Since (Z) =
n=1 (Zn ) > 0, there is some n such that 0 < (Zn ) < . Define g : X R
by g(x) = (f (x) f (x))IZn (x), and note that g takes values strictly above n 1
and bounded by n on Zn , zero elsewhere. Therefore, g Lq (X, ). Now define
Z
Z
1
1
=
(f (x) f (x))2 (dx),
(f (x) f (x))g(x)(dx) =
2
2 Zn
and note that > 0. I claim that U (f, , g, g) and U (f , , g, g) are disjoint.
Indeed, if h belongs to both sets, then we have both
Z
Z
(f (x) h(x))g(x)(dx) < and
(f (x) h(x))g(x)(dx) > ,
R
but this implies (f (x) f (x))g(x)(dx) < 2, a contradiction. Thus, f and
f are separated by disjoint basic open sets, as required. The argument for the
weak* topology is similar.
Compactness of selections. Assume is finite, X Rn is measurable, and
: X R is -integrably bounded and lower measurable with closed, convex
values. Since is -integrably bounded, there exists a -integrable function
g : X R such that for all x X, we have sup |(x)| = sup{|y| | y (x)}
g(x). Consider any > 0, and using the fact that {g} is uniformly -integrable,
n
Rchoose > 0 suchR that for all measurable Y R with (Y ) < , we have
|f (x)|(dx) Y g(x)(dx) < for all f S(). Thus, S() is uniformly
Y
120

-integrable. It is clearly bounded in the L1 -metric. Since has convex values,


it follows that S() is convex, so to show it is weakly closed, it suffices to show
it is closed in the metric topology T 1 . To this end, let {fm } be a sequence
in S() that converges to f in L1 (X, ), i.e., 1 (fm , f ) 0. Then there is a
subsequence (still indexed by m) that converges pointwise -almost everywhere.
Let Y0 X be a measurable set such that (X \ Y0 ) = 0 and for all x Y0 ,
fm (x) f (x). For each natural number m, let Ym X be a measurable
set
T
such that (X \ Ym ) = 0, and for all x Ym , fm (x) (x). Let Y =
Y
m=0 m ,
and note that (X \ Y ) = 0. For each x Y , we have fm (x) f (x); since
fm (x) (x) for all m and (x) is closed, it follows that f (x) (x). Thus,
f S(), and we conclude that S() is closed. Finally, since S() bounded,
weakly closed, and uniformly -integrable, it is weakly compact.

121

Index
C r manifold, 34
C r manifold with boundary, 34
Lp -metric, 66
-integrable
uniformly, 68
-integrable correspondence, 87
-integrable function, 41
-integrably bounded, 87
-measure zero, 29
-measurable, 111
-algebra, 28
-field, 28
-finite measure, 29
r-times continuously partially differentiable function, 26, 32
r-times partially differentiable function,
26

Borel measurable function, 39


Borel set, 27
boundary
in topological space, 91
boundary of set, 13
boundary point, 13
boundary point relative to X, 18
bounded, 7
bounded above, 5
bounded below, 5
bounded function, 7
bounded on Y , 7
bounded set, 6
above, 5
below, 5
in Euclidean space, 15
in metric space, 59
in product of Euclidean space, 21
absolutely continuous, 43
in subset of Euclidean space, 19
absolutely continuous with respect to Browders theorem, 22, 36
, 43
Browers fixed point theorem, 20
absorbing set, 76
accumulation point, 15, 59
Cantor set, 28
acyclic relation, 108
Caratheodory extension theorem, 29
adjoint, 75
Caratheodory function, 55, 86
Alaoglus theorem, 102
Caratheodorys theorem, 9
almost sure convergence, 49
cardinality of the continuum, 8
anti-symmetric relation, 106
Cartesian product, 6, 106
Arzela-Ascoli theorem, 65
Cauchy sequence, 60
asymmetric relation, 108
Cauchy-Schwartz, 10
asymptotic convergence, 14
chain, 106
atom of a measure, 31
chain rule, 23
atomless measure, 31
closed convergence, 70
atomless vector measure, 31
closed graph, 80
Aumann integral, 87
closed half-space, 10
axiom of choice, 8, 106
closed set, 14, 21
in X, 19
base (for topology), 92
in metric space, 58
basis, 11
in relative topology, 18
Berges maximum theorem, 84
in topological space, 91
bijective, 7
relatively, 19
Bohnenblust-Karlin theorem, 98
weak*, 100
Bolzano-Weierstrass theorem, 16
weak* relative to L, 100
122

weakly, 100
weakly relative to L, 100
closed values, 79
closure of set, 14
closure point, 13
coarser topology, 92
codomain, 6
column rank, 12
combination
convex, 9, 61, 95
linear, 9
compact set
Euclidean space, 15
metric space, 59
product topology, 21
relative topology, 19
topological space, 94
weak sequentially, 104
weak*, 100
weak* relative to L, 100
weak* sequentially, 104
weakly, 100
weakly relative to L, 100
compact values, 79
complement, 4
complete measure, 28
complete metric space, 60
complete relation, 108
completeness axiom, 5
composition, 7
concave function, 10
conditional probability of an event, 74
conjugate, 52, 99
connected set, 16, 19, 21, 59, 94
continuous correspondence, 81
continuous correspondence at x, 81
continuous function
Euclidean space, 16
metric space, 60
product topology, 22
relative topology, 20
topological space, 94
continuous function at x, 16
continuous selection, 84
continuous vector-valued function
Euclidean space, 17

metric space, 60
product topology, 22
relative topology, 20
continuously partially differentiable function, 24
continuously twice partially differentiable
function, 25
contour set
strict lower, 7
strict upper, 7
weak lower, 7
weak upper, 7
contraction mapping
Euclidean space, 20
metric space, 61
contraction mapping theorem
Euclidean space, 20
metric space, 61
convergence
asymptotic, 14
in metric space, 59
in product topology, 21, 96
in relative topology, 18
in topological space (net), 94
in topological space (sequence), 93
convergence in L -metric, 49
convergence in Lp -metric, 49
convergence in pth mean, 49
convergence in distribution, 53
convergence in essential supremum metric, 49
convergence in mean, 49
convergence in measure, 49
convergence in probability, 49
convergence in total variation norm, 45
convergence in weak topology, 100
convergence in weak* topology, 101
convergence of functions
almost sure, 49
in L -metric, 49
in Lp -metric, 49
in pth mean, 49
in distribution, 53
in essential supremum metric, 49
in mean, 49
in measure, 49

123

in probability, 49
in product topology, 96
in weak topology, 100
in weak* topology, 101
pointwise, 17, 20
pointwise almost everywhere, 49
uniform, 17, 20
uniform almost everywhere, 49
weak, 52
weak*, 52
weakly order p, 52
convergence of measures
in total variation, 45
set-wise, 45
strong, 45
uniformly set-wise, 45
weak, 45
weak*, 45
convergent net, 94
convergent net (in product topology),
96
convergent sequence, 14, 93
convex, 65, 67
convex combination
Euclidean space, 9
metric mixture space, 61
vector space, 95
convex function, 11
convex hull
Euclidean space, 9
metric mixture space, 61
convex metric, 62
convex set
Euclidean space, 9
metric mixture space, 61
vector space, 95
correspondence, 79
-integrable, 87
-integrably bounded, 87
closed graph, 80
lower hemicontinuous, 80
lower measurable, 85
open graph, 81
open lower sections, 81
upper hemicontinuous, 79, 98
weak* sequentially uhc, 105

weakly measurable, 85
weakly sequentially uhc, 105
countable additivity, 29
countable set, 7
countable sub-additivity, 30
countably infinite set, 8
counting measure, 29
critical point, 35
critical value, 35
cross partial derivative, 25
De Morgans law, 4
decreasing function, 7
decreasing sequence, 14
degenerate on x, 31
dense set, 60
density function, 43
density with respect to , 43
derivative of f at x in direction t, 23
determinant, 13
diagonally dominant matrix, 13
diffeomorphic sets, 34
diffeomorphism, 33
differentiable at x in direction t, 23
differentiable function, 23
dimension of Euclidean space, 6
dimension of linear space, 11
directed set, 93
direction (in Rn ), 10
direction (on arbitrary set), 93
directionally differentiable at x, 23
directionally differentiable function, 23
disc, 13, 68
discrete metric, 58
discrete topology, 92
disjoint, 4
pairwise disjoint, 4
distribution of a function, 39
divergent sequence, 14
Doeblins condition, 75
domain, 6
dominant element, 106, 107
dominated functions, 42, 44
dot product, 9
dual relation, 108

124


Eberlein-Smulian
theorem, 104
Egoroffs theorem, 52
empty set, 4
equicontinuous, 65
ergodic distribution, 76
ergodic set, 76
essentially bounded function, 41
Euclidean metric, 58
Euclidean norm, 9
Euclidean space, 6
expected value, 41
expected value of vector-valued funciton, 44
extended real numbers, 3
extension, 7

concave, 10
continuous, 16, 17, 20, 22, 60, 94
convex, 11
decreasing, 7
diffeomorphism, 33
differentiable, 23
directionally differentiable, 23
distribution of, 39
essentially bounded, 41
increasing, 7
indicator, 7
injective, 6
integrable, 40
jointly measurable, 54
linear, 10, 11
local diffeomorphism, 33
Fatous lemma
measurable, 38, 44, 61
correspondences, 91
partially differentiable, 24
real-valued function, 42
quasi-concave, 11
variable measure, 47
quasi-convex, 11
vector-valued function, 44
strictly concave, 11
Feller property, 71
strictly convex, 11
Filippovs implicit function theorem, 87
strictly decreasing, 7
finite cylinder set, 96
strictly increasing, 7
finite dominance property, 107
strictly quasi-concave, 11
finite intersection property, 16
strictly quasi-convex, 11
finite measure, 29
surjective, 7
finite partition, 4
twice directionally differentiable, 24
finite set, 8
twice partially differentiable, 25
first countable, 94
weak* continuous, 100
fixed point, 20, 61, 84, 95, 98
weak* continuous relative to L, 100
Fubinis theorem, 56, 72
weak* sequentially continuous, 104
full rank, 12
weakly continuous, 100
column, 12
weakly continuous relative to L,
row, 12
100
function, 6
weakly sequentially continuous, 104
C 1 , 24
fundamental theorem of calculus, 40
C 2 , 25
fundamental theorem of linear algebra,
C r , 26, 32
11, 12
-integrable, 41
r-times partially differentiable, 26 generated topology (from base), 92
bijective, 7
Glicksbergs fixed point theorem
Borel measurable, 39
metric space, 84
bounded, 7
topological space, 98
bounded on Y , 7
gradient, 24
Caratheodory, 55, 86
graph of correspondence, 80
125

greatest lower bound, 5

Jensens inequality, 44
jointly measurable function, 54

H
olders inequality, 66
half-open rectangle, 27
half-space
closed, 10
open, 10
Hausdorff metric, 69, 70
Hausdorff topological space, 92
Heine-Borel theorem, 16
Hessian matrix, 25
hyperplane, 10

Kakutanis fixed point theorem, 84


kernel, 11
KKM property, 109
KKM theorem, 17
Kolmogorov-Riesz compactness theorem,
67
Kuratowski-Ryll-Nardzewski selection
theorem, 86

identity function, 6
identity matrix, 12
image, 6
implicit function theorem, 33
increasing function, 7
increasing sequence, 14
index, 4
index set, 4
indicator function, 7
infimum, 5
injective, 6
integers, 3
integrable function, 40
integrable vector-valued function, 44
integrable with respect to , 41
integral of f , 40
integral of f with respect to , 41
integral of correspondence, 87
integral of vector-valued function, 44
interior of set, 13
interior point, 13, 21, 58
intermediate value theorem, 16, 20, 22,
60, 94
intersection, 4
invariant distribution, 75
invariant set, 76
inverse function theorem, 33
inverse mapping, 7
inverse matrix, 13
irreflexive relation, 108
isomorphism, 12
Jacobian matrix, 32

least upper bound, 6


Lebesgue integrable, 40
Lebesgue integral, 40
Lebesgue measurable set, 27
Lebesgue measure, 27, 28
Lebesgue measure zero, 28
Lebesgues dominated convergence theorem
convergence in distribution, 53
fixed measure, 42
set-wise convergence, 47
weak convergence, 48
level set, 7
Levis monotone convergence theorem,
42
liminf, 5
limit infimum, 5
limit of sequence, 14
limit supremum, 5
limsup, 5
linear combination, 9
linear function, 10, 11
linear space, 9
linear span, 9
linear topological space, 95
linearly independent, 11
local diffeomorphism, 33
local maximizer, 26
locally compact, 60
locally convex tvs, 95
lower hemicontinuous correspondence,
80
lower hemicontinuous correspondence
at x, 80

126

lower measurable correspondence, 85


Lusins theorem, 40
Lyapunovs theorem, 31
for correspondences, 88
manifold, 34
manifold with boundary, 34
mapping, 6
marginal measure, 57
Markov kernel, 71
matrix, 12
maximal chain, 106
maximal element, 106, 107
maximizer of f over Y , 7
maximizer of a function, 7
maximum, 6
maximum theorem, 84
mean, 41
mean value theorem, 23
measurable function, 38
to metric space, 61
measurable maximum theorem, 87
measurable partition, 69
measurable selection, 86
measurable set, 27
measurable vector-valued function, 44
measure
product, 55
measure on Rn , 29
metric, 58
Lp -, 66
discrete, 58
Euclidean, 58
Hausdorff, 69
Prohorov, 68
supremum, 58
total variation, 69
metric mixture space, 61
metric space, 58
complete, 60
mixture, 61
separable, 60
totally bounded, 60
metric topology, 93
metrizable, 63
metrizable topology, 93

Michaels selection theorem, 84


minimizer of a function, 7
minimum, 6
Minkowskis inequality, 67
mixture space, 61
modulus of contraction, 20, 61
narrow convergence of transition probabilities, 78
natural numbers, 3
negative definite matrix, 13
negative semi-definite matrix, 13
negatively acyclic relation, 108
net, 93
non-atomic measure, 31
non-singular matrix, 12
nonempty values, 79
nowhere dense, 29
null space, 11
numbers
extended real numbers, 3
integers, 3
natural, 3
rational, 3
real, 3
open ball, 13
open cover, 15
open graph, 81
open half-space, 10
open lower sections, 81
open set
Euclidean space, 14
in X, 18
in metric space, 58
in product topology, 21
in relative topology, 18
in topological space, 91
relative to X, 18
relatively, 18
weak*, 100
weak* relative to L, 100
weakly, 100
weakly relative to L, 100
ordered n-tuple, 6
ordered pair, 6

127

orthogonal, 9
outer Lebesgue measure, 27
outer regular measure, 30
pairwise disjoint, 4
partial derivative of f , 24
partial derivative of f at x, 24
partial order, 106
partially differentiable at x, 24
partially differentiable function, 24
partition, 5
finite, 4
measurable, 69
pointwise a.e. convergence, 49
pointwise convergence
net of functions, 96
sequence of functions, 17, 20
power set, 4
preimage, 6
preimage theorem, 35
probability density function, 43
probability measure, 29
product measure, 55
product metric, 63, 64
product rule, 23
product topology, 96
Prohorov metric, 68
quasi-concave function, 11
quasi-convex function, 11
quasi-convex metric, 61
quasi-measure, 111
Radon-Nikodym theorem, 43
random variable, 39
range, 6
rank, 11, 12
rational numbers, 3
real numbers, 3
rectangle (half-open), 27
reflexive relation, 106
regular conditional probability, 74
regular measure, 30
regular point, 35
regular value, 35
relation, 106

acyclic, 108
anti-symmetric, 106
asymmetric, 108
complete, 108
dominant element, 107
dual, 108
irreflexive, 108
maximal element, 106, 107
negatively acyclic, 108
partial order, 106
reflexive, 106
semi-convex, 108
transitive, 106
undominated element, 108
relative complement, 4
relative topology, 92
relative weak topology, 100
relative weak* topology, 100
restriction, 7
Riesz representation theorem, 99
row rank, 12
Russells paradox, 3
Sards theorem, 35
scalar, 8
Schauders fixed point theorem
metric mixture space, 62
topological space, 96
section of a set, 55
self-supporting set, 76
semi-convex relation, 108
separable metric space, 60
separating hyperplane theorem, 10, 17
sequence, 6
Cauchy, 60
convergent, 14
decreasing, 14
divergent, 14
in Rn , 14
in X, 6
increasing, 14
strictly decreasing, 14
strictly increasing, 14
subsequence of, 15
series, 14
set, 3
128

absorbing, 76
bounded, 15, 19, 21, 59
closed, 21
closed in Euclidean space, 14
compact, 15, 19, 21, 94
compact in metric space, 59
complement, 4
connected, 16, 19, 21, 59, 94
convex, 9, 61, 95
countable, 7
countably infinite, 8
directed, 93
ergodic, 76
finite, 8
intersection, 4
invariant, 76
linearly independent, 11
measurable, 27
measure zero, 28
nowhere dense, 29
open in Euclidean space, 14
open in product topology, 21
power set, 4
relative complement, 4
relatively closed, 18
relatively open, 18
self-supporting, 76
singleton, 8
transient, 76
uncountably infinite, 8
union, 4
set-wise convergence of measures, 45
simplex, 9
singleton, 8
singular matrix, 12
Skorohods theorem, 40
span, 9
sphere, 13
square matrix, 12
stationary distribution, 75
stochastic kernel, 71
Stone-Weierstrass theorem, 65
strict local maximizer, 26
strict lower contour set, 7
strict upper contour set, 7
strictly concave function, 11

strictly convex function, 11


strictly decreasing function, 7
strictly decreasing sequence, 14
strictly diagonally dominant matrix, 13
strictly increasing function, 7
strictly increasing sequence, 14
strictly quasi-concave function, 11
strictly quasi-convex function, 11
strong convergence of measures, 45
stronger topology, 92
subcover, 15
subnet, 93
subsequence, 15
sum rule, 23
support of measure, 31
supremum, 5
supremum metric, 58
surjective, 7
symmetric matrix, 12
system of conditional probabilities, 74
tangent space to manifold, 34
theorem of the maximum
measurable version, 87
Tietze extension theorem, 40
tight measure, 30
tight set of measures, 46
Tonellis theorem, 56, 73
topological liminf, 15, 59
topological limsup, 15, 59
topological space, 91
topological vector space, 95
topology, 91
discrete, 92
metric, 93
metrizable, 93
product, 96
relative, 92
relative weak, 100
relative weak*, 100
stronger, 92
trivial, 92
weak, 99
weak*, 99
weaker, 92
total variation metric, 69
129

totally bounded metric space, 60


trajectory of function, 32
transient set, 76
transition probabilities
narrow convergence, 78
weak convergence, 78
weak-strong convergence, 78
transition probability, 71
transitive relation, 106
transpose, 12
transversality theorem, 36
trivial topology, 92
twice directionally differentiable function, 24
twice partially differentiable function,
25
Tychonoffs product theorem
finite-dimensional, 21
metric space, 63
topological space, 96
uncountably infinite set, 8
undominated element, 108
uniform almost everywhere convergence,
49
uniform convergence, 17, 20
uniform set-wise convergence of measures, 45
uniformly -integrable, 44, 68
uniformly -finite collection of measures,
72
uniformly bounded, 47, 48
union, 4
unit coordinate vector, 8
unit simplex, 9
upper bound, 106
upper hemicontinuous correspondence,
98
metric space, 79
topological space, 98
upper hemicontinuous correspondence
at x, 79
upper section of relation, 107
vector, 8
vector measure, 31

vector space, 95
weak convergence of functions, 52, 100
weak convergence of measures, 45
weak convergence of transition probabilities, 78
weak lower contour set, 7
weak order p convergence of functions,
52
weak topology, 99
weak upper contour set, 7
weak* closed, 100
weak* closed relative to L, 100
weak* compact, 100
weak* compact relative to L, 100
weak* continuous function, 100
weak* continuous function relative to
L, 100
weak* convergence of functions, 52, 101
weak* convergence of measures, 45
weak* open, 100
weak* open relative to L, 100
weak* sequentially compact set, 104
weak* sequentially continuous function,
104
weak* sequentially upper hemi-continuous
correspondence, 105
weak* topology, 99
weak-strong convergence of transition
probabilities, 78
weaker topology, 92
weakly closed, 100
weakly closed relative to L, 100
weakly compact, 100
weakly compact relative to L, 100
weakly continuous function, 100
weakly continuous function relative to
L, 100
weakly measurable correspondence, 85
weakly open, 100
weakly open relative to L, 100
weakly sequentially compact set, 104
weakly sequentially continuous function,
104
weakly sequentially upper hemi-continuous
correspondence, 105

130

Weierstrass theorem, 16, 20, 22, 60, 94


correspondences, 80
well-ordering, 106
well-ordering axiom, 5
Young measure, 71
Youngs theorem, 25
zero vector
Euclidean space, 8
topological vector space, 95
Zorns lemma, 106

131

You might also like