Professional Documents
Culture Documents
A.I Overview
A.I.l Definitions
In many introductory statistics and probability courses, one encounters discrete
and continuous random variables and vectors. These are all special cases of a
more general type of random quantity that we will study in this text. Before we
can introduce the more general type of random quantity, we need to generalize
the sums and integrals that figure so prominently in the distributions of discrete
and continuous random variables and vectors. The generalization is through the
concept of a measure (to be defined shortly), which is a way of assigning numerical
values to the "sizes" of sets.
Example A.1. Let S be a nonempty set, and let A ~ S. Define I-£(A) to be
the number of elements of A. Then I-£(S) > 0, 1-£(0) = 0, and if Al n A2 = 0,
I-£(AI U A2) = I-£(Ad + I-£(A2). Note that I-£(A) = 00 is possible if S has infinitely
many elements. The measure 1-£ described here is called counting measure on S.
Example A.2. Let A be an interval of real numbers. If A is bounded, let I-£(A)
be the length of A. If A is unbounded, let I-£(A) = 00. It is easy to see that
p.(IR) = 00, I 1-£(0) = 0, and if Al n A2 = 0 and Al U A2 is an interval, then
J.&(A 1 U A2) = J.&(AI) + J.&(A2). The measure J.& described here is called Lebesgue
measure.
Example A.S. Let f : m. -+ m.+ be a continuous function. 2 Define, for each
interval A, J.&(A) = fA f(x)dx. Then J.&(m.) > 0, J.&(0) = 0, and if Al n A2 = 0 and
Al U A2 is an interval, then J.&(AI U A2) = J.&(A 1 ) + J.&(A2).
Since measure will be used to give sizes to sets, the domain of a measure will
be a collection of sets. In general, we cannot assign sizes to all sets, but we need
enough sets so that we can take unions and complements. A collection of sets
that is closed under taking complements and finite unions is called a field. A field
that is closed under taking countable unions is called au-field.
Example A.4. Let S be any set. Let A = is, 0}. This u-field is called the trivial
u-field. As a second example, let A C S. Let A = is, A, AC , 0}. Let B be another
subset of B, and let A = {B,A, B,Ac,Bc,A n B,A nBc, .. .}. Such examples
grow rapidly. The largest u-field is the collection of all subsets of S, called the
power set of S and denoted 28 .
Example A.S. One field of subsets of m. is the collection of all unions of finitely
many disjoint intervals (unbounded intervals are allowed). This collection is not
a u-field, however.
Infinite measures are difficult to deal with unless they behave like finite mea-
sures in certain important ways. If there exists a countable partition of the set S
such that each element of the partition has finite p. measure, then we say that p.
is IT-finite. When an abstract measure is mentioned in this text, it will generally
be safe to assume that it is IT-finite unless the contrary is clear from context.
A.1.3 Integration
The integral of a function with respect to a measure is a way to generalize the
Riemann integral. The interested readers should be able to convince themselves
that the integral as defined here is an extension of the Riemann integral. That
is, if the Riemann integral of a function over a closed and bounded interval
exists, then so does the integral as defined here, and the two are equal. We
define the integral in stages. We start with nonnegative simple functions. If f is
a nonnegative simple function represented as f(s) = 2::=1
adA, (s), with the ai
distinct and the Ai mutually disjoint, then the integral 0/ f with respect to I-' is
f f(s)dl-'(s) =2::=1 ail-'(Ai). If 0 times 00
occurs in such a sum, the result is 0
by convention. The integral of a nonnegative simple function is allowed to be 00.
For general nonnegative measurable functions, we define the integral of f with
f f
respect to I-' as f(s)dl-'(s) = SUPg:5/,g simple g(s)dl-'(s). For general functions f,
let j+(s) = max{f(s),O} and r(s) = - min{f(s),O} (the positive and negative
parts of f, respectively). Then f(s) = j+(s) - r(s). The integral of f with
respect to p, is
J f(s)dl-'(s) = J
f+(s)dl-'(s) - J r(s)dl-'(s),
if at least one of the two integrals on the right is finite. If both are infinite, the
integral is undefined. We say that f is integrable if the integral of / is defined and
is finite. The integral is defined above in terms of its values at all points in S.
Sometimes we wish to consider only a subset of A ~ S. The integral of f over A
with respect to I-' is
i f(s)dl-'(s) = JIA(S)f(s)dl-'(s).
smaller functions have smaller integrals, and that two integrable functions that
have the same integral over every set are equal almost everywhere. Another useful
property, given in Theorem A.54, is that a nonnegative integrable function leads
to a new measure /I by means of the equation II(A) = fA f(s)dl-'(s).
The most important theorems concern the interchange of limits with integra-
tion. Let Un}~=1 be a sequence of measurable functions such that fn(x) ---+ f(x)
a.e. [1-'1. The monotone convergence theorem A.52 says that if the fn are nonneg-
ative and fn(x) ~ f(x) a.e. [1-'1, then
(A.7)
The dominated convergence theorem A.57 says that if there exists an integrable
function 9 such that Ifn(x)1 :::; g(x), a.e. [I-'], then (A.7) holds.
Part 1 of Theorem A.38 says that measurable functions into each of two mea-
surable spaces combine into a jointly measurable function. Measures and inte-
gration can also be extended from several spaces into the product space. For
example, suppose that I-'i is a measure on the space (8i , Ai) for i = 1,2. To de-
fine a measure on (81 x 82,A 1 ®A2), we can proceed as follows. For each product
set A = Al X A 2, define 1-'1 x 1-'2(A) = 1-'1 (Al)1-'2(A2). The Caratheodory exten-
sion theorem A.22 allows us to extend this definition to all of the product space.
Lebesgue measure on lR?, denoted dxdy, is such a product measure. Not every
measure on a product space is a product measure. Product probability measures
will correspond to independent random variables.
Extending integration to product spaces proceeds through two famous theo-
rems. Tonelli's theorem A.69 says that a nonnegative function f satisfies
/ [ / f(X,Y)dI-'2(Y)] dl-'l(X),
Fubini's theorem A.70 says that the same equations hold if f is integrable with
respect to 1-'1 x 1-'2. These results also extend to finite product spaces 81 x· .. X 8 n •
• Let 1-'1,1-'2, ... be a collection of measures on the same space (8,A). Let
a1,a2, ... be a collection of positive numbers. Then I-' = EAlI i ail-'i is a
measure and I-'i « I-' for all i.
The last example above is important because it tells us that for every countable
collection of measures, there is a single measure such that all measures in the
collection are absolutely continuous with respect to it.
The Ra.don-Nikodym theorem A.74 says that the first part of Example A.8 is
the most general form of absolute continuity with respect to u-finite measures.
That is, if 1-'1 is u-finite and 1-'2 « 1-'1, then there exists an extended real-valued
measurable function f such that 1-'2(A) = fA f(x)dl-'l(X). In addition, if 9 is
J
1-'2 integrable, then g(X)dI-'2(X) = f g(x)f(x)dl-'l(X). The function f is called
the Radon-Nikodym derivative of 1-'2 with respect to 1-'1 and is usually denoted
(dI-'2/dI-'1)(S).
A similar theorem, A.8i, relates integrals with respect to measures on two
different spaces. It says that a function f : 8 1 -+ 82 induces a measure on the
range 8 2 • If 1-'1 is a measure on 81, then define 1-'2(A) = 1-'1 (J-1 (A»). Integrals
with respect to 1-'2 can be written as integrals with respect to 1-'1 in the following
way: f g(y)dI-'2(Y) = f g(f(X»dl-'l(X). The measure 1-'2 is called the measure
induced on 82 by f from 1-'1.
A.2 Measures
A measure is a way of assigning numerical values to the "sizes" of sets. The
collection of sets whose sizes are given by a measure is a u-field. (See Examples AA
and A.5 on page 571.)
Definition A.9. A nonempty collection of subsets A of a set 8 is called a field
if
• A E A implies4 AC E A,
• AI, A2 E A implies Al U A2 E A.
A field A is called a u-field if {An}~=1 E A implies U~1Ai E A.
Proposition A.IO. Let N be an arbitrary set of indices, and let y = {Aa : Q E
N} be an arbitrary collection of u-fields of subsets of a set 8. Then naENAa is
also a u-field of subsets of 8.
Because of Proposition A.l0 and the fact that 28 is a u-field, it is easy to
see that, for every collection of subsets C of 8, there is a smallest u-field A that
contains C, namely the intersection of all u-fields that contain C.
Definition A.n. Let C be the collection of intervals in JR. The smallest u-field
containing C is called the Borel u-field. In general, if 8 is a topological space, and
B is the smallest u-field that contains all of the open sets, then B is called the
Borel u-field.
In addition to the Borel u-field, the product u-field is also generated by a simple
collection of sets.
Definition A.l2 .
• Let ~ be an index set, and let {8a }aEN be a collection of sets. Define
8 = flaEN 8 a . We call 8 a product space .
• For each a E ~, let Aa be a u-field of subsets of 8a . Define the product
u-field as follows. ®aENAa is the smallest u-field that contains all sets of
the form flaEN Aa, where Aa E Aa for all a and all but finitely many Aa
are equal to 8 a •
In the special case in which ~ = {I, 2}, we use the notation 8 = 8 1 X 82 and the
product u-field is denoted A1 ® A2.
Proposition A.l3. 5 The Borel u-field 8 k of IRk is the same as the product u-
field of k copies of (IR, 8 1 ).
There are other types of collections of sets that are related to u-fields. Some-
times it is easier to prove results about these other collections and then use the
theorems that follow to infer similar results about u-fields.
Definition A.l4. Let 8 be a set. A collection II of subsets of 8 is called a 11'-
system if A, B E II implies A n B E II. A collection A is called a >.-system if
8 E A, A E A implies AO E A, and {An}~=1 E A with Ai n Aj = 0 for i =f. j
implies U~1 Ai E A.
As in Proposition A.1O, the intersection of arbitrarily many lr-systems is a
lr-system, and so too with A-systems. The following propositions are also easy to
prove.
Proposition A.l5. If 8 is a set and C is a collection of subsets of 8 such that
C is a lr-system and a A-system, then C is au-field.
Proposition A.16. If 8 is a set and A is a A-system of subsets, then A, AnB E A
implies A n BO E A.
k, p,(Ak) p,(B;)
k-oo
i=l i=l i=l
Al = k~"!' Ak U (Q Bi) ,
and all of the sets on the right-hand side are disjoint. It follows that
k-l
Ak Al \ UB;,
;=1
00
lim JL(A k )
k~oo
JL(At} - 2:JL(B;) = JL ( lim A k ). 0
k~oo
;=1
llThis theorem is used in the proofs of Lemma A.72 and Theorems B.90
and 1.61. There is a second Borel-Cantelli lemma, which involves probability
measures, but we will not use'it in this text. See Problem 20 on page 663. The
set whose measure is the subject of this theorem is sometimes called An infinitely
often because it is the set of points that are in infinitely many of the An.
12This theorem is used to prove the existence of many common measures
(including product measure) and in the proofs of Lemma A.24 and of Theo-
rems B.1l8, B.131, and B.133.
A.2. Measures 579
real-valued, and countably additive and satisfies J.t(0) = o. Then there is a unique
extension of J.t to a measure on a measure space13 (S, A, J.t *). (That is, C ~ A
and J.t(A) = J.t*(A) for all A E C.)
PROOF. The proof will proceed as follows. First, we will define J.t* and A. Then we
will show that J.t* is monotone and subadditive, that C ~ A, that A is a o--field,
that J.L* is countably additive on A, that J.t* extends J.t, and finally that J.t* is the
unique extension.
For each B E 2 8 , define
00
where the inf is taken over all {Ai}~l such that B ~ U~lAi and Ai E C for all
i. Let
First, we show that J.t* is monotone and subadditive. Clearly, J.t*(A) :::; J.t(A)
for all A E C and Bl ~ B2 implies J.t*(Bl) :::; J.t*(B2). It is also easy to see that
J.t*(Bl U B2) 'S J.t*(Bl) + J.t*(B2) for all B 1 ,B2 E 28. In fact, if {Bn}~l E 28 ,
then J.t*(U~lBi) :::; L::l J.t*(Bi). The proof is to notice that the collection of
numbers whose inf is J.t* of the union includes all of the sums of the numbers
whose infima are the J.t* values being added together.
Next, we show that C ~ A. Let A E C and C E 28. Since J.t* is subadditive, we
only need to show that J.t*(C) ~ J.t*(C n A) + J.t*(C n A C ). If J.t*(C) = 00, this is
clearly true. So let J.t*(C) < 00. From the definition of J.t*, for every f > 0, there
exists a collection {Ai}~l of elements of Csuch that E:I J.t(Ai) < J.L*(C) + f.
Since J.t(Ai) = J.t(Ai n A) + J.t(Ai n A C) for every i, we have
00 00
i=l i=l
~ J.t*(CnA)+J.t*(CnA c ).
Since this is true for every f > 0, it must be that J.t*(C) ~ J.t*(CnA)+J.t*(CnA c ),
hence A E A.
Next, we show that A is a o--field. It is clear that 0 E A and A E A implies
ACE A by the symmetry in the definition of A. Let AI, A2 E A and C E 2 8 . We
can write
where the first two equalities follow from AI, A2 E A, and the last follows from
the subadditivity of J.t*. So, Al U A2 EA. Let {An}~=l E A; then we can write
l3The usual statement of this theorem includes the additional claim that the
measure space (S, A, J.t*) is complete. A measure space is complete if every subset
of every set with measure 0 is in the o--field.
580 Appendix A. Measure and Integration Theory
A = U~IAi = U~IBi' where each Bi E A and the Bi are disjoint. (This just
makes use of complements and finite unions of elements of A being in A.) Let
Dn = Uf=IBi and C E 25. Since A C S;;; D;{ and Dn E A for each n, we have
LIL*(C n B i ) + IL"(C n A C ).
;=1
intervals of the form (a, b] with a = -00 and/or b = 00 possible.1 4 The collection
C of all unions of finitely many disjoint intervals of this form is easily seen to be
a field. If (aI, bd, ... , (an, bn] are mutually disjoint, set
It is not hard to see that this extension of /-L to C is well defined. This means that if
Uf=l (ai, b;] = U~l (Ci' di], where (Cl' dd, ... , (c m , dm ] are also mutually disjoint,
then E~=l JL«ai,bij) = E : l /-L«Ci,dij). If JL is finite for every interval, then it is
u-finite. To see that /-L is countably additive on C, suppose that /-L«a, bJ) = F(b)-
F(a), where F is nondecreasing and continuous from the right. If {(an' bn]}~=l is
a sequence of disjoint intervals and (a, b] is an interval such that U~=l (an, bn] ~
2:::
(a,b], then it is not difficult to see that E::"=lJL«an,bn]) ~ /-L«a,b]). If (a,b] ~
U~=l(an,bn], we can also prove that 1 /-L«an,bn]) ~ JL«a,bj) (see Problem 7
on page 603). Together these facts will imply that JL is countably additive on C.
The proof of Theorem A.22 leads us to the following useful result. Its proof is
adapted from Halmos (1950).
Lemma A.24. l5 Let (S, A, /-L) be a u-finite measure space. Suppose that C is a
field such that A is the smallest u-field containing C. Then, for every A E A and
f > 0, there is C E C such that JL(C~A) < f. 16
PROOF. Clearly, /-L and C satisfy the conditions of Theorem A.22, so that /-L is equal
to the p,* in the proof of that theorem. Let A E A and f > 0 be given. It follows
from (A.23) that there exists a sequence {Ai}~l in C such that A ~ U~lAi and
00
l4If b = 00, we mean (a, 00) by (a, b]. That is, we do not intend 00 to be a
point in the space S.
l5This lemma is used in the proof of the Kolmogorov zero-one law B.68.
16The symbol ~ here refers to the symmetric difference operator on pairs of
sets. We define C~A to be (C n A C ) u (CC n A).
582 Appendix A. Measure and Integration Theory
Similarly,
Since I-'j (Uf=l [C; n AD can be written as a linear combination of values of I-'j
at sets of the form A n C, where C E IT is the intersection of finitely many of
C1, ... , Cn, it follows from A E gc that I-'duf=dC; n A]) = 1-'2 (Uf=l[C; n AJ)
for all n, hence 1-'1(A) = 1-'2(A). 0
17This theorem is used in the proofs of Theorems B.32, B,46, B.118, B.l3l,
and 1.115, Lemma A.64, and Corollary B,44.
A.3. Measurable Functions 583
Definition A.27. Suppose that S is a set with a a-field A of subsets, and let T
be another set with a a-field C of subsets. Suppose that I : S -+ T is a function.
We say I is measurable if for every B E C, l-l(B} E A. If I is measurable,
one-to-one, and onto and 1-1 is measurable, we say that I is bimeasurable. If
T = 1R, the real numbers, and C = B, the Borel a-field, then if I is measurable,
we say that I is Borel measurable.
Proposition A.2S. Suppose that (S, A) and (T, C) are measurable spaces. Sup-
pose that I : S -+ T is a lunction.
• II A = 2s , then I is measurable.
• IIC = {T,0}, then I is measurable.
• II A = {S, 0}, {y} E C lor every yET, and / is measurable, then / is
constant.
As examples, if S = T = 1R and A = B is the Borel a-field, then all continuous
functions are measurable. But many discontinuous functions are also measurable.
For example, step functions are measurable. All monotone functions are measur-
able. In fact, it is very difficult to describe a nonmeasurable function without
using some heavy mathematics.
The following theorems make it easier to show that a function is measurable.
Theorem A.29. l8 Let N, S, and T be arbitrary sets. Let {A, : a E N} be a
collection 0/ subsets 01 T, and let A be an arbitrary subset 0/ T. Let / : S -+ T
be a function. Then
rl (U nEN
An) U
nEN
rl(An},
r l (n ",EN
An) n
",EN
rl(An},
rl(AC} rl(A)C.
PROOF. For the union, if s E l-l(U",ENA",}, then 1(8} E UnENAn , hence there
exists a such that 1(8} E An, so s E f-l(A n } and s E UnEN/-l(A n ). If 8 E
UnEN/-l(An ), then there exists a such that 8 E l-l(An), hence /(s) E An,
hence f(s) E UnENAn , hence 8 E l-l(U",ENA",). This proves the first equality.
The second is almost identical in that "there exists a" is merely replaced by "for
all a" in the above proof. For the complement, if 8 E l-l(A C ), then 1(8) E A C
and 1(8) ~ A. Hence, 8 ~ rl(A) and s E rl(A)c. If s E rl(A)c, then
8 ~ rl(A) and 1(8} ~ A. So, f(8) E A C and 8 E rl(AC). 0
Definition A.31. The u-field rl(C) in Corollary A.30 is called the u-field gen-
emted by f.
A measurable function also generates a u-field of subsets of its image.
Proposition A.32. Let (T, C) be a measumble space. Let U ~ T be arbitmry
(possibly not even in C). Define C. = {U n B : B E C}. Then C. is a u-field of
subsets of U.
r l
(QAi) = Qrl(Ai) E A,
20This theorem is used in the proofs of Lemma A.35, Proposition A.36, Corol-
lary A.37, Theorems A.38, B.75, and B.133, and to prove that stochastic processes
are measurable.
2lThis lemma is used in the proofs of Theorems A.38 and A.74.
22This proposition is used in the proof of Theorem A.38.
A.3. Measurable Functions 585
Another example of the use of Theorem A.34 is the proof that all continu
ous
functions are measurable. The result follows because the Borel a-field
is the
smallest a-field containing open sets.
Coroll ary A.37. Let (5, A) and (T, B) be topological spaces with
their Borel
a-fields. If f : 5 - t T is continuous, then f is mea.surable.
Here are some properties of measurable functions that will prove useful.
Theore m A.38. Let (5, A) be a measurable space.
1. Let N be an index set, and let {(Tn,C",)}"'EN be a collectio
n of measurable
spaces. For each a E N, let j", ; 5 --+ T", be a function . Define
f ; 5 --t
I1aEN Tn by f(s) = {f",(S)}",EN. Then f is measurable (with respect to
the
product a-field) if and only if each fOt is measurable.
2. If (V, Cd and (U,C2) are measurable spaces and f ; 5 --> V and 9
; V -t U
are measurable, then gU) ; 5 --> U is measurable.
3. Let f and 9 be measurable function s from 5 to IR n , and let a be
a constan t
scalar and let b E IR n be constant. Then the following function s
are also
measurable: f+g and af +b. Ifn = 1, then f·g and JIg are also measura
ble,
where f / 9 can be set equal to an arbitrary constan t when 9 = o.
4. If, for each n, fn is a measurable, extended real-valued function, then
sUPn fn, infn fn, limsuPn jn, and liminfn fn are all measurable.
5. Let (T, C) be a metric space with Borel a-field. If /k ; 5 --> T is a
measurable
function for each k = 1,2, ... and limk~oo fk(s) = f(s) for all s,
then f is
measurable.
6. Let (T, C) be a metric space with Barela -field, and let J.t be a
measure on
(5, A). If /k ; 5 --> T is a measurable function for each k = 1,2, ...
and
limk_oo fk(s) exists a.e. [ILl, then there is a measurable j ; 5 -->
T such
that limk~oo /k(s) = f(s), a.e. [ILl·
PROOF. (1) Suppose that j is measurable. To show
that jOt is measurable, let
Bo. E Co. and let B/3 = T/3 for f3 # a. Set C = I1;EN B/3, which
is in the
product a-field, because all but finitely many B/3 equal the entire space
T/3. Then
/;;1 (Bo.) = rI(C). Since f is measurable, rI(C) E A. Now, suppose that each
fa is measurable, and let B = flOiEN B OI , with Be. E COl for all a and
all but finitely
many BOt (say B Ot11 ... , B Otn ) equal to T",. Then f-1(B) = n~d;;/(
BOI;) E A.
Since the sets of the form B generate the product a-field, r1(B)
E A for all B
in the product a-field according to Theorem A.34.
(2) Let A E C2 • We need to prove that g(f)-l( A) E A. First, note
that
g(l)-l = f-1(g-1 ). Since g is measurable, g-l(A) E C . Since
1 I is measurable,
r1(g-1 (A)) E A. So g(f)-l( A) E A.
(3) The arithme tic parts of the theorem are all similar. They all follow
from
parts 2 and 1. For example, hex, y) = x + y is a measurable function
from IR?
to JR, so h(f, g) = I + 9 is measurable. For the quotient, a little
more care is
needed. Let hex, y) = x/y when y # 0 and let it be an arbitrar y constan
t when
y = O. Then h is measurable since {(x, y) ; y = O} is in 8 2 • It follows that
h(f, g)
is measurable.
(4) Let f = SUPn In. Then, for each finite b, {s ; f(s) :s b} = n~=l {s ;
fn(s) :s b} E A. Also {s ; f(s) = -oo} = n~=l{S : fn(s) = -oo}
E A, and
586 Appendix A. Measure and Integration Theory
{s : f(s) = oo} = n bl U~=I {s : fn(s) > i} EA. Similar arguments work for
inL Since limsuPnfn = infksuPn>kfn and liminfnfn = sUPkinfn2:kfn, these
are also measurable. -
(5) Let d be the metric in T. For each closed set C E C, and each m, let
Cm = {t: d(t, C) < l/m}. For each closed C, define
00 00 00
(A.39)
m=l n=l k=n
It is easy to see that A.(C) E A is the set of all s such that limn_oo fn(s) E
C. Obviously, f-I(C) consists of those s such that limn_oo fn(s) E C. Hence,
r I (C) = A. (C) E A, and Proposition A.36 says that f is measurable.
(6) Let G = {s : limk_oo fk(S) does not exist}, and let G ~ C with j.t(C) = O.
Let t E T, and define f(s) = t for sEC and f(s) = limk_oo /k(s) for s E CC.
Apply part 5 to the restrictions of the functions {fk}~1 to CC to conclude that
f restricted to CC (call the restriction g) is measurable. If A E C, f-I(A) =
g-I(A) E A if t ft A and rl(A) = g-I(A) U C E A if tEA. So f is measurable.
o
Part 6 is particularly useful in that it allows us to treat the limit of a sequence
of measurable functions as a measurable function even if the limit only exists
almost everywhere. This is only useful, however, if we can show that functions
that are equal almost everywhere have similar properties.
Many theorems about measurable functions are proven first for a special class of
measurable functions called simple functions and then extended to all measurable
functions using some limit theorems.
Definition A.40. A measurable function f is called simple if it assumes only
finitely many distinct values.
A simple function is often expressed in terms of its values. Let f be a simple
function taking values in IRn for some n. Suppose that {al, ... , ak} are the dis-
tinct values assumed by f, and let Ai = rl({ai}). Then f(s) = 2:~=1 adA; (s).
The most fundamental limit theorem is the following.
Theorem A.41. If f is a nonnegative measurable function, then there exists a
sequence of simple functions {Ji}bl such that for all s E S, /i(s) i f(s).
PROOF. For k = 1, ... ,i2 i , let Ak,i = {s : (k _1)/2 i :$ f(s) < k/2 i }. Define
AO,i = {s: f(s) ~ i}. Then AO,i,AI,i,." , A i2 ;,i are disjoint and their union is S.
Define
f '(s) = { k;l if s E Ak,i for k > 0,
• i if s E AO,i.
It is clear that Ii(s) :$ f(s) for all i and s, and each fi is a simple function. Since,
for k > 0, Ak,i = A2k-l,i+l U A 2 k,HI, and AO,i = AO,HI U Ai2i+l+l,i+l U··· U
< fHl(S)
A (Hl)2''+1',.+1, it is easy to see that fi(s) - .
for all i and all s. It is also
-i
easy to see that for each s, there exists n such that for l ~ n, If(s) - f;(s)1 :$ 2 .
, 0
Hence Ii(s) i f(s). . .
The following theorem will be very useful throughout the study of statistics. It
says that one function 9 is a function of another f if and only if 9 is measurable
with respect to the O'-field generated by f·
A.4. Integrat ion 587
AA Integration
The integral of a function with respect to a measure is a way to generali
ze the
notion of weighted average. We define the integral in stages. We start
with non-
negative simple functions.
Definit ion A.44. Let f be a nonnegative simple function represen
ted as f(8) =
L::"'I adA,(s) , with the ai distinct and the Ai mutuall y disjoint. Then,
the in-
f
tegral of f with respect to 1-£ is f(s)dl-£(8) = 2::=1
ail-£(Ai). If 0 times 00
occurs
in such a sum, the result is 0 by convention.
The integral of a nonnegative simple function is allowed to be
that the formula for the integral of a nonnegative simple function is
00.
It turns out
more general
than in Definition A.44.
588 Appendix A. Measure and Integration Theory
J J J
to 1-£ is
f(s)dl-£(s) = t+(s)dl-£(s) - r(s)dl-£(s),
if at least one of the two integrals on the right is finite. If both are infinite, the
integral is undefined. We say that f is integrable if the integral of f is defined
and is finite.
The integral is defined above in terms of its values at all points in S. Sometimes
we wish to consider only a subset of S.
Definition A.4S. If A ~ Sand f is measurable, the integral of f over A with
i J
respect to 1-£ is
f(s)dl-£(s) = IA(s)f(s)dl-£(s).
J af(s)dJL(s) = a f f(s)dJL(s).
3. If f and 9 are integrable with respect to 1', and f :::; g, a.e. lI'), then
The proofs of the next few theorems are essentially borrowed from
Royden
(1968).
Theore m A.50 (Fatou 's lemma} .24 Let {fn}~=l be a sequence of
nonnegative
measurable functions. Then
J f(s)d/-L(s) = sup
simple <p :s; f
J ¢(s)dJ.L(s) ,
Since this is clearly true if ¢(s) = 0, a.s. [/-L], we will assume that /-L(A)
> 0, where
A = {s : ¢(s) > O}. Let ¢ ~ f be simple, let € > 0, and let 6 and
M be the
smallest and largest positive values that ¢ assumes. For each n, define
Since (1 - €)¢(s) < f(s) for all sEA, U~=lAn = A and An ~ An+!
for all n.
Let Bn = AnA::; .
f fn(s)dJ.L(s) 2:: (
JAn
f",(s)dJ.L(s) 2:: (1 - €) (
JAn
¢(s)dJ.L(s). (A.51)
If /-L(Bn) = 00 for n = no, then J.L(A) = 00 and ¢(s)dJ.L(s) = 00, f since ¢ takes
on only finitely many different values. The rightmost integral in (A.51)
is at least
DJ.L(An), which goes to 00 as n increases, hence lim inf n _ oo J fn(s)dJ.L
(s) = 00 and
the result is true. So, assume p,(Bn) < 00 for all n. Since n~=lBn =
0, it follows
from Theorem A.19 that lim n _ oo J.L(Bn) = 0. So, there exists N such
that n > N
implies J.L(Bn) < €. Since
i
-
J
If 4J(s)dp.(s) = 00, the result is true again. If J4J(s)dp.(s) = K < 00, then for
every n ~ N,
! !
hence
l~~:f In (s)dp.(s) ~ 4J(s)dp.(s) - €[(l- €)M + K].
Since this is true for every € > 0,
liminf
n ..... oo
j In (s)dp.(s) ~ j4J(S)dP.(S)'
o
Theorem A.52 (Monotone convergence theorem). Let {fn}~=l be a se-
quence 01 measurable nonnegative functions, and let I be a measurable function
such that In(x) :5 I(x) a.e. [p.] and In(x) -+ I(x) a.e. lP-]. Then,
Rearranging the terms in the first and last expressions gives the desired result. If
both I and 9 have infinite integral of the same sign, then it follows easily using
A.4. Integration 591
Proposition A.49, that 1+ 9 has infinite integral of the same sign. Finally, if only
one of I and 9 has infinite integral, it also follows easily from Proposition A.49
that I + 9 has infinite integral of the same sign. 0
A nonnegative function can be used to create a new measure.
Theorem A.54. Let (8, A, IL) be a measure space, and let I : 8 --+ IR be non-
negative and measurable. Then v(A) = fA l(s)dIL(s) is a measure on (8, A).
PROOF. Clearly, v is nonnegative and v(0) = 0, since l(s)10(S) = 0, a.e. [IL].
Let {An}~=1 be disjoint. For eachn, define gn(S) = l(s)IAn (s) and In(s) =
1::=1 gi(S). Define A = U~=IAn. Then 0 $ In $ I1A, a.e. [IL] and In converges
to I1A, a.e. [IL]. So, the monotone convergence theorem A.52 says that
lim j In(s)dIL(S)
n .... oo
= v(A). (A.55)
Also, V(Ai) =f gi(s)dIL(S), for each i. It follows from Theorem A.53 that
(A.56)
Take the limit as n --+ 00 of the second and last terms in (A.56) and compare to
(A.55) to see that v is countably additive. 0
$ liminfjln(x)dlL(X).
n .... oo
592 Appendix A. Measure and Integration Theory
PROOF. Let ft, f;;, f+, and f- be the positive and negative parts of fn and
f. We will prove that the result holds for nonnegative functions and take the
difference to get the general result. Let ~ > 0 and let c be large enough so that
sUPn J{x:fn(xl>c} fn(x)dJL(x) < ~. The functions
( ) _ {fn(X) if fn(x) $ C,
gn X - C if !n(x) > c
~ li~~~p J fn(x)dJL(x) - ~,
where the second line follows from the dominated convergence theorem A.57 and
J
the third from our choice of c. Since this is true for every f, we have f(x)dp,{x) ;::
J
lim sup fn(x)dp,{x). Combining this with Fatou's lemma A.50 gives
So, B", E A2. Let C be the collection of all sets B ~ 8 1 X 82 such that B", E A2.
If BE C, then (BO)", = {y: (x,y) f/. B} = (B",)o, so BO E C. Let {Bn}~=1 E C.
Then it is easy to see that
(91 Bn) '" = {Y: (x,y) E 91 Bn} = 91{Y: (x,y) E Bn} = 91(Bn)", E c.
(A.62)
Clearly, 81 x 82 E C, so C is a IT-field containing all product sets; hence it contains
Al®A2. Next, let fB(X) = P,2(B",) for BE A 1®A2. Write 81 X82 = U~=IEn with
En = A 1n X A 2n and P,i(Am) < 00 for all n and i = 1,2 and with the En disjoint.
Then let fB,n = P,2«B n En)",). It follows that fB = E:=1 fB,n. If we can show
that fB,n is measurable for each n, then so is fB, since they are nonnegative, and
the sum is well defined. If B = Bl X B 2, then fB,n(X) = IAlnnBl (x)p,2{A2n nB2),
which is a measurable function. Let 'D be the collection of all sets D ~ 81 X 82
28This lemma is used in the proofs of Lemmas A.64 and A.67 and Theo-
rems A.69 and B.46.
594 Appendix A. Measure and Integration Theory
LfDm,n(X),
m=l
00
where the first equality follows from the definition of VI, the fact that Jl.2 is
countably additive, and (A.62); the second equality follows from the monotone
convergence theorem A.52 and the fact that 2:::'=1
Jl.2«Bn)",) :::; 2:::'=1
Jl.2«Bn )",)
for all m; and last equality follows from the definition of VI. This proves that VI
(and so too V2) is a measure. Note that if B = Al X A2, then
and such that each one has finite III = measure. By Theorem A.26,
112 III agrees
with 112 on all of Al @A2'
0
Definit ion A.65. Let (8i' Ai, J1.i) for i = 1,2 be (7-finite measure
spaces. Define
the product measure J1.1 x J1.2 on (81 x 82, Al @ A2) as the common
value of the
two measure s III and 112 in Lemma A.64.
Lebesgue measure on lR.2, denoted dxdy, is a product measure
. Not every
measure on a product space is a product measure. Produc t probabi
lity measure s
will corresp ond to indepen dent random variables. (See Theorem 8.66.)
Propos ition A.66. Let J1. be a measure on a product space (8 1 X
82, Al @ A2)'
Then J1. is a product measure if and only if there exist set functions
J.ti : Ai ->
1R for i = 1,2 such that, for every Al E Al and A2 E A2, J1.(AI
x A2} =
J1.dAl)J1.2(A2).
Lemm a A.67. 30 Let f be a measurable function from 8 1 x 8 2 to
m such that
either {x E 81 : Jlf(x,y)ldJ1.2(Y) = oo} ~ A E AI, where III (A) =
0, or f 2 O.
Then, there is a measurable (possibly extended real-valued) function
9 : 81 ->
mu{±o o} such thatg(x ) = Jf(x,y) dIl 2(Y), a.e. [Ill]' Iff is the indicat
orofa
measurable set B, then
by (A.68). Since 0 S J J
fn(x, y)dIL2(Y) ::; f(x, y)dJj2(y) for all x and n, and
J J
lim n _ oo fn(x, y)dIL2(Y) = f(x, y)dIL2(Y) as in the proof of Lemma A.67, it
follows from the monotone convergence theorem A.52 that
/ [ / f(X,y)dJj2(X,y)] dlLl(X).
The proof that the iterated integrals can be calculated in the other order is
similar. 0
For every f > 0, there is 6. such that JLl(A) < 6. implies JL2(A) < f. (A.73)
This is a contradiction. 0
The following theorem says that the first part of Example A.8 on page 574 is
the most general form of absolute continuity with respect to o--finite measures.
The proof is mostly borrowed from Royden (1968).
Theorem A.74 (Radon-Nikodym theorem). Let ILl and JL2 be measures on
(8, A) such that 1L2 «JLl and 1-'1 is o--finite. Then there exists an extended real-
valued measumble junction f : S --+ [0,00] such that for every A E A,
(A.75)
The function f is called the Radon-Nikodym derivative of /-l2 with respect to /-ll
and it is unique a.e. [/-llj. The Radon-Nikodym derivative is sometimes denoted
(d/-l2/d/-l 1) (s). II/-l2 is (I-finite, then I is finite a.e. [/-ld·
PROOF. First, we prove uniqueness a.e. [/-llj. Suppose that such an f exists.
Let 9 be another function such that I and 9 are not a.e. [/-llj equal. Let An =
{x : I(x) > g(x) + lin} and Bn = {x : I(x) < g(x) - lin}. Since I and
9 are not equal a.e. [/-ld, then there exists n such that either /-ll(An) > 0 or
/-ll(Bn) > o. Let A be a subset of either An or Bn with finite positive measure.
Then fA f(x)d/-l1(x) '" fA g(x)d/-l1(x). Hence 9 '" d/-l2/d/-ll.
The proof of existence proceeds as follows. First, we show that we can reduce
to the case in which /-ll is finite. Then, we create a collection of signed measures
Va indexed by a real number Ct. For each Ct we find a set A a such that every
subset of Aa has positive Va measure and every subset of the complement B a
has negative Va measure. We then show that B{J ~ B a for (3 ~ Ct, which allows
us to define I(x) = sup{Ct : X E Ba}. Finally, we show that I satisfies (A.75) and
(A.76).
Now, we prove that we need only consider finite /-ll. Since /-ll is u-finite, let
{An}~=l be disjoint elements of A such that /-ll(Ai) < 00 and S = U~lAi. Let
/-lj,i be /-lj restricted to Ai for j = 1,2 and each i. Then /-l2,i « /-ll,i for each i and
each /-ll,i is finite. Suppose that for each i we can find Ii as in the theorem with
/-lj replaced by /-lj,i for j = 1,2. Then f(x) = "E:l IA; (X)fi(X) is the function
required by the theorem as stated. Hence, we prove the theorem only for the case
in which /-ll is finite.
Suppose that /-ll is finite, and define the signed measure Va = Ct/-ll - /-l2 for
each nonnegative rational number Ct. (Note that va(A) never equals 00, although
it may equal -00.) For each Ct, define
That is, A" is the supremum of the signed measures of sets all of whose subsets
have nonnegative signed measure. 32 Since 0 EPa, A" ~ o. Let {An}~=l be
such that A" = limi_oo v,,(A i ), and let A a = U~lAi. Since every subset of AO
can be written as a union of subsets of the Ai, it follows that A a EPa, hence
Aa ~ va(Aa). Since A" \ Ai <; A a , it follows that va(A a \ Ai) ~ 0 for all i a~d
l/a(A") = l/a(A a \ A;) + l/a(A;) 2 va(Ai) for all i. It follows that Aa ~ va(A ).
Hence Aa = v,,(A a ) < 00. Define B a = (Aa)c.
Next, we prove that every subset of B a has nonpositive measure. 33 If not, let
B <; B a such that va(B) > o. If B has no subsets with negative signed measure,
32The sets in Pa are often called the positive sets relative to the signed measure
Va·
33S uch sets are called negative sets relative to the signed measure Va·
A.6. Absolute Continuity 599
then BuA'" E P'" and v",(A"'UB) > >''''' a contradiction. So, let n1 be the smallest
positive integer such that there is a subset B1 ~ B with v",(Bd < -l/n1. For
each k > 1, let nk be the smallest positive integer such that there exists a subset
Bk ~ B \ u7,;;} Bi with V",(Bk) < -link. Now, let C = B \ Uk"=lBk. Clearly
v"'(C) > O. If we prove that C has no subsets with negative signed measure,
then C E P'" and we have another contradiction. So, suppose that D ~ C has
v",(D) = -€ < O. Since v",(B) > 0, it must be that 2:~=1 V",(Bk) > -00.
Hence limk_oo nk = 00. So, there is k such that I/(nk+1 - 1) < €. Notice that
D ~ C ~ B\U~=lBk. Since v",(D) < -l/(nk+l-I), this contradicts the definition
of nk+1.
If (3 > 0, we have
Subtract the first inequality from the second to get (f3 - O)J.!l (A'" n B/3) $ 0,
from which it follows that J.!l(A'" n B/3) = O. Since v/3(A) ;::: v",(A) for f3;::: 0, we
can assume that A'" ~ A/3 if f3 ;::: o. It follows that B/3 ~ B'" for f3 ;::: 0, and we
can define f(x) = sup{o : x E B"'}. Since B O = S, f(x) ~ 0 for all x. It is easy
to see that f(x) ;::: 0 if x E B'" and f(x) $ 0 if x E A"'. It is also easy to see that
{x: f(x) ;::: b} = U"'~bB"'. Since this is a countable union of measurable sets, it
is measurable. By Lemma A.35, f is measurable.
Next, we prove that (A.75) holds for every A E A. Let A E A be arbitrary
and let € > 0 be given. Let N > J.!l(A)/€ be a positive integer. Define Ek =
An Bk/N n A(k+l)/N and Eoo = A \ Uk::1Ak/N. Then A = Uk::1Ek U Eoo and
the E j are all disjoint. So J.!2(A) = J.!2(Eoo ) + 2:;:'0 J.!2(Ek). By construction
f(x) E [kiN, (k + l)/N] for all x E Ek and f(x) = 00 for all x E Eoo. Since
Vk/N(Ek) $ 0 and V(k+1)/N(Ek) ~ 0, we have, for finite k,
1J.!2(Ek) -le k
f(X)dJ.!l(X)1 $ ~J.!l(Ek)' (A.77)
If J.!l(Eoo ) > 0, then J.!2{Eoo ) = 00 since v",(Eoo ) < 0 for all o. If J.!l(Eoo ) = 0,
then J.!2(Eoo) = 0 by absolute continuity. Either way, J.!2{Eoo ) = JEoo
f(x)dJ.!l{X).
Adding this into the sum of (A. 77) over all finite k gives
{ x E Co: dQo
d>' (x) = 0} ~ UOO
{
x E Cn : dQn
d>' (x) = 0} •
n=l
which implies that Co E V and >'(Co) = c.
Since Qo E Q, we now need only prove that v « Qo for all v E N to finish
the proof. Suppose that Qo(A) = 0 and v E N. We must prove v(A) O. Since =
Qo(A nCo) = 0 and dQo/d>.(x) > 0 for all x E Co, it follows that >'(A nCo) = 0
and hence v(AnCo) = O. Let C = {x: dv/d>.(x) > O}. Then, v(Anccfnc c ) = 0
since dv/d>.(x) = 0 for x E C c . Let D = Anccf nc, which is disjoint from Co. If
>.(D) > 0, then >'(Co U D) > >'(Co) and D E V. It follows easily that Co U DE V
and >'(Co U D) > >'(Co) contradicts >'(Co) = c. Hence >.(D) = 0 and v(D) = 0,
which implies veAl = v(A nCo) + v(A n ccf n CC) + v(D) = o. 0
There is a chain rule for Radon-Nikodym derivatives.
Theorem A.79 (Chain rule).35 Let v and." be u-finite measures and suppose
that 1.£ « v «1/. Then
1
follows from (A.76) that
J-I(A) = 1 dl.£
-(s)dv(s)
Adv
== dl.£ (8)-d
-d
A v
dv (s)d.,,(s).
."
34This theorem is used in the proofs of Lemmas 2.15 and 2.24. It appears as
Theorem 2 in Appendix 3 of Lehmann (1986) and is attributed to Halmos and
Savage (1949).
35This theorem is used in the proof of Lemma 2.15.
A.6. Absolute Continu ity 601
J J
integrable, then
g(y)dIl2(Y) = g(f(x»d j)J(x), (A.82)
That (A.82) is true for all nonnegative simple functions follows by adding
the far
ends of this equatio n (multiplied by positive constan ts). The monoto
ne conver-
gence theorem A.52 allows us to extend the equality to all nonnegative
integrable
functions. By subtrac tion, we can extend to all integrab le function
s. 0
Definit ion A.S3. The measure {t2 in Theorem A.8! is called the measure
induced
on (82 , A2) by f from j)J.
If the measure ttl in Theorem A.81 is not finite, and the function
f is not
one-to-one, the measure {t2 may not be very interesting.
Examp le A.S4. Let 81 = lR?, 8 2 = IR, j)l equal Lebesgue measure
on lR?, and
f(x, y) = x. Let the two u-fields be Borelu-fields. The measure {t2
that f induces
on (82, A2) from ttl is the following. If A E A2 and the Lebesgue measure
of A is
0, then {t2(A) = O. Otherwise, j)2(A) = 00. Althoug h j)2 is absolute
ly continuous
with respect to Lebesgue measure, it is not IT-finite. The only function
s 9 that
are integrable with respect to {t2 are those that are almost everywh
ere O.
If j)l is IT-finite, there is a way to avoid the problem in Exampl e
A.84 by making
use of the following result.
Theore m A.85. 36 A measure {t on a space (S, A) is u-finite if and only
if there
exists an integrable function f : 8 -+ IR such that f > 0, a.e. [ttl.
PROOF. For the "if" part, let f be as in the statement of the theorem. Let 0 <
f f(s)dJ.L(s) = c < 00. Let An = {s : lin ~ f(s) < l/(n - I)}, for n = 1,2, ....
We see that Al = {s: f(s) ~ I} and 8 = U~=lAn. We can write
Example A.S6 (Continuation of Example A.84; see page 601). Let hex, y) =
exp( _[x 2 + y2)/2). It is known that h is integrable with respect to J.Ll and h
is everywhere strictly positive. Let J.L~(C) = Ie h(x,y)dJ.L 1(x, y). Then J.L~ «J.Ll
and J.LI « J.Li· The measure J.L~ induced on (8 2 ,A2 ) from J.Li by f(x,y} = x
is J.L~(B) = J21T IB exp( _x 212)dx. A function 9 : 82 -+ IR is integrable with
respect to J.L; if and only if exp( _x 2/2)g(x) is integrable with respect to Lebesgue
measure.
A.7 Problems
Section A.2:
1. Let 8 be a set and let A be the collection of all subsets of 8 that either
are countable or have countable complement. Prove that A is au-field.
2. Prove Proposition A.lO on page 575.
A.7. Problem s 603
3. Prove Proposi tion A.13 on page 576. (Hint: First, show that every
open
ball in IRk is the union of countably many open rectangles. Then
prove
that the smalles t a-field containing open balls must be the same
as the
smallest a-field containing open rectangles.)
4. Prove that B+ defined on page 571 is a a-field of subsets of the
extende d
real numbers.
5. Prove Proposi tion A.15 on page 576.
6. Prove Proposi tion A.16 on page 576.
7. *Let F : lR -+ lR be a nondecreasing function that is continuous
from the
right. For each interval (a, b], define p.«a, b]) = F(b) - F(a).
(a) Suppose that {(an, bn]}~l is a sequence of disjoint intervals
such
that U~=l (an, bnJ ~ (a, bJ. Prove that I:::"=lp.«an, bnl) S; p.«a, bl).
(Hint: Prove it for finite collections and take a limit.)
(b) Suppose that {(an, bn]};';"=l is a sequence of disjoint intervals
such
that (a,b} ~ U;';"=l(an,bn j. Prove that L:::"=lp,«an,bn ]) ~ p,«a,b]).
(Hint: First, prove it for finite collections by induction. For
infinite
collections, let p.( (a, b]) > E > O. Cover a compac t interval [a + 6,
b}
with finitely many open intervals (an, bn + 6n ) such that lp.«a, bJ)
-
p.«a + 6, b])1 < E/2 and IL::=1 L::=1
p.«an, bnl) - p.«an, bn +6n ])1 <
E/2. This can be done by using continuity from the right.)
(c) Prove that p. is countably additive on the smallest field contain
ing
intervals of the form (a,b]. (Hint: Deal separat ely with finite and
semi-infinite intervals)
8. A measure space (8,A,p.) is complete if A ~ B E A and p,(B) =
0 implies
A E A. Let (8,C,p.) be a measure space, and let '0 = {D : 3A,C
E
C with D~A ~ C and p.(C) = O}. For each D E '0, define p.·(D) =
p.(A)
where D~A ~ C and p.(C) = O. Show that p.. is well defined and
that
(8, V, p..) is a complete measure space.
Section A.3:
Section A..4:
(b) Let € > O. Prove that there exists a simple function 9 such that for
J J
all measures J,I satisfying J,I(8) = 1, I f(x)dJ,l(x) - g(x)dJ,l(x) I < f.
20. Prove the following alternative type of monotone convergence theorem:
Let {In}~=1 be a sequence of integrable functions such that fn(x) con-
J
verges monotonically to f(x) a.e. [J,I). Then f(x)dJ,l(x) is defined and
J f(x)dJ,l(x) = limn _ oo f fn(x)dJ,l(x). (Hint: Use the dominated conver-
gence theorem A.57 on the positive parts of fn and the monotone con-
vergence theorem A.52 on the negative parts, or vice versa, depending on
whether the convergence is from above or below.)
21. Let (8, A, J,I) be a measure space, let {gn}~=1 be a sequence of integrable
functions that converges a.e. fI.t.), and let 9 be another integrable function.
Suppose that for all C E A,
lim
n-+oo}c
r
gn(s)dJ,l(s) = 1
C
g(s)dJ,l(s).
8ection A.5:
Section A.6:
Probability Theory
B.1 Overview
B.l.l Mathematical Probability
The measure theoretic definition of probability is that a measure space (8, A, JL)
is called a probability space and JL is called a probability if 1-'(8) = 1. Each element
of A is called an event. A measurable function X from 8 to some other space
(X, 8) is called a random quantity. The most popular type of random quantity
is a random variable, which occurs when X is m. with the Borel u-field. The
probability measure JLx induced on (X, 8) by X from JL is called the distribution
ofX.
Example B.l. Let 8 = X = m. with Borel u-field. Let f be a nonnegative
function such that f f(x)dx = 1. Define JL(A) = fA f(x)dx and Xes) = s. Then
X is a continuous random variable with density f, and JLX = JL. If we let /I denote
Lebesgue measure, then JLX «/I with dJLx/d/l = f.
1
Example B.2. Let 8 = m. with Borel u-field. Let X = {X 1 ,X2 , ••• a countable
set. Let f be a nonnegative function defined on X such that Ei=l f(x;) = 1.
Define JL(A) = E{i:Z,EA} f(Xi). Then X is a discrete random variable with prob-
ability mass function f, and JLX = 1-'. If we let /I denote counting measure on X,
then JL« /I with dJL/d/l = f.
B.l. Overview 607
In both of these examples, we will say that f is the density of X with respect
to v.
When there is one probability space (8, A, IJ.) from which all other probabilities
are induced by way of random quantities, then the probability in that one space
will be denoted Pro So, for example, if J-tx is the distribution of a random quantity
X and if BE B, then Pr(X E B) = J-t(X- 1 (B» = J-tx(B).
The expected value or mean or expectation of a random variable X is defined
(and denoted) as E(X) = J xdJ-tx(x), if the integral exists, where J-tx is the
distribution of X. If X is a vector of random variables (called a random vector),
then E(X) will stand for the vector with coordinates equal to the means of the
coordinates of X.
The (in)famous law of the unconscious statistician, B.12, is very useful for
calculating means of functions of random quantities. It says that E[f(X)] =
J f(x)dJ-tx(x). For example, the variance of a random variable X with mean c is
Var(X) = E([X - CJ2), which can be calculated as J(x - c)2dJ-tx(x). The covari-
ance between two random variables X and Y with means Cx and cy, respectively,
is Cov(X, Y) = E([X - cxllY - cyJ).
B.1.2 Conditioning
We begin with a heuristic derivation of the important concepts using the special
case of discrete random quantities. Afterwards, we define the important terms in
a more rigorous way.
Consider the case of two random quantities X and Y, each of which assumes at
most countably many distinct values, X E X = {Xl, ... } and Y E Y = {Yl, ... }.
Let Pij = Pr(X = Xi, Y = Vi). Then
00
= Yi) = L f(Xi)Pilj.
00
E(f(X)IY
i=1
From the conditional distribution, we could define a measure on (X, 2x) by
J-tx!y(AIYi) =L Pi!i·
""EA
608 Append ix B. Probabi lity Theory
fa L L L f(Xi)PiIiP.j
00
in general.
will be used as the propert y that defines conditio nal expecta tion
the definitio n of conditio nal expecta tion, we will define conditio nal prob-
Throug h
ability and conditio nal distribu tions in general.
mean and
Theorem B.21 says that, in general, if a random variable X has finite
A, then a function 9 : S -+ 1R exists which is measura ble
if C is a sub-a-field of
with respect to the a-field C and such that
random
This is the general version of what we worked out above for discrete
s in which C was the u-field generat ed by Y. We will use the symbol
variable
that E(XIC)
E(XIC) to stand for the function g. The two importa nt features
with respect to the u-field C and that it satisfies
possesses are that it is measura ble
that equals E(XIC) a.s. [/-L] will also satisfy (B.3), so there
(B.3). Any function
many function s that satisfy the definitio n of conditio nal expecta tion. All
may be
When we say
such functions are called versions of the conditio nal expecta tion.
E(XIC), we will mean that it is a version of E(XIC).
that a random variable equals
we can set B = S in (B.3) and the equatio n become s E(X) =
Notice that
generali zation
E[E(XIC)]. This result is called the law oftotal probability. A useful
is given in Theorem B.70.
symbol
lf C is the a-field generat ed by another random quantit y Y, then the
of E(XIC). For the case in which C is the a-field
E(XIY) is usually used instead
ed by Y, some special notation is introduc ed. We saw in Theorem A.42
generat
by Y if and
that a function is measura ble with respect to the a-field generat ed
B.l. Overview 609
IJ.(A) = jA
_1_ exp {_3(s~
V37r 3
+ s~ - SIS2)} dSl d S2.
Suppose that Xes) = SI and Y(s) = S2 when s = (SI' S2). Now E(IX!)
00.
y'2[ir <
We claim that g(s) == s2/2 and h(t) == t/2 satisfy the conditions required
=
to be
E(XIY) (s) and E(XIY == t), respectively. First, note that t~e u-field
by Y is Ay == {1R xC: C is Borel measurable}, and IJ.Y IS the measure ed genera~
wIth
density exp(-t2 /2)/.;z. ;r. It is clear that any measurable function
of S2 alone is
Ay measurable. Let B =
1R. x C, so that E(XIB) equals
[: L ~7r
SI exp {-~(s~ + s~ - SI S 2)} ds2dsl
= /1
e
00
-00
SI V2 exp {_3 (SI- !S2)2} -1-exp
v'31r 3 v'2ir
2
{-!s~}dSldS2
2
= r
le 2
!s2_1_ exp
V'f;ir
{-!s~} dS2
L[: ~S2:'
2
where u 2 = u~ +u~ +2{XTIU2. The fair (X, Y) does not have a joint density with
respect to Lebesgue measure on IR , but it does have a joint density with respect
to the measure /.I on IR3 defined as follows. For each A ~ IR\ let A' = {(XI,X2) :
(XI,X2,X1 +X2) E A}. Let /.I(A) = A2(A'), where Ak is Lebesgue meas.ure on IRk
for k = 1,2. Then fx,y(x, y) = fx(x) is the joint density of (X, Y) With respect
to /.I, and
/x(x) = _1_ ex p ( __
fy (y) V21ru*
I_
2u*2
(XI -I'I _[u~ + {XT1U2](~ U
-1'1 - 1-'2»)2) ,
lSee Problem 25 on page 664. If X::::: IRk, the same idea can be used.
That
is, Xn E. X if and only if the joint. CDFs Fn of Xn converge to
the joint CDF
F of X at all points at which F is continuous. Since we will not need
to use this
charact erizatio n, we will not prove it.
612 Appendix B. Probability Theory
3This theorem is used in making sense of the notation Ell when introdu
parame tric models. cing
614 Appendix B. Probability Theory
p. = f xdF(x) ~ 1
[c,oo)
xdF(x) ~ cl
[c,oo)
dF(x) = cPr(X ~ c).
Divide the extreme parts by c to get the result. 0
The following well-known inequality follows trivially from the Markov inequal-
ity B.15.
Corollary B.I6 (Tchebychev's inequality).5 Suppose that X is a mndom
variable with finite variance (12 and finite mean p.. Then, for all c > 0,
(12
Pr(IX - p.1 ~ c) ~ 2'
c
Another well-known inequality involves convex functions. 6 The proof of this
theorem resembles the proofs in Ferguson (1967) and Berger (1985).
Theorem B.IT (Jensen's inequality).1 Letg be a convex function defined on
a convex subset X of IRk and suppose that Pr(X E X) = 1. If E(X) is finite,
then E(X) E X and g(E(X» $ E(g(X».
PROOF. First, we prove that E(X) E X by induction on the dimension of X.
Without loss of generality, we can assume that E(X) = 0, since we can subtract
E(X) from X and from every element of X, and E(X) E X if and only if 0 E
X - E(X). If k = 0, then X = {O} and E(X) = O. Suppose that 0 E X for all
X with dimension strictly less than m ~ k. Now suppose that X and X have
dimension m and 0 f/. X. Since X and {O} are disjoint convex sets, the separating
hyperplane theorem C.5 says that there is a nonzero vector v and a constant c
such that, for every x E X, V T X $ c and 0 ~ c. 8 If we let Y = v T X, then we
have Pr(Y ~ c) = 1 and E(Y) = 0 ~ c. It follows that Pr(Y = c) = 1 and c = O.
Hence, X lies in the (m - I)-dimensional convex set Z = X n {x : v T x = O}. It
follows that 0 E Z C X.
Next, we prove the inequality by induction on k. For k = 0, E(g(X» =
g(E(X», since X is degenerate. Suppose that the inequality holds for all di-
mensions up to m - 1 < k. Let X have dimension m. Define the subset of JRm+l,
X' = {(x,z): x E X,z E JR, and g(x) ~ z}.
Since ag(xl) + (1- a)g(x2) ~ g(y) and w ~ ag(xI) + (1- a)g(x2), it follows that
(y, w) E X', so X' is convex. It is also clear that (E(X), g{E(X))) is a boundary
point of X'. The supporting hyperplane theorem C.4 says that there is a vector
v = (v""v .. ) such that, for all (x,z) E X', v;I x + v .. z ~ v;IE(X) + v .. g(E(X».
Since (x, Zl) E X' implies (x, Z2) E X' for all Z2 > ZI, it cannot be that v.. < 0,
since then lim.. _ oo v;I x + V .. Z = -00, a contradiction. Since (x, g(x» E X' for all
x E X, it follows that v;I X + v .. g(X) ~ v;IE(X) + v .. g(E(X», from which we
conclude
v .. g(E(X» :5 v~ [X - E(X)] + v .. g(X). (B.18)
Taking expectations of both sides of this gives v .. g(E(X» :5 v .. g(X). If v .. > 0, the
proof is complete. If v .. = 0, then (B.18) becomes 0:5 v T [X -E(X)] which implies
v T[X - E(X)] = 0 with probability 1. Hence X lies in an (m - l)-dimensional
space, and the induction hypothesis finishes the proof. 0
The famous Cauchy-Schwarz inequality for vectors 9 has a probabilistic version.
Theorem B.lO (Cauchy-Schwarz inequality).l0 Let Xl and X2 be two ran-
dom vectors of the same dimension such that E(IIXiIl2) is finite for i = 1,2. Then
(B.20)
PROOF. Let Z = 1 if XlX2 ~ 0 and Z = -1 if XlX2 < O. Let Y = IIXI +
cZX2112, where c = -y'EIIXl Il2/EIIX211 2. Then Y ~ 0 and Z2 = 1. So
B.3 Conditioning
B.3.1 Conditional Expectations
Section B.1.2 contains a heuristic derivation of the important concepts in condi-
tioning using the special case of discrete random quantities. We now turn to a
more general presentation.
Theorem B.21. 11 Let (8, A, 1') be a probability space, and suppose that X: 8 ~
1R is a measurable function with E(IXI) < 00. Let C be a sub-u-field of A. Then
there exists a C measurable function 9 : 8 ~ 1R which satisfies
PROOF. Use Theorem A.54 to construct two measures 11+ and p,- on (8,C):
It is clear that p,+ « p, and p,_ « p,. The Radon-Nikodym theorem A.74 tells us
that there are C measurable functions g+ and g_ such that
Each such function is called a version of the conditional mean. If Y : 8 --+ Y and
C is the sub-a-field generated by Y, then E(XIC) is also called the conditional
mean of X given Y, denoted E(XIY). If, in addition, the a-field of subsets of Y
contains singletons, let h : Y --+ ill. be the function such that 9 h(Y). Then =
h(t) is denoted by E(XIY t). =
When we say that a random variable equals E(XIY), we will mean that it is
a version of E(XIY). The following propositions are immediate consequences of
the above definitions.
Proposition B.24. Let (8, A, p,) be a probability space, and let (y, C) be a mea-
surable space such that C contains singletons. Let X : 8 --+ IR and Y : 8 --+ Y
be measurable. Let P,y be the measure on Y induced from p, by Y. A func-
tion 9 : Y --+ IR is a version of E(XIY = t) if and only if for all B E C,
fB g(t)dp,y(t) = E(XIB(Y»'
B.3. Conditioning 617
Proposition B.25 .
• If Z and W are both versions ofE(XIC), then Z = W, a.s .
• If X is C measurable, then E(X\C) = X, a.s.
Proposition B.26. IfC = {8,0}, the trivial u-fi~ld, then E(XIC) = E(X).
Proposition B.28Y Let (8, A, IL) be a probability space and let X : 8 ..... JR,
Y : 8 ..... (y,B1), and Z: 8 ..... (2,B 2 ) be measurable functions. Let lLy and ILZ
be the measures induced on Y and 2 by Y and Z, respectively, from p,. Suppose
that E(\XI) < 00 and that Z is a one-to-one function of Y, that is, there exists
a bimeasurable h : Y ..... 2 such that Z = h(Y). Then E(X\Y = y) = E(X\Z =
h(y», a.s. IlLY].
Conditional probability is the special case of conditional expectation in which
X=IA.
Definition B.29. Let (8, A, p,) be a probability space. For each A E A, the
conditional probability of A given C (or given Y if C is the u-field generated by
Y) is Pr(A\C) = E(IAIC). IfPr(·\C)(s) is a probability on (8, A) for aIls E 8, then
the conditional probabilities given C are called regular conditional probabilities.
It turns out that under very general conditions (see Theorem B.32), we can
choose the functions Pr(AIC) in such a way that they are regular conditional
probabilities. In the future, we will assume that this is done in all such cases.
If C is the u-field generated by Y, then Pr(AIY = y) will be used to stand for
E(IA\Y = y) as in the discussion following Corollary B.22.
If X : 8 --t X is a random quantity, its conditional distribution is the collec-
tion of conditional probabilities on X induced from the restriction of conditional
probabilities on 8 to the u-field generated by X.
Definition B.30. Let (8,A,IL) be a probability space and let (X, B) be a mea-
surable space. Suppose that X : 8 ..... X is a measurable function. Let P be the
probability on (X, B) induced by X from IL. Let C be a sub-u-field of A. For
each B E B, let P(BIC) = Pr(A\C), where A = X- 1 (B). We say that any set of
functions from 8 to [0,1] of the form
We can use Problem 3 on page 662 once again to prove that p.(Nq ) 0 for all q, =
=
hence p.( N) = O. Similarly, we can show that p.( L) 0, where L is the set
{
s:
r
lim
-+ -00
Pr(Z ~ rIC)(s) =J: o} U{s:
r
lim
-+ 00
Pr(Z ~ rIC)(s) =J: I}.
r rational r rational
B.3. Conditioning 619
We would like to prove that all Polish spaces are Borel spaces. First, we prove
that !Roo is a Borel space (Lemma B.36). Then we prove that there exist bimeasur-
able maps between Polish spaces and measurable subsets of !Roo (Lemma BAO).
The following simple proposition pieces these results together.
Proposition B .35. If X is a Borel space and there exists a bimeasurable function
f : Y --> X, then Y is a Borel space.
Lemma B.36. The infinite product space IRoo is a Borel space.
PROOF. The idea of the prooe 4 is the following. We start by transforming each
coordinate to the interval (0,1) using a continuous function with continuous
inverse. For each number in (0,1) we find a base 2 expansion, which is a sequence
of Os and Is. We then take these sequences (one for each coordinate) and merge
them into a single sequence, which we then interpret as the base 2 expansion of
a number in (0,1). If this sequence of transformations is bimeasurable, we have
our function <p.
Let 'I/J : lRoo --> (0,1)00 be defined by
I if 2Yi-l(X) ~ 1,
{
o if not,
Yj(x) = 2Yj-l(X) - Zj(x).
For each j, Zj is a measurable function. It is easy to see that Zj (x) is the jth digit
in a base 2 expansion of x with infinitely many Os. Note also that Yj(x) E [0,1)
for all j and x.
Create the following triangular array of integers:
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
Let the jth integer from the top of the ith column be l(i,j). Then
Clearly, each integer t appears once and only once as £( i, j) for some i and j .15
Define
(8.37)
Then h is clearly a measurable function from (0, 1)00 to a subset R of (0, 1). There
is a countable subset of (0, 1) which is not in the image of h. These are the numbers
with only finitely many Os in one or more of the subsequences {£(i,j)}~l of their
base 2 expansion for i = 1,2, .... For example, the number c = E:o
2- i (Hl)/2-l
is not in R.16 Since the complement of a countable set is measurable, the set R
is measurable.
We define 4> = h( t/J). If we can show that h has a measurable inverse, the proof
is complete. For each x E R, define
Combining (B.37), (B.38), (8.39), and the fact that every integer appears once
and only once as lei, j) for some i and j, we see that h(4)l (x), 4>2 (X), ... ) = x, so
that (4)1,4>2, ... ) is the inverse of h and it is measurable. 0
Lemma B.40. If (X,8) is a Polish space with the Borel (J-field and metric d,
then it is a Borel spaceP
PROOF. All we need to prove is that there exists a bimeasurable f : X -> G, where
G is a measurable subset of ]Roo. We then use Lemma B.36 and Proposition B.35.
Let {Xn}~=l be a countable dense subset of X, and let d be the metric on X.
Define the function f : X -> IR 00 by
l5It is easy to check the following. For each integer t, let k = inf{ n : t :s
n(n + 1)/2}. Then ret) = 1 + k(k + 1)/2 - t and set) = k + 1 - ret) have the
= =
property that £(r(t), s(t)) t, r(£(i,j)) i, and s(£(i,j)) j. =
l6This number corresponds to having Is in the first column of the triangular
array but nowhere else. Clearly, 0 < c < 1, but it is impossible to have Is in the
entire first column, since this would require Xl = 1. Even if Xl = 1 had been
allowed, its base 2 expansion would have ended in infinitely many Os rather than
infinitely many Is.
l7This proof is adapted from p. 219 of Billingsley (1968) and Theorem 15.8 of
Royden (1968).
622 Appendix B. Probability Theory
all n. Since {Xn}~1 is dense, there exists a subsequence {Xn;}~1 such that
limj--+oo Xnj = x. It follows that 0 = limj--+oo d(x, x nj ) = limj-+oo dey, xnj)j hence
limj--+oo Xnj = y, and y = x.
Next, we prove that f- I : f(X) -+ X is continuous. Suppose that a se-
quence of points {f(Yn)}::'=1 converges to fey). Let limj--+oo Xnj = y. Then
limj--+oo dey, Xnj ) = O. But dey, Xnj ) is the nj coordinate of fey), which in turn
is the limit (as n -+ 00) of the nj coordinate of f(Yn). For each j, d(Yn,Y) $
d(yn, xnj ) +d(y, xnj ). Let 10 > 0 and let j be large enough so that dey, Xnj ) < 10/2.
Now, let N be large enough so that n ~ N implies d(Yn' Xn,;) < dey, Xn,; ) + f/2. It
follows that, if n ~ N, d(Yn, y) < E. Hence limn-+oo Yn = y and f- I is continuous,
hence measurable.
Finally, we will prove that the image G of f is a measurable subset of JRoo. We
will do this by proving that G is the intersection of countably many open subsets
-18
of G. Let G n be the following set:
{x E JRoo : 3 0", a neighborhood of x with d(a,b) ~ lin for all a,b E f-I(O",)}.
Since 0", S;; G n for each x E G n , G n is open. Also, since f and f- I are continuous,
it is easy to see that G S;; G n for all n. Let G' = G n::'=1 Gn. For each x E G',
let O""n S;; G n be such that 0",,1 ;;;? 0",,2 ;;;? ••• and that dCa, b) $ lin for
all a,b E f-I(O""n)' Note that f-I(O."n) ;;;? f-I(O""n+il for all n. If Yn E
f-I(O""n) for every n, then {Yn}::'=1 is a Cauchy sequence, since n,m ~ N
implies d(Yn,Ym) $ liN. Hence, there is a limit Y to the sequence. It is easy to
see that if there were two such sequences with limits Y and y', then dey, y') < f
for all f > 0, hence y = y'. So we can define a function h: G' -+ X by hex) = y.
If x E G, then clearly hex) = rl(x). If x' E O""n, then d(h(x), h(x'» ~ lin, so
h is continuous. We now prove that G' S;; G, which implies that G = G' and the
proof will be complete. Let x E G', and let Xn E G be such that Xn -+ x. (This is
possible since G' S;; G.) Since h is continuous, f-I(X n) -+ hex). If Yn = f-I(X n)
and y = hex), then Yn -+ Y and f(Yn) -+ fey) E G, since f is continuous. But
f(Yn) = Xn , so fey) = x, and the proof is complete. 0
Next, we show that products of Borel spaces are Borel spaces.
Lemma B.41. Let (Xn,Bn) be a Borel space for each n. The product spaces
n~=1 Xi for all finite nand n:=1
Xn with product u-fields are Borel spaces.
PROOF. We will prove the result for the infinite product. The proofs for finite
products are similar. If Xn = JR for all n, the result is true by Lemma B.36. For
general X n , let 4>n : Xn -+ R,.. and 4>. : JRoo -+ R. be bimeasurable, where R,..
and R. are measurable subsets of JR. Then, it is easy to see that
4> : g (g Rn)
Xn -+ 4>.
18We use symbol G to stand for the closure of the set q..
The closure C?f a subs~t
G of a topological space is the smallest closed set contammg G. A set IS closed If
and only if its complement is open.
B.3. Conditi oning 623
Lemm a B.42. 19 Let C[O, 1) be the set of all bounded continuous function
s from
[0,1) to JR. Let p(f, g) = SUPXE[O ,l j lf(x) - g(x)l· Then, p is a metric on
C[O, 1)
and C[O, 1) is a Polish space.
PROOF. That p is a metric is easy to see. To see that C
is separab le, let Dk be the
set of functions that take on rational values at the points 0, l/k, ...
, (k - 1)/k, 1
and are linear between these values. Let D = Uk=lDk. The set D
is countab le.
Every continu ous function on a compac t set is uniformly continuous,
so let f E
e[O, 1) and € > 0. Let 8 be small enough so that Ix-YI < 8 implies If(x)- f(Y)1
<
E/4. Let k be larger than 4/€. There exists 9 E Dk such that Ig(i/k) - f(i/k)1
for each i = 0, ... , k. For ilk < x < (i + l)/k, If(x) - f(i/k)1
< €/4
< E/4, and
Ig(x) - g(i/k)1 < €/2, so I/(x) - g(x)1 < E. To see that e[O, 1] is complet
e, let
{/n}~=l be a Cauchy sequence. Then, for all x, {/n(X)}~=l is a Cauchy
sequence
of real number s that converges to some number f(x). We need to
show that the
convergence of f n to f is uniform. To the contrary, assume that there
exists E such
that, for each n there is Xn such that Ifn(Xn) - f(xn)1 > E. We know
that there
exists n such that m > n implies lin (X) - Im(x)1 < E/2 for all x.
In particul ar,
Ifn(x n ) - fm(xn)1 < E/2 for all m > n. Since limm _ oo fm(xn) = f(x n}
, it follows
that there exists m such that Ifm(x n) - f(xn)1 < E/2, a contrad iction.
0
Because Borel spaces have u-fields that look just like the Borel u-field
of the
real number s, their u-fields are generat ed by countab ly many sets.
The countab le
field that generat es the Borel u-field of 1R is the collection of all
sets that are
unions of finitely many disjoint intervals (including degener ate ones
and infinite
ones) with rational endpoin ts.
Propos ition B.43. 20 Let (X,B) be a Borel space. Then there exists
a countable
field e such that B is the smallest u-field containing C.
Because a field is a 7f-system, Theorem A.26 and Proposi tion B.43
imply the
following.
Coroll ary B.44. Let (X, B) be a Borel space, and let e be a countab
le field that
genemtes B. II ILl and IL2 are u-finite measures on B that agree on e,
then they
agree on B.
fy(y) = i fX,y(x,y)dvx(x),
Note that
1
IAXB(X,y)dp,Xly(xly) = IB(y)P,Xly(Aly), (B.48)
It follows from (B.48) that 'f/ and JL agree on the collection of all
product sets
(a 7T-system that generates BI @ B2)' Theorem A.26 implies that they
agree on
BI @ B2. By linearity of integrals and the monotone convergence theorem
A.52,
if 9 is nonnegative, then
1 h(x,y)d v(x,y) = I
h(X,y)
f(x,y/( x,y)dv (x,y)= Ih(X,y ) (
f(x,y)d JLX,y
)
(B.50)
where the second equality follows from the fact that dJLldv = I,
the third fol-
lows from the fact that JL and 1/ are the same measure, and the
fourth follows
from (B.49). If h is integrable with respect to v, then (B.50) applies
to h+,
h-, and Ihl, and all three results are finite. Also, f Ih(x, y)ldvxl y(xly)
is mea-
surable and vy({y : flh(x,y )ldvxly (xly) = oo}) = o. So f h+(x,y)
dvxly(x l)
and f h-(x,y) dvXly( xly) are both finite almost surely, and their
difference is
f hex, y)dvXly(xly), a measurable function. It now follows that (B.47) holds. 0
The measures /.Iy and VXIY in Theorem B.46 are not unique. In the
proof, we
could easily have defined /.Iy several ways, such as vy(A) = fA g(Y)JLy
(y) for any
strictly positive function 9 with finite JLy integral. A corresponding
adjustm ent
would have to be made to the definition of VXIY:
JLxlY(Aly) = 1 fXIY(xly)dvXly(xly)·
21The condition that the joint distribution have a density with respect .t~ a
measure v in Theorem B.52 is always met since v can be taken equal to the Jomt
distribution. The theorem applies even if v is not the joint distribution, however.
B.3. Conditioning 627
measure the conditional distributions are all dominated by the same measure.
(See Pr;blem 15 on page 663.) In general, however, the conditional distributio~
of X given Y = y is dominated by a measure that depends on y. For example, If
Y = g{X), the joint distribution of (X, Y) is not dominated by a product measure
even if the distribution of X is dominated. (See also Problem 7 on page 662.)
Nevertheless, we have the following result.
Corollary B.55. 22 Let (S,A,p,) be a probability space, let (Y,B2) be a measur-
able space such that B2 contains all singletons, and let (X, B) be a Borel space with
Vx a u-finite measure on (X,8). Let X : S -+ X and g : X -+ Y be measumble
junctions. Let Y = g{X). Suppose that the distribution of X has density fx with
respect to vx. Define von (X x Y,Bl ®B2) by v(C) = vx({x: (X,9(X» E C}).
Let p,X,Y be the probability induced on (X x y, BI ®B2) by (X, Y) from p,. Let the
probability induced on (Y,B2 ) by Y from p, be denoted P,y. Then p,x,y «v with
Radon-Nikodym derivative fx,y(x,y) = fx(x)l{g(x)} (y). Also, the conditions of
Theorem B.46 hold, and we can write
dp,y ( )
dvy Y = fy(y) = i l{g(x)} {y)fx (x)dvXIY{x/y),
if y = g(x),
fxlY{x/y) = { O~;f:~ otherwise.
XI rcos(fh),
X2 = r sin(Ot} COS{(2),
X n -l rsin(Ot} .. 'COS(On_l),
Xn = r sin(Ol) ... sin(On_l).
The Jacobian is r n - 1 j(f), where j is some function of 0 alone. The Jacobian for
the transformation to v and 0 is v(n/2)-lj(0)/2. The integral of j(O) over all 0
22This corollary is used in the proof of Theorem 2.86 and in Example 3.106.
23The calculation in this example is used again in Example 4.121.
628 Appendix B. Probability Theory
Iv (v) = 7r ~ V ~ -1 h( V ) •
2f(~)
The conditional density of X given V =v is then
2f(~)Vl-~ T
IXIV(xlv) = 7r1f I{v}(x x)
f
with respect to the measure IIXIV(Clv) = c • v(n/2)-lj(9)dA n_l(9)/2, where
JLxlV(Clv} +1
= r(!!)
7r c'
j(O}dAn-l(O}.
It is easy to see that JLxlV(·lv} is the uniform distribution over the sphere of
radius v in n dimensions.
n
every collection of sets Al E Ail, ... , An E A in , we have
If, in addition, Y is constant almost surely, we say {Xi hEN are independent.
Under the same conditions as above, if all of the conditional distributions of
the Xi given Yare the same, then we say {XihEN are conditionally IID given
Y. If, in addition, Y is constant almost surely, we say {XihEN are IID.
Example B.59. Let F be a joint CDF of n random variables Xl,.'" X n , and
let JL be the corresponding measure on lRn. Then JL is a product measure if and
only if Xl, ... ,Xn are independent (see Proposition B.66).
Example B.60 (Continuation of Example B.56; see page 627).24 Transform to
(Y, V), where Y = X/Vl/2. Then, the conditional distribution of Y given V is
given by
where D' = {9 : (cos(91 ), ... ,sin(91 ) .. ·sin(9n -I)) ED}. We note that this
formula does not depend on v; hence Y is independent of V. In addition, it is
easy to see that JLYlv(ylv) is just the uniform distribution over the sphere of
radius 1 in n dimensions.
The use of conditional independence in predictive inference is based on the
following theorem.
Theorem B.61. 26 Let N be an index set, let Y and {XihEN be a collection
of random quantities, and let Ai be the u-field generated by Xi. Then {XihEN
are conditionally independent given Y if and only if for every nand m and
every set of distinct indices it, ... ,in, jt, ... ,jm and every collection of sets Al E
Ail, ... ,An E Ain , we have
(8.62)
PROOF. For the "if" part, we will assume (8.62) and prove (B.58) by induction
on n. For n = 1, there is nothing to prove. Assuming (8.58) is true for all n ::; k,
we now prove it for n = k + 1. Let Aj E Ai; for j = 1, ... ,k + 1. According to
(B.62) and (8.58) for n = k, we have
Pr (B 0 Ai) = Pr (BnAHIOAi)
=
f
JB Pr(AHIIY)(s) n k
Pr(AiIY)(s)dJL(s) = in k+l
Pr(AiIY)(s)dJL(s).
The equality of the first and last terms above for all B E Ay means that
Il~~ll Pr(AiIY) = Pr(n~~: AiIY), a.s., which is what we need to complete the
induction.
For the "only if" part, we will assume (B.58) and prove (B.62). For a function
9 to be the left-hand side of (8.62), it must be measurable with respect to the
u-field AY,m generated by Y, Xii, ... ,Xjm , and satisfy
(B.63)
for all C E AY,m. Clearly, the right-hand side of (B.62) is measurable with respect
to AY,7n' If C = Cy nCx, where Cy E Ay and Cx is in the O'-field generated by
X jl , ... , Xj"" then
1e.
g(s)dJL(s)
Combining these gives that lIe g(s)dJL(s) - Pr (C n?=l Adl < f. Since f is arbi-
trary, (B.63) holds for all C E AY,m. 0
A particular case of interest involves three random quantities. Theorem B.64
says that when there are only two Xs in Theorem B.61, we can check conditional
independence by checking only one of the equations of the form (B.62).
Theorem B.64. 26 Let X, Y, and Z be three random quantities, and let Ax, Ay,
and Az be the O'-fields generated by each of them. Suppose that for all A E Ax,
=
Pr(AIY, Z) Pr(AIY). Then X and Z are conditionally independent given Y.
PROOF. We need to check that for every A E Ax and B E Az, Pr(A n BIY) =
Pr(AIY) Pr(BIY). Equivalently, for all such A and B, and all C E Ay, we must
show
(B.69)
Since A E C, it follows that A E Cn + l ; hence A and Ck are indepen
dent for
every k. It follows that IL(Ck n A) = /L(A)/L(Ck). It follows from
(B.69) that
IL(A) = IL(A)2, and hence either /L(A) = 0 or /L(A) = 1.
0
The a-field C in Theorem B.68 is often called the tail a-field of the sequence
{Xn}~=l' An interesting feature of the tail a-field is that limits are measurable
with respect to it. 28 (See Problem 21 on page 663.)
where the last equality follows since T = E{ZIB) and C E B. Since E{TIC) is C
measurable, equating the first and last entries of the above string of equations
means that E{TIC) satisfies the condition required for it to equal E{ZIC). 0
When Band C are the a-fields generated by two random quantities X and·
Y, respectively, C ~ B means Y is a function of X. So, Theorem B.70 can be
rewritten in this case.
Corollary B.n. Let X: 8 -+ Ul, Y : 8 -+ U2, and Z: 8 -+ 1R be measurable
junctions such that E{IZI) < 00. 8uppose that Y is a function of X. Then,
E{ZIY) = E {E {ZIX)I Y}, a.s. fIL]·
The most popular special case of this corollary occurs when Y is constant.
Corollary B.72. 29 Let (8,A,JL) be a probability space. Let X : 8 -+ U1 and
Z : 8 -+ 1R be measurable functions such that E{IZI) < 00. Then, E{Z) =
E{E{ZIX)}.
This is the special case of Theorem B.70 when C is the trivial a-field.
The following theorem implies that if a conditional mean given X depends on
X only through heX), then it is also the conditional mean given heX).
Theorem B.73. 30 Let (8,A,JL) be a probability space and let Band C be sub-
a-fields of A with C ~ B. Let Z : 8 -+ 1R be measurable such that E(IZl) <
28The tail a-field will play a role in the proofs of Corollary 1.63 and Theo-
rem 1.49.
29This corollary is used in the proof of Theorem B.75.
30This theorem is used in the proofs of Theorems 1.49 and 2.6.
B.3. Conditioning 633
To prove this we first note that f(s) = h(Xl (s), X2(S» is measurable
with respect
to the (J'-field generated by (Xl, X 2 ), Ax 1 ,x 2' All that remains is
to show that
it satisfies the integral condition required to be E(ZIX 1,X ). That
2 is, for all
l
C E AXl,x2 ,
E(ZIc) = f(s)dj.t(s). (8.76)
Let j.t2 be the measure on (X2,8 2) induced by X2 from j.t. First,
suppose that
C = An B, where A E AXl and B E A x2 . The last hypothesis of
the theorem
says that for all A E AXIl E(ZIAI X2 = y) = fA h(X l (s),y)dj.t(2)(sly)
. If j.t112(·ly)
is the probability on (Xl,B l ) induced by Xl from j.t(2) (-Iy), then j.t112(·ly)
is also
the conditional distribu tion of Xl given X2 = Y as in Theorem B.46.
Suppose
BA Limit Theorems
There are several types of convergence that will be of interest to us. They involve
sequences of random quantities or sequences of distributions.
lim
n-+ooJB
r Pn(X)dll(X) =(
JB
p(X)dll(X), Jor all B E B.
PROOF. Let 6n (x) = Pn(X) - p(x), and let 6;i and 6;; be its positive and ?eg-
ative parts. Clearly, both limn->oo 6;i = 0 and limn->oo 6;; = 0, a.e. [II]. Smce
o s:; 0;; :5 p is true, it follows from the dominated convergence theorem A.57
that limn~oo IB 0;; (x)dv(x) = 0 for all B. Since both Pn and P are densities,
Ix On (x)dv(x) = 0 for all n. It follows that limn~oo Ix 0;; (x)dv(x) = O. Since
IB(x)8;;(x) s:; 8;t(x) for all x, it follows from Proposition A.58 that
lim
n-+cx>
1
B
o;!"(x)dv(x) = o.
=
So, limn~oo IBfpn(X) - p(x)Jdv(x) 0 for all B. 0
Since defining convergence requires a topology, the following definitions require
that the random quantities lie in various types of topological spaces.
Definition B.SO. Let {Xn}~=,l be a sequence of random quantities and let X
be another random quantity, all taking values in the same topological space X.
Suppose that limn~oo E (f (Xn» = E (f (X» for every bounded continuous func-
tion , : X ..... JR, then we say that Xn converges in distribution to X, which is
written Xn E. X.
Convergence in distribution is sometimes defined in terms of probability mea-
sures. The reason is that if Xn E. X, the actual values of Xn and of X do not
play any role in the convergence. All that matters is the distributions of Xn and
of X.
Definition B.S1. Let {Pn}~=l be a sequence of probability measures on a topo-
logical space (X, 8) where 8 contains all open sets. Let P be another probability
on (X, 8). We say that Pn converges weakl1f5 to P (denoted Pn ~ P) if, for each
bounded continuous function 9 : X ..... JR, limn~oo J g(x)dPn(x) = J g(x)dP(x).
35This is not exactly the same as the concept of weak convergence in normed
linear spaces [see, for example, Dunford and Schwartz (1957), p. 419J. The col-
lection of all probability measures on a space (X, 8) can be considered a subset
of a normed linear space C consisting of all finite signed measures v (see Defini-
tion A.IS) with the norm being SUPBEB Iv(B)I. Weak convergence of a sequence
{Vn}~='l in this space would require the convergence of L(vn) for every bounded
linear functional L on C. Every bounded measurable function 9 on (X, 8) deter-
I
mines a bounded linear functional Lg on C by Lg(v) = g(x)dv(x), where the
integral with respect to a signed measure can be defined as in Problem 27 on
page 605. Hence, weak convergence of a sequence of probability measures would
require convergence of the means of all bounded measurable functions. In partic-
ular, limn~oo Pn(B) = PCB) for all measurable sets B, not just those for which
P assigns 0 probability to the boundary (see the portmanteau theorem B.83 on
page 636). Alternatively, we can consider the set of bounded continuous functions
, : X ..... JR as a normed linear space N with "'" = sup", I/(x)l. Then the set of
finite signed measures C is a set of bounded linear functionals on N using the
J
definition v(f) = '(x)dv(x). Weak" convergence of a sequence {Vn}~=l in C to
v is defined as the convergence of vn(f) to v(f) for all , EN. This is precisely
convergence in distribution. Hence, it would make more sense to call convergence
in distribution weak" convergence rather than weak convergence. Since the tra-
dition in probability theory is to call it weak convergence, we will continue to do
so.
636 Appendix B. Probability Theory
It is easy to see that these two types of convergence are the same.
Proposition B.S2. Let Pn be the distribution of X n , and let P be the distribution
of X. Then, Xn E. X if and only if Pn ~ P.
Since we will usually be dealing with X spaces that are metric spaces, there are
some equivalent ways to define convergence in distribution or weak convergence.
The proofs of Theorems B.83 and B.88 are adapted from Billingsley (1968).
Theorem B.S3 (Portmanteau theorem).36 The following are all equivalent
in a metric space:
w
1. Pn -+ P;
2. limsuPn_oo Pn(B) :$ P(B) for each closed B;
3. liminf n _ oo Pn(A) 2:: P(A), for each open A;
4. limn_co Pn(C) = P(C), for each C with P(8C) = 0. 37
PROOF. Let d be the metric in the metric space. First, assume (1) and let B be
a closed set. Let 6 > 0 be given. For each e > 0, define C. =
{x : d(x,B) :$ e},
where d(x, B) = infl/EB d(x, y). Since Id(x, B) - d(y, B)I :$ d(x, y), we see that
d(x, B) is continuous in x. Each C. is closed and n.>oc. = B. Let e be small
enough so that P(C.) :$ P(B) + 6. Let f : 1R -+ 1R be
1 if t :$ 0,
f(t) ={ 1- t if 0 < t < 1,
o if t 2:: 1,
and define g.(x) = f(d(x, B)/e). Then g. is bounded and continuous. So,
!~~ f g.{x)dPn{x) = f g.{x)dP{x).
It is easy to see that 0 :$ g.(x) :$ 1, g.(x) = 1 for all x E B, and g.(x) = 0 for all
x ¢ C•. Hence, for every 6 > 0,
36This theorem is used in the proofs of Theorem B.88 and Lemma 7.19.
37We use the symbol a in front of the name of a subset of a topological space
to refer to the boundary of the set. The boundary of a set C in a topological space
is the intersection of the closure of the set with the closure of the complement.
BA. Limit Theorems 637
exists a sequence {fk}~1 converging to 0 such that P(d(X,B) = fk) = 0 for all
k. It follows that limn_co Pn(C. k ) = P(C. k ) for all k. Since Pn(B) :5 Pn(C. k )
for every nand k, we have, for every k,
Since PCB) = limk ..... co P(C' k )' we have (2). So, (2), (3), and (4) are equivalent
and (1) implies (2).
All that remains is to prove that (2) implies (1). Assume (2), and let f be
a bounded continuous function. Let m < f(x) < M for all x. For each k, let
Fi,k = {x : f(x) :5 m + (M - m)i/k} for i = 1, ... , k. Let FO,k = 0. Each Fi,k is
closed, since f is continuous. Let Gi,k = Fi,k \ Fi- 1 ,k for i = 1, ... , k. It is easy
to see that for every probability Q,
m + (M - m) Lk i-I!
-k-Q(Gi,k) < f(x)dQ(x) :5 m + (M - m) L "kQ(Gi,k).
ki
i=1 i=1
M-m
M - - k- Lk ! M-m
Q(Fi,k) < f(x)dQ(x) :5 M + - k- -
M-m
- k- LkQ(Fi,k).
i=1 i=1
(B.84)
For each i,
(B.85)
! f(x)dP(x) :5 M
M-m
+ -k- -
M-m
-k- L P(Fi,k)
k
i=1
M-m M-m k
:5 M + - - k - - - - k - LlimsupPn(Fi,k)
i=1 n-oo
:5 Mk-m +liminf!f(x)dPn(x),
n_co
where the first inequality follows from the second inequality in (B.84) with Q = P,
the second inequality follows from (B.85), and the third inequality follows from
the first inequality in (B.84) with Q = Pn. Letting k be arbitrarily large, we get
Since this set must have probability 1, then so too must U~=l (n~=NAn,.) for
all €. By Theorem A.19, it follows that for every €, limN-+oo Pr (n~=NAn,.) = 1.
Hence, for each 10 > 0, limn_oo Pr(A~,<) == 0, which is precisely what it means to
p
say that Xn -> X.
Next assume that Xn .!: X. Let 9 : X -> IR be bounde d and continu
ous with
Ig(x)1 ~ K for all x. Let 10 > 0, and let A be a compac t se~ wit~ Pr(X
E A~ > 1-
°
€/[6Kj. A continuous function (like g) on a compac t set IS umformly
contmu?us.
So let 8 > be such that x E A and d(x, y) < 8 implies Ig(x) - g(y)1
< 10/3. Smce
Xn ~ X, there exists N such that n ~ N implies Pr(d(X n, X) < 8)
> 1-€/[6K j.
Let B = {X E A,d(Xn ,X) < 8}. It follows that Ig(X)IB - g(Xn)IB
I < 10/3 and,
for all n ~ N, Pr(B) > 1 - €/[3Kj. Also, note that n ;:: N implies
10 10
IEg(X) - E[g(X) IBli < 3' IEg(Xn) - E[g(Xn )IBll < 3'
r/>X(t) J 1 exp -
exp( itx) ..,f'f; (X2) 1
2" dx ==..,f'f; J exp -([X - itJ2 2
2 + t ) dx
exp ( -~).
Similarly, for other normal distributions, N(/-£, (12), the characteristic functions
are c/>x(t) == exp( _(12 t 2 /2 + it/-£).
By Theorem B.12, if X has CDF F, then c/>x == c/>F. It is easy to see that the
characteristic function exists for every random vector and it has complex absolute
value at most 1 for all t. Other facts that follow directly from the definition are
the following. If Y == aX + b, then r/>y(t) == c/>x(at)exp(itb). If X and Y are
independent, c/>x+y == c/>xc/>y.
The reason that characteristic functions are so useful for proving convergence
in distribution is two-fold. First, for each characteristic function C/>, there is
only one CDF F such that c/>F == c/>. (See the uniqueness theorem B.106.) Sec-
ond, characteristic functions are "continuous" as a function of the distribution
in the sense of convergence in distribution. That is, Xn E. X if and only if
limn~oo c/>Xn(t) == c/>x(t) for all t.40 (See the continuity theorem B.93.)
I if a < x < b,
1-~ if a - 0 < x:::; a,
{ (B.94)
fa,b,6(x) == ~ _ X~b if b:::; x < b+ 0,
otherwise.
Note that this function has equal values at a - 6 and b + 6. Consider the inter-
val [a - 6, b + 6] as a circle identifying the two endpoints. Now, use the Stone-
Weierstrass theorem C.3 to approximate uniformly fa.b.6 to within f on the circle
by f~.b.6 •• (X} = E~=-l bj exp(21fijx/c}, where c = b - a + 26. If Y is a random
variable, then Ef~.b.6 •• (Y} is a linear combination of values of the characteristic
function of Y. So, we have lim n ..... oo Ef~.b.6 .• (Xn} = Ef~.b.6 .• (X}. Let q > 0, and
let a and b be continuity points of Fxt such that Fxt(b} - Fxt(a} = v > q. Let
w = v - q. Let 6 > 0 be arbitrary, and define a' = a - 6 and b' = b + 6. Let N be
large enough so that n ;::: N implies IEf~.b.6.w/3(X~} - Ef~.b.6.w/3(Xl)1 < w/3. If
n;::: N, then
Fxtn (b') - Fxtn (a');::: Efa.b.6(X~) > Ef~.b.6.f(X!) - .!j
>
- EJ'a.b.6.' (Xl)
-"'3 2w ;::: Efa.b.6(X t ) - w
;::: Fxt(b} - Fxt(a} - w = q.
Now, let 9 be a bounded continuous function, and suppose that Ig(x)1 < K for
all x. Let f > O. For each coordinate xt of X, let at and be be continuity points
of Fxt such that Fxt(bt} -Fxt(at) > I-f/(7[K +f/7]k). Let 6 > 0 be arbitrary,
and define at = at - 6, bt = bt - 6, and g*(x} = g(x) n:=l fa~.b~.6(Xt). Use the
Stone-Weierstrass theorem C.3 to uniformly approximate g* to within f/7 on the
rectangle {x : at - 6 :5 Xl :5 bt + 6} by
L ... L
ml mJe
Let Nl be large enough so that n;::: Nl implies Fx~ (bt}-Fx~ (at) ;::: I-f/(7[K +
f/7]k) for all j. Let N2 be large enough so that n ;::: N2 implies IEg'(Xn) -
Eg'(X} I < f/7. Let R be the rectangle R = {x : at < Xl :5 btl. Since g' is periodic
in every coordinate, it is bounded by K + f/7 on all of IRk. If n ;::: max{Nl' N2},
then IEg(Xn} - Eg(X)1 is no greater than
We will prove two more limit theorems that make use of the continuity theo-
rem B.93. Suppose that X has finite mean. Since Iexp(itx} - 11 :5 min{ltxl,2}
for all t,z,42 and
. exp(itx} -1
I1m .
t ..... o . t = lX
(B.98)
The characteristic function of Yn is ¢Yn (t) = ¢x; (tl v'nt· We will prove that
this converges to exp(-t 2(]'2/2) for each t. Since log<Pyn(t) = nlog¢x;(tlfo),
B.4. Limit Theorems 643
It follows that limn -+ oo CPY" (t) = exp( _t 2 u 2 /2), and the continuity theorem B.93
finishes the proof. 0
There is also a multivariate version of the central limit theorem.
Theorem B.99 (Multivariate central limit theorem).43 Let {Xn}~=l be a
sequence of lID mndom vectors in IRP with mean p. and covariance matrix E.
Then v'n(X" - p.) Eo Np(O, E), a multivariate normal distribution.
PROOF.
-
Let Yn = Vri(X n - p.) and let Y '" Np(O,E). Then Yn -+ Y if and
v
only if the characteristic function of Yn converges to that of Y. That is, if and
only if, for each A E JRP, E exp { iA TYn} -+ E exp { iA TY }. This occurs if and
only if, for each A, ATYn Eo ATy. The distribution of ATy is N(O, ATEA), and
ATYn is .;n times the average of the AT (Xn - p.). By the univariate central limit
theorem B.97, ATYn Eo ATy. 0
There are inversion formulas for characteristic functions which allow us to
obtain or approximate the original distributions from the characteristic functions.
Example B.100 (Continuation of Example B.92j see page 640). Let X have
J
distribution N(O, ( 2 ). Then Icpx(t)ldt < 00. In fact,
2~ J exp(-ixt)cpx(t)dt = ~
2~
Jexp ( - u 2 [t + ~X]2 __
2 u2
1 X2) dt
2u 2
PROOF. Clearly, the function in (B.102) is bounded since cpx is integrable. Let Y"
have N,,(O, u 2 lie) distribution. The characteristic function of X + Y" is CPxCPy".
where the second equality follows from the fact that (B.102) applies to normal
distributions. Now suppose that we let (f go to zero. Since </>x is integrable and
</>y,,(t) goes to 1 for all t, it follows that the left-hand side of (B.103) converges to
the right-hand side of (B.102). It also follows that fx+y" is bounded uniformly
in (f and x. Let B be a hypercube such that the probability is 0 that X is in the
boundary of B. Then
where the first equality follows from the boundedness of f x +Yu' and the second
is proven as follows. The difference between fx+y" (x)dx and IB fx(x)dx is IB
the sum over the 2k corners of the hypercube B of terms like
k
L Pr(b i - Y",i < Xi :::; bi , Y",i > 0) + Pr(bi < Xi :::; b; - y",;, Y",i < 0),
i=l
Lemma B.I05. 46 Let Y be a mndom variable such that </>y is integmble. Let X
be an arbitmry mndom variable independent of Y. For all finite a < band c,
Pr(a < X + cY ~ b) = ~
211"
f (exp( -ibt) -. exp( -iat») </>x(t)</>y(ct)dt.
-~t
PROOF. Since </>y is integrable and </>x+cy(t) = </>x (t)</>y (ct), it follows that
X + cY has integrable characteristic function. Lemma B.101 says that (B.102)
applies to X + cY, hence
~ j</>x(t)</>y(ct)exP(-itX)dt
211"
2~ 1b J </Jx(t)</Jy(ct)exp(-itx)dtdx
2~ J </Jy(et)</Jx(t) 1b exp(-itx)dxdt
{x: Xil ::; el, ... ,Xi n ::; Cn , for some n and some integers il, ... ,in
It is clear that X-I (B) E A since it is the intersection of finitely many sets in
A. By Theorem A.34, it follows that X-I (BOO) ~ A, so X is measurable with
respect to this u-field.
B.5.2 Martingales+
A particular type of stochastic process that is sometimes of interest is a martin-
gale. [For more discussion of martingales, see Doob (1953), Chapter VII.]
Definition B.lOS. Let (8, A, 1-1) be a probability space. Let N be a set of consec-
utive integers. For each n EN, let F,. be a sub-u-field of A such that Fn ~ F,.+l
for all n such that n and n + 1 are in N. Let {X,,},.E.N' be a sequence of ran-
dom variables such that Xn is measurable with respect to Fn for all n. The
sequence of pairs {(Xn,F,.)}nE.N' is called a martingale if, for all n such that n
and n + 1 are in N, E(Xn+lIFn) = Xn. It is called a submartingale if, for every
n, E(Xn+lIFn) ~ X n .
Note that a martingale is also a submartingale.
Example B.I09. A simple example of a martingale is the following. Let N =
{1, 2, ..:1,. and let {Yn}~l be independent random variables with mean O. Let
Xn = E:=l Yi. Let Fn be the u-field generated by YI, ... , Yn . Then,
The following result is proven using the same argument as in Example B.111.
Proposition B.ll3. 49 If {(Xn, Fn)}nE.N' is a martingale, then EIXnl is nonde-
creasing in n.
The reader should note that if {(Xn, Fn)}nE.N' is a submartingale and if M ~
N is a string of consecutive integers, then {(Xn, .rn)}nEM is also a submartingale.
Similarly, if k is an integer (positive or negative) and M = {n: n+k EN}, then
{(X~, F~)}nEM is a submartingaie, where X~ = Xn+A: and .r:..
= Fn+lc. This
latter is just a shifting of the index set.
There are important convergence theorems that a.pply to many martingales
and submartingales. They say that if the set N is infinite, then limit random
variables exist. A lemma is needed to prove these theorems. 50 It puts a bound on
how often a submartingale can cross an interval between two numbers. It is used
to show that such crossings cannot occur infinitely often with high probability.
(Infinitely many crossings of a nondegenerate interval would imply divergence of
the submartingale.)
Lemma B.114 (Upcrossing lemma).5l Let.N = {I, ... , N}, and suppose
that {(X",.rn)}~=l is a submartingale. Let r < q, and define V to be the number
of times that the sequence Xl, ... , XN crosses from below r to above q. Then
Now V(s) is one-half of the largest even m such that Tm(s) :5 N. Define, for
i= 1, ... ,N,
E(X) = L:J. . _
N
i=l {s.R,{s)-l}
(Yi(S) - Yi-l(s»dl-£(s)
= L J . . _ (E(YiI.ri-t}(S) - Yi-l(s»dJL(s)
i=l {s.R,{s)-l}
N
:5 L !(E(YiI.ri-t}(S) - Yi_l(s»dJL(s)
i=l
N
L(E(Yi) - E(Yi_t}) = E(YN),
i=l
where the second equality follows from (B.116) and the inequality follows from
the fact that {Yn, Fn}~=l is a submartingale. It follows that (q-r)E(V) :5 E(YN).
Since E(YN) :5lrl + E(IXN!), it follows that (B.115) holds. 0
The proof of the following convergence theorem is adapted from Chow, Rob-
bins, and Siegmund (1971).
B=
r < q,
u
r, q rational
{s : X·{s) ~q >r ~ X.(s)}.
Now, X·{s) > q > r ~ X.{s) if and only if the values of Xn(S) cross from being
below r to being above q infinitely often. For fixed rand q, we now prove that
this has probability OJ hence J.L(B) = O. Let Vn equal the number of times that
Xl, ... , Xn cross from below r to above q. According to Lemma B.114,
The number of times the values of {Xn(S)}~=l cross from below r to above q
equals limn-+oo Vn(s). By the monotone convergence theorem A.52,
By Lemma A.72, we have that for every E > 0 there exists li such that JL(A) < 6
implies T/(A) < E. By the Markov inequality B.15,
1 1
JL(Ac,n) :$ -E(Xn) = -E(X),
c c
for all n. Let C = 2E(X)/li. Then c ~ C implies JL(Ac,n) < 6 for all n, so
T/(Ac,n) < € for all n. 0
PROOF OF THEOREM B.ll8. By Lemma B.119, {Xn}~=1 is a uniformly integrable
sequence. Let Y be the limit of the martingale guaranteed by Theorem B.117.
Since Y is a limit of functions of the X n , it is measurable with respect to Foo. It
follows from Theorem A.60 that for every event A, lim n _ oo E(XnIA) = E(Y IA)'
Next, note that, for every A E F n ,
fA
Y(s)dJ.&(s) = n_oo
lim f
A
E(XIFn)(s)dJ.&(s) = f
A
X(s)dJL(S),
where the last equality follows from the definition of conditional expectation.
Since this is true for every n and every A E F n , it is true for all A in the field
F = U~=IFn. Since IXI is integrable, we can apply Theorem A.26 to conclude
that the equality holds for all A E F oo , the smallest IT-field containing F. The
equality E(XIA) = E(YIA) for all A E Foo together with the fact that Y is Foo
measurable is precisely what it means to say that Y = E(XIFoo) = Xoo. 0
For negatively indexed martingales, there is also a convergence theorem. Some
authors refer to negatively indexed martingales in a different fashion, which is
often more convenient.
Definition B.120. Let (S, A, JL) be a probability space. For each n = 1,2, ... ,
let Fn be a sub-IT-field of A such that Fn+l ~ Fn for all n. Let {Xn}~=1 be a
sequence of random variables such that Xn is measurable with respect to Fn for
all n. The sequence of pairs {(Xn,Fn)}~=1 is called a reversed martingale if for
all n E(XnIFn+l) = X n + 1 .
Example B.121. As in Example B.110, we can let {Fn}~=1 be a decreasing
sequence of IT-fields, and let E(IX!) < 00. Define Xn = E(XIFn). It follows from
the law of total probability B.70 that {(Xn,Fn)}~=l is a reversed martingale.
The following theorem is proven by Doob (1953, Theorem VII 4.2).
Theorem B.122 (Martingale convergence theorem: part 11).55 Suppose
that {(Xn, Fn)}n<o is a martingale. Then X = limn _ - oo Xn exists a.s. and has
finite mean.
PROOF. Just as in the proof of Theorem B.117, we let Vn be the number of times
that the finite sequence Xn,Xn+1, ... ,X- 1 crosses from below a rational r to
above another rational q (for n < 0). The upcrossing lemma B.114 says that
1
E{Vn) :$ -
q-r
(E(IX_ 1 !) + Ir!) < 00.
lim E(IXnl)
n--oo
= E(IXI).
By Proposition B.113, it follows that E(lXI) < 00, and so X has finite mean. 0
It is usually more convenient to express Theorem B.122 in terms of reversed
martingales.
Corollary B.123. 56 If {(Xn, Fn)}~l is a reversed martingale, then limn_co Xn
exists a.s. and has finite mean.
There is also a version of Levy's theorem B.118 for reversed martingales.
Theorem B.124 (Levy's theorem: part 11).57 Let {Fn}~=l be a decreasing
sequence of a-fields. Let Foo be the intersection n~=lFn. Let E(IXI) < 00. Define
Xn = E(XIFn) and Xoo = E(XIFoo). Then limn_oo Xn = Xoo a.s.
PROOF. It is easy to see that {(Xn' Fn)}~=l is a reversed martingale and that
E(IX11) < 00. By Theorem B.122, it follows that lim n__ oo Xn = Y exists and is
finite a.s. To prove that Y = Xoo a.s., note that Xoo = E(X1IFoo) since .1'00 ~ Fl.
So, we must show that Y = E(X1IFoo). Let A E .1'00' Then
i Xn(s)dl./-(s) = i Xl (s)dl./-(S) ,
since A E Fn and Xn = E(XdFn). Once again, using (B.112) and Lemma B.119,
it follows that fA Y(s)dl./-(s) = fA X1 (s)dl./-(s); hence Y = E(XdFoo). 0
PROOF. Define X = TIrER Xr and let B be the product u-field. Say that a set
C E B is a finite-dimensional cylinder set if there exists k and rl, ... , rk E R and
a measurable D ~ TI:=l Xr , such that
6°This theorem is used in the proofs of Theorem B.133 and DeFinetti's repre-
sentation theorem 1.49.
B.5. Stochas tic Processes 653
• For each permuta tion 7r of k items, Pil, ... ,ik (A) = H,,(1) , ... ,i,,(k) (B), where
B = {(X".(I),'" ,X".(k»: (Xl, ... ,Xk) E A} .
• For each f. E R \ {il, ... , ik}, H1, ... ,ik (A) = Pil, ... ,ik,l(B), where
621n Section 3.3, we give a much more elaborate motivation for the entire
apparatus of Bayesian decision theory, which includes mathematical probab~lity
as one of its components. An alternative derivation of mathematical probability
from operational considerations is given in Chapter 6 of DeGroot (1970).
63There are a few major differences between the approach in this section and
DeFinetti's approach, which DeFinetti, were he alive, would be quick to po~nt
out. Out of respect for his memory and his followers, we will also try to pomt
out these differences as we encounter them.
B.6. Subjective Probability 655
bounded set, C can be made small enough for this to hold, so long as the agent
has some funds available.
Definition B.134. The fair price p of a random quantity is called its prevision
and is denoted P(X). It is assumed, for a bounded random quantity X, that the
agent is indifferent between all gambles whose net gain (loss if negative) to the
agent is c(X - P(X)) for all c in some symmetric interval around O.
The symmetric interval around 0 mentioned in the definition of prevision may
be different for different random variables. For example, it might stand to reason
that the interval corresponding to the random variable 2X would be half as wide
as the interval corresponding to X.
Another assumption we make is that if an agent is willing to accept each of a
countable collection of gambles, then the agent is willing to accept all of them
at once, so long as the maximum possible loss is small enough for the agent to
pay.64 An example of countably many gambles, each of which is acceptable but
cannot be accepted together, is the famous St. Petersburg paradox.
Example B.135. Suppose that a fair coin is tossed until the first head appears.
Let N be the number of tosses until the first head appears. For n = 1,2, ... ,
define
if N = n,
otherwise.
Suppose that our agent says that P(Xn } = 1 for all n. For each n, there is Cn < 0
such that the agent is willing to accept cn(Xn - 1). If - 2::""=1cn 2n is too big,
however, the agent cannot accept all of the gambles at once. Similarly, there are
Cn > 0 such that the agent is willing to accept Cn (Xn - 1). If 2:::'-1
Cn is too
big, the agent cannot accept all of these gambles. The St. Petersburg paradox
corresponds to the case in which Cn = 1 for all n. In this case, the agent pays 00
and only receives 2N in return. We have ruled out this possibility by requiring
that the agent be able to afford the worst possible loss.
if X = x,
if not.
Suppose that our agent is indifferent between all gambles of the form c(Ix _ 2-"')
for all -1 :s :s
c 1 and all integers x. Then, we assume that the agent is also
indifferent between all gambles of the form 2:::'=1
c",(I", - T X ), so long as -1:s
:s 1 for all x. (Note that the largest possible loss is no more than 1.) Let
:L:
C",
y = 1 cx!x with -1 :5 Cx :5 1 for all x. Note that Y is a bounded random
64DeFinetti would not require an agent to accept count ably many gambles at
once, but rather only finitely many. We introduce this stronger requirement to
avoid mathematical problems that arise when the weaker assumption holds but
the stronger one does not. Schervish, Seidenfeld, and Kadane (1984) describe one
such problem in detail.
656 Appendix B. Probability Theory
quantity, and that the agent has implicitly agreed to accept all gambles of the
form c(Y - ,.,,) for -1 ::; c ::; 1, where ,." = E::'=l cz2- z . If the agent were
foolish enough to be indifferent between all gambles of the form dey - p) for
-a ::; d ::; a where p i= ,.", then a clever opponent could make money with no risk.
For example, if p > ,.", let f = min{l, a}. The opponent would ask the agent to
accept the gamble fey -p) as well as the gambles - fcz(Iz _2- Z ) for x = 1,2, ....
The net effect to the agent of these gambles is - f(p -,.,,) < 0, no matter what
value X takes! A similar situation arises if p < ,.". Only p = ,." protects the agent
from this sort of problem, which is known as Dutch book.
X en(lcn - ,.,,(Cn
n=l
the maximum losses from X and from -X are small enough for the agent to
afford. Since this makes X bounded, it follows from Fubini's theorem A.70 that
E(X) = OJ hence it is impossible that X < 0 under all circumstances, and the
previsions are coherent.
For the "only if' part, assume that the previsions are coherent. Clearly, ,.,,(0) =
0, since 10 = 0 and -c,.,,(0) ~ 0 for both positive and negative c. It is also easy to
see that ,.,,(A) ~ 0 for all A. If ,.,,(A) < 0, then for all negative c, c(IA -,.,,(A» < 0
and we have incoherence. Countable additivity follows in a similar fashion. Let
{An}~=l be mutually disjoint, and let A = U~=lAn. If ,.,,(A) < E:=l,.,,(A n ),
If J.t(A) >E::"=l J.t(An), then the negative of the above gamble is always negativ~
Either way there is incoherence.
Theorem B.138 says that if an agent insists on dealing with a l7-field
of sub-
sets of some set 8, then expressing coherent previsions for gambles
on events is
equivalent to choosing probabilities. 67 Similar claims can be made about
bounde d
random variables.
Theore m B.139. Let C be the collection of all bounded measura
ble function s
from a measurable space (8, A) to lR. Suppose that, for each X
E C, an agent
assigns a prevision P(X). The previsions are coherent if and only
if there exists
a probability /1 on (8, A) such that P(X) = E(X) for all X E C.
PROOF. Suppose that the agent is indifferent between
all gambles of the form
c(X - P(X» for -dx ::; c S dx. For the "if" direction, the proof
is virtuall y
identical to the corresponding part of the proof of Theorem B.138.
For the "only
if" part, note that IA E C for every A E A. It follows from Theorem
B.138 that a
probabi lity J.t exists such that J.t(A) = P(lA) for all A E A. Hence P(X)
for all simple functions X. Let X > 0 and let Xl S X 2 S ... be simple
= E(X)
functions
less than or equal toX such that lim n _ oo Xn = X. Then X =
so
E::"=l(Xn+l -Xn ),
00
671n the theory of DeFine tti (1974), one obtains finitely additive
probabilities
without assuming that probabilities have been assigned to all element
s of a 17-
field.
68DeFinetti (1974) would only require that such conditional gambles
be con-
sidered one at a time rather than a l7-field at a time.
658 Appendix B. Probability Theory
this case, there are only two sets of conditional gambles (other than the "uncondi-
tional" gambles c[X-P(X)]) , namely cIA (X -P(XIA» and cIAc(X -P(XIAc ».
Here, Q = P(XIA)IA + P(XIAc)IX, Note that the previsions P(XIA) and
P(IA) = ",(A) are already expressed. It is easy to see that
cIA(X - P(XIA»
= c(XIA - E(XIA» - cP(XIA)(IA -",(A» + c[P(XIA)",(A) - E(XIA)].
Clearly, the only coherent choices of P(XIA) satisfy P(XIA)",(A) = E(XIA). If
",(A) > 0, then P(XIA) = E(XIA)/",(A), the usual conditional mean of X given
A. Similarly, P(XIAc)",(Ac ) = E(XI~) must hold.
The general situation is not much different from Example B.140.
Theorem B.141. Suppose that an agent must choose a function Q that is mea-
sumble with respect to a sub-a-field C so that for each nonempty A E C, he or
she is indifferent between all gambles of the form cIA(X - Q). The choice of Q
is coherent if and only ifE(QIA) = E(XIA), for all A E C.
PROOF. As in Example B.140, note that
cIA(X - Q) = c(XIA - E(XIA» - c(QIA - E(QIA» + c[E(QIA) - E(XIA)]'
The choice of Q can be coherent if and only if E(QIA) = E(XIA). 0
The reader should note the similarity between the conditions in Theorem B.141
and Definition B.23. The function Q must be a version of the conditional mean
of X given C.
Example B.142. Let (X, Y) be random variables with a traditional joint density
with respect to Lebesgue measure Ix,Y. That is, for all C E JR.2,
Pr«X, Y) E C) = fa IX,y(x,y)dxdy,
B.7 Simulation *
Several times in this text, we will want to generat e observations
that have a
desired distribu tion. Such observations will be called pseudorandom
numbers be-
cause samples appear to have the properties of random variables,
but they are
actually generat ed by a complicated determi nistic process. We will
not go into
detail on how pseudor andom numbers with uniform U(O, 1) distribu
tion are gen-
erated. In this section, we wish to prove a couple of useful theorem s
about how to
generat e pseudor andom number s with other distribu tions under the
assump tion
that pseudor andom number s with U(O, 1) distribu tion can be generat
ed.
Theore m B.144. Let F be a CDF and define the inverse of F by
N = min {i .
U. < fey;} }
. • - kg(Y;) .
*This section may be skipped without interrup ting the flow of ideas.
660 Appendix B. Probability Theory
Pr(Z ~ t) = ( IUi::; - -
Pr Yi::; t
f(Yi»)
kg(Yi)
Pr (Yi ::; t, U. ::; tK~i))
= --'--.,....---....,--!...
Pr (u. < 1llil)
• - k9(YJ
jt -00
fey)
kg(y)g(y)d y =k
1jt -00 f(y)dy,
since Yi has PDF g(.). Similarly, the denominator conditional probability can be
written as
Pr (Ui < f(Yi)
- kg(Yi)
IYi) = f(Yi) .
kg(Yi)
The mean of this is likewise seen to be J f(y)dy/k. The ratio of these is
Pr(Z < t) = J~oo f(y)dy
- J f(y)dy ,
If (U, V) has uniform distribution over the set A, then V /U has density propor-
tional to f.
PROOF. Let (U, V) be uniformly distributed on the set A. Then fu,v(u,v) =
lA(u,v)/c, where c is the area of A. Define X = U and Y V/U. The Jacobian =
for the transformation is x and the joint density of (X, Y) is
/x,y(x,y)
x
= ~IA(x,xy) x
= ~I[O,v'f'"iY)l (x ) .
B.8. Problems 661
J
It follows that fy(y) = ov'fu5 ~dx = icf(y). 0
If both f(x) :::; b and a :::; xV f(x) :::; b for all x, then A is contained in the
rectangle with opposite corners (0, a) and (b, c). We can then generate U '" U(O, b)
and V '" U(a, c). We set X = V/U, and if U2 :5 f(X), take X as our desired
random variable. If U 2 < f(X), try again.
An important application of simulation is to the numerical integration tech-
nique called importance sampling. Suppose that we wish to know the value of the
ratio of two integrals
J v(8)h(8)d8
(B.148)
J h(8)d8 '
where 8 can be a vector. Suppose that f is a density function such that h/ f is
nearly constant and it is easy to generate pseudorandom numbers with density
/. Let {Xi}~l be an IID sequence of pseudorandom numbers with density f.
Then
J h(8)d8 = E (h(Xi»)
f(X i ) ,
J v(8)h(9)d9 h(X;»)
E ( v(Xd /(Xi) ,
where the expectations are with respect to the pseudo-distribution of Xi. If we let
Wi = h(Xi)/ /(Xi ) and Zi = V(Xi)Wi , then the weak law of large numbers B.95
says that Zn/W n converges in probability to (B.148).69 The reason that we want
h/ / to be nearly constant is so that the variance of Wi is small. In Section 7.1.3,
we will show how to approximate the variance of Zn/W n as an estimate of
(B.148).
B.8 Problems
Section B.2:
69The strong law of large numbers 1.63 says that Zn/W n converges a.s. to
(B.148).
662 Appendix B. Probability Theory
7. Let <l> denote the standard normal CDF, and let the joint CDF of random
variables (X, Y) be
< y + 1,
Fx,Y(x, y) = { ~~y)
~ if Y - 1 ~ x
if x ~ Y + 1,
otherwise.
12. Suppose that Xl, ... , Xn are independent, each with distribution N(e, 1).
Find the conditional distribution of Xl"'" Xn given Xn = x, where Xn =
E~lXi/n.
13. Let 81 ~ 82 ~ ... be a sequence of u-fields, and let X ~ o. Suppose that
E(XI8n ) = Y for all n. Let 8 be the smallest u-field containing all of the
8 n. Show that E(XI8) = Y, a.s. (Hint: Show that the union of the 8 n is a
1I'-system, and use Theorem A.26.)
14. Prove Proposition B.43 on page 623.
15. Assume the conditions of Theorem B.46. Also, suppose that (X,8l,Vt)
and ()l, 82,112) are u-finite measure spaces and v = VI X V2. Prove that VI
can play the role of vXly(·ly) for all y and that V2 can play the role of vy
in the statement of Theorem B.46.
16. Prove Proposition B.51 on page 625. (Hint: Notice that IA(V-l(y,w» =
IA;(w).)
17. Prove Proposition B.66 on page 631. (Hint: Prove the result for product
sets first, and then use Theorem A.26.)
18. Prove Corollary B.67 on page 631.
19. Prove Corollary B.74 on page 633.
20. Prove the second Borel-Cantelli lemma: If {An}~=l are mutually inde-
pendent and E:=l Pr(An) = 00, then Pr(n~l U~=i An) = 1. (This set
is sometimes called An infinitely often.) (Hint: Find the probability of the
complement by using the fact that 1 - x:5 exp(-x) for 0:5 x :5 1.)
21. *Suppose that (8, A, It) is a measure space. Let {fn}~=l be a sequence of
measurable functions In : 8 --+ T, where (T,8) is a metric space with Borel
u-field. Let C be the tail u-field of {fn}~=l' If limn_oo fn(s) = f(s), for all
s, then prove that f is measurable with respect to C. (Hint: Refer to the
proof of part 5 of Theorem A.3B. Show that the set A. E C by showing
that the union in (A.39) does not need to start at 1.)
22. Let (8, A, It) be a probability space, and let C be the tail u-field of a se-
quence of random quantities {Xn}~=lt where Xn : 8 --+ X for all n. Let
V be the u-field generated by {Xn}~=l' Let X = (Xl,X2, ... ) E Xoo.
If 11' is a permutation of a finite set of integers {1, ... , n}, let 11'X =
(X"(l), ... , X,.(n) , X n + 1, ... ). We say that A E V is symmetric if A =
X-I (B) and for every permutation 11' of finitely many coordinates, A =
(1I'X)-l(B) as well.
(a) Prove that every C E C is symmetric.
(b) Show that there can be symmetric events that are not in C.
23. Prove Proposition B.78 on page 634.
664 Appendix B. Probability Theory
Section B.4:
29. Let {in}~=l be a sequence of numbers in {O, I}. Suppose that {Xn}~=1 is
a sequence of Bernoulli random variables such that
where x = 2::7=1 ij. Show that this specifies a consistent set of joint distri-
butions for n = 1,2, ....
30. Let /-L be a finite measure on (1R, B), where B is the Borel u-field. Suppose
that {X(t) : -00 < t < oo} is a stochastic process such that X(t) has
Beta(/-L(-oo,t],/-L(t,oo)) distribution for each t, X(t) > Xes) ift > s, and
X(·) is continuous from the right.
(a) Prove that Pr(limt-->oo X(t) = 1) = 1.
(b) Let U = inf{t : X(t) 2: 1/2}. Prove that the median of U is inf{t :
IL(-oo,t]2: /-L(t,oo)}. (Hint: Write {U $ s} in terms of X(·).)
31. Let R be a set, and let (Xr, Br) be a Borel space for every r E R. Let X =
n Xr and let B be the product u-field. For each r E R, let Xr : X -+ Xr
bert'~ projection function Xr(X) = Xr . Prove that B is the union of all of
the u-fields generated by all of the countable collections of Xr functions.
That is, let Q be the set of all countable subsets of R, and for each q E Q
let xq = {Xr lrEq and let Bq be the u-field generated by xq. Then show
that B = uqEQBq.
Section B. 7:
D(i)(f,x,y) = :t ... :t
i1=1 i;=1
(8Z'
31
~i.8z' I(Z)\
3.
llYi.) ,
z=z 8=1
where we allow notation like 8 3 /8z 1 8z 1 8z4 to stand for 8 3 /8z~8z4. Then, for
XED,
k
1This theorem is used in the proofs of Theorems 7.63, 7.89, 7.108, and 7.125.
For a proof (with m = 2), see Buck (1965), Theorem 16 on page 260.
666 Appendix C. Mathematical Theorems Not Proven Here
2This theorem is used in the proof of Theorem 7.57. For a proof, see Rudin
(1964), Theorem 9.17.
3This theorem is used in the proofs of DeFinetti's representation theorem 1.49
and 1.47 and Theorem B.93. For a proof, see Rudin (1964), Theorem 7.31.
4This theorem is used in the proof of Theorem B.17. For a proof, see Berger
(1985), Theorem 12 on page 341, or Ferguson (1967), Theorem 1 on page 73.
5This theorem is used in the proof of Theorems B.17, 3.77 and 3.95. For a
proof, see Berger (1985), Theorem 13 on page 342, or Ferguson (1967), Theorem 2
on page 73. ..
6This theorem is used in the proof of Theorem 3.77. For a proof, see DugundJ1
(1966), Theorems 3.2 and 4.3 of Chapter XI. .
7This theorem is used to show that certain estimators are UMVUE, and III
the proof of Theorem 2.74. For a proof, see Churchill (1960, Sections 52 and 56).
C.3. Functional Analysis 667
JJ
only il
IK{x', x)1 2 d",{x')d",{x) < 00.
8This theorem is used in the proof of Theorem 2.64. For a proof, see Churchill
(1960), Section 54, or Ahlfors (1966), Theorem 12' on page 134.
9This theorem is used in the proof of Theorem 2.114. For a proof, see Diaconis
and Freedman (1990), Theorem 2.1.
lOThis theorem is used in the proof of Theorem 8.40. For a lroof, see Sec-
tion XI.6 of Dunford and Schwartz (1963). By L 2{",) we mean {/: 12{x)d",{x) <
oo}.
llThis theorem is used in the proof of Theorem 8.40. For a proof, see Theo-
rem 6 of Section XI.6 of Dunford and Schwartz (1963). The reader should note
that Dunford and Schwartz (1963) use the term compact instead of completely
continuous.
l2This theorem is used in the proof of Theorem 8.40. For a proof, see Lemma 1
in Section VIIL3 of Berberian (1961).
l3This theorem is used in the proof of Theorem 8.40. For a proof, see part (5)
of Theorem 2 on p. 132 of Berberian (1961).
ApPENDIX D
Summary of Distributions
The distributions used in this book are listed here. We give the name and sym-
bol used to describe each distribution. Each distribution is absolutely continuous
with respect to some measure or other. In most cases the mean and variance are
given. In some cases, the symbol for the CDF is given.
Variance: 2 [q + a1'12~1'1')]
IThis distribution was derived without a name by Geisser (1967). It was named
L2 by Lecoutre and Rouanet (1981).
D.l. Univariate Continuous Distributions 669
Alternate noncentral F
Symbol: ANCF(q,a,,?
Beta
Symbol: Beta( a, (3)
Density: Ix (x) = ic~'igJ)xa-1(1- X),8-1
Dominating measure: Lebesgue measure on [0,1]
Mean: a~,8
.
v:arlance: a('J
(a+,8)2(a+,8+1)
Cauchy
Symbol: Cau(p" (7'2)
Chi-squared
Symbol: X~
Exponential
Symbol: Exp(O)
Density: fx(x) == Oexp(-xO)
Dominating measure: Lebesgue measure on [0, 00)
Mean: !
Variance: -b
F
Symbol: Fq,a
r(!I..±.!!.)q~a!
x2- (a + qx)--r
• !1 1 !1±<!
DensIty: fx(x) == ~r
r(~)r(!)
Dominating measure: Lebesgue measure on [0,00)
Mean: a~2' if a> 2
" .
variance: 2a2 q(a-4)(a-2)2'
q+a-2 'f
I a>
4
Gamma
Symbol: r(a,.8)
Density: fx(x) == ~X"'-l exp(-,Bx)
Dominating measure: Lebesgue measure on [0, 00)
Mean: ~
Variance: -p
Inverse gamma
Symbol: r- 1 (a,,B)
Density: fx(x) == ~X-O-l exp(-~)
Dominating measure: Lebesgue measure on [0, 00)
Mean: 6, if a> 1
Variance: (0 1~2("'_2)' if a > 2
Laplace
Symbol: Lap(tL, 0')
Density: fx(x) == 2~ exp (_Ix~el),
Dominating measure: Lebesgue measure on 1R
Mean: tL
Variance: 20'2
Dol. Univariate Continuous Distributions 671
Noncentral beta
Symbol: NCB (a., {3, 1/J)
D ensl'ty: f x ()
x = ,",00 (!I!.)k exp (!I!.)
L..Jk=O 2
r(a+~)
- 2 klr(a+k)r(~)x a+k-l(1 - x )~-1
Dominating measure: Lebesgue measure on [0,1)
Noncentral chi-squared
Symbol: NCx.~(1/J)
D enslty:
' f x (x ) -- ,",00
L..Jk=O
(!I!.)k
2 exp (!I!.)
- 2 ",!+k-l (
kJ2~+kr(!+k) exp -2
"')
Noncentral F
Symbol: NCF(q, a, 1/J)
Noncentral t
Symbol: NCt a (6)
' If a>2
CDF: NCT,,(oj6)
Normal
Symbol: N(p., 002 )
Density: /x(x) = (V21Too)-l exp ( _("';,.~)2)
Dominating measure: Lebesgue measure on (-00,00)
672 Appendix D. Summary of Distributions
Mean: J.t
Variance: 0'2
Pareto
Symbol: Par(o:, c)
Density: /x(x) = ;=+1
Dominating measure: Lebesgue measure on [e, 00)
Mean: :~l' if 0: > 1
...varIance:
" • c 2°
(0 2)(0 1)2,'1 f
0: >2
t
Symbol: t a (J.t,0'2)
Uniform
Symbol: U(a, b)
Density: /x(x) = (b - a)-l
Dominating measure: Lebesgue measure on [a, b]
Mean: ~
Variance: (b_a)2
12
Binomial
Symbol: Bin{n,p)
Density: /x{x) = (:)p"'{I- p)l-",
Dominating measure: Counting measure on {O, ... , n}
Mean: np
Variance: np{l - p)
Geometric
Symbol: Geo(p)
Density: /x{x) = p{l- p)'"
Dominating measure: Counting measure on {O, 1,2, ... }
Mean·.!=.E
• p
Variance: ~
p
Hypergeometric
Symbol: Hyp{N,n,k)
Density: /x{x) = (~)[~):)
Dominating measure: Counting measure on
{max{O,n - N + k}, ... ,min{n,k}}
Mean: ~
Variance: n (~) ( N;k) (~:7 )
Negative binomial
Symbol: N egbin{ a, p)
Density: /x{x) = Wot:lpO{l - p)'"
Dominating measure: Counting measure on {O, 1,2, ... }
Mean'• a.!=.E
p
Variance: a7
Poisson
Symbol: Poi{>.)
Density: fx{x) = exp{->'):~
Dominating measure: Counting measure on {O, 1,2, ... }
Mean: >.
Variance: >.
674 Appendix D. Summary of Distributions
Multinomial
Symbol: Multk(n,Pl, ... ,Pk)
Density: !Xl ..... Xk(Xl, ... ,Xk) =( n
.::r:l'···I:l:k
)PZll ... p~/c
'"
Multivariate Normal
Symbol: Np(JJ, (1)
Density: !x(x) = (271r~I(1I-~ exp(-!(x - JJ)T (1-l(X - JJ»)
Dominating measure: Lebesgue measure on IRP
Mean: E(Xi) = /Ji
Variance: Var(Xi) = (1i.i
Covariance: COV(Xi, Xi) = (1i.i
References
RAO, C. R. (1973). Linear Statistical Inference and Its Applications (2nd ed.).
New York: Wiley.
ROBBINS, H. (1951). Asymptotically subminimax solutions of compound sta-
tistical decision problems. In J. NEYMAN (Ed.), Proceedings of the Second
Berkeley Symposium on Mathematical Statistics and Probability (pp. 131-
148). Berkeley: University of California.
ROBBINS, H. (1955). An empirical Bayes approach to statistics. In J. NEY-
MAN (Ed.), Proceedings of the Third Berkeley Symposium on Mathematical
Statistics and Probability, volume 1 (pp. 157-164). Berkeley: University of
California.
ROBBINS, H. (1964). The empirical Bayes approach to statistical decision prob-
lems. Annals of Mathematical Statistics, 35, 1-20.
ROBERT, C. P. (1993). A note on Jeffreys-Lindley paradox. Statistica Sinica, 3,
601-608.
ROBERTS, H. V. (1967). Informative stopping rules and inferences about popu-
lation size. Journal of the American Statistical Association, 62, 763-775.
ROUANET, H. and LECOUTRE, B. (1983). Specific inference in ANOVA: From
significance tests to Bayesian procedures. British Journal of Mathematical
and Statistical Psychology, 36, 252-268.
ROYDEN, H. L. (1968). Real Analysis. London: Macmillan.
RUBIN, D. B. (1981). The Bayesian bootstrap. Annals of Statistics, 9, 130-134.
RUDIN, W. (1964). Principles of Mathematical Analysis (2nd ed.). New York:
McGraw-Hill.
SAVAGE, L. J. (1954). The Foundations of Statistics. New York: Wiley.
SAVAGE, L. J. (1962). The Foundations of Statistical Inference. London:
Methuen.
SCHEFFE, H. (1947). A useful convergence theorem for probability distributions.
Annals of Mathematical Statistics, 18,434--438.
SCHERVISH, M. J. (1983). User-oriented inference. Journal of the American
Statistical Association, 78, 611-615.
SCHERVISH, M. J. (1992). Bayesian analysis of linear models (with discussion).
In J. M. BERNARDO, J. O. BERGER, A. P. DAWID, and A. F. M. SMITH
(Eds.), Bayesian Statistics 4: Proceedings of the Second Valencia Interna-
tional Meeting (pp. 419--434). Oxford: Clarendon Press.
SCHERVISH, M. J. (1994). Discussion of "Bootstrap: More than a stab in the
dark?" by G. A. Young. Statistical Science, 9, 408-410.
SCHERVISH, M. J. (1996). P-values: What they are and what they are not.
American Statistician, 50, to appear.
SCHERVISH, M. J. and CARLIN, B. P. (1992). On the convergence of successive
substitution sampling. Journal of Computational and Graphical Statistics,
1, 111-127.
SCHERVISH, M. J. and SEIDENFELD, T. (1990). An approach to consensus and
certainty with increasing evidence. Journal of Statistical Planning and In-
ference, 25,401-414.
References 687
Ferguson, T., 52, 56, 61, 173, 179, Johnstone, I., 435, 682
181,248,258,614,666,680
Ferrandiz, J., 669, 680 Kadane, J., 21, 24,183-184,446,564,
Fieller, E., 321, 680 655, 682-683, 687-688
Fienberg, S., 462, 676 Kagan, A., 349, 683
Fishburn, P., 181, 680 Kahneman, D., 23, 683
Fisher, R., 89, 96, 217-218, 307, 370, Kasa, R., ix, 226, 446, 505, 683
373, 522, 680 Kerridge, D., 564, 683
Fraser, D., 435, 677, 680 Kiefer, J., 417, 420, 683
Freedman, D., 15, 28, 40-41, 46, 61, Kinderman, A., 660, 683
70,123,126,330-331,426, Kingman, J., 36, 683
434,479-480,667,676,678- Knuth, D., x, 683
680 Kraft, C., 66, 683
Freedman, L., 24, 680 Krasker, W., 56, 683
Freeman, P., 524, 681 Krem, A., 408, 683
Kshirsagar, A., 386, 684
Gabriel, K., 252, 681 Kullback, S., 116, 684
Garthwaite, P., 24, 681
Gavasakar, U., 24, 681 Lamport, L., x, 684
Geisser, S., 521, 668, 681 Lane, D., 9, 682
Gelfand, A., 507,681 Lauritzen, S., 28, 123, 481, 684
Geman, D., 507, 681 Lavine, M., 69,526,684
Geman, S., 507, 681 LeCam, L., 414, 437, 684
Ghosh, J., 381-382, 681 Lecoutre, B., 668-669, 684, 686
Gnanadesikan, R., 22, 681 Lehmann, E., 231,280, 285,298,350,
Good, I., 565, 681 684
Levy, P., 648, 650
Hadjicostas, P., ix Lindley, D., 6, 229, 284, 479, 684
Hall, P., 337-338, 681 Lindman, H., 222,284,679
Hall, W., 381-382, 681 Linnik, Y., 349, 683
Halmos, P., 364, 600, 681 Laeve, M., 34, 653, 685
Hampel, F., 315, 681 Louis, T., 24, 677
Hartigan, J., 20-21, 33, 681
Matts, J., 24, 677
Heath, D., 21, 46, 681-682
Mauldin, R., 66, 69, 685
Hewitt, E., 46, 682
McDunnough, P., 435, 677, 680
Heyde, C., 435, 682
Mee, R., 326, 679
Hill, B., 9, 484, 682
Mendel, G., 217, 685
Hinkley, D., 218, 423, 678-679
Metivier, M., 66,685
Hodges, J., 414
Monahan, J., 660, 683
Hoel, P., 640, 682
Morgenstern, 0., 181-182,688
Hogarth, R., 24, 682 Morris, C., 166, 500, 679, 685
Holland, P., 462, 676
Huber, P., 310, 315, 428, 682 Nachbin, L., 364, 685
Hwang, J., 160, 677 Neyman, J., 89, 175,231,247,420,
685
James, W., 163,682 Nobile, A., ix
Jaynes, E., 379, 682 Novick, M., ix, 6, 684
Jeffreys, H., 122, 229, 284, 682
Jiang, T., ix Oue, S., ix
Name Index 693
Patterson, R., 33, 688 Smith, A., 479, 507, 681, 684, 687
Pearson, E., 175, 231, 247, 685 Smith, W., 24, 682
Pearson, K., 216, 685 Spiegelhalter, D., 24, 680
Perlman, M., 430, 685 Spj(lltvoll, E., 283, 687
Peters, S., 24, 682 Stahel, W., 315, 681
Phillips, L., 6, 684 Steffey, D., 505, 683
Pierce, D., 99, 685 Stein, C., 163, 379, 382, 568, 682, 687
Pitman, E., 347, 685 Stigler, S., 8, 687
Port, S., 640, 682 Stone, C., 640, 682
Portnoy, S., x Stone, M., 21, 678, 687
Pratt, J., 56, 98, 683, 685 Strasser, H., 430, 688
Strawderman, W., ix, 166,688
Raftery, A., 226, 683 Sudderth, W., 9, 21, 46, 66, 69,
Ramamoorthi, R., 86, 676 681-682, 685
Rao, C., 152, 301, 349, 683, 685-686
Reeve, C., 326, 679 Taylor, R., 33, 688
Regazzini, E., 21, 676 Tiao, G., 521, 677
Rigo, P., 21, 676 Tibshirani, R., 336, 679
Robbins, H., 303, 647, 677, 686 Tierney, L., 225, 446, 507, 683, 688
Robert, C., 225, 686 Tversky, A., 23, 683
Roberts, H., 565,686
Ronchetti, E., 315, 681 Venn, J., 8,688
Rouanet, H., 668-669, 684, 686 Verdinelli, I., 524, 688
Rousseeuw, P, 315, 681 Villegas, C., 379, 677
Royden, H., 578, 589, 597, 621, 686 Von Mises, R., 10, 688
Rubin, D., 332, 686 Von Neumann, J., 181-182,688
Rudin, W., 666, 686
Wald, A., 415, 549, 552, 557,688
Savage, L., 46, 181, 222, 284, 565, Walker, A., 435, 442, 688
600, 679, 681-682, 686 Wallace, D., 99, 688
Scheffe, H., 298, 634, 684, 686 Wasserman, L., ix, 524, 526, 684, 688
Schervish, M., v Welch, B., 320, 688
Schwartz, J., 507, 635, 667, 679 West, M., 524, 688
Schwartz, L., 429, 687 Wijsman, R., x, 381-382, 681
Scott, E., 420, 685 Wilks, A., x, 676
Seidenfeld, T., ix, 21, 183-184, 187, Wilks, S., 325, 688
429, 564, 655, 682-683, Williams, S., 66, 69, 685
686-687 Winkler, R., 24, 682
Sellke, T., 284, 676 Wolfowitz, J., 417, 420, 557, 683, 688
Serfiing, R., 413, 687 Wolpert, R., 526, 684
Sethuraman, J., 56, 687
Short, T., ix Ylvisaker, D., 108, 679
Shurlow, N., v Young, G., 329, 688
Siegmund, D., 647, 677
Singh, K., 331, 687 Zellner, A., 16, 688
Siovic, P., 23, 683 Zidek, J., 21, 678
Subject Index*
Scale invariant loss, 350-351 Stopping time, 537, 548, 552, 554
Scale parameter, 345 Strict preference, 183
Scheffe's theorem, 634 Strong law of large numbers, 34-36
Score function, 111, 122,302, 305 Strongly unimodal, 329
conditional, 111 Submartingale, 646
Second-order efficiency, 414 Successive substitution, 505-506, 545
Sensitivity analysis, 524 Successive substitution sampling, 507
Separable space, 619 Sufficient statistic, 84-85-86, 99, 103,
Separating hyperplane theorem, 666 109, 150-151, 298
Sequential decision rule, 537 conditionally, 95
Sequential probability ratio test, 549 minimal,92
Sequential test, 548 natural, 103
Set estimation, 296 Superefficiency, 414
Shrinkage estimator, 163 Supporting hyperplane theorem, 666
a-field, 575 Sure-thing principle, 184
Borel, 571, 575
generated, 571-572, 584 t distribution, 672
image, 584 Tail a-field, 632
restriction, 584 Tailfree process, 60
tail, 632 Taylor's theorem, 665
a-finite measure, 572, 578, 601 Tchebychev's inequality, 614
Signed measure, 577, 597, 605, 635 Terminal decision rule, 537
Significance probability, 217, 228, 280 Test:
Significance test, 217 goodness of fit, 218, 461
Simple alternative, 215 one-sided, 239, 243
Simple function, 586 two-sided, 256, 273
Simple hypothesis, 215 Test function, 175, 215
Size of test, 2, 215-216 Theorem:
Small order, 394 Bahadur,94
stochastic, 396 Basu, 99
SPRT,549 Bayes, 4,16
Squared-error loss, 146, 297 Bhattacharyya lower bounds,
Vn-consistent, 401 305
SSS, 507 Bolzano-Weierstrass, 666
St. Petersburg paradox, 655 Caratheodory extension, 578
State independence, 184, 205 Cauchy's equation, 667
State-dependent utility, 205-206 central limit, 642
States of Nature, 181, 189, 205 multivariate, 643
Statistic, 83 chain rule, 600
ancillary, 95,99, 119 Chapman-Robbins bound, 304
boundedly complete, 94 complete class, 179
complete, 94, 298 continuity, 640
sufficient, 84-85-86, 99, 103, continuous mapping, 638
150-151,298 Cramer-Rao lower bound, 301
Stein estimator (see James-Stein DeFinetti, 27-28
estimator), 163 dominated convergence, 591
Stochastic large order, 396 Fatou's lemma, 589
Stochastic small order, 396 Fisher-Neyman, 89
Stone-Weierstrass theorem, 666 Fubini,596
702 Subject Index