P. 1
Probability and Measure Theory

# Probability and Measure Theory

|Views: 117|Likes:

See more
See less

05/22/2013

pdf

text

original

# LECTURE NOTES

MEASURE THEORY and PROBABILITY
Rodrigo Ba˜ nuelos
Department of Mathematics
Purdue University
West Lafayette, IN 47907
June 20, 2003
2
I
SIGMA ALGEBRAS AND MEASURES
¸1 σ–Algebras: Deﬁnitions and Notation.
We use Ω to denote an abstract space. That is, a collection of objects called
points. These points are denoted by ω. We use the standard notation: For A, B ⊂
Ω, we denote A ∪ B their union, A ∩ B their intersection, A
c
the complement of
A, A¸B = A − B = ¦x ∈ A: x ,∈ B¦ = A ∩ B
c
and A∆B = (A¸B) ∪ (B¸A).
If A
1
⊂ A
2
, . . . and A = ∪

n=1
A
n
, we will write A
n
↑ A. If A
1
⊃ A
2
⊃ . . .
and A = ∩

n=1
A
n
, we will write A
n
↓ A. Recall that (∪
n
A
n
)
c
= ∩
n
A
c
n
and
(∩
n
A
n
)
c
= ∪
n
A
c
n
. With this notation we see that A
n
↑ A ⇒ A
c
n
↓ A
c
and
A
n
↓ A ⇒ A
c
n
↑ A
c
. If A
1
, . . . , A
n
∈ Ω, we can write

n
j=1
A
j
= A
1
∪ (A
c
1
∩ A
2
) ∪ (A
c
1
∩ A
c
2
∩ A
3
) ∪ . . . (A
c
1
∩ . . . ∩ A
c
n−1
∩ A
n
), (1.1)
which is a disjoint union of sets. In fact, this can be done for inﬁnitely many sets:

n=1
A
n
= ∪

n=1
(A
c
1
∩ . . . ∩ A
c
n−1
∩ A
n
). (1.2)
If A
n
↑, then

n
j=1
A
j
= A
1
∪ (A
2
¸A
1
) ∪ (A
3
¸A
2
) . . . ∪ (A
n
¸A
n−1
). (1.3)
Two sets which play an important role in studying convergence questions are:
limA
n
) = limsup
n
A
n
=

n=1

_
k=n
A
k
(1.4)
and
limA
n
= liminf
n
A
n
=

_
n=1

k=n
A
k
. (1.5)
3
Notice
(limA
n
)
c
=
_

n=1

_
k=n
A
n
_
c
=

_
n=1
_

_
k=n
A
k
_
c
=

_
n=1

k=n
A
c
k
= limA
c
n
Also, x ∈ limA
n
if and only if x ∈

k=n
A
k
for all n. Equivalently, for all n there is
at least one k > n such that x ∈ A
k
0
. That is, x ∈ A
n
for inﬁnitely many n. For
this reason when x ∈ limA
n
we say that x belongs to inﬁnitely many of the A

n
s
and write this as x ∈ A
n
i.o. If x ∈ limA
n
this means that x ∈

k=n
A
k
for some
n or equivalently, x ∈ A
k
for all k > n. For this reason when x ∈ limA
n
we say
that x ∈ A
n
, eventually. We will see connections to limx
k
, limx
k
, where ¦x
k
¦ is a
sequence of points later.
Deﬁnition 1.1. Let T be a collection of subsets of Ω. T is called a ﬁeld (algebra)
if Ω ∈ T and T is closed under complementation and ﬁnite union. That is,
(i) Ω ∈ T
(ii) A ∈ T ⇒ A
c
∈ T
(ii) A
1
, A
2
, . . . A
n
∈ T ⇒
n

j=1
A
j
∈ T.
If in addition, (iii) can be replaced by countable unions, that is if
(iv) A
1
, . . . A
n
, . . . ∈ T ⇒

j=1
A
j
∈ T,
then T is called a σ–algebra or often also a σ–ﬁeld.
Here are three simple examples of σ–algebras.
(i) T = ¦∅, Ω¦,
4
(ii) T = ¦all subsets of Ω¦,
(iii) If A ⊂ Ω, T = ¦∅, Ω, A, A
c
¦.
An example of an algebra which is not a σ–algebra is given by the following.
Let Ω = R, the real numbers and take T to be the collection of all ﬁnite disjoint
unions of intervals of the form (a, b] = ¦x: a < x ≤ b¦, −∞ ≤ a < b < ∞. By
convention we also count (a, ∞) as right–semiclosed. T is an algebra but not a
σ–algebra. Set
A
n
= (0, 1 −
1
n
].
Then,

_
n=1
A
n
= (0, 1) ,∈ T.
The convention is important here because (a, b]
c
= (b, ∞) ∪ (−∞, a].
Remark 1.1. We will refer to the pair (Ω, T) as a measurable space. The reason
for this will become clear in the next section when we introduce measures.
Deﬁnition 1.2. Given any collection / of subsets of Ω, let σ(/) be the smallest
σ–algebra containing /. That is if T is another σ–algebra and / ⊂ T, then
σ(/) ⊂ T.
Is there such a σ–algebra? The answer is, of course, yes. In fact,
σ(/) =

T
where the intersection is take over all the σ–algebras containing the collection /.
This collection is not empty since / ⊂ all subsets of Ω which is a σ–algebra.
We call σ(/) the σ–algebra generated by /. If T
0
is an algebra, we often write
σ(T
0
) = T
0
.
Example 1.1. / = ¦A¦, A ⊂ Ω. Then
σ(/) = ¦∅, A, A
c
, Ω¦.
5
Problem 1.1. Let / be a collection of subsets of Ω and A ⊂ Ω. Set / ∩ A =
¦B ∩ A: B ∈ /¦. Assume σ(/) = T. Show that σ(/∩ A) = T ∩ A, relative to A.
Deﬁnition 1.2. Let Ω = R and B
0
the ﬁeld of right–semiclosed intervals. Then
σ(B
0
) = B is called the Borel σ–algebra of R.
Problem 1.2. Prove that every open set in R is the countable union of right
–semiclosed intervals.
Problem 1.3. Prove that every open set is in B.
Problem 1.4. Prove that B = σ(¦all open intervals¦).
Remark 1.2. The above construction works equally in R
d
where we take B
0
to be
the family of all intervals of the form
(a
1
, b
1
] . . . (a
d
, b
d
], −∞ ≤ a
i
< b
i
< ∞.
¸2. Measures.
Deﬁnition 2.1. Let (Ω, T) be a measurable space. By a measure on this space we
mean a function µ : T → [0, ∞] with the properties
(i) µ(∅) = 0
and
(ii) if A
j
∈ T are disjoint then
µ
_
_

_
j=1
A
j
_
_
=

j=1
µ(A
j
).
6
Remark 2.1. We will refer to the triple (Ω, T, µ) as a measure space. If µ(Ω) = 1
we refer to it as a probability space and often write this as (Ω, T, P).
Example 2.1. Let Ω be a countable set and let T = collection of all subsets of Ω.
Denote by #A denote the number of point in A. Deﬁne µ(A) = #A. This is called
the counting measure. If Ω is a ﬁnite set with n points and we deﬁne P(A) =
1
n
#A
then we get a probability measure. Concrete examples of these are:
(i) Coin ﬂips. Let Ω = ¦0, 1¦ = ¦Heads, Tails¦ = ¦T, H¦ and set P¦0¦ =
1/2 and P¦1¦ = 1/2
(2) Rolling a die. Ω = ¦1, 2, 3, 4, 5, 6¦, P¦w¦ = 1/6.
Of course, these are nothing but two very simple examples of probability
spaces and our goal now is to enlarge this collection. First, we list several elemen-
tary properties of general measures.
Proposition 2.1. Let (Ω, T, µ) be a measure space. Assume all sets mentioned
below are in T.
(i) If A ⊂ B, then µ(A) ≤ µ(B), (monotonicity).
(ii) If A ⊆

j=1
A
j
, then µ(A) ≤

j=1
µ(A
j
(iii) If A
j
↑ A, then µ(A
j
) ↑ µ(A), (continuity for below).
(iv) If A
j
↓ A and µ(A
1
) < ∞, then µ(A
j
) ↓ µ(A), (continuity from above).
Remark 2.2. The ﬁniteness assumption in (iv) is needed. To see this, set Ω =
¦1, 2, 3, . . . ¦ and let µ be the counting measure. Let A
j
= ¦j, j + 1, . . . ¦. Then
A
j
↓ ∅ but µ(A
j
) = ∞ for all j.
Proof. Write B = A∪ (B¸A). Then
µ(B) = µ(A) +µ(B¸A) ≥ µ(A),
7
which proves (i). As a side remark here, note that if if µ(A) < ∞, we have
µ(B¸A) = µ(B) − µ(A). Next, recall that

n=1
A
n
=

n=1
(A
c
1
∩ . . . ∩ A
c
n−1
∩ A
n
)
where the sets in the last union are disjoint. Therefore,
µ
_

_
n=1
A
n
_
=

n=1
µ(A
n
∩ A
c
1
. . . ∩ A
c
n−1
) ≤

n=1
µ(A
n
),
proving (ii).
For (iii) observe that if A
n
↑, then
µ
_

_
n=1
A
n
_
= µ
_

_
n=1
(A
n
¸A
n−1
)
_
=

n=1
µ(A
n
¸A
n−1
)
= lim
m→∞
m

n=1
µ(A
n
¸A
n−1
)
= lim
m→∞
µ
_
m
_
n=1
A
n
¸A
n−1
_
.
For (iv) we observe that if A
n
↓ A then A
1
¸A
n
↑ A
1
¸A. By (iii), µ(A
1
¸A
n
) ↑
µ(A
1
¸A) and since µ(A
1
¸A
n
) = µ(A
1
) − µ(A
n
) we see that µ(A
1
) − µ(A
n
) ↑
µ(A
1
) −µ(A), from which the result follows assuming the ﬁniteness of µ(A
1
).
Deﬁnition 2.2. A Lebesgue–Stieltjes measure on R is a measure on B = σ(B
0
)
such that µ(I) < ∞ for each bounded interval I. By an extended distribution
function on R we shall mean a map F: R → R that is increasing, F(a) ≤ F(b) if
a < b, and right continuous, lim
x→x
+
0
F(x) = F(x
0
). If in addition the function F is
nonnegative satisfying lim
x→∞
F
X
(x) = 1 and lim
x→−∞
F
X
(x) = 0, we shall simply call
it a distribution function.
We will show that the formula µ(a, b] = F(b)−F(a) sets a 1-1 correspondence
between the Lebesgue–Stieltjes measures and distributions where two distributions
that diﬀer by a constant are identiﬁed. of course, probability measures correspond
to distributions.
8
Proposition 2.2. Let µ be a Lebesgue–Stieltjes measure on R. Deﬁne F: R →R,
up to additive constants, by F(b) −F(a) = µ(a, b]. For example, ﬁx F(0) arbitrary
and set F(x) − F(0) = µ(0, x], x ≥ 0, F(0) − F(x) = µ(x, 0], x < 0. Then F is
an extended distribution.
Proof. Let a < b. Then F(b) −F(a) = µ(a, b] ≥ 0. Also, if ¦x
n
¦ is such that x
1
>
x
2
> . . . → x, then µ(x, x
n
] → 0, by Proposition 2.1, (iv), since

n=1
(x
1
, x
n
] = ∅
and (x, x
n
] ↓ ∅. Thus F(x
n
) −F(x) → 0 implying that F is right continuous.
We should notice also that
µ¦b¦ = lim
n→∞
µ
_
b −
1
n
, b
_
= lim
n→∞
F(b) −F(b −1/n) = F(b) −F(b−).
Hence in fact F is continued at ¦b¦ if and only if µ¦b¦ = 0.
Problem 2.1. Set F(x−) = lim
x→x

F(x). Then
µ(a, b) = F(b

) −F(a) (1)
µ[a, b] = F(b) −F(a

) (2)
µ[a, b) = F(b

) −F(a

) (3)
µ(R) = F(∞) −F(−∞). (4)
Theorem 2.1. Suppose F is a distribution function on R. There is a unique
measure µ on B(R) such that µ(a, b] = F(b) −F(a).
Deﬁnition 2.3. Suppose / is an algebra. µ is a measure on / if µ : / → [0, ∞],
µ(∅) = 0 and if A
1
, A
2
, . . . are disjoint with A =

j
A
j
∈ /, then µ(A) =

j
µ(A
j
) . The measure is σ–ﬁnite if the space Ω = ∪

j=1

j
where the Ω
j
∈ /
are disjoint and µ(Ω
j
) < ∞.
9
Theorem 2.2 (Carath`eodory’s Extension Theorem). Suppose µ is σ–ﬁnite
on an algebra /. Then µ has a unique extension to σ(/).
Deﬁnition 2.4. A collection o ⊂ Ω is a semialgebra if the following two condi-
tions hold.
(i) A, B ∈ o ⇒ A∩ B ∈ o,
(ii) A ∈ o then A
c
is the ﬁnite union of disjoint sets in o.
Example 2.2. o = ¦(a, b]: −∞ ≤ a < b < ∞¦. This is a semialgebra but not an
algebra.
Lemma 2.1. If o is a semi-algebra, then o = ¦ﬁnite disjoint unions of sets in
o¦ is an algebra. This is called the algebra generated by o.
Proof. Let E
1
= ∪
n
j=1
A
j
and E
2
= ∪
n
j=1
B
j
, where the unions are disjoint and
the sets are all in o. Then E
1
∩ E
2
= ∪
i,j
A
i
∩ B
j
∈ o. Thus o is closed under
ﬁnite intersections. Also, if E = ∪
n
j=1
A
j
∈ o then A
c
= ∩
j
A
c
j
. However, by the
deﬁnition of S, and o, we see that A
c
∈ o. This proves that o is an algebra.
Theorem 2.3. Let o be a semialgebra and let µ be deﬁned on o. Suppose µ(∅) = 0
(i) If E ∈ o, E =
n

i=1
E
i
, E
i
∈ o disjoint, then µ(E) =
n

i=1
µ(E
i
)
and
(ii) If E ∈ o, E =

i=1
E
i
, E
i
∈ o disjoint, then µ(E) ≤

i=1
µ(E
i
).
Then µ has a unique extension µ to o which is a measure. In addition, if µ is
σ–ﬁnite, then µ has a unique extension to a measure (which we continuo to call
µ) to σ(o), by the Carath´eodary extension theorem
10
Proof. Deﬁne µ on o by
µ(E) =
n

i=1
µ(E
i
),
if E =

n
i=1
E
i
, E
i
∈ o and the union is disjoint. We ﬁrst verify that this is well
deﬁned. That is, suppose we also have E =

m
j=1
˜
E
j
, where
˜
E
j
∈ o and disjoint.
Then
E
i
=
m
_
j=1
(E
i

˜
E
j
),
˜
E
j
=
n
_
i=1
(E
i

˜
E
j
).
By (i),
n

i=1
µ(E
i
) =
n

i=1
µ(E
i

˜
E
j
) =

m

n
µ(E
i

˜
E
j
) =

m
µ(
˜
E
j
).
So, µ is well deﬁned. It remains to verify that µ so deﬁned is a measure. We
postpone the proof of this to state the following lemma which will be used in its
proof.
Lemma 2.2. Suppose (i) above holds and let µ be deﬁned on o as above.
(a) If E, E
i
∈ o, E
i
disjoint, with E =
n

i=1
E
i
. Then µ(E) =
n

i=1
µ(E
i
).
(b) If E, E
i
∈ o, E ⊂
n

i=1
E
i
, then µ(E) ≤
n

i=1
µ(E
i
).
Note that (a) gives more than (i) since E
i
∈ o not just o. Also, the sets in
(b) are not necessarily disjoint. We assume the Lemma for the moment.
Next, let E =

_
i=1
E
i
, E
i
∈ o where the sets are disjoint and assume, as
required by the deﬁnition of the measures on algebras, that E ∈ o. Since E
i
=
n
_
j=1
E
ij
, E
ij
∈ o, with these also disjoint, we have

i=1
µ(E
i
)
(i)
=

i=1
n

j=1
µ(E
ij
) =

i,j
µ(E
ij
).
11
So, we may assume E
i
∈ o instead of o, otherwise replace it by E
ij
. Since
E ∈ o, E =
n

j=1
˜
E
j
,
˜
E
j
∈ o and again disjoint sets, and we can write
˜
E
j
=

_
i=1
(
˜
E
j
∩ E
i
).
Thus by assumption (ii),
µ(
˜
E
j
) ≤

i=1
µ(
˜
E
j
∩ E
i
).
Therefore,
µ(E) =
n

j=1
µ(
˜
E
j
)

n

j=1

i=1
µ(
˜
E
j
∩ E
i
)
=

i=1
n

j=1
µ(
˜
E
j
∩ E
i
)

i=1
µ(E
i
),
which proves one of the inequalities.
For the apposite inequality we set (recall E =

i=1
E
i
) A
n
=

n
i=1
E
i
and
C
n
= E ∩ A
c
n
so that E = A
n
∪ C
n
and A
n
, C
n
∈ o and disjoint. Therefore,
µ(A) = µ(A
n
) +µ(C
n
) = µ(B
1
) +. . . +µ(B
n
) +µ(C
n
)

n

i=1
µ(B
i
),
with n arbitrary. This proves the other inequality.
Proof of Lemma 2.2. Set E =
n

i=1
E
i
, then E
i
=
m

j=1
E
ij
, S
ij
∈ o. By assumption
(i),
µ(A) =

ij
µ(E
ij
) =
n

i=1
µ(E
i
)
12
proving (a).
For (b), assume n = 1. If E ⊂ E
1
, then
E
1
= E ∪ (E
1
∩ E
c
), E
1
∩ E
c
∈ o.
µ(E) ≤ µ(E) +µ(E
1
∩ E
c
) = µ(E
1
).
For n > 1, set
n
_
i=1
E
i
=
n
_
i=1
(E
i
∩ E
c
1
∩ . . . ∩ E
c
i−1
) =
n
_
i=1
F
i
.
Then
E = E ∩
_
n
_
i=1
E
i
_
= E ∩ F
1
∪ . . . ∪ (E ∩ F
n
).
So by (a), µ(E) =

n
i=1
µ(E ∩ F
i
). Now, the case n = 1 gives
n

i=1
µ(E ∩ F
i
) ≤
n

i=1
µ(E
i
)

n

i=1
µ(F
i
)
= µ
_
n
_
i=1
E
i
_
,
where the last inequality follows from (a).
Proof of Theorem 2.1. Let o = ¦(a, b]: −∞ ≤ a < b < ∞¦. Set F(∞) = lim
x↑∞
F(x)
and F(−∞) = lim
x↓−∞
F(x). These quantities exist since F is increasing. Deﬁne for
any
µ(a, b] = F(b) −F(a),
for any −∞ ≤ a < b ≤ ∞, where F(∞) > −∞, F(−∞) < ∞. Suppose (a, b] =
n
_
i=1
(a
i
, b
i
], where the union is disjoint. By relabeling we may assume that
a
1
= a
b
n
= b
a
i
= b
i−1
.
13
Then µ(a
i
, b
i
] = F(b
i
) −F(a
i
) and
n

i==1
µ(a
i
, b
i
] =
n

i=1
F(b
i
) −F(a
i
)
= F(b) −F(a)
= µ(a, b],
which proves that condition (i) holds.
For (ii), let −∞ < a < b < ∞ and (a, b] ⊂

_
i=1
(a
i
, b
i
] where the union is
disjoint. (We can also order them if we want.) By right continuity of F, given
ε > 0 there is a δ > 0 such that
F(a +δ) −F(a) < ,
or equivalently,
F(a +δ) < F(a) +ε.
Similarly, there is a η
i
> 0 such that
F(b
i

i
) < F(b
i
) +ε2
−i
,
for all i. Now, ¦(a
i
, b
i
+ η
i
)¦ forms a open cover for [a + δ, b]. By compactness,
there is a ﬁnite subcover. Thus,
[a +δ, b] ⊂
N
_
i=1
(a
i
, b
i

i
)
and
(a +δ, b] ⊂
N
_
i=1
(a
i
, b
i

i
].
14
Therefore by (b) of Lemma 2.2,
F(b) −F(a +δ) = µ(a +δ, b]

N

i=1
µ(a
i
, b
i

i
]
=
N

i=1
F(b
i

i
) −F(a
i
)
=
N

i=1
¦F(b
i

i
) −F(b
i
) +F(b
i
) −F(a
i

N

i=1
ε2
−i
+

i=1
(F(b
i
) −F(a
i
))
≤ ε +

i=1
F(b
i
) −F(a
i
).
Therefore,
µ(a, b] = F(b) −F(a)
≤ 2ε +

i=1
F(b
i
) −F(a
i
)
= 2ε +

i=1
µ(a
i
, b
i
],
proving (ii) provided −∞ < a < b < ∞.
If (a, b] ⊂

_
i=1
(a
i
, b
i
], a and b arbitrary, and (A, B] ⊂ (a, b] for any −∞ < A <
B < ∞, we have by above
F(B) −F(A) ≤

i=1
(F(b
i
) −F(a
i
))
and the result follows by taking limits.
If F(x) = x, µ is called the Lebesgue measure on R. If
F(x) =
_
¸
_
¸
_
0, x ≤ 0
x, 0 < x ≤ 1
1, x > 1
15
the measure we obtain is called the Lebesgue measure on Ω = (0, 1]. Notice that
µ(Ω) = 1.
If µ is a probability measure then F(x) = µ(−∞, x] and lim
x→∞
F(x) = 1.
lim
x↓−∞
F(x) = 0.
Problem 2.2. Let F be the distribution function deﬁned by
F(x) =
_
¸
¸
¸
_
¸
¸
¸
_
0, x < −1
1 +x, −1 ≤ x < 0
2 +x
2
, 0 ≤ x < 2
9, x ≥ 2
and let µ be the Lebesgue–Stieltjes measure corresponding to F. Find µ(E) for
(i) E = ¦2¦,
(ii) E = [−1/2, 3),
(iii) E = (−1, 0] ∪ (1, 2),
(iv) E = ¦x: [x[ + 2x
2
> 1¦.
Proof of Theorem 2.3. For any E ⊂ Ω we deﬁne µ

(E) = inf

µ(A
i
) where the
inﬁmum is taken over all sequences of ¦A
i
¦ in / such that E ⊂ ∪A
i
. Let /

be
the collection of all subsets E ⊂ Ω with the property that
µ

(F) = µ

(F ∩ E) +µ

(F ∩ E
c
),
for all sets F ⊂ Ω. These two quantities satisfy:
(i) /

is a σ–algebra and µ

is a measure on /

.
(ii) If µ

(E) = 0, then E ∈ /

.
(iii) / ⊂ /

and µ

(E) = µ(E), if E ⊂ /.
We begin the proof of (i)–(iii) with a simple but very useful observation. It
follows easily from the deﬁnition that E
1
⊂ E
2
implies µ

(E
1
) ≤ µ

(E
2
) and that
16
E ⊂ ∪

j=1
E
j
implies
µ

(E) ≤

j=1
µ

(E
j
).
Therefore,
µ

(F) ≤ µ

(F ∩ E) +µ

(F ∩ E
c
)
is always true. Hence, to prove that E ∈ /

, we need to verify that
µ

(F) ≥ µ

(F ∩ E) +µ

(F ∩ E
c
),
for all F ∈ Ω. Clearly by symmetry if E ∈ /

we have E
c
∈ /

.
Suppose E
1
and E
2
are in /

. Then for all F ⊂ Ω,
µ

(F) = µ

(F ∩ E
1
) +µ

(F ∩ E
c
1
)
= (µ

(F ∩ E
1
∩ E
2
) +µ

(F ∩ E
1
∩ E
c
2
))
+ (µ

(F ∩ E
c
1
∩ E
2
) +µ

(E ∩ E
c
1
∩ E
c
2
))
≥ µ

(F ∩ (E
1
∪ E
2
)) +µ

(F ∩ (E
1
∪ E
2
)
c
),
where we used the fact that
E
1
∪ E
2
⊂ (E
1
∩ E
2
) ∪ (E
1
∩ E
c
2
) ∪ (E
c
1
∩ E
2
)

observed above. We conclude that E
1
∪E
2
∈ /

. That
is, /

is an algebra.
Now, suppose E
j
∈ /

are disjoint. Let E =

j=1
E
j
and A
n
=
n

j=1
E
j
. Since
E
n
∈ /

we have (applying the deﬁnition with the set F ∩ A
n
)
µ

(F ∩ A
n
) = µ

(F ∩ A
n
∩ E
n
) +µ

(F ∩ A
n
∩ E
c
n
)
= µ

(F ∩ E
n
) +µ

(F ∩ A
n−1
)
= µ

(F ∩ E
n
) +µ(F ∩ E
n−1
) +µ

(F ∩ A
n−2
)
=
n

j=1
µ

(F ∩ E
j
).
17
Now, the measurability of A
n
together with this gives
µ

(F) = µ

(F ∩ A
n
) +µ

(F ∩ A
c
n
)
=
n

j=1
µ

(F ∩ E
j
) +µ

(F ∩ A
c
n
)

n

j=1
µ

(F ∩ E
j
) +µ

(F ∩ E
c
).
Let n → ∞ we ﬁnd that
µ

(F) ≥

j=1
µ

(F ∩ E
j
) +µ

(F ∩ E
c
)
≥ µ

(∪

j=1
(F ∩ E
j
)) +µ

(F ∩ E
c
)
= µ

(F ∩ E) +µ

(F ∩ E
c
) ≥ µ

(F),
which proves that E ∈ /

. If we take F = E we obtain
µ

(E) =

j=1
µ

(E
j
).
From this we conclude that /

is closed under countable disjoint unions and that
µ

is countably additive. Since any countable union can be written as the disjoint
countable union, we see that /

is a σ algebra and that µ

is a measure on it.
This proves (i).
If µ

(E) = 0 and F ⊂ Ω, then
µ

(F ∩ E) +µ

(F ∩ E
c
) = µ

(F ∩ E
c
)
≤ µ

(F).
Thus, E ∈ /

and we have proved (ii).
For (iii), let E ∈ /. Clearly
µ

(E) ≤ µ(E).
18
Next, if E ⊂

_
j=1
E
i
, E
i
∈ /, we have E =

_
j=1
˜
E
i
, where
˜
E
j
= E ∩
_
E
j
_
j−1
_
i=1
E
j
_
and these sets are disjoint and their union is E. Since µ is a measure on /, we
have
µ(E) =

j=1
µ(
˜
E
j
) ≤

j=1
µ(E
j
).
Since this holds for any countable covering of E by sets in /, we have µ(E) ≤
µ

(E). Hence
µ(E) = µ

(E), for all E ∈ /.
Next, let E ∈ /. Let F ⊂ Ω and assume µ

(F) < ∞. For any ε > 0, choose
E
j
∈ / with F ⊂

_
j=1
E
j
and

j=1
µ(E
j
) ≤ µ

(F) +ε.
Using again the fact that µ is a measure on /,
µ

(F) +ε ≥

j=1
µ(E
j
)
=

j=1
µ(E
j
∩ E) +

j=1
µ(E
j
∩ E
c
)
≥ µ

(F ∩ E) +µ

(F ∩ E
c
)
and since ε > 0 is arbitrary, we have that E ∈ /

. This completes the proof of
(iii).
With (i)–(iii) out of the way, it is clear how to deﬁne µ

. Since A ⊂ /

,
and /

is a σ–algebra, σ(/) ⊂ /

. Deﬁne µ(E) = µ

(E) for E ∈ σ(/). This is
clearly a measure and it remains to prove that it is unique under the hypothesis
of σ–ﬁniteness of µ. First, the construction of the measure µ

clearly shows that
whenever µ is ﬁnite or σ–ﬁnite, so are the measure µ

and µ.
19
Suppose there is another measure ˜ µ on σ(/) with µ(E) = ˜ µ(E) for all E ∈ /.
Let E ∈ σ(/) have ﬁnite µ

measure. Since σ(A) ⊂ /

,
µ

(E) = inf
_
_
_

j=1
µ(E
j
): E ⊂

_
j=1
E
j
, E
j
∈ /
_
_
_
.
However, since µ(E
j
) = ˜ µ(E
j
), we see that
˜ µ(E) ≤

j=1
˜ µ(E
j
)
=

j=1
µ(E
j
).
This shows that
˜ µ(E) ≤ µ

(E).
Now let E
j
∈ / be such that E ⊂ ∪

j=1
E
j
and

j=1
µ(E
j
) ≤ µ

(E) +ε.
Set
˜
E = ∪

j=1
E
j
and
˜
E
n
= ∪
n
j=1
E
j
. Then
µ

(
˜
E) = lim
k→∞
= lim
k→∞
˜ µ(E
n
)
= ˜ µ(
˜
E)
Since
µ

(
˜
E) ≤ µ

(E) +ε,
we have
µ

(
˜
E¸E) ≤ ε.
20
Hence,
µ

(E) ≤ µ

(
˜
E)
= ˜ µ(
˜
E)
≤ ˜ µ(E) + ˜ µ(E¸E)
≤ ˜ µ(E) +ε.
Since ε > 0 is arbitrary, µ(E) = µ

(E) = ˜ µ

(E) for all E ∈ σ(/) of ﬁnite µ

measure. Since µ

is σ–ﬁnite, we can write any set E = ∪

j=1
(Ω
j
∩ E) where the
union is disjoint and each of these sets have ﬁnite µ

measure. Using the fact that
both ˜ µ and µ are measures, the uniqueness follows from what we have done for
the ﬁnite case.
What is the diﬀerence between σ(/) and /

? To properly answer this ques-
tion we need the following
Deﬁnition 2.5. The measure space (Ω, T, µ) is said to be complete if whenever
E ∈ T and µ(E) = 0 then A ∈ T for all A ⊂ E.
By (ii), the measure space (Ω, /

, µ

) is complete. Now, if (Ω, T, µ) is a
measure space we deﬁne T

= ¦E ∪ N : E ∈ T, and N ∈ T, µ(N) = 0¦. We
leave the easy exercise to the reader to check that T

is a σ–algebra. We extend
the measure µ to a measure on T

by deﬁning µ

(E ∪ N) = µ(E). The measure
space (Ω, T

, µ

) is clearly complete. This measure space is called the completion
of (Ω, T, µ). We can now answer the above question.
Theorem 2.4. The space (Ω, /

, µ

) is the completion of (Ω, σ(/), µ).
21
II
INTEGRATION THEORY
¸1 Measurable Functions.
In this section we will assume that the space (Ω.T, µ) is σ–ﬁnite. We will say
that the set A ⊂ Ω is measurable if A ∈ T. When we say that A ⊂ R is measurable
we will always mean with respect to the Borel σ–algebra B as deﬁned in the last
chapter.
Deﬁnition 1.1. Let (Ω, T) be a measurable space. Let f be an extended real
valued function deﬁned on Ω. That is, the function f is allowed to take values in
¦+∞, ∞¦. f is measurable relative to T if ¦ω ∈ Ω : f(ω) > α¦ ∈ T for all α ∈ R.
Remark 1.1. When (Ω, T, P) is a probability space and f : Ω → R, we refer to
measurable functions as random variables.
Example 1.1. Let A ⊂ Ω be a measurable set. The indicator function of this set
is deﬁned by
1
A
(ω) =
_
1 if ω ∈ A
0 else.
This function is clearly measurable since
¦x: 1
A
(ω) < α¦ =
_
¸
_
¸
_
Ω 1 ≤ α
∅ α < 0
A
c
0 ≤ α < 1.
This deﬁnition is equivalent to several others as seen by the following
22
Proposition 1.1. The following conditions are equivalent.
(i) ¦ω : f(ω) > α¦ ∈ T for all α ∈ R,
(ii) ¦ω : f(ω) ≤ α¦ ∈ T for all α ∈ R,
(iii) ¦ω : f(ω) < α¦ ∈ T for all α ∈ R,
(ii) ¦ω : f(ω) ≥ α¦ ∈ T for all α ∈ R.
Proof. These follow from the fact that σ–algebras are closed under countable
unions, intersections, and complementations together with the following two iden-
tities.
¦ω : f(ω) ≥ α¦ =

n=1
¦ω : f(ω) > α −
1
n
¦
and
¦ω : f(ω) > α¦ =

_
n=1
¦ω : f(ω) ≥ α +
1
n
¦
Problem 1.1. Let f be a measurable function on (Ω, T). Prove that the sets
¦ω : f(ω) = +∞¦, ¦ω : f(ω) = −∞¦, ¦ω : f(ω) < ∞¦, ¦ω : f(ω) > −∞¦, and
¦ω : −∞ < f(ω) < ∞¦ are all measurable.
Problem 1.2.
(i) Let (Ω, T, P) be a probability space. Let f : Ω →R. Prove that f is measurable
if and only if f
−1
(E) = ¦ω : f(ω) ∈ E¦ ∈ T for every Borel set E ⊂ R.
(ii) With f as in (i) deﬁne µ on the Borel sets of R by µ(A) = P¦ω ∈ Ω : f(ω) ∈
A¦. Prove that µ is a probability measure on (R, B).
Proposition 1.2. If f and f
2
are measurable, so are the functions f
1
+f
2
, f
1
f
2
,
max(f
1
, f
2
), min(f
1
, f
2
) and cf
1
, for any constant c.
23
Proof. For the sum note that
¦ω : f
1
(ω) +f
2
(ω)¦ =
_
(¦ω : f
1
(ω) < r¦ ∩ ¦ω : f
2
(ω) < α −r¦) ,
where the union is taken over all the rational. Again, the fact that countable
unions of measurable sets are measurable implies the measurability of the sum. In
the same way,
¦ω : max(f
1
(ω), f
2
(ω)) > α¦ = ¦ω : f
1
(ω) > α¦ ∪ ¦ω : f
2
(ω) > α¦
gives the measurability of max(f
1
, f
2
). The min(f
1
, f
2
) follows from this by taking
complements. As for the product, ﬁrst observe that
¦ω : f
2
1
(ω) > α¦ = ¦ω : f
1
(ω) >

α¦ ∪ ¦ω : f
1
(ω) < −

α¦
and hence f
2
1
is measurable. But then writing
f
1
f
2
=
1
2
_
(f
1
+f
2
)
2
−f
2
1
−f
2
2
¸
,
gives the measurability of the product.
Proposition 1.3. Let ¦f
n
¦ be a sequence of measurable functions on (Ω, T), then
f = inf
n
f
n
, sup
n
f
n
, limsup f
n
, liminf f
n
are measurable functions.
Proof. Clearly ¦inf
n
f
n
< α¦ = ∪¦f
n
< α¦ and ¦sup
n
f
n
> α¦ = ∪¦f
n
> α¦ and
hence both sets are measurable. Also,
limsup
n→∞
f
n
= inf
n
_
sup
m≥n
f
m
_
and
liminf
n→∞
f
n
= sup
n
_
inf
m≥n
f
m
_
,
the result follows from the ﬁrst part.
24
Problem 1.3. Let f
n
be a sequence of measurable functions. Let E = ¦ω ∈ Ω :
limf
n
(ω) exists¦. Prove that E is measurable.
Problem 1.4. Let f
n
be a sequence of measurable functions converging pointwise
to the function f. Proof that f is measurable.
Proposition 1.4.
(i) Let Ω be a metric space and suppose the collection of all open sets are in the
sigma algebra T. Suppose f : Ω → R is continuous. Then f is measurable.
In particular, a continuous function f : R
n
→R is measurable relative to the
Borel σ–algebra in R
n
.
(ii) Let ψ : R → R be continuous and f : Ω → R be measurable. Then ψ(f) is
measurable.
Proof. These both follows from the fact that for every continuous function f,
¦ω : f(ω) > α¦ = f
−1
(α, ∞) is open for every α.
Problem 1.5. Suppose f is a measurable function. Prove that
(i) f
p
, p ≥ 1,
(ii) [f[
p
, p > 0,
(iii) f
+
= max(f, 0),
(iv) f

= −min(f, 0)
are all measurable functions.
Deﬁnition 1.2. Let f: Ω →R be measurable. The sigma algebra generated by f
is the sigma algebra in Ω
1
generated by the collection ¦f
−1
(A): A ∈ B¦. This is
denoted by σ(f).
25
Deﬁnition 1.3. A function ϕ deﬁne on (Ω, T, µ) is a simple function if ϕ(w) =
n

i=1
a
i
1
A
i
where the A

i
s are disjoint measurable sets which form a partition of Ω,
(

A
i
= Ω) and the a

i
s are constants.
Theorem 1.1. Let f: Ω → [0, ∞] be measurable. There exists a sequence of simple
functions ¦ϕ
n
¦ on Ω with the property that 0 ≤ ϕ
1
(ω) ≤ ϕ
2
(ω) ≤ . . . ≤ f(ω) and
ϕ
n
(ω) → f(ω), for every ω ∈ Ω.
Proof. Fix n ≥ 1 and for i = 1, 2, . . . , n2
n
, deﬁne the measurable sets
A
n
i
= f
−1
_
i −1
2
n
,
i
2
n
_
.
Set
F
n
= f
−1
([n, ∞])
and deﬁne the simple functions
ϕ
n
(ω) =
n2
n

i=1
i −1
2
n
1
A
n
i
(ω) +n1
F
n
(ω).
clearly ϕ
n
is a simple function and it satisﬁes ϕ
n
(ω) ≤ ϕ
n+1
(ω) and ϕ
n
(ω) ≤ f(ω)
for all ω.
Fix ε > 0. Let ω ∈ Ω. If f(ω) < ∞, then pick n so large that 2
−n
< ε and
f(ω) < n. Then f(ω) ∈
_
i−1
2
n
,
i
2
n
_
for some i = 1, 2, . . . n2
n
. Thus,
ϕ
n
(ω) =
i −1
2
n
and so,
f(x)) −ϕ
n
(ω) < 2
−n
.
By our deﬁnition, if f(ω) = ∞ then ϕ
n
(ω) = n for all n and we are done.
¸2 The Integral: Deﬁnition and Basic Properties.
26
Deﬁnition 2.1. Let (Ω, T, µ) be a measure space.
(i) If ϕ(w) =
n

i=1
a
i
1
A
i
is a simple function and E ∈ T is measurable, we deﬁne
the integral of the function ϕ over the set E by
_
E
ϕ(w)dµ =
n

i=1
a
i
µ(A
i
∩ E). (2.1)
(We adapt the convention here, and for the rest of these notes, that 0∞ = 0.)
(ii) If f ≥ 0 is measurable we deﬁne the integral of f over the set E by
_
E
fdµ = sup
ϕ
_
E
ϕdµ, (2.2)
where the sup is over all simple functions ϕ with 0 ≤ ϕ ≤ f.
(iii) If f is measurable and at least one of the quantities
_
E
f
+
dµ or
_
E
f

dµ is
ﬁnite, we deﬁne the integral of f over E to be
_
E
fdµ =
_
E
f
+
dµ −
_
E
f

dµ.
(iv) If
_
E
[f[dµ =
_
E
f
+
dµ +
_
E
f

dµ < ∞
we say that the function f is integrable over the set E. If E = Ω we denote
this collection of functions by L
1
(µ).
We should remark here that since in our deﬁnition of simple functions we did
not required the constants a
i
to be distinct, we may have diﬀerent representations
for the simple functions ϕ. For example, if A
1
and A
2
are two disjoint measurable
sets then 1
A
1
∪A
2
and 1
A
1
+ 1
A
2
both represents the same simple function. It is
clear from our deﬁnition of the integral that such representations lead to the same
quantity and hence the integral is well deﬁned.
Here are some basic and easy properties of the integral.
27
Proposition 2.1. Let f and g be two measurable functions on (Ω, T, µ).
(i) If f ≤ g on E, then
_
E
fdµ ≤
_
E
gdµ.
(ii) If A ⊂ B and f ≥ 0, then
_
A
fdµ ≤
_
B
fdµ.
(iii) If c is a constant, then
_
E
cfdµ = c
_
E
fdµ.
(iv) f ≡ 0 on E, then
_
E
fdµ = 0 even if µ(E) = ∞.
(v) If µ(E) = 0, then
_
E
fdµ = 0 even if f(x) = ∞ on E.
(vi) If f ≥ 0, then
_
E
fdµ =
_

gf
E
fdµ.
Proposition 2.2. Let (Ω, T, µ) be a measure space. Suppose ϕ and ψ are simple
functions.
(i) For E ∈ T deﬁne
ν(E) =
_
E
ϕdµ.
The ν is a measure on T.
(ii)
_

(ϕ +ψ)dµ =
_

ϕdµ +
_

ψdµ.
Proof. Let E
i
∈ T, E =

E
i
. Then
ν(E) =
_
E
ϕdµ =
n

i=1
α
i
µ(A
i
∩ E)
=
n

i=1
α
i

j=1
µ(A
i
∩ E
j
) =

j=1
n

i=1
α
i
µ(A
i
∩ E
j
)
=

i=1
ν(E
i
).
By the deﬁnition of the integral, µ(∅) = 0. This proves (i). (ii) follows from
(i) and we leave it to the reader.
28
We now come to the “three” important limit theorems of integration: The
Lebesgue Monotone Convergence Theorem, Fatou’s lemma and the Lebesgue Dom-
inated Convergence theorem.
Theorem 2.1 (Lebesgue Monotone Convergence Theorem). Suppose ¦f
n
¦
is a sequence of measurable functions satisfying:
(i) 0 ≤ f
1
(ω) ≤ f
2
(ω) ≤ . . . , for every ω ∈ Ω,
and
(ii) f
n
(ω) ↑ f(ω), for every ω ∈ Ω.
Then
_

f
n
dµ ↑
_
fdµ.
Proof. Set
α
n
=
_

f
n
dµ.
Then α
n
is nondecreasing and it converges to α ∈ [0, ∞]. Since
_

f
n
dµ ≤
_

fdµ,
for all n we see that if α = ∞, then
_

fdµ = ∞ and we are done. Assume
_

fdµ < ∞. Since
α ≤
_

fdµ.
we need to prove the opposite inequality. Let 0 ≤ ϕ ≤ f be simple and let 0 < c <
1. Set
E
n
= ¦ω: f
n
(ω) ≥ c ϕ(ω)¦.
Clearly, E
1
⊂ E
2
⊂ . . . In addition, suppose ω ∈ Ω. If f(ω) = 0 then ϕ(ω) = 0
and ω ∈ E
1
. If f(ω) > 0 then cs(ω) < f(ω) and since f
n
(ω) ↑ f(ω), we have that
29
ω ∈ E
n
some n. Hence

E
n
= Ω, or in our notation of Proposition 2.1, Chapter
I, E
n
↑ Ω. Hence
_

f
n
dµ ≥
_
E
n
f
n

≥ c
_
E
n
ϕ(ω)dµ
= cν(E
n
).
Let n ↑ ∞. By Proposition 2.2 above and Proposition 2.1 of Chapter I,
α ≥ lim
n→∞
_

f
n
dµ ≥ cν(Ω) = c
_

ϕdµ
and therefore,
_

E
ϕdµ ≤ α,
for all simple ϕ ≤ f and
sup
ϕ≤f
_

ϕdµ ≤ α,
proving the desired inequality.
Corollary 2.1. Let ¦f
n
¦ be a sequence of nonnegative measurable functions and
set
f =

n=1
f
n
(ω).
Then
_

fdµ =

n=1
_

f
n
dµ.
Proof. Apply Theorem 2.1 to the sequence of functions
g
n
=
n

j=1
f
j
.

30
Corollary 2.2 (First Borel–Contelli Lemma). Let ¦A
n
¦ be a sequence of
measurable sets. Suppose

n=1
µ(A
n
) < ∞.
Then µ¦A
n
, i.o.¦ = 0.
Proof. Let f(ω) =

n=1
1
A
n
(ω)
. Then
_

f(ω)dµ =

n=1
_

1
A
n

=

n=1
µ(A
n
) < ∞.
Thus, f(ω) < ∞ for almost every ω ∈ Ω. That is, the set A where f(ω) = ∞ has
µ–measure 0. However, f(ω) = ∞ if and only if ω ∈ A
n
for inﬁnitely many n. This
proves the corollary.
Let µ be the counting measure on Ω = ¦1, 2, 3, . . . ¦ and deﬁne the measurable
functions f by f(j) = a
j
where a
j
is a sequence of nonnegative constants. Then
_

f(j)dµ(j) =

j=1
a
j
.
From this and Theorem 2.1 we have
Corollary 2.3. Let a
ij
≥ 0 for all i, j. Then

i=1

j=1
a
ij
=

j=1

i=1
a
ij
.
The above theorem together with theorem 1.1 and proposition 2.2 gives
31
Corollary 2.4. Let f be a nonnegative measurable function. Deﬁne
ν(E) =
_
E
fdµ.
Then ν is a measure and
_

gdν =
_

gfdµ
for all nonnegative measurable functions g.
Theorem 2.2 (Fatou’s Lemma). Let ¦f
n
¦ be a sequence of nonnegative mea-
surable functions. Then
_

liminf f
n
dµ ≤ liminf
_

f
n
dµ.
Proof. Set
g
n
(ω) = inf
m≥n
f
m
(ω), n = 1, 2, . . .
Then ¦g
n
¦ is a sequence of nonnegative measurable functions satisfying the hy-
pothesis of Theorem 2.1. Since
lim
n→∞
g
n
(ω) = liminf
n→∞
f
n
(ω)
and
_
g
n
dµ ≤
_
f
n
dµ,
Theorem 2.1 gives
_

liminf
n→∞
f
n
dµ =
_

lim
n
g
n

= lim
n→∞
_

g
n

≤ liminf
n→∞
_

f
n
dµ.
This proves the theorem.
32
Proposition 2.2. Let f be a measurable function. Then
¸
¸
_

fdµ
¸
¸

_

[f[dµ.
Proof. We assume the right hand side is ﬁnite. Set β =
_

fdµ. Take α = sign(β)
so that αβ = [β[. Then
¸
¸
¸
¸
_

fdµ
¸
¸
¸
¸
= [β[
= α
_

fdµ
=
_

αfdµ

_

[f[dµ.
Theorem 2.3 (The Lebesgue Dominated Convergence Theorem ). Let
¦f
n
¦ be a sequence of measurable functions such that f
n
(ω) → f(ω) for every
ω ∈ Ω. Suppose there is a g ∈ L
1
(µ) with [f
n
(ω)[ ≤ g(ω). Then f ∈ L
1
(µ) and
lim
n→∞
_

[f
n
−f[dµ = 0.
In particular,
lim
n→∞
_

f
n
dµ =
_

fdµ.
Proof. Since [f(ω)[ = lim
n→∞
[f
n
(ω)[ ≤ g(ω) we see that f ∈ L
1
(µ). Since [f
n
−f[ ≤
2g, 0 ≤ 2g −[f
n
−f[ and Fatou’s Lemma gives
0 ≤
_

2gdµ ≤ lim
n→∞
_

2gdµ + lim
_

_

[f
n
−f[dµ
_
=
_

2gdµ −lim
_

[f
n
−f[dµ.
33
It follows from this that
lim
_

[f
n
−f[dµ = 0.
Since
¸
¸
¸
¸
_

[f
n
dµ −
_

fdµ
¸
¸
¸
¸

_

[f
n
−f[dµ,
the ﬁrst part follows. The second part follows from the ﬁrst.
Deﬁnition 2.2. Let (Ω, T, µ) be a measure space. Let P be a property which a
point ω ∈ Ω may or may not have. We say that P holds almost everywhere on
E, and write this as i.e. , if there exists a measurable subset N ⊂ E such that P
holds for all E¸N and µ(N) = 0.
For example, we say that f
n
→ f almost everywhere if f
n
(ω) → f(ω) for all
ω ∈ Ω except for a set of measure zero. In the same way, f = 0 almost everywhere
if f(ω) = 0 except for a set of measure zero.
Proposition 2.3 (Chebyshev’s Inequality). Fix 0 < p < ∞ and let f be a
nonnegative measurable function on (Ω, T, µ). Then for any measurable set e we
have
µ¦ω ∈ E: ∈: f(ω) > λ¦ ≤
1
λ
p
_
E
f
p
dµ: .
Proof.
λ
p
µ¦ω ∈ E: f(ω) > λ¦ =
_
{ω∈E:f(ω)>λ}
λ
p

_
E
f
p
dµ ≤
_
E
f
p

which proves the proposition.
Proposition 2.4.
34
(i) Let f be a nonnegative measurable function. Suppose
_
E
fdµ = 0.
Then f = 0 a.e. on E.
(ii) Suppose f ∈ L
1
(µ) and
_
E
fdµ = 0 for all measurable sets E ⊂ Ω. Then
f = 0 a.e. on Ω.
Proof. Observe that
¦ω ∈ E: f(ω) > 0¦ =

_
n=1
¦ω ∈ Ω: f(ω) > 1/n¦.
By Proposition 2.3,
µ¦ω ∈ E: f(ω) > 1/n¦ ≤ n
_
E
fdµ = 0.
Therefore, µ¦f(ω) > 0¦ = 0, which proves (i).
For (ii), set E = ¦ω: f(ω) ≥ 0¦ = ¦ω: f(ω) = f
+
(ω)¦. Then
_
E
f
+
dµ =
_
E
fdµ = 0,
which by (i) implies that f
+
= 0, a.e. But then
_
E
fdµ = −
_
E
f

dµ = 0
and this again gives
_
E
f

dµ = 0
which implies that f

= 0, a.e.
35
Deﬁnition 2.3. The function ψ: (a, b) → R (the “interval” (a, b) = R is permit-
ted) is convex if
ψ((1 −λ)x +λy) ≤ (1 −λ)ψ(x) +λψ(y), (2.3)
for all 0 ≤ λ ≤ 1.
An important property of convex functions is that they are always continuous.
This “easy” to see geometrically but the proof is not as trivial. What follows easily
from the deﬁnition is
Problem 2.1. Prove that (2.3) is equivalent to the following statement: For all
a < s < t < u < b,
ϕ(t) −ϕ(s)
t −s

ϕ(u) −ϕ(t)
u −t
.
and conclude that a diﬀerentiable function is convex if and only if its derivative is
a nondecreasing function.
Proposition 2.5 (Jensen’s Inequality). Let (Ω, T, µ) be a probability space.
Let f ∈ L
1
(µ) and a < f(ω) < b. Suppose ψ is convex on (a, b). The ψ(f) is
measurable and
ψ
__

fdµ
_

_

ψ(f)dµ.
Proof. The measurability of the function ψ(f) follows from the continuity of ψ
and the measurability of f using Proposition 1.4. Since a < f(ω) < b for all ω ∈ Ω
and µ is a probability measure, we see that if
t =
_

fdµ,
then a < t < b. Let (x) = c
1
x + c
2
be the equation of the supporting line of
the convex function ψ at the point (t, ψ(t)). That is, satisﬁes (t) = ψ(t) and
36
ψ(x) ≥ (x) for all x ∈ (a, b). The existence of such a line follows from Problem
2.1. The for all ω ∈ Ω,
ψ(f(ω)) ≥ c
1
f(ω) +c
2
= (f(ω)).
Integrating this inequality and using the fact that µ(Ω) = 1, we have
_

ψ(f(ω))dµ ≥ c
1
_

f(ω)dµ +c
2
=
__

f(ω)dµ
_
= ψ
__

f(ω)dµ
_
,
which is the desired inequality.
Examples.
(i) Let ϕ(x) = e
x
. Then
exp
_

fdµ ≤
_

e
f
dµ.
(ii) If Ω = ¦1, 2, . . . , n¦ with the measure µ deﬁned by µ¦i¦ = 1/n and the
function f given by f(i) = x
i
, we obtain
exp¦
1
n
(x
1
+x
2
+. . . +x
n
)¦ ≤
1
n
¦e
x
1
+. . . +e
x
n
¦.
Setting y
i
= e
x
i
we obtain the Geometric mean inequality. That is,
(y
1
, . . . , y
n
)
1/n

1
n
(y
1
+. . . +y
n
) .
More generally, extend this example in the following way.
Problem 2.2. Let α
1
, , α
n
be a sequence of positive numbers with α
1
+ +
α
n
= 1 and let y
1
, , y
n
be positive numbers. Prove that
y
α
1
1
y
α
n
n
≤ α
1
y
1
+ +α
n
y
n
.
37
Deﬁnition 2.4. Let (Ω, T, µ) be a measure space. Let 0 < p < ∞ and set
|f|
p
=
__

[f[
p

_
1/p
.
We say that f ∈ L
p
(µ) if |f|
p
< ∞. To deﬁne L

(µ) we set
E = ¦m ∈ R
+
: µ¦ω: [f(ω)[ > m¦ = 0¦.
If E = ∅, deﬁne |f|

= ∞. If E ,= ∅, deﬁne |f|

= inf E. The function
f ∈ L

(µ) if |f|

< ∞.
Suppose |f|

< ∞. Since
f
−1
(|f|

, ∞] =

_
n=1
f
−1
_
|f|

+
1
n
, ∞
_
and µf
−1
(|f|

+
1
n
, ∞] =, we see |f|

∈ E. The quantity |f|

is called the
essential supremum of f.
Theorem 2.4.
(i) (H¨older’s inequality) Let 1 ≤ p ≤ ∞ and let q be its conjugate exponent. That
is,
1
p
+
1
q
= 1. If p = 1 we take q = ∞. Also note that when p = 2, q = 2.
Let f ∈ L
p
(µ) and g ∈ l
q
(µ). Then fg ∈ L
1
(µ) and
_
[fg[dµ ≤ |f|
p
|g|
q
.
(ii) (Minkowski’s inequality) Let 1 ≤ p ≤ ∞. Then
|f +g|
p
≤ |f|
p
+|g|
p
.
Proof. If p = 1 and q = ∞, or q = 1 and p = ∞, we have [fg(ω)[ ≤ |g|

[f(ω)[.
This immediately gives the result when p = 1 or p = ∞. Assume 1 < p < ∞ and
(without loss of generality) that both f and g are nonnegative. Let
A =
__

f
p

_
1/p
and B =
__
g
q

_
1/q
38
If A = 0, then f = 0 almost everywhere and if B = 0, then g = 0 almost everywhere
and in either case the result follows. Assume 0 < A < ∞ and same for B. Put
F = f/A > 0, G = g/B. Then
_

F
p
dµ =
_

G
p
dµ = 1
and by Problem 2.2
F(ω) G(ω) ≤
1
p
F
p
+
1
q
G
q
.
Integrating both sides of this inequality gives
_
F(ω)G(ω) ≤
1
p
+
1
q
= 1,
which implies the (i) after multiplying by A B.
For (ii), the cases p = 1 and p = ∞ are again clear. Assume therefore that
1 < p < ∞. As before, we may assume that both f and G are nonnegative. We
start by observing that since the function ψ(x) = x
p
, x ∈ R
+
is convex, we have
_
f +g
2
_
p

1
2
f
p
+
1
2
g
p
.
This gives
_

(f +g)
p
dµ ≤ 2
−(p−1)
_

f
p
dµ + 2
−(p−1)
_

g
p
dµ.
Thus, f +g ∈ L
p
(dµ). Next,
(f +g)
p
= (f +g)(f +g)
p−1
= f(f +g)
p−1
+g(f +g)
p−1
together with H¨older’s inequality and the fact that q(p −1) = p, gives
_

f(f +g)
p−1
dµ ≤
__

f
p

_
1/p
__

(f +g)
q(p−1)

_
1/q
=
__

f
p

_
1/p
__

(f +g)
p

_
1/q
.
39
In the same way,
_

g(f +g)
p
dµ ≤
__

g
p

_
1/p
__

(f +g)
p

_
1/q
.
_

(f +g)
p
dµ ≤
_
__

f
p

_
1/p
+
__

g
p

_
1/p
_
__

(f +g)
p

_
1/q
.
Since f + g ∈ L
p
(dµ), we may divide by the last expression in brackets to obtain
the desired inequality.
For f, g ∈ L
p
(µ) deﬁne d(f, g) = |f − g|
p
. For 1 ≤ p ≤ ∞, Minkowski’s
inequality shows that this function satisﬁes the triangle inequality. That is,
d(f, g) = |f −g|
p
= |f −h +h −g|
p
≤ |f −h|
p
+|h −g|
p
= d(f, h) +d(h, g),
for all f, g, h ∈ L
p
(µ). It follows that L
p
(µ) is a metric space with respect to d(, ).
Theorem 2.4. L
p
(µ), 1 ≤ p ≤ ∞, is complete with respect to d( , ).
Lemma 2.1. Let g
k
be a sequence of functions in L
p
and (0 < p < ∞) satisfying
|g
k
−g
h+1
|
p

_
1
4
_
k
, k = 1, 2, . . ..
Then ¦g
k
¦ converges a.e.
Proof. Set
A
k
= ¦ω: [g
k
(ω) −g
k+1
(ω)[ > 2
−k
¦.
By Chebyshev’s inequality,
µ¦A
n
¦ ≤ 2
kp
_

[g
k
−g
k+1
[
p

_
1
4
_
kp
2
kp
=
1
2
kp
40
This shows that

µ(A
n
) < ∞.
By Corollary 2.2, µ¦A
n
i.o.¦ = 0. Thus, for almost all ω ∈ ¦A
n
i.o.¦
c
there is an
N = N(ω) such that
[g
k
(ω) −g
k+1
(ω)[ ≤ 2
−k
,
for all k > N. It follows from this that ¦g
k
(ω)¦ is Cauchy in R and hence ¦ g
k
(ω)¦
converges.
Lemma 2.2. The sequence of functions ¦f
n
¦ converges to f in L

if and only if
there is a set measurable set A with µ(A) = 0 such that f
n
→ f uniformly on A
c
.
Also, the sequence ¦f
n
¦ is Cauchy in L

if and only if there is a measurable set
A with µ(A) = 0 such that ¦f
n
¦ is uniformly Cauchy in A
c
.
Proof. We proof the ﬁrst statement, leaving the second to the second to the reader.
Suppose |f
n
− f|

→ 0. Then for each k > 1 there is an n > n(k) suﬃciently
large so that |f
n
− f|

<
1
k
. Thus, there is a set A
k
so such that µ(A
k
) = 0
and [f
n
(ω) − f(ω)[ <
1
k
for every ω ∈ A
c
k
. Let A = ∪A
k
. Then µ(A) = 0 and
f
n
→ f uniformly on A
c
. For the converse, suppose f
n
→ f uniformly on A
c
and
µ(A) = 0. Then given ε > 0 there is an N such that for all n > N and ω ∈ A
c
,
[f
n
(ω)−f(ω)[ < ε. This is the same as saying that |f
n
−f|

< ε for all n > N.
Proof of Theorem 2.4. Now, suppose ¦f
n
¦ is Cauchy in L
p
(µ). That is, given any
ε > 0, there is a N such that for all n, m ≥ N, d(f
n
, f
m
) = |f
n
−f
m
| < ε for all
n, m > N. Assume 1 ≤ p < ∞. The for each k = 1, 2, . . . , there is a n
k
such that
|f
n
−f
m
|
p
≤ (
1
4
)
k
for all n, m ≥ n
k
. Thus, f
n
k
(ω) → f a.e., by Lemma 2.1. We need to show that
f ∈ L
p
and that it is the L
p
(µ) limit of ¦f
n
¦. Let ε > 0. Take N so large that
41
|f
n
−f
m
|
p
< ε for all n, m ≥ N. Fix such an m. Then by the pointwise convergence
of the subsequence and by Fatou’s Lemma we have
_

[f −f
m
[
p
dµ =
_

lim
k→∞
[f
n
k
−f
m
[
p

≤ liminf
k→∞
_

[f
n
k
−f
m
[
p
< ε
p
.
Therefore f
n
→ f in L
p
(µ) and
_

[f −f
m
[
p
dµ < ∞
for m suﬃciently large. But then,
|f|
p
= |f
m
−f
m
−f|
p
≤ |f
m
|
p
+|f
m
−f|
p
,
which shows that f ∈ L
p
(µ).
Now, suppose p = ∞. Let ¦f
n
¦ be Cauchy in L

. There is a set A with
µ(A) = 0 such that f
n
is uniformly Cauchy on A
c
, by Lemma 2.2. That is, given
ε > 0 there is an N such that for all n, m > N and all ω ∈ A
c
,
[f
n
(ω) −f
m
(ω)[ < ε.
Therefore the sequence ¦f
n
¦ converges uniformly on A
c
to a function f. Deﬁne
f(ω) = 0 for ω ∈ A. Then f
n
converges to f in L

(µ) and f ∈ L

(µ).
In the course of proving Theorem 2.4 we proved that if a sequence of functions
in L
p
, 1 ≤ p < ∞ converges in L
p
(µ), then there is a subsequence which converges
a.e. This result is of suﬃcient importance that we list it here as a corollary.
Corollary 2.5. Let f
n
∈ L
p
(µ) with 1 ≤ p < ∞ and f
n
→ f in L
p
(µ). Then
there exists a subsequence ¦f
n
k
¦ with f
n
k
→ f a.e. as k → ∞.
The following Proposition will be useful later.
42
Proposition 2.6. Let f ∈ L
1
(µ). Given ε > 0 there exists a δ > 0 such that
_
E
[f[dµ < ε whenever µ(E) < δ.
Proof. Suppose the statement is false. Then we can ﬁnd an ε > 0 and a sequence
of measurable sets ¦E
n
¦ with
_
E
n
[f[dµ ≥ ε
and
µ(E
n
) <
1
2
n
Let A
n
= ∪

j=n
E
j
and A = ∩

n=1
A
n
= ¦E
n
i.o.¦. Then

µ(E
n
) < ∞ and by
the Borel–Cantelli Lemma, µ(A) = 0. Also, A
n+1
⊂ A
n
for all n and since
ν(E) =
_
E
[f[dµ
is a ﬁnite measure, we have
_
A
[f[dµ = lim
n→∞
_
A
n
[f[dµ
≥ lim
n→∞
_
E
n
[f[dµ
≥ ε.
This is a contradiction since µ(A) = 0 and therefore the integral of any function
over this set must be zero.
¸3 Types of convergence for measurable functions.
Deﬁnition 3.1. Let ¦f
n
¦ be a sequence of measurable functions on (Ω, T, µ).
(i) f
n
→ f in measure if for all ε > 0,
lim
n→∞
µ¦ω ∈ Ω: [f
n
(ω) −f(ω)[ ≥ ε¦ = 0.
(ii) f
n
→ f almost uniformly if given ε > 0 there is a set E ∈ T with µ(E) < ε
such that f
n
→ f uniformly on E
c
.
43
Proposition 3.1. Let ¦f
n
¦ be measurable and 0 < p < ∞. Suppose f
n
→ f in
L
p
. Then, f
n
→ f in measure.
Proof. By Chebyshev’s inequality
µ¦[f
n
−f[ ≥ ε¦ ≤
1
ε
p
_

[f
n
−f[
p
dµ,
and the result follows.
Example 3.1. Let Ω = [0, 1] with the Lebesgue measure. Let
f
n
(ω) =
_
e
n
0 ≤ ω ≤
1
n
0 else
Then f
n
→ 0 in measure but f
n
,→ 0 in L
p
(µ) for any 0 < p ≤ ∞. To see this
simply observe that
|f
n
|
p
=
_
1
0
[f
n
(x)[
p
dx =
1
n
e
np
→ ∞
and that |f
n
|

= e
n
→ ∞, as n → ∞.
Proposition 3.2. Suppose f
n
→ f almost uniformly. Then f
n
→ f in measure
and almost everywhere.
Proof. Since f
n
→ f almost uniformly, given ε > 0 there is a measurable set E
such that µ(E) < ε and f
n
→ f uniformly on E
c
. Let η > 0 be given. There is a
N = N(η) such that [f
n
(ω) −f(ω)[ < η for all n ≥ N and for all ω ∈ E
c
. That is,
¦µ: [f
n
(ω) −f(ω)[ ≥ η¦ ⊆ E, for all n ≥ N. Hence, for all n ≥ N,
µ¦[f
n
(ω) −f(ω)[ ≥ η¦ < ε.
Since ε > 0 was arbitrary we see that for all η > 0,
lim
n→∞
µ¦[f
n
(ω) −f(ω)[ ≥ η¦ = 0,
proving that f
n
→ f in measure.
Next, for each k take A
k
∈ T with µ(A
k
) <
1
k
and f
n
→ f uniformly on A
c
k
.
If E = ∪

k=1
A
c
k
, then f
n
→ f on E and µ(E
c
) = µ(∩

k=1
A
k
) ≤ µ(A
k
) <
1
k
for all
k. Thus µ(E
c
) = 0 and we have the almost everywhere convergence as well.
44
Proposition 3.3. Suppose f
n
→ f in measure. Then there is a subsequence ¦f
n
k
¦
which converges almost uniformly to f.
Proof. Since
µ¦[f
n
−f
m
[ ≥ ε¦ ≤ µ¦[f
n
−f[ ≥ ε/2¦ +µ¦[f
n
−f[ ≥ ε/2¦,
we see that µ¦f
n
−f
m
[ ≥ ε¦ → 0 as n and m → ∞. For each k, take n
k
such that
n
k+1
> n
k
and
µ¦[f
n
(ω) −f
m
(ω)[ ≥
1
2
k
¦ ≤
1
2
k
for all n, m ≥ n
k
. Setting g
k
= f
n
k
and A
k
= ¦ω ∈ Ω: [g
k+1
(ω) −g
k
(ω)[ ≥
1
2
k
¦ we
see that

k=1
µ(A
k
) < ∞.
By the Borel–Cantelli Lemma, Corollary 2.2, µ¦A
n
i.o.¦ = 0. However, for every
ω ,∈ ¦A
n
i.o.¦, there is an N = N(ω) such that
[g
k+1
(ω) −g
k
(ω)[ <
1
2
k
for all k ≥ N. This implies that the sequence of real numbers ¦g
k
(ω)¦ is Cauchy
and hence it converges to g(ω). Thus g
k
→ g a.e.
To get the almost uniform convergence, set E
n
= ∪

k=n
A
k
. Then µ(E
n
) ≤

k=n
µ(A
k
) and this can be made smaller than ε as soon as n is large enough. If
ω ,∈ E
n
, then
[g
k
(ω) −g
k+1
(ω)[ <
1
2
k
for all k ∈ ¦n, n + 1, n + 2, . . . ¦. Thus g
k
→ g uniformly on E
c
n
.
For the uniqueness, suppose f
n
→ f in measure. Then f
n
k
→ f in measure
also. Since we also have f
n
k
→ g almost uniformly clearly, f
n
k
→ g in measure
and hence f = g a.e. This completes the proof.
45
Theorem 3.1 (Egoroﬀ’s Theorem). Suppose (Ω, T, µ) is a ﬁnite measure space
and that f
n
→ f a.e. Then f
n
→ f almost uniformly.
Proof. We use Problem 3.2 below. Let ε > 0 be given. For each k there is a n(k)
such that if
A
k
=

_
n=n(k)
¦ω ∈ Ω: [f
n
−f[ ≥
1
k
¦,
then µ(A
k
) ≤ ε/2
k
. Thus if
A =

_
k=1
A
k
,
then µ(A) ≤

k=1
µ(A
k
) < ε. Now, if δ > 0 take k so large that
1
k
< δ and then for
any n > n(k) and ω ,∈ A, [f
n
(ω) − f(ω)[ <
1
k
< δ. Thus f
n
→ f uniformly on
A
c
.
Let us recall that if ¦y
n
¦ is a sequence a sequence of real numbers then
y
n
converges to y if and only if every subsequence ¦y
n
k
has a further subsequence
¦y
n
k
j
¦ which converges to y. For measurable functions we have the following result.
Proposition 3.3. The sequence of measurable functions ¦f
n
¦ on (Ω, T, µ) con-
verges to f in measure if and only if every subsequence ¦f
n
k
¦ contains a further
subsequence converging a.e. to f.
Proof. Let ε
k
be a sequence converging down to 0. Then µ¦[f
n
−f[ > ε
k
¦ → 0, as
n → ∞ for each k. We therefore have a subsequence f
n
k
satisfying
µ¦[f
n
k
−f[ > ε
k
¦ ≤
1
2
k
.
Hence,

k=1
µ¦[f
n
k
−f[ > ε
k
¦ < ∞
46
and therefore by the ﬁrst Borel–Cantelli Lemma, µ¦[f
n
k
−f[ > ε
k
i.o.¦ = 0. Thus
[f
n
k
−f[ < ε
k
eventually a.e. Thus f
n
k
→ f a.e.
For the converse, let ε > 0 and put y
n
= µ¦[f
n
− f[ > ε¦ and consider
the subsequence y
n
k
. If every f
n
k
subsequence has a subsequence f
n
k
j
such that
f
n
k
j
→ f a.e. Then ¦y
n
k
¦ has a subsequence. y
n
k
j
→ 0. Therefore ¦y
n
¦ converges
to 0 and hence That is f
n
→ 0 in measure.
Problem 3.1. Let Ω = [0, ∞) with the Lebesgue measure and deﬁne f
n
(ω) =
1
A
n
(ω) where A
n
= ¦ω ∈ Ω: n ≤ ω ≤ n +
1
n
¦. Prove that f
n
→ 0 a.e., in measure
and in L
p
(µ) but that f
n
,→ 0 almost uniformly.
Problem 3.2. Let (Ω, T, µ) be a ﬁnite measure space. Prove that f
n
→ f a.e. if
and only if for all ε > 0
lim
n→∞
µ
_

_
k=n
A
k
(ε)
_
= 0
where
A
k
(ε) = ¦ω ∈ Ω: [f
k
(ω) −f(ω)[ ≥ ε¦.
Problem 3.3.
(i) Give an example of a sequence of nonnegative measurable functions f
n
for
which we have strict inequality in Fatou’s Lemma.
(ii) Let (Ω, T, µ) be a measure space and ¦A
n
¦ be a sequence of measurable sets.
Recall that liminf
n
A
n
=

n=1

k=n
A
k
and prove that
µ¦liminf
n→∞
A
n
¦ ≤ liminf
n→∞
µ¦A
n
¦.
(converges) Suppose f
n
is a sequence of nonnegative measurable functions on (Ω, T, µ)
which is pointwise decreasing to f. That is, f
1
(ω) ≥ f
2
(ω) ≥ ≥ 0 and
47
f
n
(ω) → f(ω). Is it true that
lim
n→∞
_

f
n
dµ =
_

fdµ?
Problem 3.4. Let (Ω, T, P) be a probability space and suppose f ∈ L
1
(P). Prove
that
lim
p→0
|f|
p
= exp¦
_

log [f[dP¦
where exp¦−∞¦ is deﬁned to be zero.
Problem 3.5. Let (Ω, T, µ) be a ﬁnite measure space. Prove that the function
µ¦[f[ > λ¦, for λ > 0, is right continuous and nonincreasing. Furthermore, if
f, f
1
, f
2
are nonnegative measurable and λ
1
, λ
2
are positive numbers with the prop-
erty that f ≤ λ
1
f
1

2
f
2
, then for all λ > 0,
µ¦f > (λ
1

2
)λ¦ ≤ µ¦f
1
> λ¦ +µ¦f
2
> λ¦.
Problem 3.6. Let ¦f
n
¦ be a nondecreasing sequence of measurable nonnegative
functions converging a.e. on Ω to f. Prove that
lim
n→∞
µ¦f
n
> λ¦ = µ¦f > λ¦.
Problem 3.7. Let (Ω, T, µ) be a measure space and suppose ¦f
n
¦ is a sequence
of measurable functions satisfying

n=1
µ¦[f
n
[ > λ
n
¦ < ∞
for some sequence of real numbers λ
n
. Prove that
limsup
n→∞
[f
n
[
λ
n
≤ 1,
a.e.
48
Problem 3.8. Let (Ω, T, µ) be a ﬁnite measure space. Let ¦f
n
¦ be a sequence of
measurable functions on this space.
(i) Prove that f
n
converges to f a.e. if and only if for any ε > 0
lim
m→∞
µ¦[f
n
−f
n
[ > ε, for some n

> n ≥ m¦ = 0
(ii) Prove that f
n
→ 0 a.e. if and only if for all ε > 0
µ¦[f
n
[ > ε, i.o¦ = 0
(converges) Suppose the functions are nonnegative. Prove that f
n
→ ∞ a.e. if and only
if for all M > 0
µ¦f
n
< M, i.o.¦ = 0
Problem 3.9. Let Ω = [0, 1] with its Lebesgue measure. Suppose f ∈ L
1
(Ω).
Prove that x
n
f ∈ L
1
(Ω) for every n = 1, 2, . . . and compute
lim
n→∞
_

x
n
f(x)dx.
Problem 3.10. Let (Ω, T, µ) be a ﬁnite measure space and f a nonnegative real
valued measurable function on Ω. Prove that
lim
n→∞
_

f
n

exists, as a ﬁnite number, if and only if µ¦f > 1¦ = 0.
Problem 3.11. Suppose f ∈ L
1
(µ). Prove that
lim
n→∞
_
{|f|>n}
fdµ = 0
49
Problem 3.12. Let Ω = [0, 1] and let (Ω, T, µ) be a ﬁnite measure space and f a
measurable function on this space. Let E be the set of all x ∈ Ω such that f(x) is
an integer. Prove that the set E is measurable and that
lim
n→∞
_

(cos(πf(x))
2n
dµ = µ(E)
Problem 3.13. Let (Ω, T, P) be a probability space. Suppose f and g are positive
measurable function such that fg ≥ 1 a.e. on Ω. Prove that
_

fgdP ≥ 1
Problem 3.14. Let (Ω, T, P) be a probability space and suppose f ∈ L
1
(P). Prove
that
lim
p→0
|f|
p
= exp¦
_

log [f[dP¦
where exp¦−∞¦ is deﬁned to be zero.
Problem 3.15. Let (Ω, T, P) be a probability space. Suppose f ∈ L

(P) and
|f|

> 0. Prove that
lim
n→∞
_
_

[f[
n+1
dP
_

[f[
n
dP
_
= |f|

Problem 3.16. Let (Ω, T, P) be a probability space and f
n
be a sequence of mea-
surable functions converging to zero in measure. Let F be a bounded uniformly
continuous function on R. Prove that
lim
n→∞
_

F(f
n
)dP = F(0)
50
Proclaim 3.17. Let (Ω, T, P) be a probability space.
(i) Suppose F : R → R is a continuous function and f
n
→ f in measure. Prove
that F(f
n
) → F(f) in measure.
(i) If f
n
≥ 0 and f
n
→ f in measure. Then
_

fdµ ≤ lim
_

f
n
dµ.
(ii) Suppose [f
n
[ ≤ g where g ∈ L
1
(µ) and f
n
→ f in measure. Then
_

fdµ = lim
_

f
n
dµ.
Problem 3.18. Let (Ω, T, µ) be a measure space and let f
1
, f
2
, , f
n
be mea-
surable functions. Suppose 1 < p < ∞. Prove that
_

¸
¸
1
n
n

j=1
f
j
(x)
¸
¸
p
dµ(x) ≤
1
n
_

n

j=1
[f
j
(x)[
p
dµ(x)
and
_

¸
¸
1
n
n

j=1
f
j
(x)
¸
¸
p
dµ(x) ≤
_
_
1
n
n

j=1
|f
j
|
p
_
_
p
Problem 3.19. Let (Ω, T, µ) be a measure space and let f
n
be a sequence of
measurable functions satisfying |f
n
|
p
≤ n
1
p
, for 2 < p < ∞. Prove that the
sequence ¦
1
n
f
n
¦ converges to zero almost everywhere.
Problem 3.20. Suppose (Ω, T, P) is a probability space and that f ∈ L
1
(P) in
nonnegative. Prove that
_
1 +|f|
2
1

_

_
1 +f
2
dP ≤ 1 +|f|
1
51
Problem 3.21. Compute, justifying all your steps,
lim
n→∞
_
n
0
_
1 −
x
n
_
n
e
x/2
dx.
Problem 3.22. Let probtrip be a probability space. Let f be a measurable function
with the property that |f|
2
= 1 and |f|
1
=
1
2
. Prove that
1
4
(1 −η)
2
≤ P¦ω ∈ Ω : [f(ω)[ ≥
η
2
¦.
52
III
PRODUCT MEASURES
Our goal in this chapter is to present the essentials of integration in product
space. We begin by deﬁning the product measure. Many of the deﬁnitions and
properties of product measures are, in some sense, obvious. However, we need to
be properly state them and carefully prove them so that they may be freely used
in the subsequent Chapters.
¸1 Deﬁnitions and Preliminaries.
Deﬁnition 1.1. If X and Y are any two sets, their Cartesian product X Y is
the set of all order pairs ¦(x, y): x ∈ X, y ∈ Y ¦.
If A ⊂ X, B ⊂ Y, AB ⊂ XY is called a rectangle. Suppose (X, /), (X, B)
are measurable spaces. A measurable rectangle is a set of the form A B, A ∈
/, B ∈ B. A set of the form
Q = R
1
∪ . . . ∪ R
n
,
where the R
i
are disjoint measurable rectangles, is called an elementary sets. We
denote this collection by c.
Exercise 1.1. Prove that the elementary sets form an algebra. That is, c is closed
under complementation and ﬁnite unions.
We shall denote by /B the σ–algebra generated by the measurable rectangle
which is the same as the σ–algebra generated by the elementary sets.
53
Theorem 1.1. Let E ⊂ X Y and deﬁne the projections
E
x
= ¦y ∈ Y : (x, y) ∈ E¦, and E
y
= ¦x ∈ X: (x, y) ∈ E¦.
If E ∈ /B, then E
x
∈ B and E
y
∈ / for all x ∈ X and y ∈ Y .
Proof. We shall only prove that if E ∈ /B then E
x
∈ B, the case of E
y
being
completely the same. For this, let Ω be the collection of all sets E ∈ / B for
which E
x
∈ B for every x ∈ X. We show Ω is a σ–algebra containing all measurable
rectangles. To see this, note that if
E = AB
then
E
x
=
_
B if x ∈ A
∅ if x ,∈ A.
Thus, E ⊂ Ω. The collection Ω also has the following properties:
(i) X Y ∈ Ω.
(ii) If E ∈ Ω then E
c
∈ Ω.
This follows from the fact that (E
c
)
x
= (E
x
)
c
, and that / is a σ–algebras.
(iii) If E
i
∈ Ω then E =

i=1
E
i
∈ Ω.
For (iii), observe that E
x
=

i=1
(E
i
)
x
where (E
i
)
x
∈ B. Once again, the fact that
/ is a σ algebras shows that E ∈ Ω. (i)–(iii) show that Ω is a σ–algebra and the
theorem follows.
We next show that the projections of measurable functions are measurable.
Let f: X Y → R. For ﬁx x ∈ X, deﬁne f
x
: Y → R by f
x
(y) = f(x, y) with a
similar deﬁnition for f
y
.
In the case when we have several σ–algebras it will be important to clearly
distinguish measurability relative to each one of these sigma algebras. We shall
54
use the notation f ∈ σ(T) to mean that the function f is measurable relative to
the σ–algebra T.
Theorem 1.2. Suppose f ∈ σ(/B). Then
(i) For each x ∈ X, f
x
∈ σ(B)
(ii) For each y ∈ X, f
y
∈ σ(/)
Proof. Let V be an open set in R. We need to show that f
−1
x
(V ) ∈ B. Put
Q = f
−1
(V ) = ¦(x, y): f(x, y) ∈ V ¦.
Since f ∈ σ(/B), Q ∈ T (. However,
Q
x
= f
−1
x
(V ) = ¦y: f
x
(y) ∈ V ¦,
and it follows by Theorem 1.1 that Q
x
∈ B and hence f
x
∈ σ(B). The same
argument proves (ii).
Deﬁnition 1.2. A monotone class / is a collection of sets which is closed under
increasing unions and decreasing intersections. That is:
(i) If A
1
⊂ A
2
⊂ . . . and A
i
∈ /, then ∪A
i
∈ /
(ii) If B
1
⊃ B
2
⊃ . . . and B
i
∈ /, then ∩B
i
∈ /.
Lemma 1.1 (Monotone Class Theorem). Let T
0
be an algebra of subsets of X and
let / be a monotone class containing T
0
. If T denotes the σ–algebra generated
by T
0
then T ⊂ /.
Proof. Let /
0
be the smallest monotone class containing T
0
. That is, /
0
is the
intersection of all the monotone classes which contain T
0
. It is enough to show that
T ⊂ /
0
. By Exercise 1.1, we only need to prove that /
0
is an algebra. First we
55
prove that /
0
is closed under complementation. For this let Ω = ¦E : E
c
∈ /
0
¦.
It follows from the fact that /
0
is a monotone class that Ω is also a monotone
class and since T
0
is an algebra, if E ∈ T
0
then E ∈ Ω. Thus, /
0
⊂ Ω and this
proves it.
Next, let Ω
1
= ¦E : E∪F ∈ /
0
for all F ∈ T
0
¦. Again the fact that /
0
is a
monotone class implies that Ω
1
is also a monotone class and since clearly T
0
⊂ Ω
1
,
we have /
0
⊂ Ω
1
. Deﬁne Ω
2
= ¦F : F ∪E ∈ /
0
for all E ∈ /
0
¦. Again Ω
2
is a
monotone class. Let F ∈ T
0
. Since /
0
∈ Ω
1
, if E ∈ /
0
, then E ∪F ∈ /
0
. Thus
T
0
⊂ Ω
2
and hence /
0
⊂ Ω
2
. Thus, if E, F ∈ /
0
then E ∪ F ∈ /
0
. This shows
that /
0
is an algebra and completes the proof.
Exercise 1.2. Prove that an algebra T is a σ–algebra if and only if it is a mono-
tone class.
Exercise 1.3. Let T
0
be an algebra and suppose the two measures µ
1
and µ
2
agree
on T
0
. Prove that they agree on the σ–algebra T generated by T
0
.
¸2 Fubini’s Theorem.
We begin this section with a lemma that will allow us to deﬁne the product
of two measures.
Lemma 2.1. Let (X, /, µ) and (Y, B, ν) be two σ–ﬁnite measure spaces. Suppose
Q ∈ /B. If
ϕ(x) = ν(Q
x
) and ψ(y) = µ(Q
y
),
then
ϕ ∈ σ(/) and ψ ∈ σ(B)
and
_
X
ϕ(x)dµ(x) =
_
Y
ψ(y)dν(y). (2.1)
56
Remark 2.1. With the notation of ¸1 we can write
ν(Q
x
) =
_
Y
χ
Q
(x, y)dν(y) (2.2)
and
µ(Q
y
) =
_
X
χ
Q
(x, y)dµ(x). (2.3)
Thus (2.1) is equivalent to
_
X
_
Y
χ
Q
(x, y)dν(y)dµ(x) =
_
Y
_
X
χ
Q
(x, y)dµ(x)dν(y).
Remark 2.2. Lemma 2.1 allows us to deﬁne a new measure µ ν on /B by
(µ ν)(Q) =
_
X
ν(Q
x
)dµ(x) =
_
Y
µ(Q
y
)dν(y). (2.4)
To see that this is indeed a measure let ¦Q
j
¦ be a disjoint sequence of sets in
/B. Recalling that (∪Q
j
)
x
= ∪(Q
j
)
x
and using the fact that ν is a measure we
have
(µ ν)
_
_

_
j=1
Q
j
_
_
=
_
X
ν
_
_
_
_

_
j=1
Q
j
_
_
x
_
_
dµ(x)
=
_
X
ν
_
_
_
_

_
j=1
Q
jx
_
_
_
_
dµ(x)
=
_
X

j=1
ν(Q
jx
)dµ(x)
=

j=1
_
X
ν(Q
jx
)dµ(x)
=

j=1
(µ ν)(Q
j
),
where the second to the last equality follows from the Monotone Convergence
Theorem.
57
Proof of Lemma 2.1. We assume µ(X) < ∞ and ν(Y ) < ∞. Let / be the
collection of all Q ∈ /B for which the conclusion of the Lemma is true. We will
prove that / is a monotone class which contains the elementary sets; c ⊂ /. By
Exercise 1.1 and the Monotone class Theorem, this will show that / = T (.
This will be done in several stages. First we prove that rectangles are in /. That
is,
(i) Let Q = AB, A ∈ /, B ∈ B. Then Q ∈ /.
To prove (i) observe that
Q
x
=
_
B if x ∈ A
∅ if x ,∈ A.
Thus
ϕ(x) =
_
ν(B) if x ∈ A
0 if x ,∈ A.
= χ
A
(x)ν(B)
and clearly ϕ ∈ σ(/). Similarly,
ψ(y) = 1
B
(y)µ(A) ∈ B.
Integrating we obtain that
_
¸
¸
¸
¸
_
¸
¸
¸
¸
_
_
X
ϕ(x)dµ(x) = µ(A)ν(B)
_
Y
ϕ(y)dν(y) = µ(A)ν(B),
proving (i).
(ii) Let Q
1
⊂ Q
2
⊂ . . . , Q
j
∈ /. Then Q =

j=1
Q
j
∈ /.
To prove this, let
ϕ
n
(x) = ν((Q
n
)
x
) = ν
_
_
_
_
n
_
j=1
Q
j
_
_
x
_
_
58
and
ψ
n
(y) = µ(Q
y
n
) = µ
_
_
_
_
n
_
j=1
Q
j
_
_
y
_
_
.
Then
ϕ
n
(x) ↑ ϕ(x) = ν(Q
x
)
and
ψ
n
(x) ↑ ψ(x) = µ(Q
x
).
Since ϕ
n
∈ σ(/) and ψ
n
∈ σ(B), we have ϕ ∈ σ(/) and ψ ∈ σ(B). Also by
assumption
_
X
ϕ
n
(x)dµ(x) =
_
Y
ψ
n
(y)dν(y),
for all n. By Monotone Convergence Theorem,
_
X
ϕ(x)dµ(x) =
_
Y
ϕ(y)dν(y)
and we have proved (ii).
(iii) Let Q
1
⊃ Q
2
⊃ . . . , Q
j
∈ /. Then Q =

j=1
Q
j
∈ /.
The proof of this is the same as (ii) except this time we use the Dominated Conver-
gence Theorem. That is, this time the sequences ϕ
n
(x) = ν((Q
n
)
x
), ψ
n
(y) = µ(Q
y
n
)
are both decreasing to ϕ(x) = ν(Q
x
) and ψ(y) = µ(Q
y
), respectively, and since
since both measures are ﬁnite, both sequences of functions are uniformly bounded.
(iv) Let ¦Q
i
¦ ∈ / with Q
i
∩ Q
j
= ∅. Then

j=1
Q
i
∈ /.
For the proof of this, let
˜
Q
n
=
n

i=1
Q
i
. Then
˜
Q
n
∈ /, since the sets are disjoint.
However, the
˜
Q

n
s are increasing and it follows from (ii) that their union is in /,
proving (iv).
It follows from (i)–(iv) that / is a monotone class containing the elementary
sets c. By the Monotone Class Theorem and Exercise 1.1, /B = σ(c) = /. This
proves the Lemma for ﬁnite measure and the following exercise does the rest.
59
Exercise 2.1. Extend the proof of Lemma 2.1 to the case of σ–ﬁnite measures.
Theorem 2.1 (Fubini’s Theorem). Let (X, /, µ) and (Y, B, ν) be σ–ﬁnite measure
spaces. Let f ∈ σ(/B).
(a) (Tonelli) If f is nonnegative and if
ϕ(x) =
_
X
f
x
(y)dν(y), ψ(y) =
_
X
f
y
(x)dµ(x), (2.5)
then
ϕ ∈ σ(/), ψ ∈ σ(B)
and
_
X
ϕ(x)dµ(x) =
_
X×Y
f(x, y)d(µ ν) =
_
Y
ψ(y)dν(y).
(b) If f is complex valued such that
ϕ

(x) =
_
Y
[f(y)[
x
dν(y) =
_
X
[f(x, y)[dν(y) < ∞ (2.6)
and
_
X
ϕ

(x)dµ(x) < ∞
then
f ∈ L
1
(µ ν).
and (2.6) holds. A similarly statement holds for y in place ofx.
(c) If f ∈ L
1
(µ ν), then f
x
∈ L
1
(ν) for a.e. x ∈ X, f
y
∈ L
1
(µ) for
a.e. y ∈ Y , the functions deﬁned in (2.5) are measurable and (2.6)
holds.
Proof of (a). If f = χ
Q
, Q ∈ /B, the result follows from Lemma 2.1. By linearity
we also have the result for simple functions. Let 0 ≤ s
1
≤ . . . be nonnegative simple
functions such that s
n
(x, y) ↑ f(x, y) for every (x, y) ∈ X Y . Let
ϕ
n
(x) =
_
Y
(s
n
)
x
(y)dν(y)
60
and
ψ
n
(y) =
_
X
s
y
n
(x)dµ(x).
Then
_
X
ϕ
n
(x)dµ(x) =
_
X×Y
s
n
(x, y)d(µ λ)
=
_
Y
ψ
n
(y)dµ(y).
Since s
n
(x, y) ↑ f(x, y) for every (x, y) ∈ X Y , ϕ
n
(x) ↑ ϕ(x) and ψ
n
(y) ↑ ψ(y).
The Monotone Convergence Theorem implies the result. Parts (b) and (c) follow
directly from (a) and we leave these as exercises.
The assumption of σ–ﬁniteness is needed as the following example shows.
Example 2.1. X = Y = [0, 1] with µ = the Lebesgue measure and ν = the
counting measure. Let f(x, y) = 1 if x = y, f(x, y) = 0 if x ,= y. That is, the
function f is the characteristic function of the diagonal of the square. Then
_
X
f(x, y)dµ(x) = 0, and
_
Y
f(x, y)dν(y) = 1.
Remark 2.1. Before we can integrate the function f in this example, however, we
need to verify that it (and hence its projections) is (are) measurable. This can be
seen as follows: Set
I
j
=
_
j −1
n
,
j
n
_
and
Q
n
= (I
1
I
1
) ∪ (I
2
I
2
) ∪ . . . ∪ (I
n
I
n
).
Then Q
n
is measurable and so is Q = ∩Q
n
, and hence also f .
Example 2.2. Consider the function
f(x, y) =
x
2
−y
2
(x
2
+y
2
)
2
on (0, 1) (0, 1).
61
with the µ = ν = Lebesgue measure. Then
_
1
0
_
1
0
f(x, y)dydx = π/2
but
_
1
0
_
1
0
f(x, y)dxdy = −π/2.
The problem here is that f / ∈ L
1
¦(0, 1) (0, 1)¦ since
_
1
0
[f(x, y)[dy ≥ 1/2x.
Let m
k
= Lebesgue measure in R
k
and recall that m
k
is complete. That is,
if m
k
(E) = 0 then E is Lebesgue measurable. However, m
1
m
1
is not complete
since ¦x¦B, for any set B ⊂ R, has m
1
m
1
– measure zero. Thus m
2
,= m
1
m
1
.
What is needed here is the notion of the completion of a measure. We leave the
proof of the ﬁrst two Theorems as exercises.
Theorem 2.2. If (X, T, µ) is a measure space we let
T

= ¦E ⊂ X: ∃ A and B ∈ T, A ⊂ E ⊂ B and µ(B¸A) = 0¦.
Then T

is a σ–algebra and the function µ

deﬁned on T

by
µ

(E) = µ(A)
is a measure. The measure space (X, m

, µ

) is complete. This new space is called
the completion of (X, T, µ).
Theorem 2.3. Let m
n
be the Lebesgue measure on R
n
, n = r + s. Then m
n
=
(m
r
m
j
)

, the completion of the product Lebesgue measures.
The next Theorem says that as far as Fubini’s theorem is concerned, we need
not worry about incomplete measure spaces.
62
Theorem 2.4. Let (X, /, µ), (Y, B, ν) be two complete σ–ﬁnite measure spaces.
Theorems 2.1 remains valid if µν is replaced by (µν)

except that the functions
ϕ and ψ are deﬁned only almost everywhere relative to the measures µ and ν,
respectively.
Proof. The proof of this theorem follows from the following two facts.
(i) Let (X, T, µ) be a measure space. Suppose f ∈ σ(T

). Then there is a
g ∈ σ(T) such that f = g a.e. with respect to µ.
(ii) Let (X, /, µ) and (Y, B, ν) be two complete and σ–ﬁnite measure spaces.
Suppose f ∈ σ((/ B)

) is such that f = 0 almost everywhere with
respect to µν. Then for almost every x ∈ X with respect to µ, f
x
= 0
a.e. with respect to ν. In particular, f
x
∈ σ(B) for almost every x ∈ X.
A similar statement holds with y replacing x.
Let us assume (i) and (ii) for the moment. Then if f ∈ σ((/ B)

) is
nonnegative there is a g ∈ σ(/ B) such that f = g a.e. with respect to µ ν.
Now, apply Theorem 2.1 to g and the rest follows from (ii).
It remains to prove (i) and (ii). For (i), suppose that f = χ
E
where E ∈ /

.
By deﬁnition A ⊂ E ⊂ B with µ(A¸B) = 0 and A and B ∈ /. If we set g = χ
A
we have f = g a.e. with respect to µ and we have proved (i) for characteristic
function. We now extend this to simple functions and to nonnegative functions in
the usual way; details left to the reader. For (ii) let Ω = ¦(x, y) : f(x, y) ,= 0¦.
Then Ω ∈ (/B)

and (µ ν)(Ω) = 0. By deﬁnition there is an
˜
Ω ∈ /B such
that Ω ⊂
˜
Ω and µ ν(
˜
Ω) = 0. By Theorem 2.1,
_
X
ν(
˜

x
)dµ(x) = 0
and so ν(
˜

x
) = 0 for almost every x with respect to µ. Since Ω
x

˜

x
and the
space (Y, B, ν) is complete, we see that Ω
x
∈ B for almost every x ∈ X with respect
63
to the measure µ. Thus for almost every x ∈ X the projection function f
x
∈ B
and f
x
(y) = 0 almost everywhere with respect to µ. This completes the proof of
(ii) and hence the theorem.
Exercise 2.3. Let f be a nonnegative measurable function on (X, T, µ). Prove
that for any 0 < p < ∞,
_
X
f(x)
p
dµ(x) = p
_

0
λ
p−1
µ¦x ∈ X : f(x) > λ¦dλ.
Exercise 2.4. Let (X, T, µ) be a measure space. Suppose f and g are two nonneg-
ative functions satisfying the following inequality: There exists a constant C such
that for all ε > 0 and λ > 0,
µ¦x ∈ X : f(x) > 2λ, g(x) ≤ ελ¦ ≤ Cε
2
µ¦x ∈ X : f(x) > λ¦.
Prove that
_
X
f(x)
p
dµ ≤ C
p
_
X
g(x)
p

for any 0 < p < ∞ for which both integrals are ﬁnite where C
p
is a constant
depending on C and p.
Exercise 2.5. For any α ∈ R deﬁne
sign(α) =
_
¸
_
¸
_
1, α > 0
0, α = 0
− 1, α < 0
Prove that
0 ≤ sign(α)
_
y
0
sin(αx)
x
dx ≤
_
π
0
sin(x)
x
dx (2.7)
for all y > 0 and that
_

0
sin(αx)
x
dx =
π
2
sign(α) (2.8)
and
_

0
1 −cos(αx)
x
2
dx =
π
2
[α[. (2.9)
64
Exercise 2.6. Prove that
e
−α
=
2
π
_

0
cos αs
1 +s
2
ds (2.10)
for all α > 0. Use (2.10), the fact that
1
1 +s
2
=
_

0
e
−(1+s
2
)t
dt,
and Fubini’s theorem, to prove that
e
−α
=
1

π
_

0
e
−t

t
e
−α
2
/4t
dt. (2.11)
Exercise 2.7. Let S
n−1
= ¦x ∈ R
n
: [x[ = 1¦ and for any Borel set E ∈ S
n−1
set
˜
E = ¦rθ : 0 < r < 1, θ ∈ A¦. Deﬁne the measure σ on S
n−1
by σ(A) = n[
˜
E[.
Notice that with this deﬁnition the surface area ω
n−1
of the sphere in R
n
satisﬁes
ω
n−1
= nγ
n
= 2π
n
2
/Γ(
n
2
) where γ
n
is the volume of the unit ball in R
n
. Prove
(integration in polar coordinates) that for all nonnegative Borel functions f on R
n
,
_
R
n
f(x)dx =
_

0
r
n−1
__
S
n−1
f(rθ)dσ(θ)
_
dr.
In particular, if f is a radial function, that is, f(x) = f([x[), then
_
R
n
f(x)dx =

n
2
Γ(
n
2
)
_

0
r
n−1
f(r)dr = nγ
n
_

0
r
n−1
f(r)dr.
Exercise 2.8. Prove that for any x ∈ R
n
and any 0 < p < ∞
_
S
n−1
[ξ x[
p
dσ(ξ) = [x[
p
_
S
n−1

1
[
p
dσ(ξ)
where ξ x = ξ
1
x
1
+ +ξ
n
x
n
is the inner product in R
n
.
65
Exercise 2.9. Let e
1
= (1, 0, . . . , 0) and for any ξ ∈ S
n−1
deﬁne θ, 0 ≤ θ ≤ π such
that e
1
ξ = cos θ. Prove, by ﬁrst integrating over L
θ
= ¦ξ ∈ S
n−1
: e
1
ξ = cos θ¦,
that for any 1 ≤ p < ∞,
_
S
n−1

1
[
p
dσ(ξ) = ω
n−1
_
π
0
[ cos θ[
p
(sin θ)
n−2
dθ. (2.12)
Use (2.12) and the fact that for any r > 0 and s > 0,
2
_ π
2
0
(cos θ)
2r−1
(sin θ)
2s−1
dθ =
Γ(s)Γ(r)
Γ(r +s)
([Ru1, p. 194]) to prove that for any 1 ≤ p < ∞
_
S
n−1

1
[
p
dσ(ξ) =

n−1
2
Γ(
p+1
2
)
Γ(
n+p
2
)
(2.13)
66
IV
RANDOM VARABLES
¸1 Some Basics.
From this point on, (Ω, T, µ) will denote a probability space. X : Ω →R is a
random variable if X is measurable relative to T. We will use the notation
E(X) =
_

XdP.
E(X) is called the expected value of X, or expectation of X. We recall from Prob-
lem – that if X is a random variable, then µ(A) = µ
X
(A) = P¦X ∈ A¦, A ∈ B(R),
is a probability measure on (R, B(R)). This measure is called the distribution mea-
sure of the random variable X. Two random variables X, Y are equally distributed
if µ
X
= µ
X
. This is often written as X
d
= Y or X ∼ Y .
If we take the set A = (−∞, x], for any x ∈ R, the n
µ
X
(−∞, x] = P¦X ≤ x¦ = F
X
(x)
deﬁnes a distribution function, as we saw in Chapter I. We list some additional
properties of this distribution function given the fact that µ
X
(R) = 1 and since it
arises from the random variable X.
(i) F
X
(b) = F
X
(a) = µ(a, b].
(ii) lim
x→∞
F
X
(x) = 1, lim
x→−∞
F
X
(x) = 0.
(iii) With F
X
(x−) = lim
y↑x
F
X
(y), we see that F
X
(x−) = P(X < x).
(iv) P¦X = x¦ = µ
X
¦x¦ = F
X
(x) −F
X
(x−).
67
It follows from (iv) that F is continuous at x ∈ R if and only if x is not an
atom of the measure µ. That is, if and only if µ
X
¦x¦ = 0. As we saw in Chapter
I, distribution functions are in a one-to-one correspondence with the probability
measures in (R, B). Also, as we have just seen, every random variable gives rise to
a distribution function. The following theorem completes this circle.
Theorem 1.1. Suppose F is a distribution function. Then there is a probability
space (Ω, T, P) and a random variable X deﬁned on this space such that F = F
X
.
Proof. We take (Ω, T, µ) with Ω = (0, 1), T = Borel sets and P the Lebesgue
measure. For each ω ∈ Ω, deﬁne
X(ω) = sup¦y : F(y) < ω¦.
We claim this is the desired random variable. Suppose we can show that for each
x ∈ R,
¦ω ∈ Ω : X(ω) ≤ x¦ = ¦ω ∈ Ω : ω ≤ F(x)¦. (1.1)
Clearly then X is measurable and also P¦X(ω) ≤ x¦ = F(x), proving that F =
F
X
. To prove (1.1) let ω
0
∈ ¦ω ∈ Ω : ω ≤ F(x)¦. That is, ω
0
≤ F(x). Then
x / ∈ ¦y : F(y) < ω
0
¦ and therefore X(ω
0
) ≤ x. Thus ¦ω ∈ Ω : ω ≤ F(x)¦ ⊂ ¦ω ∈
Ω : X(ω) ≤ x¦.
On the other hand, suppose ω
0
> F(x). Since F is right continuous, there
exists > 0 such that F(x + ) < ω
0
. Hence X(ω) ≥ x + > x. This shows that
ω
0
/ ∈ ¦ω ∈ Ω : X¦ω¦ ≤ x¦ and concludes the proof.
Theorem 1.2. Suppose X is a random variable and let G : R → R be Borel
measurable. Suppose in addition that G is nonnegative or that E[G(X)[ < ∞.
Then
_

G(X(ω))d(ω) = E(G(X)) =
_
R
G(y)dµ
X
(y). (1.2)
68
Proof. Let B ⊂ B(R). Then
E(1
B
(X(ω)) = P

¦X ∈ B¦
= µ
X
(B) =
_
B

X
=
_
R
1
B
(y)dµ
X
(y).
Thus the result holds for indicator functions. By linearity, it holds for simple
functions. Now , suppose G is nonnegative. Let ϕ
n
be a sequence of nonnegative
simple functions converging pointwise up to G. By the Monotone Convergence
Theorem,
E(G(X(ω)) =
_
R
G(x)dµ
X
(x).
If E[G(X)[ < ∞ write
G(X(ω)) = G
+
(X(ω)) −G

(X(ω)).
Apply the result for nonnegative G to G
+
and G

and subtract the two using the
fact that E(G(X)) < ∞.
More generally let X
1
, X
2
, . . . , X
n
be n-random variables and deﬁne their
join distribution by
µ
n
(A) = P¦(X
1
, X
2
, . . . , X
n
) ∈ A¦, A ∈ B(R
n
).
µ
n
is then a Borel probability measure on (R
n
, B). As before, if G : R
n
→ R is
Borel measurable nonnegative or E(G(X
1
, X
2
, . . . , X
n
)) < ∞, then
E(G(X
1
(ω), X
2
(ω), . . . , X
n
(ω))) =
_
R
n
G(x
1
, x
2
, . . . , x
n
)dµ
n
(x
1
, . . . , x
n
).
The quantity EX
p
, for 1 ≤ p < ∞ is called the p–th moment of the random
variable X. The case and the variance is deﬁned by var(X) = E[X − m[
2
Note
that by expanding this quantity we can write
var(X) = EX
2
−2(EX)
2
+ (EX)
2
= EX
2
−(EX)
2
69
If we take the function G(x) = x
p
then we can write the p–th moments in terms
of the distribution as
EX
p
=
_
R
x
p

X
and with G(x) = (x = m)
2
we can write the variance as
var(X) =
_
R
(x −m)
2

=
_
R
x
2

X
−m
2
.
Now, recall that if f is a nonnegative measurable function on (Ω, T, P) then
µ(A) =
_
A
fdP
deﬁnes a new measure on (Ω, T) and
_

g dµ =
_

gfdP. (1.3)
In particular, suppose f is a nonnegative borel measurable function in R with
_
R
f(x)dx = 1
where here and for the rest of these notes we will simply write dx in place of dm
when m is the Lebesgue measure. Then
F(x) =
_
x
−∞
f(t)dt
is a distribution function. Hence if µ(A) =
_
A
f dt, A ∈ B(R) then µ is a probability
measure and since
µ(a, b] =
_
b
a
f(t)dt = F(b) −F(a),
70
for all interval (a, b] we see that µ ∼ F (by the construction in Chapter I). Let X
be a random variable with this distribution function. Then by (1.3) and Theorem
1.2,
E(g(X)) =
_
R
g(x)dµ(x) =
_
R
g(x)f(x)dx. (1.4)
Distributions arising from such f

s are called absolutely continuous distributions.
We shall now give several classical examples of such distributions. The function f
is called the density of the random variable associated with the distribution.
Example 1.1. The Uniform distribution on (0, 1).
f(x) =
_
1 x ∈ (0, 1)
0 x / ∈ (0, 1)
Then
F(x) =
_
¸
_
¸
_
0 x ≤ 0
x 0 ≤ x ≤ 1
1 x ≥ 1
.
If we take a random variable with this distribution we ﬁnd that the variance
var(X) =
1
12
and that the mean m =
1
2
.
Example 1.2. The exponential distribution of parameter λ. Let λ > 0 and set
f(x) =
_
λe
−λx
, x ≥ 0
0 else
If X is a random variable with associated to this density, we write X ∼
exp(λ).
EX
k
= λ
_

0
x
k
e
−λx
dx =
k!
λ
k
71
Example 1.3. The Cauchy distribution of parameter a. Set
f(x) =
1
π
a
a
2
+x
2
We leave it to the reader to verify that if the random variable X has this
distribution then E[X[ = ∞.
Example 1.3. The normal distribution. Set
f(x) =
1

e

x
2
2
.
The corresponding random variable is the normal distribution. We write X ∼
N(0, 1) By symmetry,
E(X) =
1

_
R
xe

x
2
2
dx = 0.
To compute the variance let us recall ﬁrst that for any α > 0,
Γ(α) =
_

0
t
α−1
e
−t
dt.
We note that
_
R
x
2
e

x
2
2
dx = 2
_

0
x
2
e

x
2
2
dx = 2

2
_

0
u
1
2
e
−u
dx
= 2

2Γ(
3
2
) = 2

1Γ(
1
2
+ 1) =
k

2
k
Γ(
1
2
) =

and hence var(X) = 1. If we take σ > 0 and µ ∈ R, and set
f(x) =
1
_
(2πσ
2
)
e
−(x−µ)
2

2
we get the normal distribution with mean µ and variance σ and write X ∼ N(µ, σ).
For this we have Ex = µ and var(X) = σ
2
.
72
Example 1.4. The gamma distribution arises from
f(x) =
_
1
Γ(α)
λ
α
x
α−1
e
−λx
x ≥ 0
0, x < 0.
We write X ∼ Γ(α, λ) when the random variable X has this density.
Random variables which take only discrete values are appropriately called
“discrete random variables.” Here are some examples.
Example 1.5. X is a Bernoulli random variable with parameter p, 0 < p < 1, if
X takes only two values one with probability p and the other with probability 1−p.
P(X = 1) = p and P(X = 0) = 1 −p
For this random variable we have
EX = p 1 + (1 −p) 0 = p,
EX
2
= 1
2
p = p
and
var(X) = p −p
2
= p(1 −p).
Example 1.6. We say X has Poisson distribution of parameter λ if
P¦X = k¦ = e
−λ
λ
k
k!
k = 0, 1, 2, . . . .
For this random variable,
EX =

k=0
k
e
−λ
λ
k
k!
= λe
−λ

k=1
λ
k−1
(k −1)!
= λ
and
Var(X) = EX
2
−λ
2
=

k=0
k
2
e
−λ
λ
k
k!
−λ
2
= λ.
73
Example 1.7. The geometric distribution of parameter p. For 0 < p < 1 deﬁne
P¦X = k¦ = p(1 −p)
k−1
, for k = 1, 2, . . .
The random variable represents the number of independent trial needed to
observe an event with probability p. By the geometric series,

k=1
(1 −p)
k
=
1
p
and we leave it to the reader to verify that
EN =
1
p
and
var(N) =
1 −p
p
2
.
¸2 Independence.
Deﬁnition 2.1.
(i) The collection T
1
, T
2
, . . . , T
n
of σ–algebras is said to be independent if when-
ever A
1
∈ T
j
, A
2
∈ T
2
, . . . , A
n
∈ T
n
, then
P
_

n
j=1
A
j
_
=
n

j=1
P(A
j
).
(ii) A collection ¦X
j
: 1 ≤ j ≤ n¦ of random variables is said to be (totally)
independent if for any ¦B
j
: 1 ≤ j ≤ n¦ of Borel sets in R,
P¦X
1
∈ B
1
, X
1
∈ B
2
, . . . X
n
∈ B
n
¦ = P¦
n

j=1
(X
j
∈ B
j
)¦ =
n

j=1
P¦X
j
∈ B
j
¦.
(iii) The collection of measurable subsets A
1
, A
2
, . . . , A
n
in a σ–algebra T is in-
dependent if for any subset I ⊂ ¦1, 2, . . . , n¦ we have
P
_
_
_

j∈I
(A
j
)
_
_
_
=

j∈I
P¦A
j
¦
74
Whenever we have a sequence ¦X
1
, . . . , X
n
¦ of independent random variables
with the same distribution, we say that the random variables are identically dis-
tributed and write this as i.i.d. We note that (iii) is equivalent to asking that the
random variables 1
A
1
, 1
A
2
, . . . , 1
A
3
are independent. Indeed, for one direction we
take B
j
= ¦1¦ for j ∈ I and B
j
= R for j / ∈ I. For the other direction the reader
Problem 2.1. Let A
1
, A
2
, . . . , A
n
be independent. Proof that A
c
1
, A
c
2
, . . . A
c
n
and
1
A
1
, 1
A
2
, . . . , 1
A
n
are independent.
Problem 2.2. Let X and Y be two random variable and set T
1
= σ(X) and
T
2
= σ(Y ). (Recall that the sigma algebra generated by the random X, denoted
σ(X), is the sigma algebra generated by the sets X
−1
¦B¦ where B ranges over
all Borel sets in R.) Prove that X, Y are independent if and only if T
1
, T
2
are
independent.
Suppose ¦X
1
, X
2
, . . . , X
n
¦ are independent and set
µ
n
(B) = P¦X
1
, . . . , X
n
) ∈ B¦ B ∈ B(R
n
),
as in ¸1. Then with B = B
1
B
n
we see that
µ
n
(B
1
B
n
) =
n

i=1
µ
j
(B
j
)
and hence
µ
n
= µ
1
µ
n
where the right hand side is the product measure constructed from µ
1
, . . . , µ
n
as
in Chapter III. As we did earlier. Thus for this probability measure on (R
n
, B),
the corresponding n-dimensional distribution is
F(x) =
n

j=1
F
X
j
(x
j
),
where x = (x
1
, x
2
, . . . , x
n
)).
75
Deﬁnition 2.2. Suppose / ⊂ T. /is a π-system if it is closed under intersections:
/, B ∈ / ⇒ A ∩ B ∈ /. The subcollection L ⊂ T is a λ-system if (i) Ω ∈ L, (ii)
A, B ∈ L and A ⊂ B ⇒ B¸A ∈ L and (iii) A
n
∈ L and A
n
↑ A ⇒ A ∈ L.
Theorem 2.1. Suppose / is a π-system and L is a λ-system and A ⊂ L. Then
σ(/) ⊂ L.
Theorem 2.2. Let µ and ν be two probability measures on (Ω, T). Suppose they
agree on the π-system / and that there is a sequence of sets A
n
∈ / with A
n
↑ Ω.
Then ν = µ on σ(/)
Theorem 2.3. Suppose /
1
, /
2
, . . . , /
n
are independent and π-systems. Then
σ(/
2
), σ(/
2
), . . . , σ(/
n
) are independent.
Corollary 2.1. The random variables X
1
, X
2
, . . . , X
n
are independent if and only
if for all x = ¦x
1
, . . . , x
n
¦, x
i
∈ (−∞, ∞].
F(x) =
n

j=1
F
X
j
(x
i
), (2.1)
where F is the distribution function of the measure µ
n
.
Proof. We have already seen that if the random variables are independent then
the distribution function F satisﬁes 2.1. For the other direction let x ∈ R
n
and set
A
i
be the sets of the form ¦X
i
≤ x
i
¦. Then
¦X
i
≤ x
i
¦ ∩ ¦X
i
≤ y
i
¦ = ¦X
i
≤ x
i
∧ y
i
¦ ∈ /
i
.
Therefore the collection /
i
is a π-system. σ(/
i
) = σ(X).
Corollary 2.2. µ
n
= µ
1
µ
n
.
76
Corollary 2.3. Let X
1
, . . . , X
n
with X
i
≥ 0 or E[X
i
[ < ∞ be independent. Then
E
_
_
n

j=1
X
i
_
_
=
n

i=1
E(X
i
).
Proof. Applying Fubini’s Theorem with f(x
1
, . . . x
n
) = x
1
x
n
we have
_
R
n
(x
1
x
n
)d(µ
1
µ
n
) =
_
R
x
1

1
(a)
_
R
n
x
n

n
(a).

It follows as in the proof of Corollary 1.3 that if X
1
, . . . , X
n
are independent
and g ≥ 0 or if E[

n
j=1
g(X
i
)[ < ∞, then
E
_
n

i=1
g(X
i
)
_
=
n

i=1
E(g(X
i
)).
We warn the reader not to make any inferences in the opposite direction. It may
happen that E(XY ) = (E(X)E(Y ) and yet X and Y may not be independent.
Take the two random variables X and Y with joint distributions given by
X¸Y 1 0 −1
1 0 a 0
0 b c b
−1 0 a 0
with 2a + 2b + c = 1, a, b, c > 0. Then XY = 0 and E(X)E(Y ) = 0. Also by
symmetry, EX = EY = 0. However, the random variables are not independent.
Why? Well, observe that P(X = 1, Y = 1) = 0 and that P(X = 1) = P(X =
1, Y = 1) = ab ,= 0.
Deﬁnition 2.2. If F and G are two distribution functions we deﬁne their convo-
lution by
F ∗ G(z) =
_
R
F(z −y)dµ(y)
77
where µ is the probability measure associated with G. The right hand side is often
also written as
_
R
F(z −y)dG(y).
In this notes we will use both notations.
Theorem 2.4. If X and Y are independent with X ∼ F
X
, Y ∼ G
Y
, then X+Y ∼
F ∗ G.
Proof. Let Let us ﬁx z ∈ R. Deﬁne
h(x, y) = 1
(x+y≤z)
(x, y)
Then
F
X+Y
(z) = P¦X +Y ≤ z¦
= E(h(X, Y ))
=
_
R
2
h(x, y)d(µ
X
µ
Y
)(x, y)
=
_
R
__
R
h(x, y)dµ
X
(x)
_

Y
(y)
=
_
R
__
R
1
−∞,z−y)
(x) dµ
X
(x)
_

Y
(y)
=
_
R
µ
X
(−∞, z −y)dµ
Y
(y) =
_
R
F(z −y)dG(y).

Corollary 2.4. Suppose X has a density f and Y ∼ G, and X and Y are inde-
pendent. Then X +Y has density
h(x) =
_
R
f(x −y)dG(y).
If both X and Y have densities with g denoting the density of Y . Then
h(x) =
_
f(x −y)g(y)dy.
78
Proof.
F
X+Y
(z) =
_
R
F(z −y)dG(y)
=
_
R
_
z−y
−∞
f(x)dxdG(y)
=
_
R
_
z
−∞
f(u −y)dudG(y)
=
_
z
−∞
_
R
f(u −y)dG(y)du
=
_
z
−∞
__
R
f(u −y)g(y)dy
_
du,
which completes the proof.
Problem 2.1. Let X ∼ Γ(α, λ) and Y ∼ Γ(β, λ). Prove that X+Y ∼ Γ(α+β, λ).
¸3 Construction of independent random variables.
In the previous section we have given various properties of independent ran-
dom variables. However, we have not yet discussed their existence. If we are given
a ﬁnite sequence F
1
, . . . F
n
of distribution functions, it is easy to construct inde-
pendent random variables with this distributions. To do this, let Ω = R
n
and
T = B(R
n
). Let P be the measure on this space such that
P((a
1
, b
1
] (a
n
, b
n
]) =
n

j=1
(F
j
(b
j
) −F
j
(a
j
)).
Deﬁne the random variables X
j
: Ω →R by X
j
(ω) = ω
j
, where ω = (ω
1
, . . . , ω
n
).
Then for any x
j
∈ R,
P(X
j
≤ x
j
) = P(R (−∞, x
j
] R R) = F
j
(x
j
).
Thus X
j
∼ F
j
. Clearly, these random variables are independent by Corollary 2.1.
It is, however, extremely important to know that we can do this for inﬁnitely many
distributions.
79
Theorem 3.1. Let ¦F
j
¦ be a ﬁnite or inﬁnite sequence of distribution functions.
Then there exists a probability space (Ω, T, P) and a sequence of independent ran-
dom variables X
j
on this space with X
j
∼ F
j
.
Let N = ¦1, 2, . . . ¦ and let R
N
be the space of inﬁnite sequences of real
numbers. That is, R
N
= ¦ω = (ω
1
, ω
2
, . . . ) : ω
i
∈ R¦. Let B(R
N
) be the σ-
algebra on R
N
generated by the ﬁnite dimensional sets. That is, sets of the form
¦ω ∈ R
N
: ω
i
∈ B
i
, 1 ≤ i ≤ n¦, B
i
∈ B(R).
Theorem 3.2 (Kolmogovov’s Extension Theorem). Suppose we are given
probability measures µ
n
on (R
n
, B(R
n
)) which are consistent. That is,
µ
n+1
(a
1
, b
1
] (a
n
, b
n
] R) = µ
n
(a
1
, b
1
] (a
n
, b
n
].
Then there exists a probability measure P on (R
N
, B(R
N
)) such that
P¦ω : ω
i
∈ (a
i
, b
i
], 1 ≤ i ≤ n¦ = µ
n
((a
1
, b
1
] (a
n
, b
n
]).
The Means above µ
n
are consistent. Now, deﬁne
X
j
: R
N
→R
by
X
j
(ω) = ω
j
.
Then ¦X
j
¦ are independent under the extension measure and X
j
∼ F
j
.
A diﬀerent way of constructing independent random variables is the following,
at least Bernoulli random variables, is as follows. Consider Ω = (0, 1] and recall
that for x ∈ (0, 1) we can write
x =

n=1

n
2
n
where
n
is either 0 or 1. (This representation in actually unique except for x the
80
Problem 3.1. Deﬁne X
n
(x) =
n
. Prove that the sequence ¦X
n
¦ of random
variables is independent.
Problem 3.2. Let ¦A
n
¦ be a sequence of independent sets. Prove that

j=1
A
j
¦ =

j=1
P¦A
j
¦
and

_
j=1
A
j
¦ = 1 −

j=1
(1 −P¦A
j
¦)
Problem 3.3. Let ¦X
1
, . . . , X
n
¦ be independent random variables with X
j
∼ F
j
.
Fin the distribution of the random variables max
1≤j≤n
X
j
and min
1≤j≤n
X
j
.
Problem 3.4. Let ¦X
n
¦ be independent random variables and ¦f
n
¦ be Borel mea-
surable. Prove that the sequence of random variables ¦f
n
(X
n
)¦ is independent.
Problem 3.5. Suppose X and Y are independent random variables and that X+
Y ∈ L
p
(P) for some 0 < p < ∞. Prove that both X and Y must also be in L
p
(P).
Problem 3.6. The covariance of two random variables X and Y is deﬁned by
Cov(X, Y ) = E[(X −EX)(Y −EY )]
= E(XY ) −E(X)E(Y ).
Prove that
var(X
1
+X
2
+ +X
n
) =
n

j=1
var(X
j
) +
n

i,j=1,i=j
Cov(X
i
, X
j
)
and conclude that if the random variables are independent then
var(X
1
+X
2
+ +X
n
) =
n

j=1
var(X
j
)
81
V
THE CLASSICAL LIMIT THEOREMS
¸1 Bernoulli Trials.
Consider the sequence of independent random variables which arise from
tossing a fair coin.
X
i
=
_
1 with probability p
0 with probability 1 −p
If we use 1 to denote success (=heads) and 0 to denote failure (=tails) and S
n
, for
the number of successes in n -trials, we can write
S
n
=
n

j=1
X
j
.
We can compute and ﬁnd that the probability of exactly j successes in n
trials is
P¦S
n
= j¦ =
_
n
j
_
P¦any speciﬁc sequence of n trials with exactly j heads¦
=
_
n
j
_
p
j
(1 −p)
n−j
=
_
n
j
_
p
j
(1 −p)
n−j
=
n!
j!(n −j)!
p
j
(1 −p)
n−j
.
This is called Bernoulli’s formula. Let us take p = 1/2 which represents a fair
coin. Then
S
n
n
denotes the relative frequency of heads in n trials, or, the average
number of successes in n trials. We should expect, in the long run, for this to be
1/2. The precise statement of this is
82
Theorem 1.1 (Bernoulli’s “Law of averages,” or ‘Weak low of large num-
bers”). As n–increases, the probability that the average number of successes de-
viates from
1
2
by more than any preassigned number tends to zero. That is,
P¦[
S
n
n

1
2
[ > ε¦ → 0, as n → ∞.
Let x ∈ [0, 1] and consider its dyadic representation. That is, write
x =

n=1
ε
n
2
n
with ε
n
= 0 or 1. The number x is a normal number if each digit occurs the “right”
proportion of times, namely
1
2
.
Theorem 1.2 (Borel 1909). Except for a set of Lebesgue measure zero, all
numbers in [0, 1] are normal numbers. That is, X
n
(x) = ε
n
and S
n
is the partial
sum of these random variables, we have
S
n
(x)
n

1
2
a.s. as n → ∞.
The rest of this chapter is devoted to proving various generalizations of these
results.
¸2 L
2
and Weak Laws.
First, to conform to the language of probability we shall say that a sequence of
random variables X
n
converges almost surely, and write this as a.s., if it converges
a.e. as deﬁned in the Chapter II. If the convergence is in measure we shall say that
X
n
→ X in probability. That is, X
n
→ X if for all ε > 0,
P¦[X
n
−X[ > ε¦ → 0 as n → ∞.
We recall that X
n
→ X in L
p
then X
n
→ X in probability and that there is a
subsequence X
n
k
→ X a.s. In addition, recall that by Problem 3.8 in Chapter II,
X
n
→ Y a.s. iﬀ for any ε > 0,
lim
m→∞
P¦[Y
n
−Y [ ≤ ε for all n ≥ m¦ = 1 (2.1)
83
or
lim
m→∞
P¦[X
n
−X[ > ε for all n ≥ m¦ = 0. (2.2)
The proofs of these results are based on a convenient characterization of a.s. con-
vergence. Set
A
m
=

n=m
¦[X
n
−X[ ≤ ε¦
= ¦[X
n
−X[ ≤ ε for all n ≥ m¦
so that
A
c
m
= ¦[X
n
−X[ > ε for some n ≥ m¦.
Therefore,
¦[X
n
−X[ > ε i.o.¦ =

m=1

_
n=m
¦[X
n
−X[ > ε¦
=

m=1
A
c
m
.
However, since X
n
→ X a.s. if and only if [X
n
−X[ < ε, eventually almost surely,
we see that X
n
→ X a.s. if and only if
P¦[X
n
−X[ > ε i.o.¦ = lim
m→∞
P¦A
c
m
¦ = 0. (2.3)
Now, (2.1) and (2.2) follow easily from this. Suppose there is a measurable
set N with P(N) = 0 such that for all ω ∈ Ω
0
= Ω¸N, X
n

0
) → X(ω
0
).
Set
A
m
(ε) =

n=m
¦[X
n
−X[ ≤ ε¦ (2.3.)
A
m
(ε) ⊂ A
m+1
(ε). Now, for each ω
0
∈ Ω
0
there exists an M(ω
0
, ε) such that for
all n ≥ M(ω
0
, ε), [X
n
−X[ ≤ ε. Therefore, ω ∈ A
M(ω
0
,ε)
. Thus

0

_
m=1
A
m
(ε)
84
and therefore
1 = P(Ω
0
) = lim
m→∞
P¦A
m
(ε)¦,
which proves that (2.1) holds. Conversely, suppose (2.1) holds for all ε > 0 and
the A
m
(ε) are as in (2.3) and A(ε) =

m=1
A
m
(ε). Then
P¦A(ε)¦ = P¦

_
m=1
A
m
(ε)¦ = 1.
Let ω
0
∈ A(ε). Then for ω
0
∈ A(ε) there exists m = m(ω
0
, ε) such that [X
m
−X[ ≤
ε Let ε = ¦1/n¦. Set
A =

n=1
A(
1
n
).
Then
P(A) = lim
n→∞
P(A(
1
n
)) = 1
and therefore if ω
0
∈ A we have ω
0
∈ A(1/n) for all n. Therefore [X
n

0
) −
X(ω
0
)[ < 1/n which is the same as X
n

0
) → X(ω
0
)
Theorem 2.1 (L
2
–weak law). Let ¦X
j
¦ be a sequence of uncorrelated random
variables. That is, suppose EX
i
X
j
= EX
i
EX
j
. Assume that EX
i
= µ and that
var (X
i
) ≤ C for all i, where C is a constant. Let S
n
=
n

i=1
X
i
. Then
S
n
n
→ µ as
n → ∞ in L
2
(P) and in probability.
Corollary 2.1. Suppose X
i
are i.i.d with EX
i
= µ, var (X
i
) < ∞. Then
S
n
n
→ µ
in L
2
and in probability.
Proof. We begin by recalling from Problem that if X
i
are uncorrelated and E(X
2
i
) <
∞ then var(X
1
+. . . +X
n
) = var(X
1
) +. . . +var(X
n
) and that
var(cX) = c
2
var(X) for any constant c. We need to verify that
E
¸
¸
¸
¸
S
n
n
−µ
¸
¸
¸
¸
2
→ 0.
85
Observe that E
_
S
n
n
_
= µ and therefore,
E
¸
¸
¸
¸
S
n
n
−µ
¸
¸
¸
¸
2
= var
_
S
n
n
_
=
1
n
2
var (S
n
)
=
1
n
2
n

i=1
var (X
i
)

Cn
n
2
and this last term goes to zero as n goes to inﬁnity. This proves the L
2
. Since
convergence in L
p
implies convergence in probability for any 0 < p < ∞, the
result follows.
The assumption in Theorem Here is a standard application of the above
weak–law.
Theorem 2.2 (The Weierstrass approximation Theorem). Let f be a con-
tinuous function on [0, 1]. Then there exists a sequence p
n
of polynomials such that
p
n
→ f uniformly on [0, 1].
Proof. Without loss of generality we may assume that f(0) = f(1) = 0 for if this
is not the case apply the result to g(x) = f(x) −f(0) −x(f(1) −f(0)). Put
p
n
(x) =
n

j=0
_
n
j
_
x
j
(1 −x)
n−j
f(j/n)
recalling that
_
n
j
_
=
n!
j!(n −j)!
.
The functions p
n
(x) are clearly polynomials. These are called the Bernstein poly-
nomials of degree n associated with f.
86
Let X
1
, X
2
, . . . i.i.d according to the distribution: P(X
i
= 1) = x, 0 < x < 1.
P(X
i
= 0) = 1 −x so that E(X
i
) = x and var(X
i
) = x(1 −x). The if S
n
denotes
their partial sums we have from the above calculation that
P¦S
n
= j¦ =
_
n
j
_
x
j
(1 −x)
n−j
.
Thus
E(f(S
n
/n)) =
n

j=0
_
n
j
_
x
j
(1 −x)
n−j
f(j/n) = p
n
(x)
as n → ∞. Also, S
n
/n → x in probability. By Chebyshev’s’s inequality,
P

¸
¸
¸
S
n
n
−x
¸
¸
¸
¸
> δ
_

1
δ
2
var
_
S
n
n
_
=
1
δ
2
1
n
2
var(S
n
)
=
x(1 −x)

2

1
4nδ
2
for all x ∈ [0, 1] since x(1 − x) ≤
1
4
. Set M = |f|

and let ε > 0. There exists a
δ > 0 such that [f(x) −f(y)[ < ε when [x −y[ < δ. Thus
[p
n
(x) −f(x)[ =
¸
¸
¸
¸
Ef
_
S
n
n
_
−f(x)
¸
¸
¸
¸
=
¸
¸
¸
¸
E
_
f
_
S
n
n
_
−f(x)

¸
¸
¸
≤ E
¸
¸
¸
¸
f
_
S
n
n
_
−f(x)
¸
¸
¸
¸
=
_
{|
S
n
n
−x|<δ}
¸
¸
¸
¸
f
_
S
n
n
_
−f(x)
¸
¸
¸
¸
dP
+
_
{|
S
n
n
−x|≥δ
¸
¸
¸
¸
f
_
S
n
n
_
−f(x)
¸
¸
¸
¸
dP
< ε + 2MP

¸
¸
¸
S
n
n
−x
¸
¸
¸
¸
≥ δ
_
.
Now, the right hand side can be made smaller than 2ε by taking n large enough
and independent of x. This proves the result.
87
The assumption that the variances of the random variables are uniformly
bounded can be considerably weaken.
Theorem 2.3. Let X
i
be i.i.d. and assume
λP¦[X
i
[ > λ¦ → 0 (1.3)
as λ → ∞. Let S
n
=

n
j=1
X
j
and µ
n
= E(X
1
1
(|X
1
|≤n)
). Then
S
n
n
−µ
n
→ 0
in probability.
Remark 2.1. The condition (1.3) is necessary to have a sequence of numbers a
n
such that
S
n
n
− a
n
→ 0 in probability. For this, we refer the reader to Feller, Vol
II (1971).
Before proving the theorem we have a
Corollary 2.2. Let X
i
be i.i.d. with E[X
1
[ < ∞. Let µ = EX
i
. Then
S
n
n
→ µ in
probability.
Proof of Corollary. First, by the Monotone Convergence Theorem and Cheby-
shev’s’s inequality, λP¦[X
i
[ > λ¦ = λP¦[X
1
[ > λ¦ → 0 as λ → ∞ and µ
n

E(X) = µ. Hence,
P

¸
¸
¸
_
S
n
n
−µ

¸
¸
¸
> ε
_
= P

¸
¸
¸
S
n
n
−µ +µ
n
−µ
n
¸
¸
¸
¸
> ε
_
≤ P

¸
¸
¸
S
n
n
−µ
n
¸
¸
¸
¸
> ε/2
_
+P¦µ
n
−µ[ > ε/2¦
and these two terms go to zero as n → ∞.
88
Lemma 2.1 ( triangular arrays). Let X
n,k
, 1 ≤ k ≤ n, n = 1, 2, . . . be a
triangular array of random variables and assume that for each n, X
n,k
, 1 ≤ k ≤ n,
are independent. Let b
n
> 0, b
n
→ ∞ as n → ∞ and deﬁne the truncated random
variables by X
n,k
= X
n,k
1
|X
n,k
|≤b
n
)
. Suppose that
(i)
n

n=1
P¦[X
n,k
[ > b
n
) → 0, as n → ∞.
(ii)
1
b
2
n
n

k=1
EX
2
n,k
→ 0 as n → ∞.
Put a
n
=
n

k=1
EX
n,k
and set S
n
= X
n,1
+X
n,2
+. . . +X
n,n
. Then
S
n
−a
n
b
n
→ 0
in probability.
Proof. Let S
n
= X
n,1
+. . . +X
n,n
. Then
P
_
[S
n
−a
n
[
b
n
> ε
_
= P

¸
¸
¸
S
n
−a
n
b
n
¸
¸
¸
¸
> ε, S
n
= S
n
, S
n
,= S
n
_
≤ P¦S
n
,= S
n
¦ +P
_
[S
n
−a
n
[
b
n
> ε
_
.
However,
P¦S
n
,=
˜
S
n
¦ ≤ P
_
n
_
k=1
¦X
n,k
,= X
n,k
¦
_

n

k=1
P¦X
n,k
,= X
n,k
¦
=
n

k=1
P¦[X
n,k
[ > b
n
¦
and this last term goes to zero by (i).
89
Since a
n
= ES
n
, we have
P
_
[S
n
−a
n
[
b
n
> ε
_

1
ε
2
b
2
n
E[S
n
−a
n
[
2
=
1
ε
2
b
2
n
n

k=1
var(X
n,k
)

1
ε
2
b
2
n
n

k=1
EX
2
n,k
and this goes to zero by (ii).
Proof of Theorem 2.3. We apply the Lemma with X
n,k
= X
k
and b
n
= n. We ﬁrst
need to check that this sequence satisﬁes (i) and (ii). For (i) we have
n

k=1
P¦[X
n,k
[ > n¦ = nP¦[X
1
[ > n¦,
which goes to zero as n → ∞ by our assumption. For (ii) we see that that since
the random variables are i.i.d we have
1
n
2
n

k=1
EX
2
n,1
=
1
n
EX
2
n,1
.
Let us now recall that by Problem 2.3 in Chapter III, for any nonnegative
random variable Y and any 0 < p < ∞,
EY
p
= p
_

0
λ
p−1
P¦Y > λ¦dλ
Thus,
E[X
2
n,1
[ = 2
_

0
λP¦X
n,1
[ > λ¦dλ = 2
_
n
0
P¦[X
n,1
[ > λ¦dλ.
We claim that as n → ∞,
1
n
_
n
0
λP¦[X
1
[ > λ¦dλ → 0.
For this, let
g(λ) = λP¦[X
1
[ > λ¦.
90
Then 0 ≤ g(λ) ≤ λ and g(λ) → 0. Set M = sup
λ>0
[g(λ)[ < ∞ and let ε > 0. Fix k
0
so large that g(λ) < ε for all λ > k
0
. Then
_
n
0
λP¦[X
1
[ > λ¦dx = M +
_
n
k
0
λP¦[x
1
[ > λ¦dλ
< M +ε(n −k
0
).
Therefore
1
n
_
n
0
λP¦[x
1
[ > λ¦ <
M
n

_
n −k
0
n
_
.
The last quantity goes to ε as n → ∞ and this proves the claim.
¸3 Borel–Cantelli Lemmas.
Before we stay our Borel–Cantelli lemmas for independent events, we recall
a few already proven facts. If A
n
⊂ Ω, then
¦A
n
, i.o.¦ = limA
n
=

m=1

_
n=m
A
n
and
¦A
n
, eventually¦ = limA
n
=

_
m=1

n=m
A
n
Notice that
lim1
A
n
= 1
{limA
n
}
and
lim1
A
n
(ω) = 1
{limA
n
}
It follows from Fatou’s Lemma
P(limA
n
) ≤ limP¦A
n
¦
and that
limP¦A
n
¦ ≤ P¦limA
n
¦.
Also recall Corollary 2.2 of Chapter II.
91
First Borel–Cantelli Lemma. If

n=1
p(A
n
) < ∞ then P¦A
n
, i.o.¦ = 0.
Question: Is it possible to have a converse ? That is, is it true that P¦A
n
, i.o.¦ = 0
implies that

n=1
P¦A
n
¦ = ∞? The answer is no, at least not in general.
Example 3.1. Let Ω = (0, 1) with the Lebesgue measure on its Borel sets. Let
a
n
= 1/n and set A
n
= (0, 1/n). Clearly then

P(A
n
) = ∞. But, P¦A
n
i.o.¦ =
P¦∅¦ = 0.
Theorem 3.1 (The second Borel–Cantelli Lemma). Let ¦A
n
¦ be a sequence
of independent sets with the property that

P(A
n
) = ∞ Then P¦A
n
i.o.¦ = 1.
Proof. We use the elementary inequality (1 − x) ≤ e
−x
valid for 0 ≤ x ≤ 1. Let
Fix N. By independence,
P
_
N

n=m
A
c
n
_
=
N

n=m
P¦A
c
n
¦
=
N

n=m
¦1 −P¦A
n
¦¦

N

n=m
e
−P{A
n
}
= exp
−{
N
P
n=m
P{A
n
}}
and this quantity converges to 0 as N → ∞ Therefore,
P
_

_
n=m
A
n
_
= 1
which implies that P¦A
n
i.o.¦ = 1 and completes the proof.
¸4 Applications of the Borel–Cantelli Lemmas.
In Chapter II, ¸3, we had several applications of the First Borel–Cantelli
Lemma. In the next section we will have several more applications of this and of
92
the Second Borel–Cantelli. Before that, we give a simple application to a fairly
weak version of the strong law of large numbers.
Theorem 4.1. Let ¦X
i
¦ be i.i.d. with EX
1
= µ and EX
4
1
= C∞. Then
S
n
n
→ µ
a.s.
Proof. By considering X

i
= X
i
−µ we may assume that µ = 0. Then
E(S
4
n
) = E(
n

i=1
X
i
)
4
= E

1≤i,j,k,l≤n
X
i
X
j
X
k
X
l
=

1≤i,j,k,l≤n
E(X
i
X
j
X
k
X
l
)
Since the random variables have zero expectation and they are independent, the
only terms in this sum which are not zero are those where all the indices are equal
and those where to pair of indices are equal. That is, terms of the form EX
4
j
and
EX
2
i
X
2
j
= (EX
2
i
)
2
. There are n of the ﬁrst type and 3n(n−1) of the second type.
Thus,
E(S
4
n
) = nE(X
1
) + 3n(n −1)(EX
2
1
)
2
≤ Cn
2
.
By Chebyshev’s inequality with p = 4,
P¦[S
n
[ > nε¦ ≤
Cn
2
n
4
ε
4
=
C
n
2
ε
4
and therefore,

n=1
P¦[
S
n
n
[ > ε¦ < ∞
and the First Borel–Cantelli gives
P

¸
¸
¸
S
n
n
¸
¸
¸
¸
> ε i.o.¦ = 0
93
which proves the result.
The following is an applications of the Second Borel–Cantelli Lemma.
Theorem 4.2. If X
1
, X
2
, . . . , X
n
are i.i.d. with E[X
1
[ = ∞. Then P¦[X
n
[ ≥
n i.o.¦ = 1 and P¦lim
S
n
n
exists ∈ (−∞, ∞)¦ = 0.
Thus E[X
1
[ < ∞ is necessary for the strong law of large numbers. It is also
suﬃcient.
Proof. We ﬁrst note that

n=1
P¦[X
1
[ ≥ n¦ ≤ E[X
1
[ ≤ 1 +

n=1
P¦[X
1
[ > n¦
which follows from the fact that
E[X
1
[ =
_

0
P¦[X
1
[ > X¦dx
and

n=0
_
n+1
n
P¦[X
1
[ > x¦dx ≤
_

0
P¦[X
1
[ > X¦dx
≤ 1 +

n=1
P¦[X
1
[ > n¦
Thus,

n=1
P¦[X
n
[ ≥ n¦ = ∞
and therefore by the Second Borel–Cantelli Lemma,
P¦[X
n
[ > n i.o.¦ = 1.
Next, set
A =
_
lim
n→∞
S
n
n
exits ∈ (−∞, ∞)
_
94
Clearly for ω ∈ A,
lim
n→∞
¸
¸
¸
¸
S
n
(ω)
n

S
n+1
n + 1
(ω)
¸
¸
¸
¸
= 0
and
lim
n→∞
S
n
(ω)
n(n + 1)
= 0.
Hence there is an N such that for all n > N,
lim
n→∞
¸
¸
¸
¸
S
n
(ω)
n(n + 1)
¸
¸
¸
¸
<
1
2
.
Thus for ω ∈ A∩ ¦ω: [X
n
[ ≥ n i.o.¦,
¸
¸
¸
¸
S
n
(ω)
n(n + 1)

X
n+1
(ω)
n + 1
¸
¸
¸
¸
>
1
2
,
inﬁnitely often. However, since
S
n
n

S
n+1
n + 1
=
S
n
n(n + 1)

X
n+1
n + 1
and the left hand side goes to zero as observe above, we see that A ∩ ¦ω: [X
n
[ ≥
n i.o.¦ = ∅ and since P¦[X
n
[ > 1 i.o.¦ = 1 we conclude that P¦A¦ = 0, which
completes the proof.
The following results is stronger than the Second–Borel Cantelli but it follows
from it.
Theorem 4.3. Suppose the sequence of events A
j
are pairwise independent and

j=1
P(A
j
) = ∞. Then
lim
n→∞
_

n
j=1
1
A
j

n
j=1
P(A
j
)
_
= 1. a.s.
In particular,
lim
n→∞
n

j=1
1
A
j
(ω) = ∞ a.s
95
which means that
P¦A
n
i.o.¦ = 1.
Proof. Let X
j
= 1
A
j
and consider their partial sums, S
n
=
n

j=1
X
j
. Since these
random variables are pairwise independent we have as before, var(S
n
) = var(X
1
)+
. . .+ var(X
n
). Also, var(X
j
) = E(X
j
−EX
j
[
2
≤ E(X
j
)
2
= E(X
j
) = P¦A
j
¦. Thus
var(S
n
) ≤ ES
n
. Let ε > 0.
P

¸
¸
¸
S
n
ES
n
−1
¸
¸
¸
¸
> ε
_
= P

¸
¸
¸
S
n
−ES
n
[ > εES
n
_

1
ε
2
(ES
n
)
2
var(S
n
)
=
1
ε
2
ES
n
and this last goes to ∞ as n → ∞. From this we conclude that
S
n
ES
n
→ 1 in
probability. However, we have claimed a.s.
Let
n
k
= inf¦n ≥ 1: ES
n
≥ k
2
¦.
and set T
k
= S
n
k
. Since EX
n
≤ 1 for all n we see that
k
2
≤ ET
k
≤ E(T
k−1
) + 1 ≤ k
2
+ 1
for all k. Replacing n with n
k
in the above argument for S
n
we get
P¦[T
k
−ET
k
[ > εET
k
¦ ≤
1
ε
2
ET
k

1
ε
2
k
2
Thus

k=1
P

¸
¸
¸
T
k
ET
k
−1
¸
¸
¸
¸
> δ
_
< ∞
96
and the ﬁrst Borel–Cantelli gives
P

¸
¸
¸
T
k
ET
k
−1
¸
¸
¸
¸
> ε i.o
_
= 0
That is,
T
k
ET
k
→ 1, a.s.
Let Ω
0
with P(Ω
0
) = 1 be such that
T
k
(ω)
ET
k
→ 1.
for every ω ∈ Ω
0
. Let n be any integer with n
k
≤ n < n
k+1
. Then
T
k
(ω)
ET
k+1

S
n
(ω)
E(S
n
)

T
k+1
(ω)
ET
k
.
We will be done if we can show that
lim
n→∞
T
k
(ω)
ET
k+1
→ 1 and lim
n→∞
T
k+1
(ω)
ET
k
= 1.
Now, clearly we also have
ET
k
ET
k+1

T
k
(ω)
ET
k

S
n
(ω)
ES
n

T
k+1
(ω)
ET
k+1

ET
k+1
ET
k
and since
k
2
≤ ET
k
≤ ET
k+1
≤ (k + 1)
2
+ 1
we see that 1 ≤
ET
k+1
ET
k
and that
ET
k+1
ET
k
→ 1 and similarly 1 ≥
ET
k
ET
k+1
and that
ET
k
ET
k+1
→ 1, proving the result.
¸5. Convergence of Random Series, Strong Law of Large Numbers.
Deﬁnition 5.1. Let ¦X
n
¦ be a sequence of random variables. Deﬁne the σ–
algebras T

n
= σ(X
n
, X
n+1
, . . . ) and 1 =

n≥1
T

n
. T

n
is often called the “future”
σ–algebra and 1 the remote (or tail) σ–algebra.
97
Example 5.1.
(i) If B
n
∈ B(R), then ¦X
n
∈ B
n
i.o.¦ ∈ 1. and if we take X
n
= 1
A
n
we see that
Then ¦A
n
i.o.¦ ∈ 1.
(ii) If S
n
= X
1
+. . .+X
n
, then clearly, ¦lim
n→∞
S
n
exists¦ ∈ 1 and ¦limS
n
/c
n
>
λ¦ ∈ 1, if c
n
→ ∞. However, ¦limS
n
> 0¦ ,∈ 1
Theorem 5.1 (Kolmogorov 0 − 1 Law). If X
1
, X
2
. . . are independent and
A ∈ 1, then P(A) = 0 or 1.
Proof. We shall show that A is “independent of itself” and hence P(A) = P(A ∩
A) = P(A)P(A) which implies that P(A) = 0 or 1. First, since X
1
, . . . are indepen-
dent if A ∈ σ(X
1
, . . . , X
n
) and B ∈ σ(X
n+1
, . . . ) then A and B are independent.
Thus if, A ∈ σ(X
1
, . . . , X
n
) and B ∈ 1, then A and B are independent. Thus
_
n
σ(X
1
, . . . , X
n
) is independent of 1. Since they are both π–systems (clearly
if A, B ∈
_
n
σ(X
1
, . . . , X
n
) then A ∈ σ(X
1
. . . , X
n
) and B ∈ σ(X
1
, . . . , X
m
) for
some n and m and so A∩B ∈ σ(X
1
, . . . , X
max(n,m)
)),

n
σ(X
1
, . . . , X
n
) is indepen-
dent of 1, by Theorem 2.3, Chapter IV. Since A ∈ 1 implies A ∈ σ(X
1
, X
2
, . . . ),
we are done.
Corollary. Let A
n
be independent. Then P¦A
n
i.o.¦ = 0 or 1. In the same way,
if X −n are independent then P¦limS
n
exists¦ = 0 or1.
Or next task is to investigate when the above probabilities are indeed one.
Recall that Chebyshev’s inequality gives, for mean zero random variables which
are independent, that
P¦[S
n
[ > λ¦ ≤
1
λ
2
var(S
n
).
The following results is stronger and more useful as we shall see soon.
98
Theorem 5.2 ( Kolmogorov’s inequality). Suppose ¦X
n
¦ be independent,
EX
n
= 0 and var(X
n
) < ∞ for all n. Then for any λ > 0,
P¦ max
1≤k≤n
[S
k
[ ≥ λ¦ ≤
1
λ
2
E[S
n
[
2
=
1
λ
2
var(S
n
).
Proof. Set
A
k
= ¦ω ⊂ Ω: [S
k
(ω)[ ≥ λ, [S
j
(ω)[ < λ for all j < k¦
Note that these sets are disjoint and
ES
2
n

n

k=1
_
A
k
S
2
n
dP =
=
n

k=1
_
A
k
S
2
k
+ 2S
k
S
n
−2S
k
+ (S
n
−S
k
)
2
dP

n

k=1
_
A
k
S
2
k
dP + 2
n

k=1
_

S
k
1
A
k
(S
n
−S
k
)dP (5.1)
Now,
S
k
1
A
k
∈ σ(X
1
, . . . , X
k
)
and
S
n
−S
k
∈ σ(X
k+1
. . . S
n
)
and hence they are independent. Since E(S
n
− S
k
) = 0, we have E(S
k
1
A
k
(S
n

S
k
)) = 0 and therefore the second term in (5.1) is zero and we see that
ES
n
2

n

k=1
_
A
k
[S
2
k
[dp
≥ λ
2

k=1
P(A
k
)
= λ
2
P( max
1≤k≤n
[S
k
[ > λ
_
which proves the theorem.
99
Theorem 5.3. If X
j
are independent, EX
j
= 0 and

n=1
var(X
n
) < ∞, then

n=1
X
n
converges a.s.
Proof. By Theorem 5.2, for N > M we have
P
_
max
M≤n≤N
[S
n
−S
M
[ > ε
_

1
ε
2
var(S
N
−S
M
)
=
1
ε
2
N

n=M+1
var(X
n
).
Letting N → ∞ gives P¦ max
m≥M
[S
n
−S
M
[ > ε¦ ≤
1
ε
2

n=M+1
ε(X
n
) and this last
quantity goes to zero as M → ∞ since the sum converges. Thus if
Λ
M
= sup
n,m≥M
[S
m
−S
n
[
then
P¦Λ
M
> 2ε¦ ≤ P¦ max
m≤M
[S
m
−S
M
[ > ε¦ → 0
as M → ∞ and hence Λ
M
→ 0 a.s. as M → ∞. Thus for almost every ω, ¦S
m
(ω)¦
is a Cauchy sequence and hence it converges.
Example 5.2. Let X
1
, X
2
, . . . be i.i.d. N(0, 1). Then for every t,
B
t
(ω) =

n=1
X
n
sin(nπt)
n
converges a.s. (This is a series representation of Brownian motion.)
Theorem 5.4 (Kolmogorov’s Three Series Theorem). Let ¦X
j
¦ be inde-
pendent random variables. Let A > 0 and set Y
j
= X
j
1
(|X
j
|≤A)
. Then

n=1
X
n
converges a.s. if and only if the following three conditions hold:
(i)

n=1
P([X
n
[ > A) < ∞,
100
(ii)

n=1
EY
n
converges,
(iii)

n=1
var(Y
n
) < ∞.
Proof. Assume (i)–(iii). Let µ
n
= EY
n
. By (iii) and Theorem 5.3,

(Y
n
−µ
n
) con-
verges a.e. This and (ii) show that

n=1
Y
n
converges a.s. However, (i) is equivalent
to

n=1
P(X
n
,= Y
n
) < ∞ and by the Borel–Cantelli Lemma,
P(X
n
,= Y
n
i.o.¦ = 0.
Therefore, P(X
n
= Y
n
eventually¦ = 1. Thus if

n=1
Y
n
converges, so does

n=1
X
n
.
We will prove the necessity later as an application of the central limit theo-
rem.
For the proof of the strong law of large numbers, we need
Lemma 5.1 (Kronecker’s Lemma). Suppose that a
n
is a sequence of positive
real numbers converging up to ∞ and suppose

n=1
x
n
a
n
converges. Then
1
a
n
n

m=1
x
m
→ 0.
Proof. Let b
n
=
n

j=1
x
j
a
j
. Then b
n
→ b

, by assumption. Set a
0
= 0, b
0
= 0. Then
101
x
n
= a
n
(b
n
−b
n−1
), n = 1, 2, . . . and
1
a
n
n

j=1
x
j
=
1
a
n
n

j=1
a
j
(b
j
−b
j−1
)
=
1
a
n
_
_
b
n
a
n
−b
0
a
0

n−1

j=0
b
j
(a
j+1
−a
j
)
_
_
= b
n

1
a
n
n−1

j=0
b
j
(a
j+1
−a
j
)
The last equality is by summation by parts. To see this, precede by induction
observing ﬁrst that
n

j=1
a
j
(b
j
−b
j−1
) =
n−1

j=1
a
j
(b
j
−b
j−1
) +a
n
(b
n
−b
n−1
)
= b
n−1
a
n−1
−b
0
a
0

n−2

j=0
b
j
(a
j+1
−a
j
) +a
n
b
n
−a
n
b
n−1
= a
n
b
n
−b
0
a
0

n−2

j=0
b
j
(a
j+1
−a
j
) −b
n−1
(a
n
−a
n−1
)
= a
n
b
n
−b
0
a
0

n−1

j=0
b
j
(a
j+1
−a
j
)
Now, recall a b
n
→ b

. We claim that
1
a
n
n−1

j=0
b
j
(a
j+1
−a
j
) → b

. Since b
n
→ b

,
given ε > 0 ∃ N such that for all j > N, [b
j
−b

[ < ε. Since
1
a
n
n−1

j=0
(a
j+1
−a
j
) = 1
102
Jensen’s inequality gives
[
1
a
n
n−1

j=0
b
j
(a
j+1
−a
j
) −b

[ ≤
¸
¸
¸
¸
1
a
n
n−1

j=1
(b

−b
j
)(a
j+1
−a
j
)
¸
¸
¸
¸

1
a
n
N

j=1
[(b

−b
j
)(a
j+1
−a
j
)[
+
1
a
n
n−1

j=N+1
[b

−b
j
[[a
j+1
−a
j
[

1
a
n
N

j=1
[b
N
−b
j
[[a
j+1
−a
j
[ +ε
1
a
n
n

j=N+1
[a
j+1
−a
j
[

M
a
n
+ε.
Letting ﬁrst n → ∞ and then ε → 0 completes the proof.
Theorem 5.5 (The strong law of large numbers). Suppose ¦X
j
¦ are i.i.d., E[X
1
[ <
∞ and set EX
1
= µ. Then
S
n
n
→ µ a.s.
Proof. Let Y
k
= X
k
1
(|X
k
|≤k)
Then

n=1
P¦X
k
,= Y
k
¦ =

P([X
k
[ > k¦

_

0
P([X
1
[ > λ¦d
λ
= E[X[ < ∞.
Therefore by the First Borel–Cantelli Lemma, P¦X
k
,= Y
k
i.o.¦ = 0 or put in
other words, P¦X
k
,= Y
k
eventually¦ = 1. Thus if we set T
n
= Y
1
+ . . . + Y
n
. It
103
suﬃces to prove
T
n
n
→ µ a.s. Now set Z
k
= Y
k
−EY
k
. Then E(Z
k
) = 0 and

k=1
var(Z
n
)
k
2

k=1
E(Y
2
k
)
k
2
=

k=1
1
k
2
_

0
2λP¦[Y
k
[ > λ¦dλ
=

k=1
1
k
2
_
k
0
2λP¦[X
1
[ > λ¦dλ
= 2
_

0

k=1
λ
k
2
1
{λ≤k}
(λ)P¦[X
1
[ > λ¦dλ
= 2
_

0
λ
_

k>λ
1
k
2
_
P¦[X
1
[ > λ¦dλ.
≤ CE[X
1
[,
where we used the fact that

k>λ
1
k
2

C
λ
for some constant C which follows from the integral test. By Theorem 5.3,

k=1
Z
k
k
converges a.s. and the Kronecker’s Lemma gives that
1
n
n

k=1
Z
k
→ 0 a.s.
which is the same as
1
n
n

k=1
(Y
k
−EY
k
) → 0 a.s.
or
T
n
n

1
n
n

k=1
EY
k
→ 0 a.s.
We would be done if we can show that
1
n
n

k=1
EY
k
→ µ. (5.2)
104
We know EY
k
→ µ as k → ∞. That is, there exists an N such that for all k > N,
[EY
k
−µ[ < ε. With this N ﬁxed we have for all n ≥ N,
¸
¸
¸
¸
1
n
n

k=1
EY
k
−µ
¸
¸
¸
¸
=
¸
¸
¸
¸
1
n
n

k=1
(EY
k
−µ)
¸
¸
¸
¸

1
n
N

k=1
E[Y
k
−µ[ +
1
n
n

k=N
E[Y
k
−µ[

1
n
N

k=1
E[Y
k
−µ[ +ε.
Let n → ∞ to complete the proof.
¸6. Variants of the Strong Law of Large Numbers.
Let us assume E(X
i
) = 0 then under the assumptions of the strong law of
large numbers we have
S
n
n
→ 0 a.s. The question we address now is: Can we have
a better rate of convergence? The answer is yes under the right assumptions and
we begin with
Theorem 6.1. Let X
1
, X
2
, . . . be i.i.d., EX
i
= 0 and EX
2
1
≤ σ
2
< ∞. Then for
any ε ≥ 0
lim
n→∞
S
n
n
1/2
(log n)
1/2+ε
= 0,
a.s.
We will show later that in fact,
lim
S
n
_

2
nlog log n
= 1,
a.s. This last is the celebrated law of the iterated logarithm of Kinchine.
Proof. Set a
n
=

n(log n)
1
2

, n ≥ 2. a
1
> 0

n=1
var(X
n
/a
n
) = σ
_
1
a
2
1
+

n=2
1
n(log n)
1+2ε
_
< ∞.
105
Then

n=1
X
n
a
n
converges a.s. and hence
1
a
n
n

k=1
X
k
→ 0 a.s.
What if E[X
1
[
2
= ∞ But E[X
1
[
p
< ∞ for some 1 < p < 2? For this we have
Theorem 6.2 (Marcinkiewidz and Zygmund). X
j
i.i.d., EX
1
= 0 and
E[X
1
[
p
< ∞ for some 1 < p < 2. Then
lim
n→∞
S
n
n
1/p
= 0,
a.s.
Proof. Let
Y
k
= X
k
1
(|X
n
|≤k
1/p
)
and set
T
n
=
n

k=1
Y
k
.
It is enough to prove, as above, that
T
n
n
1/p
→0
, a.s. To see this, observe that

P¦Y
k
,= X
k
¦ =

k=1
P¦[X
k
[
p
> k¦
≤ E([X
1
[
p
) < ∞
and therefore by the ﬁrst Borel–Cantelli Lemma, P¦Y
k
,= X
k
i.o.¦ = 0 which is
the same as P¦Y
k
= X
k
, eventually¦ = 1
Next, estimating by the integral we have

k>λ
p
1
k
2/p
≤ C
_

λ
p
dx
x
2/p
=
1
(1 −2/p)
x
2−2/p
¸
¸
¸
¸

λ
p
= λ
p−2
106
and hence

k=1
var(Y
k
/k
1/p
) ≤

k=1
EY
2
k
k
2/p
= 2

k=1
1
k
2/p
_

0
λP¦[Y
k
[ > λ¦dλ
= 2

k=1
1
k
2/p
_
k
1/p
0
λP¦[X
1
[ > λ¦dλ
= 2

k=1
1
k
2/p
_

0
1
(0,k
1/p
)
(λ)λP¦[X
1
[ > λ¦dλ
= 2
_

0
λP¦[X
1
[ > λ¦
_

k>λ
p
1
k
2/p
_

≤ 2
_

0
λ
p−1
P¦[X
1
[ > λ¦dλ = C
p
E[X
1
[
p
< ∞.
Thus, and Kronecker implies, with µ
k
= E(Y
k
), that
1
n
1/p
n

k=1
(Y
k
−µ
k
) → 0, a.s.
If we are bale to show that
1
n
1/p
n

k=1
µ
k
→ 0, we will be done. Observe that 0 =
E(X
1
) = E(X1
(|X|≥k
1/p
)
) +µ
k
so that [µ
k
[ ≤ [E(X1
(|X|≥k
1/p
)
[ and therefore
¸
¸
¸
¸
1
n
1/p
n

k=1
µ
k
¸
¸
¸
¸

1
n
1/p
n

k=1
_

k
1/p
P¦[X
1
[ > λ¦dλ

1
pn
1/p
n

k=1
1
k
1−1/p
_

k
1/p

p−1
P¦[X
1
[ > λ¦dλ
=
1
pn
1/p
n

k=1
1
k
1−1/p
E¦[X
1
[
p
; [X
1
[ > k
1/p
¦.
Since X ∈ L
p
, given ε > 0 there is an N such that E([X
1
[
p
[X
1
[ > k
1/p
) < ε if
k > N. Also,
n

k=1
1
k
1−1/p
≤ C
n
_
1
x
1/p−1
dx ≤ Cn
1/p
.
The Theorem follows from these.
107
Theorem 6.3. Let X
1
, X
2
, . . . be i.i.d. with EX
+
j
= ∞ and EX

j
< ∞. Then
lim
n→∞
S
n
n
= ∞, a.s.
Proof. Let M > 0 and X
M
j
= X
i
∧ M, the maximum of X
j
and M. Then X
M
i
are i.i.d. and E[X
M
i
[ < ∞. (Here we have used the fact that EX

j
< ∞.) Setting
S
M
n
= X
M
1
= X
M
1
+. . . +X
M
n
we see that
S
M
n
n
→ EX
M
1
a.s. Now, since X
i
≥ X
M
i
we have
lim
S
n
n
≥ lim
n→∞
S
M
n
n
= EX
M
1
, a.s.
However, by the monotone convergence theorem, E(X
M
1
)
+
↑ E(X
+
1
) = ∞, hence
EX
M
1
= E(X
M
1
)
+
−E(X
M
1
)

↑ +∞.
Therefore,
lim
S
n
n
= ∞, a.s.
and the result is proved.
¸7. Two Applications.
We begin with an example from Renewal Theory. Suppose X
1
, X
2
, . . . be
are i.i.d. and 0 < X
i
< ∞, a.s. Let T
n
= X
1
+ . . . + X
n
and think of T
n
as the time of the nth occurrence of an event. For example, X
i
could be the
lifetime of ith lightbulb in a room with inﬁnitely many lightbulbs. Then T
n
=
is the time the nth lightbulb burns out. Let N
t
= sup¦n: T
n
≤ t¦ which in this
example is the number of lightbulbs that have burnt out by time t.
Theorem 7.1. Let X
j
be i.i.d. and set EX
1
= µ which may or may not be ﬁnite.
Then
N
t
t
→ 1/µ, a.s. as t → ∞ where this is 0 if µ = ∞. Also, E(N(t))/t → 1/µ
Continuing with our lightbulbs example, note that if the mean lifetime is
large then the number of lightbulbs burnt by time t is very small.
108
Proof. We know
T
n
n
→ µ a.s. Note that for every ω ∈ Ω, N
t
(ω) is and integer and
T(N
t
) ≤ t < T(N
t
+ 1).
Thus,
T(N
t
)
N
t

t
N
t

T(N
t
+ 1)
N
t
+ 1

N
t
+ 1
N
t
.
Now, since T
n
< ∞ for all n, we have N
t
↑ ∞ a.s. By the law of large numbers
there is an Ω
0
⊂ Ω such that P(Ω
0
) = 1 and such that for ω ∈ Ω
0
,
T
N
t
(ω)
(ω)
N
t
(ω)
→ µ and
N
t
(ω) + 1
N
t
(ω)
→ 1.
Thus t/N
t
(ω) → µ a.s. and we are done.
Let X
1
, X
2
, . . . be i.i.d. with distribution F. For x ∈ R set
F
n
(x, ω) =
1
n
n

n=1
1
(X
k
≤x)
(ω).
This is the observed frequency of values ≤ x. Now, ﬁx ω ∈ Ω and set a
k
= X
k
(ω).
Then F
n
(x, ω) is the distribution with jump of size
1
n
at the points a
k
. This is
called the imperial distribution based on n samples of F. On the other hand, let
us ﬁx x. Then F
n
(x, ) is a random variable. What kind of a random variable is
it? Deﬁne
ρ
k
(ω) = 1
(X
k
≤x)
(ω) =
_
1, X
k
(ω) ≤ x
0, X
k
(ω) > x
Notice that in fact the ρ
k
are independent Bernoullians with p = F(x) and Eρ
k
=
F(x). Writing
F
n
(x, ω) =
1
n
n

k=1
ρ
k
we see that in fact F
n
(x, ) =
S
n
n
and the Strong Law of Large numbers shows
that for every x ∈ R, F
n
(x, ω) → F(x) a.s. Of course, the exceptional set may
depend on x. That is, what we have proved here is that given x ∈ R there is a set
109
N
x
⊂ Ω such that P(N
x
) = 0 and such that F
n
(x, ω) → F(x) for ω ∈ N
x
. If we
set N = ∪
x∈Q
N
x
where we use Q to denote the rational numbers, then this set
also has probability zero and oﬀ this set we have F
n
(x, ω) → F(x) for all ω ∈ N
and all x ∈ Q. This and the fact that the discontinuities of distribution functions
are at most countable turns out to be enough to prove
Theorem 7.2 ( Glivenko–Cantelli Theorem). Let
D
n
(ω) = sup
x∈R
[F
n
(x, ω) −F(x)[.
Then D
n
→ 0 a.s.
110
VI
THE CENTRAL LIMIT THEOREM
¸1 Convergence in Distribution.
If X
n
“tends” to a limit, what can you say about the sequence F
n
of d.f. or
the sequence ¦µ
n
¦ of measures?
Example 1.1. Suppose X has distribution F and deﬁne the sequence of random
variables X
n
= X+1/n. Clearly, X
n
→ X a.s. and in several other ways. F
n
(x) =
P(X
n
≤ x) = P(X ≤ x −1/n) = F(x −1/n). Therefore,
lim
n→∞
F
n
(x) = F(x−).
Hence we do not have convergence of F
n
to F. Even worse, set X
n
= X + C
n
where C
n
=
_
1
n
even
−1/n odd
. Thenthe limit may not even exist.
Deﬁnition 1.1. The sequence ¦F
n
¦ of d.f. converges weakly to the d.f. F if
F
n
(x) → F(x) for every point of continuity of F. We write F
n
⇒ F. In all
our discussions we assume F is a d.f. but it could just as well be a (sub. d.f.).
The sequence of random variables X
n
converges weakly to X if their distri-
butions functions F
n
(x) = P(X
n
≤ x) converge weakly to F(x) = P(X ≤ x). We
will also use X
n
⇒ X.
Example 1.2.
(1) The Glivenko–Cantelli Theorem
111
(2) X
i
= i.i.d. ±1, probability 1/2. If S
n
= X
1
+. . . +X
n
then
F
n
(y) = P
_
S
n

n
≤ y
_

1

_
y
−∞
e
−x
2
/2
dx.
This last example can be written as
S
n
n
⇒ N(0, 1) and is called the the De
Moivre–Laplace Central limit Theorem. Our goal in this chapter is to obtain a very
general version of this result. We begin with a detailed study of convergence in
distribution.
Theorem 1.1 (Skorhod’s Theorem). IF F
n
⇒ F, then there exists random
variables Y
n
, Y with Y
n
→ Y a.s. and Y
n
∼ F
n
, Y ∼ F.
Proof. We construct the random variables on the canonical space. That is, let
Ω = (0, 1), T the Borel sets and P the Lebesgue measure. As in Chapter IV,
Theorem 1.1,
Y
n
(ω) = inf¦x: ω ≤ F
n
(x)¦, Y (ω) = inf¦x: ω ≤ F(x)¦.
are random variables satisfying Y
n
∼ F
n
and Y ∼ F
The idea is that if F
n
→ F then F
−1
n
→ F
−1
, but of course, the problem is
that this does not happen for every point and that the random variables are not
exactly inverses of the distribution functions. Thus, we need to proceed with some
care. In fact, what we shall show is that Y
n
(ω) → Y (ω) except for a countable
set. Let 0 < ω < 1. Given ε > 0 chose and x for which Y (ω) − ε < x < Y (ω)
and F(x−) = F(x), (that is for which F is continuous at x). Then by deﬁnition
F(x) < ω. Since F
n
(x) → F(x) we have that for all n > N, F
n
(x) < ω. Hence,
again by deﬁnition, Y (ω) −ε < x < Y
n
(ω), for all such n. Therefore,
lim Y
n
(ω) ≥ Y (ω).
It remains to show that
lim Y
n
(x) ≤ Y (x).
112
Now, if ω < ω

and ε > 0, choose y for which Y (ω

) < y < Y (ω

) + ε and F is
continuous at y. Now,
ω < ω

≤ F(Y (ω

)) ≤ F(y).
Again, since F
n
(y) → F(y) we see that for all n > N, ω ≤ F
n
(y) and hence
Y
n
(ω) ≤ y < Y (ω

) + ε which implies lim Y
n
(ω) ≤ Y (ω

). If Y is continuous at ω
we must have
lim Y
n
(ω) ≤ Y (ω).

The following corollaries follow immediately from Theorem 1.1 and the results
in Chapter II.
Corollary 1.1 (Fatou’s in Distribution). Suppose X
n
⇒ X and g ≥ 0 and
continuous. Then E(g(X)) ≤ limE(g(X
n
)).
Corollary 1.2 (Dominated Convergence in Distribution). If X
n
⇒ X, g
is continuous and and E[(g(X
n
)[ < C, then
E(g(X
n
)) → E(g(X)).
The following is a useful characterization of convergence in distribution.
Theorem 1.2. X
n
⇒ X if and only if for every bounded continuous function g
we have E(g(X
n
)) → E(g(X)).
Proof. If X
n
⇒ X then Corollary 2.1 implies the convergence of the expectations.
Conversely, let
g
x,ε
(y) =
_
¸
_
¸
_
1 y ≤ x
0 y ≥ x +ε
linear x ≤ y ≤ x +ε
113
It follows from this that
P(X
n
≤ x) ≤ E(g
x,ε
(X
n
))
and therefore,
limP(X
n
≤ x) ≤ limE(g
x,ε
(X
n
))
= E(g
x,ε
(X))
≤ P(X ≤ x +ε).
Now, let ε → 0 to conclude that
limP(X
n
≤ x) ≤ P(X ≤ x).
In the same way,
P(X ≤ x −ε) ≤ E(g
x−ε,ε
(X))
= lim
n→∞
E(g
x−ε,ε
(X
n
))
≤ lim
n→∞
P(X
n
≤ x).
Now, let ε → 0. If F continuous at x, we obtain the result.
Corollary 1.3. Suppose X
n
→ X in probability. Then X
n
⇒ X.
Lemma 1.1. Suppose X
n
→ 0 in probability and [X
n
[ ≤ Y with E(Y ) < ∞.
Then E[X
n
[ → 0.
Proof. Fix ε > 0. Then P¦[X
n
[ > ε¦ → 0, as n → ∞. Hence by Proposition 2.6 in
Chapter II,
_
{|X
n
|>ε}
[Y [ dP → 0, as n → ∞.
114
Since
E[X
n
[ =
_
{|X
n
|<ε}
[X
n
[dP +
_
{|X
n
|>ε}
[X
n
[dP
< ε +
_
{|X
n
|>ε}
[Y [ dP,
the result follows.
Proof of Corollary 1.3. If X
n
→ X in probability and g is bounded and continuous
then g(X
n
) → g(X) in probability (why ?) and hence E(g(X
n
)) → E(g(X)),
proving X
n
⇒ X.
An alternative proof is as follows. Set a
n
= E(G(X
n
)) and a = E(X). Let
a
n
k
be a subsequence. Since X
n
k
converges to X in probability also, we have a
subsequence X
n
k
j
which converges almost everywhere and hence by the dominated
convergence theorem we have a
n
k
j
→ a and hence the sequence a
n
also converges
to a, proving the result.
Theorem 1.3 (Continuous mapping Theorem). Let g be a measurable func-
tion in R and let D
g
= ¦x: g is discontinuous at x¦. If X
n
⇒ X and P¦X ∈
D
g
¦ = µ(D
g
) = 0, then g(X
n
) ⇒ g(X)
Proof. Let X
n
∼ Y
n
, X ∼ Y and Y
n
→ Y a.s. Let f be continuous and bounded.
Then D
f◦g
⊂ D
g
. So,
P¦Y

∈ D
f◦g
¦ = 0.
Thus,
f(g(Y
n
)) → f(g(Y ))
a.s. and the dominated convergence theorem implies that E(f(g(Y
n
))) → E(f(g(Y ))
and this proves the result.
Next result gives a number of useful equivalent deﬁnitions.
115
Theorem 1.4. The following are equivalent:
(i) X
n
⇒ X.
(ii) For all open sets G ⊂ R, limP(X
n
∈ G) ≥ P(X ∈ G) or what is the
same, limµ
n
(G) ≥ µ(G), where X
n
∼ µ
n
and X ∼ µ.
(iii) For all closed sets K ⊂ R , limP(X
n
∈ K) ≤ P(X ∈ K).
(iv) For all sets A ⊂ R with P(X ∈ ∂A) = 0 we have lim
n→∞
P(X
n
∈ A) =
P(X ∈ A).
We recall that for any set A, ∂A = A¸A
0
where A is the closure of the set
and A
0
is its interior. It can very well be that we have strict inequality in (ii) and
(iii). Consider for example, X
n
= 1/n so that P(X
n
= 1/n) = 1. Take G = (0, 1).
Then P(X
n
∈ G) = 1. But 1/n → 0 ∈ ∂G, so,
P(X ∈ G) = 0.
Also, the last property can be used to deﬁne weak convergence of probability
measures. That is, let µ
n
and µ be probability measures on (R, B). We shall say
that µ
n
converges to µ weakly if µ
n
(A) → µ(A) for all borel sets A in R with the
property that µ(∂A) = 0.
Proof. We shall prove that (i) ⇒ (ii) and that (ii)⇔ (iii). Then that (ii) and (iii)
⇒ (iv), and ﬁnally that (iv) ⇒ (i).
Proof. Assume (i). Let Y
n
∼ X
n
, Y ∼ X, Y
n
→ Y a.s. Since G is open,
lim1
(Y
n
∈G)
(ω) ≥ 1
(Y ∈G)
(ω)
Therefore Fatou’s Lemma implies
P(Y ∈ G) ≤ limP(Y
n
∈ G),
116
proving (ii). Next, (ii) ⇒ (iii). Let K be closed. Then K
c
is open. Put
P(X
n
∈ K) = 1 −P(X
n
∈ K
c
)
P(X ∈ K) = 1 −P(X ∈ K
c
).
The equivalence of (ii) and (iii) follows from this.
Now, (ii) and (iii) ⇒ (iv). Let K = A, G = A
0
and ∂A = A¸A
0
. Now,
G = K¸∂A and under our assumption that P(X ∈ ∂A) = 0,
P(X ∈ K) = P(X ∈ A) = P(X ∈ G).
Therefore, (ii) and (iii) ⇒
lim P(X
n
∈ A) ≤ lim P(X
n
∈ K)
≤ P(X ∈ K)
= P(X ∈ A)
lim P(X
n
∈ A) ≥ lim P(X
n
∈ G)
and this gives
P(X

∈ G) = P(X

∈ A).
To prove that (iv) implies (i), take A = (−∞, x]. Then ∂A = ¦x¦ and this
completes the proof.
Next, recall that any bounded sequence of real numbers has the property that
it contains a subsequence which converges. Suppose we have a sequence of prob-
ability measures µ
n
. Is it possible to pull a subsequence µ
u
k
so that it converges
weakly to a probability measure µ? Or, is it true that given distribution functions
F
n
there is a subsequence ¦F
n
k
¦ such that F
n
k
converges weakly to a distribution
function F? The answer is no, in general.
117
Example 1.3. Take F
n
(x) =
1
3
1
(x≥n)
(x) +
1
3
1
(x≥−n)
(x) +
1
3
G(x) where G is a
distribution function. Then
lim
n→∞
F
n
(x) = F(x) = 1/3 + 1/3G(x)
lim
x↑∞
f(x) = 2/3 < 1
lim
x↓−∞
F(x) = 1/3 ,= 0.
Lemma 1.1. Let f be an increasing function on the rationals Q and deﬁne
˜
f on
R by
˜
f(x) = inf
x<t∈Q
f(t) = inf¦f(t): x < t ∈ Q¦
= lim
t
n
↓x
f(t
n
)
Then
˜
f is increasing and right continuous.
Proof. The function
˜
f is clearly increasing. Let x
0
∈ R and ﬁx ε > 0. We shall
show that there is an x > x
0
such that
0 ≤
˜
f(x) −
˜
f(x
0
) < ε.
By the deﬁnition, there exists t
0
∈ Q such that t
0
> x
0
and
f(t
0
) −ε <
˜
f(x
0
) < f(t
0
).
Hence
[f(t
0
) −
˜
f(x)[ < ε.
Thus if t ∈ Q is such that x
0
< t < t
0
, we have
0 ≤ f(t) −
˜
f(x
0
) ≤ f(t
0
) −
˜
f(x
0
) < ε.
That is, for all x
0
< t < t
0
, we have
f(t) <
˜
f(x
0
) +ε
and therefore if x
0
< x < t
0
we see that
0 ≤
˜
f(x) −
˜
f(x
0
) < ε,
proving the right continuity of
˜
f.
118
Theorem 1.5 (Helly’s Selection Theorem). Let ¦F
n
¦ be a sequence of of
distribution functions. There exists a subsequence ¦F
n
k
¦ and a right continuous
nondecreasing function function F such that F
n
k
(x) → F(x) for all points x of
continuity of F.
Proof. Let q
1
, q
2
, . . . be an enumeration of the rational. The sequence ¦F
n
(q
1

has values in [0, 1]. Hence, there exists a subsequence F
n
1
(q
1
) → G(q
1
). Similarly
for F
n
1
(q
2
) and so on. schematically we see that
q
1
: F
n
1
, . . . → G(q
1
)
q
2
: F
n
2
, . . . → G(q
2
).
.
.
.
q
k
: F
n
k
(q
k
) . . . → G(q
k
)
.
.
.
Now, let ¦F
n
n
¦ be the diagonal subsequence. Let q
j
be any rational. Then
F
n
n
(q
j
) → G(q
j
).
So, we have a nondecreasing function G deﬁned on all the rationals. Set
F(x) = inf¦G(q): q ∈ Q: q > x¦
= lim
q
n
↓x
G(q
n
)
By the Lemma 1.1 F is right continuous and nondecreasing. Next, let us show
that F
n
k
(x) → F(x) for all points of continuity of F. Let x be such a point and
pick r
1
, r
2
, s ∈ Q with r
1
< r
2
< x < s so that
F(x) −ε < F(r
1
) ≤ F(r
2
)
≤ F(x) ≤ F(s)
< F(x) +ε.
119
Now, since F
n
k
(r
2
) → F(r
2
) ≥ F(r
1
) and F
n
k
(s) → F(s) we have for n
k
large
enough,
F(x) −ε < F
n
k
(r
2
) ≤ F
n
k
(x) ≤ F
n
k
(s) < F(x) +ε
and this shows that F
n
k
(x) → F(x), as claimed.
When can we guarantee that the above function is indeed a distribution?
Theorem 1.6. Every weak subsequential limit µ of ¦µ
n
¦ is a probability measures
if and only if for every ε > 0 there exists a bounded interval I
ε
= (a, b] such that
inf
n
µ
n
(I
ε
) > 1 −ε. (*)
In terms of the distribution functions this is equivalent to the statement that
for all ε > 0, there exists an M
ε
> 0 such that sup
n
¦1 −F
n
(M
ε
) +F
n
(−M
ε
)¦ < ε.
A sequence of probability measures satisfying (∗) is said to be tight. Notice that
if µ
n
is unit mass at n then clearly µ
n
is not tight. “The mass of µ
n
scapes to
inﬁnity.” The tightness condition prevents this from happening.
Proof. Let µ
n
k
⇒ µ. Let J ⊃ I
ε
and µ(∂J) = 0. Then
µ(R) ≥ µ(J) = lim
n→∞
µ
n
k
(J)
≥ limµ
n
k
(I
ε
)
> 1 −ε.
Therefore, µ(R) = 1 and µ is a probability measure.
Conversely, suppose (∗) fails. Then we can ﬁnd an ε > 0 and a sequence n
k
such that
µ
n
k
(I) ≤ 1 −ε,
120
for all n
k
and all bounded intervals I. Let µ
n
k
j
→ µ weakly. Let J be a continuity
interval for µ. Then
µ(J) = lim
j→∞
µ
n
k
j
(J) ≤ limµ
n
k
j
(J)
≤ 1 −ε.
Therefore, µ(R) ≤ 1 −ε and µ is not a probability measure.
¸2 Characteristic Functions.
Let µ be a probability measure on R and deﬁne its Fourier transform by
ˆ µ(t) =
_
R
e
itx
dµ(x). Notice that the Fourier transform is a a complex valued
function satisfying [ˆ µ(t)[ ≤ µ(R) = 1 for all t ∈ R. If X be a random variable its
characteristic function is deﬁned by
ϕ
X
(t) = E(e
itX
) = E(cos(tX)) +iE(sin(tX)).
Notice that if µis the distribution measure of X then
ϕ
X
(t) =
_
R
e
itx
dµ(x) = ˆ µ(t).
and again [ϕ
X
(t)[ ≤ 1. Note that if X ∼ Y then ϕ
X
(t) = ϕ
Y
(t) and if X and Y
are independent then
ϕ
X+Y
(t) = E(e
itX
e
itY
) = ϕ
X
(t)ϕ
Y
(t).
In particular, if X
1
, X
2
, . . . , X
n
are are i.i.d., then
ϕ
X
n
(t) = (ϕ
X
1
(t))
n
.
Notice also that if (a +ib) = a − ib then ϕ
X
(t) = ϕ
X
(−t). The function ϕ is
uniformly continuous. To see the this observe that
[ϕ(t +h) −ϕ(t)[ = [E[e
i(t+h)X
−e
itX
)[
≤ E[e
ihX
−1[
121
and use the continuity of the exponential to conclude the uniform continuity of
ϕ
X
. Next, suppose a and b are constants. Then
ϕ
aX+b
(t) = e
itb
ϕ
X
(at).
In particular,
ϕ
−X
(t) = ϕ
X
(−t) = ϕ
X
(t).
If −X ∼ X then ϕ
X
(t) = ϕ
X
(t) andϕ
X
is real. We now proceed to present some
examples which will be useful later.
Examples 2.1.
(i) (Point mass at a) Suppose X ∼ F = δ
a
. Then
ϕ(t) = E(e
itX
) = e
ita
(ii) (Coin ﬂips) P(X = 1) = P(X = −1) = 1/2. Then
ϕ(t) = E(e
itX
) =
1
2
e
it
+
1
2
e
−it
=
1
2
(e
it
+e
−it
) = cos(t).
(iii) (Bernoulli ) P(X = 1) = p, P(X = 0) = 1 −p. Then
ϕ(t) = E(e
itX
) = pe
it
+ (1 −p)
= 1 +p(e
it
−1)
(iv) (Poisson distribution) P(X = k) = e
−λ λ
k
k!
, k = 0, 1, 2, 3 . . .
ϕ(t) =

k=0
e
itk
e
−λ
λ
k
k!
= e
−λ

k=0
(λe
it
)
k!
k
= e
−λ
e
λe
it
= e
λ(e
it
−1)
122
(v) (Exponential) Let X be exponential with density e
−y
. Integration by parts
gives
ϕ(t) =
1
(1 −it)
.
(vi) (Normal) X ∼ N(0, 1).
ϕ(t) = e
−t
2
/2
.
Proof of (vi). Writing e
itx
= cos(tx) +i sin(tx) we obtain
ϕ(t) =
1

_
R
e
itx
e
−x
2
/2
dx =
1

_
R
cos txe
−x
2
/2
dx
ϕ

(t) =
1

_
R
−xsin(tx)e
−x
2
/2
dx
= −
1

_
R
sin(tx)xe
−x
2
/2
dx
= −
1

_
R
t cos(tx)e
−x
2
/2
dx
= −tϕ(t).
This gives
ϕ

(t)
ϕ(t)
= −t which, together with the initial condition ϕ(0) = 1, immedi-
ately yields ϕ(t) = e
−t
2
/2
as desired.
Theorem 2.1 (The Fourier Inversion Formula). Let µ be a probability mea-
sure and let ϕ(t) =
_
R
e
itx
dµ(x). Then if x
1
< x
2
µ(x
1
, x
2
) +
1
2
µ(x
1
) +
1
2
µ(x
2
) = lim
T→0
1

_
T
−T
e
−itx
1
−e
−itx
2
it
ϕ(t)dt.
Remark. The existence of the limit is part of the conclusion. Also, we do not mean
that the integral converges absolutely. For example, if µ = δ
0
then ϕ(t) = 1. If
x
1
= −1 and x
2
= 1, then we have the integral of
2 sin t
t
which does not converse
absolutely.
123
Recall that
sign(α) =
_
¸
_
¸
_
1 α > 0
0 α = 0
−1 α < 0
Lemma 2.1. For all y > 0,
0 ≤ sign(α)
_
y
0
sin(αx)
x
dx ≤
_
π
0
sin x
x
dx, (2.1)
_

0
sin(αx)
x
dx = π/2 sign(α), (2.2)
_

0
1 −cos αx
x
2
dx =
π
2
[α[. (2.3)
Proof. Let αx = u. It suﬃces to prove (2.1)–(2.3) for α = 1. For (2.1), write
[0, ∞) = [0, π] ∪ [π, 2π], . . . and choose n so that nπ < y ≤ (n + 1)π. Then
_
y
0
sin x
x
dx =
n

k=0
_
_
(k+1)π

sin x
x
dx
_
+
_
y

sin x
x
dx
=
_
π
0
sin x
x
dx +
_

π
sin x
x
dx +
_

sin x
x
dx +. . . +
_
y

sin x
x
dx
=
_
π
0
sin x
x
dx + (−1)a
1
+ (−1)
2
a
2
+. . . + (−1)
n−1
a
n−1
+ (−1)
n
_
y

sin x
x
dx
where [a
j+1
[ < [a
j
[. If n is odd then n −1 is even and
y
_

sin x
x
dx < 0. Comparing
terms we are done. If n is even, the result follows by replacing y with (n+1)π and
using the same argument.
For (2.2) and (2.3) apply Fubini’s Theorem to obtain
_

0
sin x
x
dx =
_

0
sin x
_

0
e
−ux
dudx
=
_

0
__

0
e
−ux
sin xdx
_
du
=
_

0
_
du
1 +u
2
_
= π/2.
124
and
_

0
1 −cos x
x
2
dx =
_

0
1
x
2
_
x
0
sin ududx
=
_

0
sin u
_

u
dx
x
2
du
=
_

0
sin u
u
du
= π/2.
This completed the proof.
Proof of Theorem 2.1. We begin by observing that
¸
¸
¸
¸
e
it(x−x
1
)
−e
it(x−x
2
)
it
¸
¸
¸
¸
=
¸
¸
¸
¸
_
x
2
x
1
e
−itu
du
¸
¸
¸
¸
≤ [x
1
−x
2
[
and hence for any T > 0,
_
R
1
_
T
−T
[x
2
−x
2
[dtdµ(x) ≤ 2T[x
1
−x
2
[ < ∞.
From this, the deﬁnition of ϕ and Fubini’s Theorem, we obtain
1

_
T
−T
e
−i+x
1
−e
−itx
2
it
ϕ(t)dt =
_

−∞
_
T
−T
e
−itx
1
−e
−itx
2
2πit
e
itx
dtdµ(x)
=
_

−∞
_
_
T
−T
e
it(x−x
1
)
−e
it(x−x
2
)
2πit
dt
_
dµ(x)
=
_

−∞
F(T, x, x
1
, x
2
)dµ(x) (2.4)
Now,
F(T, x, x
1
, x
2
) =
1
2πi
_
T
−T
cos(t(x −x
1
))
t
dt +
i
2πi
_
T
−T
sin(t(x −x
1
))
t
dt

1
2πi
_
T
−T
cos(t(x −x
2
))
t
dt −
i
2πi
_
T
−T
sin(t(x −x
2
))
t
dt
=
1
π
_
T
0
sin(t(x −x
1
))
t
dt −
1
π
_
T
0
sin(t(x −x
2
))
t
dt,
125
using the fact that
sin(t(x −x
i
))
t
is even and
cos(t(x −x
i
)
t
odd.
By (2.1) and (2.2),
[F(T, x, x
1
, x
2
)[ ≤
2
π
_
π
0
sin t
t
dt.
and
lim
T→∞
F(T, x, x
1
, x
2
) =
_
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
_

1
2
−(−
1
2
) = 0, if x < x
1
0 −(−
1
2
) =
1
2
, if x = x
1
1
2
−(−
1
2
) = 1, if x
1
< x < x
2
1
2
−0 =
1
2
if x = x
2
1
2

1
2
= 0, if x > x
2
Therefore by the dominated convergence theorem we see that the right hand side
of (2.4) is
_
(−∞,x
1
)
0 dµ +
_
{x
1
}
1
2
dµ +
_
(x
1
,x
2
)
1 dµ +
_
1
2
dµ +
_
(x
2
,∞)
0 dµ
= µ(x
1
, x
2
) +
1
2
µ¦x
1
¦ +
1
2
µ¦x
2
¦,
proving the Theorem.
Corollary 2.1. If two probability measures have the same characteristic function
then they are equal.
This follows from the following
Lemma 2.1. Suppose The two probability measures µ
1
and µ
2
agree on all inter-
vals with endpoints in a given dense sets, then they agree on all of B(R).
Proof Corollary 2.1. Since the atoms of both measures are countable, the two
measures agree, the union of their atoms is also countable and hence we may
apply the Lemma.
126
Corollary 2.2. Suppose X is a random variable with distribution function F
and characteristic function satisfying
_
R

X
[dt < ∞. Then F is continuously
diﬀerentiable and
F

(x) =
1

_
R
e
−ity
ϕ
X
(y)dy.
Proof. Let x
1
= x −h, x
2
= x, h > 0. Since µ(x
1
, x
2
) = F(x
2
−) −F(x
1
) we have
F(x
2
−) −F(x
1
) +
1
2
(F(x
1
) −F(x
1
−)) +
1
2
(F(x
2
) −F(x
2
−))
= µ(x
1
, x
2
) +
1
2
µ¦x
1
¦ +
1
2
µ¦x
2
¦
=
1

_

−∞
_
e
−it(x−h)
−e
−itx
it
_
ϕ
X
(t)dt.
Since
¸
¸
¸
¸
e
−it(x−h)
−e
−itx
it
¸
¸
¸
¸
=
¸
¸
¸
¸
_
x
x−h
e
−ity
dy
¸
¸
¸
¸
≤ h
we see that
lim
h→∞
(µ(x
1
, x
2
) +
1
2
µ(x
1
)) +
1
2
µ¦x
2
¦ ≤
≤ lim
h→0
h

_
R

X
(t)[ = 0.
Hence, µ¦x¦ = 0 for any x ∈ R, proving the continuity of F. Now,
F(x +h) −F(x)
h
= µ(x, x +h)
=
1

_
R
_
e
−it
−e
−it(x+h)
hit
_
ϕ
X
(t)dt
=
1

_
R

(e
−it(x+h)
−e
−itx
)
hit
ϕ
X
(t)dt.
Let h → 0 to arrive at
F

(x) =
1

_
R
e
−itx
ϕ
X
(t)dt.
127
Note that the continuity of F

follows from this, he continuity of the exponential
and the dominated convergence theorem.
Writing
F(x) =
_
x
−∞
F

(t)dt =
_
x
−∞
f(t)dt
we see that it has a density
f =
1

_
R
e
−itx
ϕ
X
(t)dt.
and hence also
ϕ(t) =
_
R
e
itx
f(x)dx.
¸3 Weak convergence and characteristic functions.
Theorem 3.1. Let ¦µ
n
¦ be a sequence of probability measures with characteristic
functions ϕ
n
. (i) If µ
n
converges weakly to a probability measure µ with charac-
teristic function ϕ, then ϕ
n
(t) → ϕ(t) for all t ∈ R. (ii) If ϕ
n
(t) → ˜ ϕ(t) for all
t ∈ R where ˜ ϕ is a continuous function at 0, then the sequence of measures ¦µ
n
¦
is tight and converges weakly to a measure µ and ˜ ϕ is the characteristic function
of µ. In particular, if ϕ
n
(t) converges to a characteristic function ϕ then µ
n
⇒ µ.
Example 3.1. Let µ
n
∼ N(0, n). Then ϕ
n
(t) = e

nt
2
2
. (By scaling if X ∼
N(µ, σ
2
) then ϕ
X
(t) = e
iµt−σ
2
t
2
/2
.) Clearly ϕ
n
→ 0 for all t ,= 0 and ϕ
n
(0) = 1
for all n. Thus ϕ
n
(t) converges for ever t but the limit is not continuous at 0. Also
with
µ
n
(−∞, x] =
1

2πn
_
x
−∞
e
−t
2
/2n
dt
a simple change of variables (r =
t

n
) gives
µ
n
= (−∞, x] =
1

_ x

n
−∞
e
−t
2
2
dt → 1/2
and hence no weak convergence.
128
Proof of (i). This is the easy part. Note that g(x) = e
itx
is bounded and continu-
ous. Since µ
n
⇒ µ we get that E(g(X
n
)) → E(y(X

)) and this gives ϕ
n
(t) → ϕ(t)
for every t ∈ R.
For the proof of (ii) we need the following Lemma.
Lemma 3.1 (Estimate of µ in terms of ϕ). For all A > 0 we have
µ[−2A, 2A] ≥ A
¸
¸
¸
¸
_
A
−1
−A
−1
ϕ(t)dt
¸
¸
¸
¸
−1. (3.1)
This, of course can also be written as
1 −A
¸
¸
¸
¸
_
A
−1
−A
−1
ϕ(t)[dt[ ≥ −µ[−2A, 2A], (3,2)
or
P¦[X[ > 2A¦ ≤ 2 −A
¸
¸
¸
¸
_
A
−1
−A
−1
ϕ(t)dt
¸
¸
¸
¸
. (3.3)
Proof of (ii). Let δ > 0.
¸
¸
¸
¸
1

_
δ
−δ
ϕ(t)dt
¸
¸
¸
¸

¸
¸
¸
¸
1

_
δ
−δ
ϕ
n
(t)dt
¸
¸
¸
¸
+
1

_
δ
−δ

n
(t) −ϕ(t)[dt.
Since ϕ
n
(t) → ϕ(t) for all t, we have for each ﬁxed δ > 0 (by the dominated
convergence theorem)
lim
n→∞
1

δ
_
−δ

n
(t) −ϕ(t)[dt → 0.
Since ϕ is continuous at 0, lim
δ→0
1

_
δ
δ
[ϕ(t)[dt = [ϕ(0)[ = 1. Thus for all ε > 0
there exists a δ = δ(ε) > 0 and n
0
= n
0
(ε) such that for all n ≥ n
0
,
1 −ε/2 <
¸
¸
¸
¸
1

_
δ
−δ
ϕ
n
(t)dt
¸
¸
¸
¸
+ε/2,
129
or
1
δ
¸
¸
¸
¸
_
δ
−δ
ϕ
n
(t)dt
¸
¸
¸
¸
> 2(1 −ε).
Applying the Lemma with A =
1
δ
gives
µ
n
[−2δ
−1
, 2δ
−1
] > δ
¸
¸
¸
¸
_
δ
−δ
ϕ(t)dt
¸
¸
¸
¸
> 2(1 −ε) −1 = 1 −2ε,
for all n ≥ n
0
. Thus the sequence ¦µ
n
¦ is tight. Let µ
n
k
⇒ ν. Then ν a probability
measure. Let ψ be the characteristic function of ν. Then since µ
n
k
⇒ ν the ﬁrst
part implies that ϕ
n
k
(t) → ψ(t) for all t. Therefore, ψ(t) = ˜ ϕ(t) and hence ˜ ϕ(t) is
a characteristic function and any weakly convergent subsequence musty converge
to a measure whose characteristic function is ˜ ϕ. This completes the proof.
Proof of Lemma 3.1. For any T > 0
_
T
−T
(1 −e
itx
)dt = 2T −
_
T
−T
(cos tx +i sin tx)dt
= 2T −
2 sin(Tx)
x
.
Therefore,
1
T
_
R
n
_
T
−T
(1 −e
itx
)dtdµ(x) = 2 −
_
R
2 sin(Tx)
Tx
dµ(x)
or
2 −
1
T
_
T
−T
ϕ(t)dt = 2 −
_
R
2 sin(πx)
Tx
dµ(x).
That is, for all T > 0,
1
2T
_
T
−T
ϕ(t)dt =
_
R
sin(Tx)
Tx
dµ(x).
Now, for any [x[ > 2A,
¸
¸
¸
¸
sin(Tx)
Tx
¸
¸
¸
¸

1
[Tx[

1
(2TA)
130
and also clearly,
¸
¸
¸
¸
sin Tx
Tx
¸
¸
¸
¸
< 1, for all x.
Thus for any A > 0 and any T > 0,
[
_
R
sin(Tx)
Tx
dµ(x)[ =
¸
¸
¸
¸
_
2A
−2A
sin(Tx)
Tx
µ(dx) +
_
|x|>2A
sin(Tx)
Tx
dµ(x)
¸
¸
¸
¸
≤ µ[−2A, 2A] +
1
2TA
[1 −µ[−2A, 2A]]
=
_
1 −
1
2TA
_
µ[−2A, 2A] +
1
2TA
.
Now, take T = A
−1
to conclude that
A
2
¸
¸
¸
¸
_
A
−1
−A
−1
ϕ(t)dt
¸
¸
¸
¸

¸
¸
¸
¸
_
R
sin Tx
Tx

¸
¸
¸
¸
=
1
2
µ[−2A, 2A] + 1/2
which completes the proof.
Corollary. µ¦x: [x[ > 2/T¦ ≤
1
T
_
T
−T
(1 − ϕ(t))dt, or in terms of the random
variable,
P¦[X[ > 2/T¦ ≤
1
T
_
T
−T
(1 −ϕ(t))dt,
or
P¦[X[ > T¦ ≤ T/2
_
T
−1
−T
−1
(1 −ϕ(t))dt.
¸4 Moments and Characteristic Functions.
Theorem 4.1. Suppose X is a random variable with E[X[
n
< ∞for some positive
integer n. Then its characteristic function ϕ has bounded continuous derivatives
of any order less than or equal to n and
ϕ
(k)
(t) =

_
−∞
(ix)
k
e
itx
dµ(x),
131
for any k ≤ n.
Proof. Let µ be the distribution measure of X. Suppose n = 1. Since
_
R
[x[dµ(x) <
∞ the dominated convergence theorem implies that
ϕ

(t) = lim
h→0
ϕ(t +h) −ϕ(t)
h
= lim
n→0
_
R
_
e
i(t+h)x
−e
itx
h
_

=
_
R
(ix)e
itx
dµ(x).
We now continue by induction to complete the proof.
Corollary 4.1. Suppose E[X[
n
< ∞, n an integer. Then its characteristic func-
tion ϕ has the following Taylor expansion in a neighborhood of t = 0.
ϕ(t) =
n

m=0
i
m
t
m
E(X)
m
m!
+o(t
n
).
We recall here that g(t) = o(t
m
) as t → 0 means g(t)/t
m
→ 0 as t → 0.
Proof. By calculus, if ϕ has n continuous derivatives at 0 then
ϕ(t) =
n

m=0
ϕ
(m)
(0)
m!
t
m
+o(t
n
).
In the present case, ϕ
(m)
(0) = i
m
E(X
m
) by the above theorem.
Theorem 4.2. For any random variable X and any n ≥ 1
¸
¸
¸
¸
Ee
itX

n

m=0
E(itX)
m
m!
¸
¸
¸
¸
≤ E
¸
¸
¸
¸
e
itX

n

m=0
(itX)
m
m!
¸
¸
¸
¸
≤ E
_
min
_
[tX[
n+1
(n + 1)!
,
2[tX[
n
n!
__
.
This follows directly from
132
Lemma 4.2. For any real x and any n ≥ 1,
¸
¸
¸
¸
e
ix

n

m=0
(ix)
m
m!
¸
¸
¸
¸
≤ min
_
[x[
n+1
(n + 1)!
,
2[X[
n
n!
_
.
We note that this is just the Taylor expansion for e
ix
with some information
on the error.
Proof. For all n ≥ 0 (by integration by parts),
_
x
0
(x −s)
n
e
is
ds =
x
n+1
(n + 1)
+
i
(n + 1)
_
x
0
(x −s)
n+1
e
is
ds.
For n = 0 this is the same as
1
i
(e
ix
−1) =
_
x
0
e
is
ds = x +i
_
x
0
(x −s)e
is
ds
or
e
ix
= 1 +ix +i
2
_
x
0
(x −s)e
is
ds.
For n = 1,
e
ix
= 1 +ix +
i
2
x
2
2
+
i
3
2
_
x
0
(x −s)
2
e
is
ds
and continuing we get for any n,
e
ix

n

m=0
(ix)
m
m!
=
i
n+1
n!
_
x
0
(x −s)
n
e
is
ds.
So, need to estimate the right hand side.
¸
¸
¸
¸
i
n+1
n!
_
x
0
(x −s)
n
e
is
ds
¸
¸
¸
¸

1
n!
¸
¸
¸
¸
_
x
0
(x −s)
n
dx
¸
¸
¸
¸
=
[x[
n+1
(n + 1)!
.
This is good for [x[ small. Next,
i
n
_
x
0
(x −s)
n
e
is
ds = −
x
n
n
+
_
x
0
(x −s)
n−1
e
is
ds.
133
Since
x
n
n
=
x
_
0
(x −s)
n−1
ds
we set
i
n
_
x
0
(x −s)
n
e
is
ds =
_
x
0
(x −s)
n−1
(e
is
−1)ds
or
i
n+1
n!
_
x
0
(x −s)
n
e
is
ds =
i
n
(n −1)!
_
x
0
(x −s)
n−1
(e
is
−1)ds.
This gives
¸
¸
¸
¸
i
n+1
n!
_
x
0
(x −s)
n
e
is
ds
¸
¸
¸
¸

2
(n −1)!
_
|x|
0
(x −s)
n−1
ds

2
n!
[x[
n
,
and this completes the proof.
Corollary 1. If EX = µ and E[X[
2
= σ
2
< ∞, then ϕ(t) = 1+itµ−
t
2
σ
2
2
+o(t)
2
,
as t → 0.
Proof. Applying Theorem 4.1 with n = 2 gives
¸
¸
¸
¸
ϕ(t) −
_
1 +itµ −
t
2
σ
2
2

¸
¸
¸
≤ t
2
E
_
[t[[X[
3
3!

2[X[
2
2!
_
and the expectation goes to zero as t → 0 by the dominated convergence theo-
rem.
¸5. The Central Limit Theorem.
We shall ﬁrst look at the i.i.d. case.
Theorem 5.1. ¦X
i
¦ i.i.d. with EX
i
= µ, var(X
i
) = σ
2
< ∞. Set S
n
= X
1
+
. . . +X
n
. Then
S
n
−nµ
σ

n
⇒ N(0, 1).
134
Equivalently, for any real number x,

S
n
−µ
σ

n
≤ x¦ →
1

_
x
−∞
e
−y
2
/2
dt.
Proof. By looking at X

i
= X
i
−µ, we may assume µ = 0. By above
ϕ
X
1
(t) = 1 −
t
2
σ
2
2
+g(t)
with
g(t)
t
2
→ 0 as t → 0. By i.i.d.,
ϕ
S
n
(t) =
_
1 −
t
2
σ
2
2
+g(t)
_
n
or
ϕ S
n
σ

n
(t) = ϕ
S
n

nt) =
_
1 −
t
2
2n
+g
_
t
σ

n
__
n
.
Since
g(t)
t
2
→ 0 as t → 0, we have (for ﬁxed t) that
g
_
t
σ

n
_
(1/

n)
2
=
g
_
t
σ

n
_
1
n
→ 0,
as n → ∞. This can be written as
ng
_
t
σ

n
_
→ 0 as n → ∞.
Next, set C
n
= −
t
2
2
+ng
_
t
σ

n
_
and C = −t
2
/2. Apply Lemma 5.1 bellow to
get
ϕ S
n
σ

n
(t) =
_
1 −
t
2
2n
+g(t/σ

n)
_
n
→ e
−t
2
/2
and complete the proof.
135
Lemma 5.1. If C
n
are complex numbers with C
n
→ C ∈ C. Then
_
1 +
C
n
n
_
n

e
C
.
Proof. First we claim that if z
1
, z
2
, . . . , z
n
and ω
1
, . . . , ω
n
are complex numbers
with [z
j
[ and [ω
j
[ ≤ η for all j, then
¸
¸
¸
¸
n

m=1
z
m

n

m=1
ω
m
¸
¸
¸
¸
≤ η
n−1
n

m=1
[z
m
−ω
m
[. (5.1)
If n = 1 the result is clearly true for n = 1; with equality, in fact. Assume it for
n −1 to get
¸
¸
¸
¸
n

m=1
z
m

n

m=1
ω
m
¸
¸
¸
¸

¸
¸
¸
¸
z
n
n−1

m=2
z
m
−ω
n
n−1

m=1
ω
m
¸
¸
¸
¸
¸
¸
¸
¸
z
n
n−1

m=1
z
m
−z
n
n−1

m=1
ω
m
+z
n
n−1

m=1
ω
m
−ω
n
n−1

m=1
ω
m
¸
¸
¸
¸
≤ η
¸
¸
¸
¸
n−1

m=1
z
m

n−1

m=1
ω
m
¸
¸
¸
¸
+
¸
¸
¸
¸
n−1

m=1
ω
m
¸
¸
¸
¸
[z
n
−ω
m
[
≤ ηη
n−2
n−1

m=1
[z
m
−ω
m
[ +η
n−1
[z
n
−ω
m
[
= η
n−1
n

m=1
[z
m
−ω
m
[.
Next, if b ∈ C and [b[ ≤ 1 then
[e
b
−(1 +b)[ ≤ [b[
2
. (5.2)
For this, write e
b
= 1 +b +
b
2
2
+
b
3
3!
+. . . . Then
[e
b
−(1 +b)[ ≤
[b[
2
2
_
1 +
2[b[
3!
+
2[b[
2
4!
+. . .
_

[b[
2
2
_
1 +
1
2
+
1
2
2
+
1
2
3
+. . .
_
= [b[
2
,
which establishes (5.2).
136
With both (5.1) and (5.2) established we let ε > 0 and choose γ > [C[. Take
n large enough so that [C
n
[ < γ and γ
2
e

/n < ε and
¸
¸
¸
¸
C
n
n
¸
¸
¸
¸
≤ 1. Set z
i
= (1 +
C
n
n
)
and ω
i
= (e
C
n
/n
) for all i = 1, 2, . . . , n. Then
[z
i
[ =
¸
¸
¸
¸
1 +
C
n
n
¸
¸
¸
¸

_
1 +
γ
n
_
and [ω
i
[ ≤ e
γ/n
hence for both z
i
and [ω
i
[ we have the bound e
γ/n
_
1 +
γ
n
_
. By (5.1) we have
¸
¸
¸
¸
_
1 +
C
n
n
_
n
−e
C
n
¸
¸
¸
¸
≤ e
γ
n
(n−1)
_
1 +
γ
n
_
n−1
n

m=1
¸
¸
¸
¸
e
C
n
n

_
1 +
C
n
n

¸
¸
¸
≤ e
γ
n
(n−1)
_
1 +
γ
n
_
n−1
n
¸
¸
¸
¸
e
C
n
n

_
1 +
C
n
n

¸
¸
¸
Setting b = C
n
/n and using (5.2) we see that this quantity is dominated by
≤ e
γ
n
(n−1)
_
1 +
γ
n
_
n−1
n
¸
¸
¸
¸
C
n
n
¸
¸
¸
¸
2

γ
2
e
γ
n
(n−1)
_
1 +
γ
n
_
n−1
n

γ
2
e
γ
n
(n−1)
e
γ
n

γ
2
e

n
< ε,
which proves the lemma
Example 5.1. Let X
i
be i.i.d. Bernoullians 1 and 0 with probability 1/2. Let
S
n
= X
1
+. . . +X
n
= total number of heads after n–tones.
EX
i
= 1/2, var(X
i
) = EX
2
i
−(E(X))
2
= 1/2 −(
1
4
) = 1/4
and hence
S
n
−µ
n
σ

n
=
S
n

n
2
_
n/4
⇒ χ = N(0, 1).
From a table of the normal distribution we ﬁnd that
P(χ > 2) ≈ 1 −.9773 = 0.227.
137
Symmetry:
P([χ[ < 2) = 1 −2(0.227) = .9546.
Hence for n large we should have
0.95 ≈ P
_
¸
¸
¸
¸
S
n

n
2
_
n
2
/4
¸
¸
¸
¸
< 2
_
= P
_
−2
2

n ≤ S
n

n
2
<
2
2

n
_
= P
_
n
2

n < S
n

n +n/2
_
.
If n = 250, 000,
n
2

n = 125, 000 −500
n
2
+

n = 125, 000 + 5000.
That is, with probability 0.95, after 250,000 tosses you will get between 124,500
Examples 5.2. A Roulette wheel has slots 1–38 (18 red and 18 black) and two
slots 0 and 00 that are painted green. Players can bet \$1 on each of the red and
black slots. The player wins \$1 if the ball falls on his/her slot. Let X
1
, . . . , X
n
be
i.i.d. with X
i
= ¦±1¦ and P(X
i
= 1) =
18
38
, P(X
i
= −1) =
20
38
. S
n
= X
1
+ +X
n
is the total fortune of the player after n games. Suppose we want to know P(S
n
≥ 0)
after large numbers tries. Since
E(X
i
) =
18
38

20
38
=
−2
38
= −
1
19
var(X
i
) = EX
2
−(E(x))
2
= 1 −
_
1
19
_
2
= 0.9972
we have
P(S
n
≥ 0) = P
_
S
n
−nµ
σ

n

−nµ
σ

n
_
.
138
Take n such that

n
_
−1
19
_
(.9972)
= 2.
This gives

n = 2(19)(0.9972) or n ≈ 3 61.4 = 1444. Hence
P(S
1444
≥ 0) = P
_
S
n
−nµ
σ

n
≥ 2
_
≈ P(χ ≥ 2)
= 1 −0.9772
= 0.0228
Also,
E(S
1444
) = −
1444
19
= −4.19
= −76.
Thus, after n = 1444 the Casino would have won \$76 of your hard earned dollars,
in the average, but there is a probability .0225 that you will be ahead. So, you
decide if you want to play!
Lemma 5.2. Let C
n,m
be nonnegative numbers with the property that max
1≤m≤n
C
n,m

0 and
n

m=1
C
n,m
→ λ. Then
n

m=1
(1 −C
n,m
) → e
−λ
.
Proof. Recall that
lim
a↓0
log
_
1
1−a
_
a
= 1.
Therefore, given ε > 0 there exists δ > 0 such that 0 < a < δ implies
(1 −ε)a ≤ log
_
1
1 −a
_
≤ (1 +ε)a.
139
If n is large enough, C
m,n
≤ δ and
(1 −ε)C
m,n
≤ log
_
1
1 −C
m,n
_
≤ (1 +ε)C
m,n
.
Thus
n

m=1
log
_
1
1 −C
m,n
_
→ λ
and this is the same as
n

m=1
log(1 −C
m,n
) → −λ
or
log
_
n

m=1
(1 −C
m,n
)
_
→ −λ.
This implies the result.
Theorem 5.2 (The Lindeberg–Feller Theorem). For each n, let X
n,m
, 1 ≤
m ≤ n, be independent r.v.’s with EX
n,m
= 0. Suppose
(i)
n

m=1
EX
2
n,m
→ σ
2
, σ ∈ (0, ∞).
(ii) For all ε > 0, lim
n→∞
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > ε) = 0.
Then, S
n
= X
n,1
+X
n,2
+. . . +X
n,m
⇒ N(0, σ
2
).
Example 5.3. Let Y
1
, Y
2
, . . . be i.i.d., EY
i
= 0, E(Y
2
i
) = σ
2
. Let X
n,m
=
Y
m
/n
1/2
. Then X
n,1
+X
n,2
+. . . +X
n,m
=
S
n

n
. Clearly,
n

m=1
E(Y
2
m
)
n
=
σ
2
n
n

m=1
1 = σ
2
.
Also, for all ε > 0,
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > ε) = nE
_
[Y
1
[
2
n
;
[Y
1
[
n
1/2
> ε
_
= E([Y
1
[
2
; [Y
1
[ > εn
1/2
)
140
and this goes to 0 as n → ∞ since E[Y
1
[
2
< ∞.
Proof. Let ϕ
n,m
(t) = E(e
itX
n,m
), σ
2
n,m
= E(X
2
n,m
). It is enough to show that
n

m=1
ϕ
n,m
(t) → e
−t
2
σ
2
/2
.
Let ε > 0 and set z
n,m
= ϕ
n,m
(t), ω
n,m
= (1 −t
2
σ
2
n,m
/2). We have
[z
n,m
−ω
n,m
[ ≤ E
_
[tX
n,m
[
3
3!

2[tX
n,m
[
2
2!
_
≤ E
_
[tX
n,m
[
3!
3

2[tX
n,m
[
2
2!
; [X
n,m
[ ≤ ε
_
+E
_
[tX
n,m
[
3
3!

2t
2
[X
n,m
[
2
2!
; [X
n,m
[ > ε
_
≤ E
_
[tX
n,m
[
3
3!
; [X
n,m
[ ≤ ε
_
+E
_
[tX
n,m
[
2
; [X
n,m
[ > ε
_

εt
3
6
E[X
n,m
[
2
+t
2
E([X
n,m
[
2
; [X
n,m
[ > ε)
Summing from 1 to n and letting n → ∞ gives (using (i) and (ii))
lim
n→∞
n

m=1
[z
n,m
−ω
n,m
[ ≤
εt
3
σ
2
6
.
Let ε → 0 to conclude that
lim
n→∞
n

m=1
[z
n,m
−ω
n,m
[ → 0.
Hence with η = 1 (5.1) gives
¸
¸
¸
¸
n

m=1
ϕ
n,m
(t) −
n

m=1
_
1 −
t
2
σ
2
n,m
2
_
¸
¸
¸
¸
→ 0,
as n → ∞. Now,
σ
2
n,m
≤ ε
2
+E([X
n,m
[
2
; [X
n,m
[ > ε)
141
and therefore,
max
1≤m≤n
σ
2
n,m
≤ ε
2
+
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > ε).
The second term goes to 0 as n → ∞. That is, max
1≤m≤n
σ
2
n,m
→ 0. Set C
n,m
=
t
2
σ
2
n,m
2
.
Then
n

m=1
C
n.m

t
2
2
σ
and Lemma 5.2 shows that
n

m=1
_
1 −
t
2
σ
n,m
2
_
→ e

t
2
σ
2
2
,
completing the proof of the Theorem.
We shall now return to the Kolmogorov three series theorem and prove the
necessity of the condition. This was not done when we ﬁrst stated the result earlier.
For the sake of completeness we state it in full again.
The Kolmogorov’s Three Series Theorem. Let X
1
, X
2
, . . . be independent
random variables. Let A > 0 and Y
m
= X
m
1
(|X
m
|≤A)
. Then

n=1
X
n
converges a.s.
if and only if the following three hold:
(i)

n=1
P([X
n
[ > A) < ∞,
(ii)

n=1
EY
n
converges and
(iii)

n=1
var(Y
n
) < ∞.
Proof. We have shown that if (i), (ii), (iii) are true then

n=1
X
n
converges a.s. We
now show that if

n=1
X
n
converges then (i)–(iii) hold. We begin by proving (i).
142
Suppose this is false. That is, suppose

m=1
P([X
n
[ > A) = ∞.
Then the Borel–Cantelli lemma implies that
P([X
n
[ > A i.o.) > 0.
Thus, lim
n

m=1
X
m
cannot exist. Hence if the series converges we must have (i).
Next, suppose (i) holds but

n=1
var(Y
n
) = ∞. Let
C
n
=
n

m=1
var(Y
m
) and X
n,m
=
(Y
m
−EY
m
C
1/2
n
.
Then
EX
n,m
= 0 and
n

m=1
EX
2
n,m
= 1.
Let ε > 0 and choose n so large that
2A
C
1/2
n
< ε. Then
n

m=1
E([X
n,m
[
2
; [X
n,m
[ > ε) ≤
n

m=1
E
_
[X
n,m
[
2
; [X
n,m
[ >
2A
C
1/2
n
_

n

m=1
E
_
[X
n,m
[
2
;
2A
C
1/2
n
<
[Y
n
[ +E[Y
m
[
C
1/2
n
_
.
But
[Y
n
[ +E[Y
m
[
C
1/2
n

2A
C
1/2
n
.
So, the above sum is zero. Let
S
n
= X
n,1
+X
n,2
+. . . X
n,m
=
1
C
1/2
n
n

m=1
(Y
m
−EY
m
).
By Theorem 5.2,
S
n
⇒ N(0, 1).
143
Now, if lim
n→∞
n

m=1
X
m
exists then lim
n→∞
n

m=1
Y
m
exists also. (This follows from (i).)
Let
T
n
=
n

m=1
Y
m
C
1/2
n
=
1
C
1/2
n
n

m=1
Y
m
and observe that T
n
⇒ 0. Therefore, (S
n
− T
n
) ⇒ χ where χ ∼ N(0, 1). (This
follows from the fact that lim
n→∞
E(g(S
n
−T
n
)) = lim
n→∞
E(g(S
n
)) = E(g(χ)).)
But
S
n
−T
n
= −
1
C
1/2
n
n

m=1
E(Y
m
)
which is nonrandom. This gives a contradiction and shows that (i) and (iii) hold.
Now,

n=1
var(Y
n
) < ∞ implies

m=1
(Y
m
− EY
m
) converges, by the corollary
to Kolmogorov maximal inequality. Thus if
n

m=1
X
n
converges so does

Y
m
and
hence also

EY
m
.
¸6. The Polya distribution.
We begin with some discussion on the Polya distribution. Consider the density
function given by
f(x) = (1 −[x[)1
x∈(−1,1)
= (1 −[x[)
+
.
Its characteristic function is given by
ϕ(t) =
2(1 −cos t)
t
2
and therefore for all y ∈ R,
(1 −[y[)
+
=
2

_
R
e
−ity
(1 −cos t)
t
2
dt
=
1
π
_
R
_
1 −cos t
t
2
_
e
−ity
dt.
144
Take y = −y this gives
(1 −[y[)
+
=
1
π
_
R
(1 −cos t)
t
2
e
ity
dt.
So, if f
1
(x) =
1 −cos x
πx
2
which has
_
R
f
1
(x)dx = 1, and we take X ∼ F where F has
density f
1
we see that (1−[t[)
+
is its characteristic function This is called the Polya
distribution. More generally, If f
a
(x) =
1−cos ax
πax
2
, then then we get the characteristic
function ϕ
a
(t) =
_
1 −
¸
¸
¸
¸
t
a
¸
¸
¸
¸
_
+
, just by changing variables. The following fact will be
useful below. If F
1
, . . . , F
n
have characteristic functions ϕ
1
, . . . , ϕ
n
, respectively,
and λ
i
≥ 0 with

λ
i
= 1. Then the characteristic function of
n

i=1
λ
i
F
i
is
n

i=1
λ
i
ϕ
i
.
Theorem 6.1 (The Polya Criterion). Let ϕ(t) be a real and nonnegative
function with ϕ(0) = 1, ϕ(t) = ϕ(−t), decreasing and convex on (0, ∞) with
lim
t↓0
ϕ(t) = 1, lim
t↑∞
ϕ(t) = 0. There is a probability measure ν on (0, ∞) so that
ϕ(t) =
_

0
_
1 −
¸
¸
¸
¸
t
s
¸
¸
¸
¸
_
+
dν(s)
and ϕ(t) is a characteristic function.
Example 6.1. ϕ(t) = e
−|t|
α
for any 0 < α ≤ 2. If α = 2, we have the normal
density. If α = 1, we have the Cauchy density. Let us in show here that exp(−[t[
α
)
is a characteristic function for any 0 < α < 1. With a more delicate argument, one
can do the case 1 < α < 2. We only need to verify that the function is convex.
Diﬀerentiating twice this reduces to proving that
αt
2α−2
−α
2
t
α−2
+αt
α−2
> 0.
This is true if α
2
t
α
− α
2
+ α > 0 which is the same as α
2
t
α
− α
2
+ α > 0 which
follows from αt
α
+α(1 −α) > 0 since 0 < α ≤ 1.
145
¸7. Rates of Convergence; Berry–Esseen Estimates.
Theorem 7.1. Let X
i
be i.i.d., E[X
i
[
2
= σ
2
, EX
i
= 0 and E[X
i
[
3
= ρ < ∞. If
F
n
is the distribution of
S
n
σ

n
and Φ(x) is the normal distribution, we have
sup
x∈R
[F
n
(x) −Φ(x)[ ≤

σ
3

n
,
where c is an absolute constant. In fact, we may take c = 3.
More is actually true:
F
n
(x) = φ(x) +
H
1
(x)

n
+
H
2
(x)
n
+. . . +
H
3
(x)
n
3/2
+. . .
where H
i
(x) are explicit functions involving Hermit polynomials. We shall not
prove this, however.
Lemma 7.1. Let F be a distribution function and G a real–valued function with
the following conditions:
(i) lim
x→−∞
G(x) = 0, lim
x→+∞
G(x) = 1,
(ii) G has bounded derivative with sup
x∈R
[G

(x)[ ≤ M. Set A =
1
2M
sup
x∈R
[F(x) −G(x)[.
There is a number α such that for all T > 0,
2MTA
_
3
_
TA
0
1 −cos x
x
2
dx −π
_

¸
¸
¸
¸
_

−∞
1 −cos Tx
x
2
¦F(x +α) −G(x +α)¦dx
¸
¸
¸
¸
.
Proof. Observe that A < ∞, since G is bounded and we may obviously assume that
it is positive. Since F(t) − G(t) → 0 at t → ±∞, there is a sequence x
n
→ b ∈ R
such that
F(x
n
) −G(x
n
) →
_
¸
_
¸
_
2MA
or
−2MA
.
146
Since F(b) ≥ F(b−) it follows that either
_
¸
_
¸
_
F(b) −G(b) = 2MA
or
F(b−) −G(b) = −2MA.
Assume F(b−) −G(b) = −2MA, the other case being similar.
Put
α = b −A < b, since
A = (b −α).
If [x[ < A we have
G(b) −G(x +α) = G

(ξ)(b −α −x)
= G

(ξ)(A−x)
Since [G

(ξ)[ ≤ M we get
G(x +a) = G(b) + (x −A)G

(ξ)
≥ G(b) + (x −A)M.
So that
F(x +a) −G(x +a) ≤ F(b−) −[G(b) + (x −∆)M]
= −2MA−xM +AM
= −M(x +A)
for all x ∈ [−A, A]. Therefore for all T > 0,
_
A
−A
1 −cos Tx
x
2
¦F(x +α) −G(x +α)¦dx ≤ −M
_
A
−A
1 −cos Tx
x
2
(x +A)dx
= −2MA
_
A
0
_
1 −cos Tx
x
2
_
dx
147
Also,
¸
¸
¸
¸
__
−A
−∞
+
_

A
_
1 −cos Tx
x
2
¦F(x +α) −G(x +α)¦dx
¸
¸
¸
¸
≤ 2MA
__
−A
−∞
+
_

A
_
1 + cos Tx
x
2
dx
= 4MA
_

A
1 −cos Tx
x
2
dx.
_

−∞
_
1 −cos Tx
x
2
_
¦F(x +α) −G(x +α)¦dx
≤ 2MA
_

_
A
0
+2
_

A
__
1 −cos Tx
x
2
_
dx
= 2MA
_
−3
_
A
0
+2
_

0
__
1 −cos Tx
x
2
_
dx
= 2MA
_
−3
_
A
0
1 −cos Tx
x
2
dx + 2
_

0
1 −cos Tx
x
2
dx
_
= 2MA
_
−3
_
A
0
1 −cos Tx
x
2
dx + 2
_
πT
2
__
= 2MTA
_
−3
_
TA
0
1 −cos x
x
2
dx +π
_
< 0,
proving the result.
Lemma 7.2. Suppose in addition that G is of bounded variation in (−∞, ∞) (for
example if G has a density) and that

_
−∞
[F(x) −G(x)[dx < ∞.
Let f(t) and g(t) be the characteristic functions of F and G, respectively. Then
A ≤
1
2πM
_
T
−T
[f(t) −g(t)[
t
dt +
12

,
for any T > 0.
Proof. Since F and G are of bounded variation,
f(t) −g(t) = −it
_

−∞
¦F(x) −G(x)¦e
itx
dx.
148
Therefore,
f(t) −g(t)
−it
e
−itα
=
_

−∞
(F(x) −G(x))e
−itα+itx
dx
=
_

−∞
F(x +α) −G(x +α)e
itx
dx.
It follows from our assumptions that the right hand side is uniformly bounded in
α. Multiply the left hand side by (T −[t[) and integrating gives
_
T
−T
_
f(t) −g(t)
−it
_
e
−itα
(T −[t[)dt
=
_
T
−T
_

−∞
¦F(x +α) −G(x +α)¦e
itx
(T −[t[)dxdt
=
_

−∞
¦F(x +α) −G(x +α)¦
_
T
−T
e
itx
(T −[t[)dtdx
=
_

−∞
(F(x +α) −G(x +α))
_
T
−T
e
itx
(T −[t[)dtdx.
= I
Writing
1 −cos Tx
x
2
=
1
2
_
T
−T
(T −[t[)e
itx
dt
we see that
I = 2
_

−∞
(F(x +α) −G(x +α))
_
1 −cos Tx
x
2
_
dx
which gives
¸
¸
¸
¸
_

−∞
_
F(x +α) −G(x +α)¦
_
1 −cos Tx
x
2
_
dx
¸
¸
¸
¸

1
2
¸
¸
¸
¸
_
T
−T
f(t) −g(t)
−it
e
−ita
(T −[t[)dt
¸
¸
¸
¸
≤ T/2
_
T
−T
¸
¸
¸
¸
f(t) −g(t)
t
¸
¸
¸
¸
dt
Therefore by Lemma 7.1,
2MA
_
3
_
TA
0
1 −cos x
x
2
dx −π
_

1
2
_
T
−T
¸
¸
¸
¸
f(t) −g(t)
t
¸
¸
¸
¸
dt
149
However,
3
_
TA
0
1 −cos x
x
2
dx −π = 3
_

0
1 −cos x
x
2
dx −3
_

TA
1 −cos x
x
2
dx −π
= 3
_
π
2
_
−3
_

TA
1 −cos x
x
2
dx −π

2
−6
_

TA
dx
x
2
−π =
π
2

6
TA
Hence,
_
T
−T
[f(t) −g(t)[
t
dt ≥ 2
_
2MA
_
π
2

6
TA
__
= 2MπA−
24M
T
or equivalently,
A ≤
1
2M
_
T
−T
¸
¸
¸
¸
f(t) −g(t)
t
¸
¸
¸
¸
dt +
12

,
which proves the theorem.
Proof of Theorem 7.1. Without loss of generality, σ
2
= 1. Then ρ ≥ 1. We will
apply the above lemmas with
F(x) = F
n
(x) = P
_
S
n

n
> x
_
and
G(x) = Φ(x) =
1

_
x
−∞
e
−y
2
/2
dy.
Clearly they satisfy the hypothesis of Lemma 7.1 and in fact we may take M = 2/5
since
sup
x∈R

(x)[ =
1

= .39894 < 2/5.
Also clearly G is of bounded variation. We need to show that
_
R
[F
n
(x) −Φ(x)[dx < ∞.
150
To see this last fact, note that Clearly,
_
1
−1
[F
n
(x) −Φ(x)[dx < ∞
and we need to verify that
_
−1
−∞
[F
n
(x) −Φ(x)[dx +
_

1
[F(x) −Φ(x)[dx < ∞. (7.1)
For x > 0, P([X[ > x¦ ≤
1
λ
2
E[X[
2
, by Chebyshev’s inequality. Therefore,
(1 −F
n
(x)) = P
_
S
n

n
> x
_

1
x
2
E
¸
¸
¸
¸
S
n

n
¸
¸
¸
¸
2
<
1
x
2
and if N denotes a normal random variable with mean zero and variance 1 we also
have
(1 −Φ(x)) = P(N > x) ≤
1
x
2
E[N[
2
=
1
x
2
.
In particular: for x > 0, max ((1 −F
n
(x)), (1 −Φ(x))) ≤
1
x
2
. If x < 0 then
F
n
(x) = P
_
S
n

n
< x
_
= P
_

S
n

n
> −x
_

1
x
2
E
¸
¸
¸
¸
S
n

n
¸
¸
¸
¸
2
=
1
x
2
and
Φ(x) = P(N < x) ≤
1
x
2
.
Once again, we have max(F
n
(x), Φ(x)) ≤
1
x
2
hence for all x ,= 0 we have
[F(x) −Φ(x)[ ≤
1
x
2
.
Therefore, (7.1) holds and we have veriﬁed the hypothesis of both lemmas. We
obtain
[F
n
(x) −Φ(x)[ ≤
1
π
_
T
−T

n
(t/

n) −e
−t
2
/2
[
[t[
dt +
24M
πT

1
π
_
T
−T

n
(t/

n) −e
−t
2
/2
[
[t[
dt +
48
5πT
151
Assume n is large and take T =
4

n

. Then
48
5πT
=
48 3
5π4

n
ρ =
12 3

n
ρ =
36ρ

n
.
Next we claim
1
[t[

n
(t/

n) −e
−t
2
/2
[ ≤
1
T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
(7.2)
for −T ≤ t ≤ T, T = 4

n/3ρ and n ≥ 10. If this were the case then
πT[F
n
(x) −Φ(x)[ ≤
_
T
−T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt +
48
5
=
_
T
−T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt +
48
5

2
9
_

−∞
e
−t
2
/4
t
2
dt +
1
18
_

−∞
e
−t
2
/4
[t[
3
dt + 9.6
= I +II + 9.6.
Since
2
9
_

−∞
e
−t
2
/4
t
2
dt =
8
9

π
and
1
18
_

−∞
[t[
3
e
−t
2
/4
dt =
1
18
_
2
_

0
t
3
e
−t
2
/4
dt
_
=
1
18
_
2
_

0
t
2
te
−t
2
/4
dt
_
=
1
18
+
_
2
_

0
2t 2e
−t
2
/4
dt
_
=
1
18
_
8
_

0
te
−t
2
/4
dt
_
=
_
−16e
−t
2
/4
¸
¸
¸
¸

0
_
1
18
=
16
18
=
8
9
.
152
Therefore,
πT[F
n
(x) −Φ(x)[ ≤
_
8
9

π +
8
9
+ 9.6
_
.
This gives,
[F
n
(x) −Φ(x)[ ≤
1
πT
_
8
9
(1 +

π
_
+ 9.6
_
=

4

n π
_
8
9
(1 +

π) + 9.6
_
<

n
.
For n ≤ 9, the result is clear since 1 ≤ ρ. It remains to prove (7.2). Recall that
that
¸
¸
¸
¸
ϕ(t) −
n

m=0
E(itX)
m
m!
¸
¸
¸
¸
≤ E(min
|tX|
n+1
(n+1)!
2|tX|
n
n!
_
. This gives
¸
¸
¸
¸
ϕ(t) −1 +
t
2
2
¸
¸
¸
¸

ρ[t[
3
6
and hence
[ϕ(t)[ ≤ 1 −t
2
/2 +
ρ[t[
3
6
,
for t
2
≤ 2.
With T =
4

n

, if [t[ ≤ T then
ρ|t|

n
≤ (4/3) < 2 and t/

n =
4

< 2. Thus
¸
¸
¸
¸
ϕ
_
t

n

¸
¸
¸
≤ 1 −
t
2
2n
+
ρ[t[
3
6n
3/2
= 1 −
t
2
2n
+
ρ[t[
6

n
[t[
2
n
≤ 1 −
t
2
2n
+
4
18
t
2
n
= 1 −
5t
2
18n
≤ e
−5t
2
18n
,
given that 1 −x ≤ e
−x
. Now, let z = ϕ(t/

n), w = e
−t
2
/2n
and γ = e
−5t
2
18n
. Then
for n ≥ 10, γ
n−1
≤ e
−t
2
/4
and the lemma above gives
[z
n
−w
n
[ ≤ nγ
n−1
[z −w[
153
which implies that
[ϕ(t/

n) −e
−t
2
/2
[ ≤ ne
−5t
2
18n
(n−1)
¸
¸
¸
¸
ϕ(t/

n) −e
−t
2
/2n
¸
¸
¸
¸
≤ ne
−t
2
/4
¸
¸
¸
¸
ϕ(t/

n) −1 +
t
2
2n
−e
−t
2
/2n
+ 1 −
t
2
2n
¸
¸
¸
¸
≤ ne
−t
2
/4
¸
¸
¸
¸
ϕ(t/

n) −1 +
t
2
2n
¸
¸
¸
¸
+ne
−t
2
/4
¸
¸
¸
¸
1 −
t
2
2n
−e
−t
2
/2n
¸
¸
¸
¸
≤ ne
−t
2
/4
ρ[t[
3
6n
3/2
+ne
−t
2
/4
t
4
2 4n
2
,
using the fact that [e
−x
−(1 −x)[ ≤
x
2
2
, for 0 < x < 1. We get
1
[t[
¸
¸
¸
¸
ϕ(t/

n) −e
−t
2
/2
¸
¸
¸
¸

ρt
2
e
−t
2
/4
6

n
+
e
−t
2
/4
[t[
3
8n
= e
−t
2
/4
_
ρt
2
6

n
+
[t[
3
8n
_

1
T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
,
using ρ/

n =
4
3
T and
1
n
=
1

n
1

n

4
3
T
1
3
, ρ > 1 and n ≥ 10.This completed
the proof of (7.2) and the proof of the theorem.
Let us now take a look at the following question. Suppose F has density f.
Is it true that the density of
S
n

n
tends to the density of the normal? This is not
always true. as shown in Feller, volume 2, page 489. However, it is true if add some
other conditions. We state the theorem without proof.
Theorem. Let X
i
be i.i.d., EX
i
= 0 and EX
2
i
= 1. If ϕ ∈ L
1
, then
S
n

n
has a
density f
n
which converges uniformly to
1

e
−x
2
/2
= η(x).
¸8. Limit Theorems in R
d
.
Recall that R
d
= ¦(x
1
, . . . , x
d
): x
i
∈ R¦. For any two vectors x, y ∈ R
d
we
will write x ≤ y if x
i
≤ y
i
for all i = 1, . . . , d and write x → ∞ if x
i
→ ∞ for all
154
i. Let X = (x
1
, . . . , x
d
) be a random vector and deﬁned its distribution function
by F(x) = P(X ≤ x). F has the following properties:
(i) If x ≤ y then F(x) ≤ F(y).
(ii) lim
x→∞
F(x) = 1, lim
x
i
→−∞
F(x) = 0.
(iii) F is right continuous. That is, lim
y↓x
F(x) = F(x).
The distribution measure is given by µ(A) = P(X ∈ A), for all A ∈ B(R
d
).
However, unlike the situation of the real line, a function satisfying (i) ↔ (ii) may
not be the distribution function of a random vector. Example: we must have:
P(X ∈ (a
1
, b
1
] (a
2
, b
2
]) = F(b
1
, b
2
) −F(a
1
, b
2
)
P(a < X
1
≤ b
1
, a
2
≤ X
2
≤ b
2
) −F(b
1
, a
2
) +F(a
1
, a
2
).
Need: measure of each vect. ≥ 0,
Example 8.1.
F(x
1
, x
2
) =
_
¸
¸
¸
_
¸
¸
¸
_
1, x
1
, x
1
≥ 1
2/3, x
1
≥ 1, 0 ≤ x
2
≤ 1
2/3, x
2
≥ 1, 0 ≤ x
1
< 1
0, else
If 0 < a
1
, a
2
< 1 ≤ b
1
, b
2
< ∞, then
F(b
1
, b
2
) −F(a
1
, b
2
) −F(b
1
, a
2
) +F(a
1
, a
2
) = 1 −2/3 −2/3 + 0
= −1/3.
155
Hence the measure has
µ(0, 1) = µ(1, 0) = 2/3, µ(1, 1) = −1/3
which is a signed measure (not a probability measure).
If F is the distribution function of (X
1
, . . . , X
n
), then F
i
(x) = P(X
i
≤ x),
x ∈ R is called the marginal distributions of F. We also see that
F
i
(x) = lim
m→∞
F(m, . . . , m, x
i
, . . . , m)
As in the real line, F has a density if there is a nonnegative function f with
_
R
n
f(y)dy =
_
R

_
R
f(y
1
, y
2
, . . . , y
n
)dy
1
. . . dy
n
= 1
and
F(x
1
, x
2
, . . . , x
d
) =
_
x
1
−∞
. . .
_
x
d
−∞
f(y)dy
1
. . . dy
2
.
Deﬁnition 8.1. If F
n
and F are distribution functions in R
d
, we say F
n
converges
weakly to F, and write F
n
⇒ F, if lim
n→∞
F
n
(x) = F(x) for all points of continuity
of F. As before, we also write X
n
⇒ X, µ
n
⇒ µ.
As in the real line, recall that A in the closure of A and A
o
is its interior and
∂A = A −A
o
is its boundary. The following two results are exactly as in the real
case. We leave the proofs to the reader.
Theorem (Skorohod) 8.1. Suppose X
n
⇒ X. Then there exists a sequence of
random vectors Y
n
and a random vector Y with Y
n
∼ X
n
and Y ∼ X such that
Y
n
→ Y a.e.
Theorem 8.2. The following statements are equivalent to X
n
⇒ X.
(i) Ef(X
n
) → E(f(X)) for all bounded continuous functions f.
156
(iii) For all closed sets K, limP(X
n
∈ K) ≤ P(X ∈ K).
(iv) For all open sets G, limP(X
n
∈ G) ≥ P(X ∈ G).
(v) For all Borel sets A with (P(X ∈ ∂A) = 0,
lim
n→∞
P(X
n
∈ A) = P(X

∈ A).
(vi) Let f : R
d
→R be bounded and measurable. Let D
f
be the discontinuity
points of f. If P(X ∈ D
f
) = 0, then E(f(X
n
)) → E(f(X

)).
Proof. X
n
⇒ X

⇒ (i) trivial. (i) ⇒ (ii) trivial. (i) ⇒ (ii). Let d(x, K) = inf¦[x−
y[: y ∈ K¦. Set
ϕ
j
(t) =
_
¸
_
¸
_
1 t ≤ 0
1 −jt 0 ≤ t ≤ j
−1
0
−1
≤ t
and let f
j
(x) = ϕ
j
(dist (x, K)). The functions f
j
are continuous and bounded by
1 and f
j
(x) ↓ I
K
(x), since K is closed. Therefore,
limsup
n→∞
µ
n
(K) ≤ lim
n→∞
E(f
j
(X
n
))
= E(f
j
(X))
and this last quantity ↓ P(X ∈ K) as j ↑ ∞.
That (iii) ⇒ (iv) follows by taking complements. For (v) implies convergence
in distribution, assume F is continuous at x = (x
1
, . . . , x
d
), and set A = (−∞, x] =
(−∞, x
1
] . . . (−∞, x
d
]. We have µ(∂A) = 0. So, F
n
(x) = F(x
n
∈ A) → P(X

A) = F(x).
As in the real case, we say that a sequence of measurers µ
n
is tight if given
ε ≥ 0 exists an M
ε
> 0 such that
inf
n
µ
n
([−M
ε
, M
ε
]
d
) ≥ 1 −ε.
157
We remark here that Theorem 1.6 above holds also in the setting of R
d
.
The characteristic function of the random vector X = (X
1
, . . . , X
d
) is deﬁned as
ϕ(t) = E(e
it·X
) where t X = t
1
X
1
+. . . +t
d
X
d
.
Theorem 8.3 (The inversion formula in R
d
). Let A = [a
1
, b
1
] . . . [a
d
, b
d
]
with µ(∂A) = 0. Then
µ(A) = lim
T→∞
1
(2π)
d
_
T
−T
. . .
_
T
−T
ψ
1
(t
1
)ϕ(t) . . . ψ
d
(t
d
)ϕ(t)dt
1
, . . . , dt
2
= lim
T→∞
1
(2π)
d
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)ϕ(t)dt,
where
ψ
j
(s) =
_
e
isa
j
−e
−sb
j
is
_
,
for s ∈ R.
Proof. Applying Fubini’s Theorem we have
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)
_
R
d
e
it·x
dµ(x)
=
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)
_
R
d
e
it
1
·x
1
+...+it
d
·X
d
dµ(x)dt
=
_
R
d
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)e
it·X
dtdµ(x)
=
_
R
d
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)e
it
j
X
j
dtdµ(x)
=
_
R
d
_
d

j=1
_
[−T,T]
ψ
j
(t
j
)e
it
j
X
j
dt
j
_
dµ(x)

_
R
d
d

j=1
_
π(1
(a
j
,b
j
)
(x
j
) + 1
[a
j
,b
j
]
(x
j
)
_
dµ(x)
and this proves the result.
158
Theorem (Continuity Theorem) 8.4. Let X
n
and X be random vectors with
characteristic functions ϕ
n
and ϕ, respectively. Then X
n
⇒ X if and only if
ϕ
n
(t) → ϕ(t).
Proof. As before, one direction is trivial. Let f(x) = e
itx
. This is bounded and
continuous. X
n
⇒ X implies ϕ
n
(x) = E(f(X
n
)) → ϕ(t).
For the other direction we need to show tightness. Fix θ ∈ R
d
. Then for
∀ s ∈ R ϕ
n
(sθ) → ϕ(sθ). Let
˜
X
n
= θ X
n
. Then ϕ
˜
X
(s) = ϕ
X
(θs) and
ϕ
˜
X
n
(s) → ϕ
˜
X
(s).
Therefore the distribution of
˜
X
n
is tight by what we did earlier. Thus the random
variables e
j
X
n
are tight. Let ε > 0. There exists a constant positive constant M
i
such that
lim
n
P(e
j
X
i
∈ [M
i
, M
i
]) ≥ 1 −ε.
Now take M = max
1≤j≤d
M
j
. Then
P(X
n
∈ [M, M]
d
) ≥ 1 −ε
and the result follows.
Remark. As before, if ϕ
n
(t) → ϕ(t) and ϕ is continuous at 0, then ϕ(t) is the
characteristic function of a random vector X and X
n
⇒ X.
Also, it follows from the above argument that If θ X
n
⇒ θ X for all θ ∈ R
d
then X
n
⇒ X. This is often called the Cram´er–Wold devise.
Next let X = (X
1
, . . . , X
d
) be independent X
i
∼ N(0, 1). Then X has den-
sity
1
(2π)
d/2
e
−|x|
2
/2
where [x[
2
=

d
i=1
[x
i
[
2
. This is called the standard normal
distribution in R
d
and its characteristic function is
ϕ
X
(t) = E
_
d

j=1
e
it
j
X
j
_
= e
−|t|
2
/2
.
159
Let A = (a
ij
) be a d d matrix. and set Y = AX where X is standard normal.
The covariance matrix of this new random vector is
Γ
ij
= E(Y
i
Y
j
)
= E
_
d

l=1
a
il
X
l

d

m=1
a
jm
X
m
_
=
d

l=1
d

m=1
a
il
a
jm
E(X
l
X
m
)
=
d

l=1
a
il
a
jl
.
Thus Γ = (Γ
ij
) = AA
T
. and the matrix γ is symmetric; Γ
T
form of Γ is positive semideﬁnite. That is,

ij
Γ
ij
t
i
t
j
= ¸Γt, t) = ¸A
T
t, A
T
t)
= [A
T
t[
2
≥ 0.
ϕ
Y
(t) = E(e
it·AX
)
= E(e
iA
T
t·X
)
= e

|A
T
t|
2
2
= e

P
ij
Γ
ij
t
i
t
j
.
So, the random vector Y = AX has a multivariate normal distribution with co-
variance matrix Γ.
Conversely, let Γ be a symmetric and nonnegative deﬁnite dd matrix. Then
there exists an orthogonal matrix O such that
O
T
ΓO = D
160
where D is diagonal. Let D
0
=

D and A = OD
0
.
Then AA
T
= OD
0
(D
T
0
O
T
) = ODO
T
= Γ. So, if we let Y = AX, X normal, then
Y is multivariate normal with covariance matrix Γ. If Γ is non–singular, so is A
and Y has a density.
Theorem 8.5. Let X
1
, X
2
, . . . be i.i.d. random vectors, EX
n
= µ and covariance
matrix
Γ
ij
= E(X
1,j
−µ
j
)(X
1,i
−µ
i
)).
If S
n
= X
1
+. . . +X
n
then
S
n
−nµ

n
⇒ χ
where χ is a multivariate normal with covariance matrix Γ = (Γ
ij
).
Proof. By setting X

n
= X
n
−µ we may assume µ = 0. Let t ∈ R
d
. Then
˜
X
n
= tX
n
are i.i.d. random variables with E(
˜
X
n
) = 0 and
E[
˜
X
n
[
2
= E
_
d

i=1
t
i
(X
n
)
i
_
2
=

ij
t
i
t
j
Γ
ij
.
So, with
˜
S
n
=
n

j=1
(t X
j
) we have
ϕ
˜
S
n
(1) = E(e
i
˜
S
n
) → e

P
ij
Γ
ij
t
i
t
j
/2
This is equivalent to
ϕ
S
n
(t) = E(e
it·S
n
) → e

P
ij
Γ
ij
t
i
t
j
/2
.
Theorem. Let X
i
be i.i.d. E[X
i
[
2
= σ
2
, EX
i
= 0 and E[X
i
[
3
= ρ < ∞. Then if
F
n
is the distribution of
S
n
σ

n
and Φ(x) is the normal we have
sup
x∈R
[F
n
(x) −Φ(x)[ ≤

σ
3

n
(may take c = 3).
161
?? is actually true:
F
n
(x) = φ(x) +
H
1
(x)

n
+
H
2
(x)
n
+. . . +
H
3
(x)
n
3/2
+. . .
where H
i
(x) are explicit functions involving Hermid polynomials.
Lemma 1. Let F be a d.f., G a real–valued function with the following conditions:
(i) lim
x→−∞
G(x) = 0, lim
x→+∞
G(x) = 1,
(ii) G has bounded derivative with sup
x∈R
[G

(x)[ ≤ M. Set A =
1
2M
sup
x∈R
[F(x) −G(x)[.
There is a number a s.t. ∀ T > 0
2MT∆
_
3
_
T∆
0
1 −cos x
x
2
dx −π
_

¸
¸
¸
¸
_

−∞
1 −cos Tx
x
2
¦F(x +a) −G(x +a)¦dx
¸
¸
¸
¸
.
Proof. ∆ < ∞ since G is bounded. Assume L.H.S. is > 0 so that ∆ > 0.
b<−a
a>0
a < b
−b<−a
.
Since F −G = 0 at ±∞, ∃ sequence x
n
→ b ∈ R s.t.
F(x
n
) −G(x
n
) →
_
¸
_
¸
_
2M∆
or
−2M∆
.
So, either
_
¸
_
¸
_
F(b) −G(b) = 2M∆
or
F(b−) −G(b) = −2M∆.
Assume F(b−) −G(b) = −2M∆.
Put
a = b −∆ < b, since
∆ = (b −a)
162
if [x[ < ∆ we have
G(b) −G(x +a) = G

(ξ)(b −a −x)
= G

(ξ)(∆−x) [G

(ξ)[ ≤ M
or
G(x +a) = G(b) + (x −∆)G

(ξ)
≥ G(b) + (x −∆)M.
So that
F(x +a) −G(x +a) ≤ F(b−) −[G(b) + (x −∆)M]
= −2M∆−xM + ∆M
= −M(x + ∆) ∀ x ∈ [−∆, ∆]
∴ T to be chosen: we will consider

_
−∞
=

_
−∆
+rest
_

−∆
1 −cos Tx
x
2
¦F(x +a) −G(x +a)¦dx
≤ −M
_

−∆
1 −cos Tx
x
2
(x + ∆)dx
= −2M∆
_

0
_
1 −cos Tx
x
2
_
dx
(1)
¸
¸
¸
¸
__
−∆
−∞
+
_

_
¦F(x +a) −G(x +a)¦dx
¸
¸
¸
¸
≤ 2M∆
__
−∆
−∞
+
_

_
1 + cos Tx
x
2
dx = 4M∆
_

1 −cos Tx
x
2
dx.
(2)
163
_

−∞
_
1 −cos Tx
x
2
_
¦F(x +a) −G(x +a)¦dx
≤ 2M∆
_

_

0
+2
_

__
1 −cos Tx
x
2
_
dx
= 2M∆
_
−3
_

0
+2
_

0
__
1 −cos Tx
x
2
_
dx
= 2M∆
_
−3
_

0
1 −cos Tx
x
2
dx + 2
_

0
1 −cos Tx
x
2
dx
_
= 2M∆
_
−3
_

0
1 −cos Tx
x
2
dx + 2
_
πT
2
__
= 2MT∆
_
−3
_
T∆
0
1 −cos x
x
2
dx +π
_
< 0.
Lemma 2. Suppose in addition that
(iii) G is of bounded variation in (−∞, ∞). (Assume G has a density).
(iv)

_
−∞
[F(x) −G(x)[dx < ∞.
Let
f(t) =
_

−∞
e
itx
dF(x), g(t) =
_

−∞
e
itx
dG(x).
Then
∆ ≤
1
2πM
_
T
−T
[f(t) −g(t)[
t
dt +
12
πT
.
Proof.
f(t) −g(t) = −it
_

−∞
¦F(x) −G(x)¦e
itx
dx
and ∴
f(t) −g(t)
−it
e
−ita
=
_

−∞
(F(x) −G(x))e
−ita+itx
dx
=
_

−∞
F(x +a) −G(x +a)e
+itx
dx.
164
∴ By (iv), R.H.S. is bounded and L.H.S. is also. Multiply left hand side by (T −[t[)
_
T
−T
_
f(t) −g(t)
−it
_
e
ita
(T −[t[)dt
=
_
T
−T
_

−∞
¦F(x +a) −G(x +a)¦e
itx
(T −[t[)dxdt
=
_

−∞
¦F(x +a) −G(x +a)¦
_
T
−T
e
itx
(T −[t[)dtdx
=
_

−∞
(F(x +a) −G(x +a))
_
T
−T
e
itx
(T −[t[)dtdx
Now,
1 −cos Tx
x
2
=
1
2
_
T
−T
(T −[t[)e
itx
dt
above
= 2
_

−∞
(F(x +a) −G(x +a))
_
1 −cos Tx
x
2
_
dx
or
¸
¸
¸
¸
_

−∞
_
F(x +a) −G(x +a)¦
_
1 −cos Tx
x
2
_
dx
¸
¸
¸
¸

1
2
¸
¸
¸
¸
_
T
−T
f(t) −g(t)
−it
e
−ita
(T −[t[)dt
¸
¸
¸
¸
≤ T/2
_
T
−T
¸
¸
¸
¸
f(t) −g(t)
t
¸
¸
¸
¸
dt
∴ Lemma 1 ⇒
2M∆
_
3
_
T∆
0
1 −cos x
x
2
dx −π
_

_
T
−T
[f(t) −g(t)[
t
dt
or now,
3
_
T∆

0
1 −cos x
x
2
dx −π = 3
_

0
1 −cos x
x
2
dx −3
_

T∆
1 −cos x
x
2
dx −π
= 3
_
π
2
_
−3
_

T∆
1 −cos x
x
2
dx −π

2
−6
_

T∆
dx
x
2
−π =
π
2

6
T∆
165
or
_
T
−T
[f(t) −g(t)[
t
dt ≥ 2
_
2M∆
_
π
2

6
T∆
__
= 2Mπ∆−
24M
T
or
∆ ≤
1
2MT
_
T
−T
[f(t) −g(t)[
[t[
dt +
12
π
We have now bound the diﬀerence between the d.f. satisfying certain conditions
by the average diﬀerence of.
Now we apply this to our functions: Assume wlog σ
2
= 1, ρ ≥ 1.
F(x) = F
n
(x) = P
_
S
n

n
> x
_
, X
i
i.i.d.
G(x) = φ(x) = P(N > x) =
1

_
x
−∞
e
−y
2
/2
dy.
Clearly F
n
and Φ satisfy (i).
sup
x∈

(x)[ =
1

e
−x
2
/2
=
1

= .39894 < 2/5 = M.
(iii) Satisfy:
(iv)
_
R
[F
n
(x) −Φ(x)[dx < ∞.
Clearly,
_
1
−1
[F
n
(x) −φ(x)[dx < ∞.
Need:
_
−1
−∞
[F
n
(x) −Φ(x)[dx +
_

1
[F(x) −Φ(x)[dx < ∞.
Assume wlog σ
2
= 1. For x > 0. P([X[ > x) =
1
λ
2
F[x[
2
.
1 −F
n
(x) = P
_
S
n

n
> x
_

1
[X[
2
E
¸
¸
¸
¸
S
n

n
¸
¸
¸
¸
2
<
1
[X[
2
1 −Φ(x) = P(N > x) ≤
1
x
2
E[N[
2
=
1
[x[
2
.
166
In particular: for x > 0. max((1 −F
n
(x)), max 1 −Φ(x)) ≤
1
|x|
2
.
If x < 0. Then
F
n
(x) = P
_
S
n

n
< x
_
= P
_

S
n

n
> −x
_

1
x
2
E
¸
¸
¸
¸
S
n

n
¸
¸
¸
¸
2
=
1
x
2
Φ(x) = P(N < x) ≤
1
x
2

x<0
max(F
n
(x), Φ(x)) ≤
1
x
2
[F(x) −Φ(x)[ ≤
1
x
2
if x < 0
¦[F(x) −φ(x)[ =
= [1 −φ(x) −(1 −F(x))[ ≤
1
x
2
. x > 0
∴ (iv) hold.
[F
n
(x) −Φ(x)[ ≤
1
π
_
T
−T

n
(t/

n) −e
−t
2
/2
[
[t[
dt +
24M
πT

1
π
_
T
−T

n
(t/

n) −e
−t
2
/2
[t[
dt +
48
5πT
tells us what we must do. Take T = multiple of

n.
Assume n ≥ 10.
Take T =
4

n

: Then
48
5πT
=
48 3
5π4

n
ρ =
12 3

n
ρ =
36ρ

n
.
For second. Claim:
1
[t[

n
(t/

n) −e
−t
2
/2
[ ≤

1
T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
,
−T ≤ t ≤ T
T = 4

n/3ρ
, n ≥ 10
167
∴ πT[F
n
(x) −Φ(x)[ ≤
_
T
−T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt
+
48
5

=
_
T
−T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
dt +
9.6
48
5

2
9
_

−∞
e
−t
2
/4
t
2
dt +
1
18
_

−∞
e
−t
2
/4
[t[
3
dt + 9.6
= I +II + 9.6.
Recall:
1

2πσ
2
_
e
−t
2
/2σ
2
t
2
dt = σ
2
.
Take σ
2
= 2
2
9
_

−∞
e
−t
2
/2σ
2
t
2
dt =
2
9

2π 2 2 =
2 2 2
9

π
=
8
9

π
1
18
__

−∞
[t[
3
e
−t
2
/2σ
2
dt
_
=
1
18
_
2
_

0
t
3
e
−t
2
/4
dt =
1
18
_
2
_

0
t
2
te
−t
2
/4
dt
_
=
1
18
+
_
2
_

0
2t 2e
−t
2
/4
dt
_
=
1
18
_
8
_

0
te
−t
2
/4
dt =
_
−16e
−t
2
/4
¸
¸
¸
¸

0
_
1
18
=
16
18
=
8
9
πT[F
n
(x) −Φ(x)[ ≤
_
8
9

π +
8
9
+ 9.6
_
.
or
[F
n
(x) −Φ(x)[ ≤
1
πT
_
8
9
(1 +

π
_
+ 9.6
_

4

n π
_
8
9
(1 +

π) + 9.6
_
=
ρ

n
_
.¸¸.
_
<

n
.
168
For n ≤ 9, the result follows. 1 < ρ since σ
2
= 1.
Proof of claim. Recall: σ
2
= 1, [ϕ(t) −
n

m=0
E(itX)
m
m!
¸
¸
¸
¸
≤ E(min
[t[X[
nt
(n + 1)!
2[tX[
n
n!
_
.
(1)
¸
¸
¸
¸
ϕ(t) −1 +
t
2
2
¸
¸
¸
¸

ρ[t[
3
6
and
(2) [ϕ(t)[ ≤ 1 −t
2
/2 +
ρ[t[
3
6
, for t
2
≤ 2.
So, if T =
4

n

if [t[ ≤ L ⇒
_
ρ[t[

n
_
≤ (4/3) =
16
9
< 2.
⇒ Also, t/

n =
4

< 2. So,
¸
¸
¸
¸
ϕ
_
t

n

¸
¸
¸
≤ 1 −
t
2
2n
+
ρ[t[
3
6n
3/2
= 1 −
t
2
2n
+
ρ[t[

n
[t[
2
n
≤ 1 −
t
2
2n
+
4
3
t
2
n
= 1 −
5t
2
18n
≤ e
−5t
2
18n
, gives that 1 −x ≤ e
−x
.
Now, let α = ϕ(t/

n), β = e
−t
2
/2n
and γ = e
−5t
2
18n
. Then n ≥ 10 ⇒ γ
n−1
≤ e
−t
2
/4
.

n
−β
n
[ ≤ nγ
n−1
[α −β[ ⇔
[ϕ(t/

n) −e
−t
2
/2
[ ≤ ne
−5t
2
18n
(n−1)
[ϕ(t/

n) −e
−t
2
/2n
[
≤ ne
−t
2
/4
[ϕ(t/

n) −1 +
t
2
2n
−e
−t
2
/2n−1−t
2
/2n
[
≤ ne
−t
2
/4
[ϕ(t/

n) −1 +
t
2
2n
[ +ne
−t
2
/4
[1 −
t
2
2n
−e
−t
2
/2n
[
≤ ne
−t
2
/4
ρ[t[
3
6n
3/2
+ne
−t
2
/4
t
4
2 4n
2
using [e
−x
−(1 −x)[ ≤
x
2
2
for 0 < x < 1
or
1
[t[
[ϕ(t/

n) −e
−t
2
/2
[ ≤
ρt
2
e
−t
2
/4
6

n
+
e
−t
2
/4
[t[
3
8n
= e
−t
2
/4
_
ρt
2
6

n
+
[t[
3
8n
_

1
T
e
−t
2
/4
_
2t
2
9
+
[t[
3
18
_
169
using ρ/

n =
4
3
T and
1
n
=
1

n
1

n

4
3
T
1
3
, ρ > 1 and n ≥ 10. Q.E.D.
Question: Suppose F has density f. Is it true that the density of
S
n

n
tends to the
density of the normal? This is not always true. (Feller v. 2, p. 489). However, it is
true if more conditions.
Let X
i
be i.i.d. EX
i
= 0, FX
2
i
= 1.
Theorem. If ϕ ∈ L
1
, then
S
n

n
has a density f
n
which converges uniformly to
1

e
−x
2
/2
= η(x).
Proof.
f
n
(x) =
1

_
R
e
itx
ϕ
n
(t)dt
n(x) =
1

_
R
e
ity
e

1
2
t
2
dt
∴ [f
n
(x) −η(x)[ ≤
1

_

−∞
[ϕ(t/

n)
n
−e
−1/2t
2
[dt
under the assumption
[ϕ(t)[ ≤ e

1
4
t
2
for [t[ < δ.
At 0, both sides are 0.
Both have ?? derivatives of social derivative of ϕ(t) at 0 is −1 smaller than
the second derivative of r.h.s.
Limit Theorems in R
d
Recall: R
d
= ¦(x
1
, . . . , x
d
): x
i
∈ R¦.
If X = (x
1
, . . . , X
d
) is a random vector, i.e. a r.v. X: ω → R
d
. We deﬁned its
distribution function by F(x) = P(X ≤ x), where X ≤ x ⇔ X
i
≤ x
i
, i = 1, . . . , d.
F has the following properties:
170
(i) x ≤ y ⇒ F(x) ≤ F(y).
(ii) lim
x→∞
F(x) = 1, lim
x
i
→−∞
F(x) = 0.
(iii) F is right cont. i.e. lim
y↓x
F(x) = F(x).
X
p
→ ∞ we mean each coordinate goes to zero. You know what X
i
→ −∞.
There is also the distribution measure on (R
d
, B(R
d
)): µ(A) = P(X ∈ A).
If you have a function satisfying (i) ↔ (ii), this may not induce a measure. Exam-
ple: we must have:
P(X ∈ (a
1
, b
1
] (a
2
, b
2
]) = F(b
1
, b
2
) −F(a
1
, b
2
)
P(a < X
1
≤ b
1
, a
2
≤ X
2
≤ b
2
) −F(b
1
, a
2
) +F(a
1
, a
2
).
Need: measure of each vect. ≥ 0,
Example. f(x
1
, x
2
) =
_
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
_
1 x
1
, x
1
≥ 1
2/3
2/3 x
1
≥ 1, 0 ≤ x
2
≤ 1
0 x
2
≥ 1, 0 ≤ x
1
< 1
else
.
If 0 < a
1
, a
2
< 1 ≤ b
1
, b
2
< ∞ ⇒
F(b
1
, b
2
) −F(a
1
, b
2
) −F(b
1
, a
2
) +F(a
1
, a
2
) = 1 −2/3 −2/3 + 0
= −1/3.
171
The measure has:
µ(0, 1) = µ(1, 0) = 2/3, µ(1, 1) = −1/3
for each, need measure of ??
Other simple ??
Recall: If F is the dist. of (x
1
, . . . , X
n
), then F
i
(x) = P(X
i
≤ x), x real in the
marginal distributions of F
F
i
(x) = lim
m→∞
F(m, . . . , m, x
i+1
, . . . m)
F has a density if ∃ f ≥ 0 with
_
R
d
0 = f and
F(x
1
, x
2
, x
d
) =
_
x
1
−∞
. . .
_
x
d
−∞
f(y)dy
1
. . . dy
2
.
Def: If F F
n
is a distribution function in R
d
, we say F
n
converges weakly to F, if
F
n
⇒ F, if lim
n→∞
F
n
(x) = F(x) for all pts of continuity of F.
X
n
⇒ X, µ
n
⇒ µ.
Recall: A = set of limits of sequences in A, closure of A. A
o
= R
d
¸(R
d
[A) interior.
∂A = A−A
o
. A Borel set A is a µ–continuity set if µ(∂A) = 0.
Theorem 1 Skorho. X
n
⇒ X

⇒ ∃ r.v. X
n
∼ X
n
, Y ∼ X

s.t. Y
n
→ Y a.e.
Theorem 2. The following statements are equivalent to X
n
⇒ X

.
(i) Ef(X
n
) → E(f(X

))∀ bounded cont. f.
(iii) ∀ closed sets k, limP(X
n
∈ k) ≤ P(X

∈ k).
(iv) ∀ open sets G, limP(X
n
∈ G) ≥ P(X

∈ G).
(v) ∀ continuity A, (P(X ∈ ∂A).
lim
n→∞
P(X
n
∈ A) = P(X

∈ A).
172
(vi) Let D
p
= discontinuity sets of f. If P(X

∈ D
p
) = 0 ⇒ E(f(X
n
)) →
E(f(X

)), f bounded.
Proof. X
n
⇒ X

⇒(i) trivial. (i) ⇒(ii) trivial. (i) ⇒(ii). Let d(x, k) = inf¦d(x−
y): y ∈ k¦.
ϕ
j
(t) =
_
¸
_
¸
_
1 t ≤ 0
1 −jt 0 ≤ t ≤ j
−1
0
−1
≤ t
Let f
j
(x) = ϕ
j
(dist (x, k)). f
j
is cont. and bounded by 1 and f
j
(x) ↓ I
k
(x) since
k is closed.
∴ limsup
n→∞
µ
n
(k) = lim
n→∞
E(f
j
(X
n
))
= E(f
j
(X

)) ↓ P(X

∈ k). Q.E.D.
(iii) ⇒ (iv): A open iﬀ A
c
closed and P(X ∈ A) +P(X ∈ A
c
) = 1.
(v) ⇒ implies conv. in dis. If F is cont. at X, then with A = (−∞, x
1
]
. . . (−∞, x
d
], x = (x
1
, . . . , x
n
) we have µ(∂A) = 0. So, F
n
(x) = F(x
n

A) → P(X

∈ A) = F(x). Q.E.D.
As in 1–dim. µ
n
is tight if ∃ given ε ≥ 0 s.t.
inf
n
µ
n
([−M, M]
d
) ≥ 1 −ε.
Theorem. If µ
n
is tight ⇒ ∃µ
n
j
s.t. µ
n
j
weak
−→ µ.
Ch. f. Let X = (X
1
, . . . , X
d
) be a r. vector. ϕ(t) = E(e
it·X
) is the Ch.f. t X =
t
1
X
1
+. . . +t
d
X
d
.
Inversion formula: Let A = [a
1
, b
1
] . . . [a
d
, b
d
] with µ(∂A) = 0. Then
µ(A) = lim
T→∞
1
(2π)
d
_
T
−T
. . .
_
T
−T
ψ
1
(t
1
)ϕ(t) . . . ψ
d
(t
d
)ϕ(t)dt
1
, . . . , dt
2
173
where
ψ
j
(s) =
_
e
isa
j
−e
−sb
j
is
_
= lim
T→∞
1
(2π)
d
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)ϕ(t)dt
Proof. Apply Fubini’s Theorem:
A = [a
1
, b
1
] . . . [a
d
, b
d
] with µ(∂A) = 0. Then
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)
_
R
d
e
it·x
dµ(x)
=
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)
_
R
d
e
it
1
·x
1
+...+it
d
·X
d
dµ(x)dt
=
_
R
d
_
[−T, T]
d
d

j=1
ψ
j
(t
j
)e
it·X
dtdµ(x)
=
_
R
d
_
[−T,T]
d
d

j=1
ψ
j
(t
j
)e
itjX
j
dtdµ(x)
=
_
R
d
_
d

j=1
_
[−T,T]
ψ
j
(t
j
)e
itjX
j
dt
j
_
dµ(x)

_
R
d
d

j=1
_
π(1
(a
j
,b
j
)
(x
j
) + 1
[a
j
,b
j
]
(x
j
)
_
dµ(x)
results = µ(A).
Continuity Theorem. Let X
n
, 1 ≤ n ≤ ∞ be random vectors with Ch. f.’s ϕ
n
.
Then X
n
⇒ X

⇔ ϕ
n
(t) → ϕ

(t).
Proof. f(x) = e
itx
is bounded and cont. to, X
n
⇒ x implies ϕ
n
(x) = E(f(X
n
)) →
ϕ

(t).
Next, we show as earlier, that sequence is tight.
174
Fix O ∈ R
d
. Then for ∀ s ∈ R ϕ
n
(sO) → ϕ

(sO). Let
˜
X
n
= O X
n
:
Then ϕ
˜
X
(s) = ϕ
X
(Os). Then
ϕ
˜
X
n
(s) → ϕ
˜
X
(s).
∴ The dist. of
˜
X
n
is tight.
Thus, for each vector, e
j
X
n
is tight.
So, given ε > 0, ∃M
i
s.t.
lim
n
P(X
n
i
∈ [M
i
, M
i
]) > 1 −ε.
Now, π[M
i
, M
i
] take M = largest of all.
P(X
n
∈ [M, M]) ≥ 1 −ε. Q.E.D.
Remark. As before, if ϕ
n
(t) → ϕ
a
(t) and ϕ

is cont. at 0, then ϕ

(t) is the
Ch.f. of a r.vec. X

and X
n
⇒ X

.
We also showed: Cram´er–Wold device
If O X
n
⇒ O X

∀ O ∈ R
d
⇒ X
n
⇒ X.
Proof the condition implies E(e
iO·X
n
) → E(e
iO·X
n
)∀ O ∈ R
d

ϕ
n
(O) → ϕ(O)
Q.E.D.
.
Last time: (Continuity Theorem): X
n
⇒ X

iﬀ ϕ
n
(t) → ϕ

(t).
We showed: O X
n
⇒ O X

∀ O ∈ R
d
.
Implies: X
n
⇒ X
This is called Cram´er–Wold device.
Next let X = (X
1
, . . . , X
d
) be independent X
i
∼ N(0, 1). Then X
i
has
density
1

e

|x|
2

.
175
∴ X has density
1
(2π)
d/2
e
−|x|
2
/2
,
x = (x
1
, . . . , x
d
), [x[
2
=
d

i=1
[x
i
[
2
.
This is called the standard normal.
The Ch.f.
ϕ(t) = E
_
d

j=1
e
it
j
X
j
_
= e
−|t|
2
/2
.
Let A = (a
ij
) be a d d matrix. Let
Y = AX, X normal
Y
j
=
d

l=1
a
jl
X
l
Let
Γ
ij
= E(Y
i
Y
j
)
= E
_
d

l=1
a
il
X
l

d

m=1
a
jm
X
m
_
=
d

l=1
d

m=1
a
il
a
jm
E(X
l
X
m
)
=
d

l=1
a
il
a
jl
Γ = (Γ
ij
) = AA
T
.
Recall: For any matrix, ¸Bx, x) = ¸x, B
+
x). So, Γ is symmetric. Γ
T
= Γ.
Also,

ij
Γ
ij
t
i
t
j
= ¸Γt, t) = ¸A
t
t, A
t
t)
= [A
t
t[ ≥ 0.
176
So, Γ is nonnegative deﬁnite.
E(e
it·AX
) = E(e
itA

t·X
)
= e

|A
T
t|
2
2
= e

P
ij
Γ
ij
t
i
t
j
.
So, the random vector Y = AX has a multivariate normal distribution with co-
variance matrix Γ.
Conversely: Let Γ be a symmetric and nonnegative deﬁnite d d matrix.
Then ∃ O orthogonal s.t.
O
T
ΓO = D −D diagonal.
Let D
0
=

D and A = OD
0
.
Then AA
T
= OD
0
(D
T
0
O
T
) = ODO
T
= Γ. So, if we let Y = AX, X normal, then
Y is multivariate normal with covariance matrix Γ.
If Γ is non–singular, so is A and Y has a density.
Theorem. Let X
1
, X
2
, . . . be i.i.d. random vectors, EX
n
= µ and covariance
matrix
Γ
ij
= E(X
1,j
−µ
j
)(X
1,i
−µ
i
)).
If S
n
= X
1
+. . . X
n
then
S
n
−nµ

n
⇒ χ where
χ is a multivariate normal with covariance matrix Γ = (Γ
ij
).
Proof. Letting X

n
= X
n
−µ we may assume µ = 0. Let t ∈ R
d
. Then
˜
X
n
= t X
n
are i.i.d. random variables with E(
˜
X
n
) = 0 and
E[
˜
X
n
[
2
= E
_
d

i=1
t
i
X
ni
_
2
=

ij
t
i
t
j
Γ
ij
.
177
So, with
˜
S
n
=
n

j=1
(t X
j
) we have
ϕ
˜
S
n
(1) = E(e
i
˜
S
n
) → e

P
ij
Γ
ij
t
i
t
j
/2
or
ϕ
S
n
(t) = E(e
it·S
n
) → Q.E.D.
178
Math/Stat 539. Ideas for some of the problems in ﬁnal homework assignment. Fall
1996.
#1b) Approximate the integral by
1
n
n

k=1
_
B
k
n

B
k−1
n
_
and . . .
E
__
1
0
B
t
dt
_
2
= E
__
1
0
_
1
0
B
s
B
t
dsdt
_
= 2
_
1
0
_
1
s
E(B
s
B
t
)dtds
= 2
_
1
0
_
1
s
sdtds =
1
3
.
#2a) Use the estimate C
p
=
2
p
C
1
e
c/ε
2
ε
p
(e
c/ε
2
−2
p
)
from class and choose ε ∼ c/

p.
#2b) Use (a) and sum the series for the exponential.
#2c) Show φ(2λ) ≤ cφ(λ) for some constant c. Use formula E(φ(X)) =

_
0
φ

(λ)P¦X ≥ λ¦dλ
and apply good–λ inequalities.
#3a) Use the “exponential” martingale and ...
#3b) Take b = 0 in #3a).
#4)(i) As in the proof of the reﬂection property. Let
Y
1
s
(ω) =
_
1, s < t and u < ω(t −s) < v
0 else
and
Y
2
s
(ω) =
_
1, s < t, 2a −v < ω(t −s) < 2a −u
0 else.
Then E
x
(Y
1
s
) = E
x
(Y
2
s
) (why?) and with τ = (inf¦s: B
s
= a¦) ∧ t, we apply the
strong Markov property to get
E
x
(Y
1
τ
◦ θ
τ
[T
τ
)
why
= E
x
(Y
2
τ
◦ θ
τ
[T
τ
)
179
and ... gives the result.
#4)(i) Let the interval (u, v) ↓ x to get
P
0
¦M
t
> a, B
t
= x¦ = P
0
¦B
t
= 2a −x¦ =
1

2πt
e

(2a−x)
2
2t
and diﬀerentiate with respect to a.
#5a) Follow Durrett, page 402, and apply the Markov property at the end.
#7)(i)
E(X
n+1
[T
n
) = e
θS
n
−(n+1)ψ(θ)
E(e
θξ
n+1
[T
n
)
= e
θS
n
−nψ(θ)

n+1
independent of T
n
).
(ii) Show ψ

(θ) = ϕ

(θ)/ϕ(θ) and
_
ϕ

(θ)
ϕ(θ)
_
=
ϕ

(θ)
ϕ(θ)

_
ϕ

(θ)
ϕ(θ)
_
2
= E(Y
2
θ
) −(E(Y
θ
))
2
> 0
where Y
θ
has distribution
e
θx
ϕ(θ)
(distribution of ξ
1
). (Why is true?)
(iii)
_
X
θ
n
= e
θ
2
S
n

n
2
ψ(θ)
= X
θ/2
n
e
n{ψ(
θ
2
)−
1
2
ψ(θ)}
Strict convexity, ψ(0) = 0, and ... imply that
E
_
X
θ
n
= e
n{ψ(
θ
2
)−
1
2
ψ(θ)}
→ 0
as n → ∞. This implies X
θ
n
→ 0 in probability.
180
Chapter 7
1) Conditional Expectation.
Durrett p. 476
Signed measures: If µ
1
and µ
2
are two measures, particularly prob. measures, we
could add them. i.e. µ = µ
1

2
is a measure.
1
−µ
2
?
Deﬁnition 1.1. By a signed measure on a measurable space (Ω, T) we mean an
extended real valued function ν deﬁned on T such that
(i) ν assumes at most one of the values +∞ or −∞.
(ii) ν
_

_
j=1
E
j
_
=

j=1
ν(E
j
), E
j
’s are disjoint in T. By (iii) we mean that
the series is absolutely convergent if ν
_

_
j=1
E
j
_
is ﬁnite and properly
divergent if ν
_

_
j=1
E
j
_
is inﬁnite or − inﬁnite.
Example. f ∈ L
1
[0, 1], then (if f ≥ 0, get a measure)
ν(E) =
_
E
fdx
(Positive sets): A set A ∈ T is a positive set if ν(E) ≥ 0 for every mble subset
E ⊂ A.
(Negative sets): A set A ∈ T is negative if for every measurable subset E ⊂
A, ν(E) ≤ 0.
(Null): A set which is both positive and negative is a null set. Thus a set is null
iﬀ every measurable subset has measure zero.
181
Remark. Null sets are not the same as sets of measure zero.
Example. Take ν given above.
Our goal now is to prove that the space Ω can be written as the disjoint union
of a positive set and a negative set. This is called the Hahn–Decomposition.
Lemma 1.1. (i) Every mble subset of a positive set is positive.
(ii) If A
1
, A
2
, . . . are positive then A =

_
i=1
A
i
is positive.
Proof. (i): Trivial.
Proof of (ii). : Let A =

_
n=1
A
i
. A
i
positive. Let E ⊂ A be mble. Write
_
_
_
E =

n=1
E
i
, E
i
∩ E
j
= 0, i ,= j
E
j
= E ∩ A
j
∩ A
c
j−1
∩ . . . ∩ A
c
1
⊂ A
j
(*)
⇒ ν(E
j
) ≥ 0
ν(F) = Σν(E
j
) ≥ 0
We show (∗): Let x ∈ E
j
⇒ x ∈ E and x ∈ E
j
but x ,∈ A
j−1
, . . . A
1
. ∴ x ,∈ E
i
if j > i. If x ∈ E, let j = ﬁrst j such that x ∈ A
j
. Then x ∈ E
j
, done. (Such a j
exists become E ⊂ A).
Lemma 1.2. Let E be measurable with 0 < ν(E) < ∞. Then there is mble set
A ⊂ E. A positive such that 0 < ν(A).
Proof. If E is positive we are done. Let n
1
= smallest positive number such that
there is an E
1
⊂ E with
ν(E
1
) ≤ −1/n
1
.
Now, consider E[E
1
⊂ E.
182
Again, if E[E
1
is positive with ν(E[E
1
) > 0 we are done. If not n
2
= smallest
integer such that
1) ∃E
2
⊂ E[E
1
with ν(E
2
) < 1/n
2
.
Continue:
Let n
k
= smallest positive integer such that
∃E
k
⊂ E
¸
¸
¸
¸
k−1
_
j=1
E
j
with
ν(E
k
) < −
1
n
k
.
Let
A = E[

_
k=1
E
k
.
Claim: A will do.
First: ν(A) > 0. Why?
E = A∪

_
k=1
E
k
are disjoint.
ν(E) = ν(A) +

k=1
ν(E
k
)
⇒ ν(A) > 0
since negative.
Now,
0 < ν(E) < ∞ ⇒

k=1
ν(E
k
)
converges.
Problem 1: Prove that A is also positive.
183
Absolutely. ∴

k=1
1
n
k
< −

ν(E
k
) < ∞.
Suppose A is not positive. Then A has a subset A
0
with A
0
) < −ε for some ε > 0.
Now, since Σ
1
n
k
< ∞, n
k
→ ∞.
Theorem 1.1 (Hahn–Decomposition). Let ν be a signed measure on (Ω, T).
There is a positive set A and a negative set B with A∩ B = φ, A∪ B = X.
Proof. Assume ν does not take +∞. Let λ = sup¦ν(A): A ∈ positive sets¦. φ ∈
Positive sets, sup ≥ 0. Let A
n
∈ p s.t.
λ = lim
n→∞
ν(A
n
).
Set
A =

_
n=1
A
n
.
A is positive. Also, λ ≥ ν(A). Since
A¸A
n
⊂ A ⇒ ν(A[A
n
) ≥ 0
and
ν(A) = ν(A
n
) +ν(A[A
n
) ≥ ν(A
n
).
Thus,
ν(A) ≥ λ ⇒ 0 ≤ ν(A) = λ < ∞.
∴ 0 ≤ ν(A).
Let B = A
c
.
Claim B is negative.
Let E ⊂ B and E positive.
We show ν(E) = 0. This will do it.
184
For suppose E ⊂ B, 0 < ν(E) < ∞ ⇒ E has a positive subset of positive measure
by Lemma 1.2.
To show ν(E) = 0, observe E ∪ A is positive
∴ λ ≥ ν(E ∪ A) = ν(E) +ν(A)
= ν(E) +λ ⇒ ν(E) = 0.
Q.E.D.
Problem 1.b: Give an example to show that the Hahn decomposition is not unique.
Remark 1.1. The Hahn decomposition give two measures ν
+
and ν

deﬁned by
ν
+
(E) = ν(A∩ E)
ν

(E) = −ν(B ∩ E).
Notice that ν
+
(B) = 0 and ν

(A) = 0. Clearly ν(E) = ν
+
(E) −ν

(E).
Deﬁnition 1.2. Two measures ν
1
and ν
2
are mutually singular (ν
1
⊥ ν
2
) if
there are two measurable subsets A and B with A ∩ B = φ A ∪ B = Ω and
ν
1
(A) = ν
2
(B) = 0. Notice that ν
+
⊥ ν

.
Theorem 1.2 (Jordan Decomposition). Let ν be a signed measure. These are two
mutually singular measures ν
+
and ν

such that
ν = ν
+
−ν

.
This decomposition is unique.
Example. f ∈ L
1
[a, b]
ν(E) =
_
E
fdx.
Then
ν
+
(E) =
_
E
f
+
dx, ν

(E) = −
_
E
f

dx.
185
Deﬁnition 1.3. The measure ν is absolutely continuous with respect to µ, written
ν << µ, if µ(A) = 0 implies ν(A) = 0.
Example. Let f ≥ 0 be mble and set ν(A) =
_
A
fdµ.
ν << µ.
Theorem 1.3 (Radon–Nikodym Theorem). Let (Ω, T, µ) be σ–ﬁnite measure
spaces. Assume ν << µ. Then there is a nonnegative measurable function f such
that
ν(E) =
_
E
fdµ.
The function f is unique a.e. [µ]. We call f the Radon–Nikodym derivative of ν
with respect to µ and write
f =

.
Remark 1.2. The space needs to be σ–ﬁnite.
Example. (Ω, T, µ) = ([0, 1], Borel, µ = counting). Then
m << µ, m = Lebesgue.
If
m(E) =
_
E
fdµ
⇒ f(x) = 0 for all x ∈ [0, 1].
∴ m = 0, Contra.
3. Lemmas.
186
Lemma 1.3. Suppose ¦B
α
¦
α∈D
is a collection of mble sets index by a countable
set of real numbers D. Suppose B
α
⊂ B
β
whenever α < β. Then there is a mble
function f such that f(x) ≤ α on B
α
and f(x) ≥ α on B
c
α
.
Proof. For x ∈ Ω, set
f(x) = ﬁrst α such that x ∈ B
α
= inf¦α ∈ D: x ∈ B
α
¦.
inf¦φ¦ = ∞.
• If x ,∈ B
α
, x ,∈ B
β
for any β < α and so, f(x) ≥ α.
• If x ∈ B
α
, then f(x) ≤ α provided we show f is mble. Q.E.D.
Claim: ∀λ real
¦x: f(x) < λ¦ =
_
β<λ
β∈D
B
β
.
If f(x) < λ, then x ∈ D
β
save β < λ. If x ∈ B
β
, β < λ ⇒ f(α) < λ. Q.E.D.
Lemma 1.4. Suppose ¦B
α
¦
α∈D
as in Lemma 1.3 but this time α < β implies
only µ¦D
α
¸B
β
¦ = 0. Then there exists a mble function f on Ω such that f(x) ≤ α
a.e. on B
α
and f(x) ≥ α a.e. on B
c
α
.
Lemma 1.5. Suppose D is dense. Then the function in Lemma 1.3 is unique and
the function in lemma 1.4 is unique µ a.e.
Proof of Theorem 1.3. Assume µ(Ω) = 1. Let
ν
α
= ν −αµ, α ∈ Q.
ν
α
is a signed measure. Let ¦A
α
, B
α
¦ be the Hahn–Decomp of ν
α
. Notice:
Ω = B
α
, B
α
= φ, if α ≤ 0. (1)
B
α
[B
β
= B
α
∩ (X[B
β
) = B
α
∩ A
β
. (2)
187
Thus,
ν
α
(B
α
[B
β
) ≤ 0 (1)
ν
β
(B
α
[B
β
) ≥ 0 (2)
or
⇔ ν(B
α
[B
β
) −αµ(B
α
[B
β
) ≤ 0 (1)
ν(B
α
[B
β
) −βµ(B
α
[B
β
) ≥ 0. (2)
Thus,
βµ(B
α
[B
β
) ≤ ν(B
α
[B
β
) ≤ αµ(B
α
[B
β
).
Thus, if α < β, we have
µ(B
α
[B
β
) = 0.
Thus, ∃n mble f s.t. ∀ α ∈ Q, f ≥ α a.e. on A
α
and f(x) ≤ α a.e. on B
α
. Since
B
0
= φ, f ≥ 0 a.e.
Let N be very large. Put
E
k
= E ∩
_
B
k+1
N
¸
¸
¸
¸
B
k/N
_
, k = 0, 1, 2, . . .
E

= Ω
¸
¸
¸
¸

_
k=0
B
k/N
.
Then E
0
, E
1
, . . . , E

are disjoint and
E =

_
k=0
E
k
∪ E

.
So,
ν(E) = ν(E

) +

k=0
ν(E
k
)
on
E
k

B
k+1
N
¸
¸
¸
¸
B
k/N
=
B
k+1
N
∩ A
k/N
.
188
We have,
k/N ≤ f(x) ≤
k + 1
N
a.e.
and so,
k
N
µ(E
k
) ≤
_
E
k
f(x)dµ ≤
k + 1
N
µ(E
k
). (1)
Also
E
k
⊂ A
k/N

k
N
µ(E
k
) ≤ ν(E
k
) (2)
and
E
k

B
k+1
N
⇒ ν(E
k
) ≤
k + 1
N
µ(E
k
). (3)
Thus:
ν(E
k
) −
1
N
µ(E
k
) ≤
k
N
µ(E
k
) ≤
_
E
k
f(x)dx

k
N
µ(E
k
) +
1
N
µ(E
k
)
≤ ν(E
k
) +
1
N
µ(E
k
)
on
E

, f ≡ ∞ a.e.
If µ(E

) > 0, then ν(E

) = 0 since (ν − αµ)(E

) ≥ 0 ∀ α. If µ(E

) = 0 ⇒
ν(E

) = 0. So, either way:
ν(E

) =
_
E

fdµ.
ν(E) −
1
N
µ(E) ≤
_
E
fdµ ≤ ν(E) +
1
N
µ(E)
Since N is arbitrary, we are done.
Uniqueness: If ν(E) =
_
E
gdµ, ∀ E ∈ B
⇒ ν(E) −αµ(E) =
_
E
(g −α)dµ ∀ α ∀ E ⊂ A
α
.
189
Since
0 ≤ ν(E) −αµ(E) =
_
E
(g −α)dµ.
We have:
g −α ≥ 0[µ] a.e. on A
α
or
g ≥ α a.e. on A
α
.
Similarly,
g ≤ α a.e. on B
α
.

f = g a.e.
Suppose µ is σ–ﬁnite: ν << µ. Let Ω
i
be s.t. Ω
i
∩ Ω
j
= φ,

i
= Ω. µ(Ω
i
) < ∞.
Put µ
i
(E) = µ(E ∩ Ω
i
) and ν
i
(E) = ν(E ∩ Ω
i
). Then ν
i
<< µ
i
∴ ∃f
i
≥ 0 s.t.
ν
i
(E) =
_
E
f
i

i
or
ν(E ∩ Ω
i
) =
_
E∩Ω
i
f
i
dµ =
_
E
dΩ
i
dµ.
⇒ result.
Theorem 1.4 (The Lebesgue decomposition for measures). Let (Ω, T) be a mea-
surable space and µ and ν σ–ﬁnite measures on T. Then ∃ν
0
⊥ µ and ν
1
<< µ
such that ν = ν
0

1
. The measure ν
0
and ν
1
are unique.
(f ∈ BV ⇒ f = h +g. h singular g a.c.).
Proof. Let λ = µ +ν. λ is σ–ﬁnite.
λ(E) = 0 ⇒ µ(E) = ν(E) = 0.
190
R.N. ⇒
µ(E) =
_
E
fdλ
ν(E) =
_
E
gdλ.
Let
A = ¦f > 0¦, B = ¦f = 0¦
Ω = A∪ B, A∩ B = φ, µ(B) = 0.
Let ν
0
(E) = ν(E ∩ B). Then
ν
0
(A) = 0, so ν
0
⊥ µ
set
ν
1
(E) = ν(E ∩ A) =
_
E∩A
gdλ.
Clearly ν
1

0
= ν and it only remains to show that ν
1
<< µ. Assume µ(E) = 0.
Then
_
E
fdλ = 0 ⇒ f ≡ 0 a.e. [λ]. (f ≥ 0)
on E.
Since f > 0 on E ∩ A ⇒ λ(E ∩ A) = 0. Thus
ν
1
(E) =
_
E∩A
gd
λ
= 0 Q.E.D.
uniqueness. Problem.
You know: P(A[B) =
P(A∩ B)
P(B)
, A, B. P and B indept. P(A[B) = P(A). Now
we work with probability measures. Let (Ω, T
0
, P) be a prob space, T ⊂ T
0
a
σ–algebra and X ∈ σ(T
0
) with E[X[ < ∞.
191
Deﬁnition 1.4. The conditional expectation of X given T, written as E(X[T),
is any random variable Y with the properties
(i) Y ∈ σ(F)
(ii) ∀A ∈ T,
_
A
XdP =
_
A
Y dP or E(X; A) = E(Y ; A).
Existence, uniqueness.
First, let us show that if Y has (i) and (ii), then E[Y [ ≤ E[X[.
With A = ¦Y > 0¦ ∈ T, observe that
_
A
Y dp =
_
A
Xdp ≤
_
A
[X[dp
and
_
A
c
−Y dp =
_
A
c
−Xdp ≤
_
A
c
[X[dp.
So,
E[Y [ ≤ E[X[.
Uniqueness: If Y

also satisﬁes (i) and (ii), then
_
A
Y dp =
_
A
Y

dp ∀ A ∈ T

_
A
ydp =
_
A
Y

dp ∀ A ∈ T or
_
A
(Y −Y

)dp = 0 ∀ A ∈ T.
⇒ Y = Y a.e.
Existence:
Consider ν deﬁned on (Ω, T) by
ν(A) =
_
A
Xdp. A ∈ T.
192
ν is a signed measure. ν << P.
∴ ∃ Y ∈ σ(T) s.t.
ν(A) =
_
A
Y dp ∀ A ∈ T.

_
A
Y dp =
_
A
xdp. ∀ A ∈ T.
Example 1. Let A, B be ﬁxed sets in T
0
. Let
T = σ¦B¦ = ¦φ, Ω, B, B
c
¦.
E(1
A
[T)?
This is a function so that thwn we integrate over sets in T, we get integral of 1
A
over the sets.
i.e.
_
B
E(1
A
[T)dP =
_
B
1
A
dP
= P(A∩ B).
E(1
A
[T)P(B) = P(A∩ B)
or
E(1
A
[T)1
B
(B) = P(A∩ B)
or
P(A[B) = E(1
A
[T)1
B
=
P(A∩ B)
P(B)
.
In general. If X is a random variable and T = σ(Ω
1
, Ω
2
, . . . ) where Ω
i
are disjoint,
then
1

i
E(X[T) =
E(X; Ω
i
)
P(Ω
i
)
or
E(X[T) =

i=1
E(X; Ω
i
)
P(Ω
i
)
1

i
(ω).
193
Notice that if T¦φ, Ω¦.
Thus
E(X[T) = E(X).
Properties:
(1) If X ∈ T ⇒ E(X[T) = X. E(X) = E(E(X[T)).
(2) X is independent of T, i.e. (σ(X) ⊥ T)
P(X ∈ B) ∩ A) = P(X ∈ B)P(A).
⇒ E(X[T) = E(X) (as in P(A[B) = P(A)).
Proof. To check this, (1) E(X) ∈ T.
(ii) Let A ∈ T. Thus
_
A
EXdP = E(X)E(1
A
)
= E(X1
A
) =
_
A
Xdp
∴ E(X[T) = EX.
Theorem 1.5. Suppose that X, Y and X
n
are integrable.
(i) If X = a ⇒ E(X[T) = a
(ii) For constants a and b, E(aX +bY ) = aE(X[T) +bE(Y [T).
(iii) If x ≤ Y ⇒ E(X[T) ≤ E(X[T). In particular, [E(X[T)[ ≤ E([X[[T).
(v) If lim
n→∞
X
n
= X and [X
n
[ ≤ Y, Y integrable, then E(X
n
[T) → E(X[T)
a.s.
(vi) Monotone and Fatou’s hold the same way.
Proof. (i) Done above since clearly a ∈ T.
194
(ii)
_
A
aE(x[T)dp +
_
A
bE(Y [T)dP
_
A
(aE(x[T) +bE(Y [I))dp =
= a
_
A
E(X[T)dP +b
_
A
E(Y [T)dP
= a
_
A
XdP +b
_
A
Y dp =
_
A
(aX +bY )dp =
_
A
E(aX +bY [T)dP
(iii)
_
A
E(X[T)dP =
_
A
XdP ≤
_
A
Y dP
=
_
A
E(Y [T)dP. ∀ A ∈ T
∴ E(X[T) ≤ E(Y [T). a.s.
(iv) Let Z
n
= sup
k≥n
[X
k
−X[. Then Z
n
↓ 0 a.s.
[E(X
n
[T) −E(X[T)[ ≤ E([X
n
−Y [[T
n
)
≤ E[Z
n
[T].
Need to show E[Z
n
[T) ↓ 0 with pub. 1. By (iii), E(Z
n
[T
n
) ↓ decreasing.
So, let Z = limit. Need to show Z ≡ 0.
We have Z
n
≤ 2Y .
∴ E(Z) =
_

E(Z[T)dp (E(Z[ = E(E(Z[T))
= E(E(Z[T)) ≤ E(E(Z
n
[T)) = E(Z
n
) → 0 by D.C.T.
Theorem 1.6 (Jensen Inequality). If ϕ is convex E[X[ and E[ϕ(X)[ < ∞, then
ϕ(E(X[T)) ≤ E(ϕ(X)[T).
Proof.
195
ϕ(x
0
) +A(x
0
)(x −x
0
) ≤ ϕ(x).
x
0
= E(X[T) x = X.
ϕ(E(X[T)) +A(E(X[T))(X −E(X[T)) ≤ ϕ(X).
Note expectation of both sides.
E[ϕ(E(X[T)) +A(E(X[T))(X −E(X[T)[T) ≤ E(ϕ(X[T)
⇒ ϕ(E(X[T)) +A(E(X[T))[E(X[T) −E(X[T)] ≤ E(ϕ(X[T)).
Corollary.
[E(X[T)[
p
≤ E([X[
p
[T) for 1 ≤ p < ∞
exp(E(X[T)) ≤ E(e
X
[T).
Theorem 1.7. (1) If T
1
⊂ T
2
then
(a) E(E(X[T
1
)[T
2
)) = E(X[T
1
)
(b) E(E(X[T
2
)[T
1
)) = E(X[T
1
). (The smallest ﬁeld always wins).
(2) If X ∈ σ(T), E[Y [, E[XY [ < ∞
⇒ E(XY [T) = XE(Y [T)
(measurable functions act like constants). (Y = 1, done before).
Proof. (a) E(X[T
1
) ∈ (T
1
) ⊂ (T
2
).
∴ Done.
(b) E(X[T
1
) ∈ T
1
. So A ∈ T
1
⊂ T
2
we have
_
A
E(X[T
1
)dp =
_
A
(X)dp
=
_
A
E(X[T
2
)dp =
_
A
E(E(x[T
2
)[T
2
)dP.
196
Durrett: p 220: #1.1
p. 222: #1.2
p.225: #1.3
p. 227: 1.6
p. 228: 1.8.
Let
L
2
(T
0
) = ¦X ∈ T
0
: EX
2
N∞¦
and
L
2
(T
1
) = ¦Y ∈ T
1
: EY
2
< ∞¦.
Then L
2
(T
1
) is a closed subspace of L
2
(T
0
). In fact, with ¸X
1
, X
2
) = E(X
1
X
2
),
L
2
(T
0
) and L
2
(T
1
) are Hilbert spaces. L
2
(T
1
) is closed subspace in L
2
(T
0
). Given
any X ∈ L
2
(T
0
), ∃ Y ∈ L
2
(T
1
) such that
dist (X, L
2
(T
1
)) = E(x −y)
2
Theorem 1.8. Suppose Ex
2
< ∞, then
inf
X∈L
2
(F
1
)
E([X −Y [
2
) = E([X −(EX[T
1
))
2
.
Proof. Need to show
E([X −Y [
2
) ≥ E[X −E(X[T
1
))
2
for any y ∈ L(T
1
). Let y ∈ L
2
(T
1
) out set. Set
Z = Y −E(X[T) ∈ L
2
(T
1
),
Y = Z +E(X[T).
Now, since
E(ZE(X[T)) = E(ZX[T)) = E(ZX)
197
we see that
E(ZE(X[T)) −E(ZX) = 0.
E(X −Y )
2
= E¦X −Z −E(X[T)¦
2
= E(X −E(X[T))
2
+E(Z
2
)
−2E((X −E(X[T))Z)
≥ E(X −E(X[T))
2
−2E(XZ) + 2E(ZE(X[T))
= E(X −E(X[T))
2
. Q.E.D.
By the way: If X and Y are two r.v. we deﬁne
E(X[Y ) = E(x[σ(Y )).
Recall conditional expectation for ??
Suppose X and Y have joint density f(x, y)
P((X, Y ) ∈ B) =
_
B
f(x, y)dxdy, B ⊂ R
2
.
And suppose
_
R
f(x, y)dx > 0 ∀y. We claim that in this case, if E[g(X)[ < ∞,
then
E(g(X)[Y ) = h(Y )
and
h(y) =
_
R
g(x)f(x, y)dy
_
R
(f(x, y)dx
.
Treat the “given” density as if the second probability
P(X = x[Y = y) =
P(X = x, Y = y)
P(Y = y)
=
f(x, y)
_
R
f(x, y)dx
.
198
Now: Integrale:
E(g(X)[Y = y) =
_
g(x)P(X = x)Y = y))dy.
To verify: (i) clearly h(Y ) ∈ σ(Y ). For (ii): let
A = ¦Y ∈ B¦ for B ∈ B(R).
Then need to show
E(h(Y ); A) = E(g(X); A).
L.H.S., E(h(Y )1
R
(X) A)
=
_
B
_
R
h(y)f(x, y)dxdy
_
B
_
R
_
_
R
g(3)f(3, y)d
3
_
_
R
f(x, y)dx
f(x, y)dxdy
=
_
B
_
R
g(3)f(3, y)d
3
d
y
= E(g(X)1
B
(Y )) = E(g(X); A).
(If
_
f(x, y)dy = 0, deﬁne h by h(y)
_
f(x, y)dx =
_
g(x)f(x, y)dy i.e. h can be
anything where
_
f(x, y) dy = 0).

2

I SIGMA ALGEBRAS AND MEASURES

§1 σ–Algebras: Deﬁnitions and Notation. We use Ω to denote an abstract space. That is, a collection of objects called points. These points are denoted by ω. We use the standard notation: For A, B ⊂ Ω, we denote A ∪ B their union, A ∩ B their intersection, Ac the complement of A, A\B = A − B = {x ∈ A: x ∈ B} = A ∩ B c and A∆B = (A\B) ∪ (B\A). If A1 ⊂ A2 , . . . and A = ∪∞ An , we will write An ↑ A. If A1 ⊃ A2 ⊃ . . . n=1 and A = ∩∞ An , we will write An ↓ A. Recall that (∪n An ) = ∩n Ac and n=1 n (∩n An ) = ∪n Ac . With this notation we see that An ↑ A ⇒ Ac ↓ Ac and n n An ↓ A ⇒ Ac ↑ Ac . If A1 , . . . , An ∈ Ω, we can write n ∪n Aj = A1 ∪ (Ac ∩ A2 ) ∪ (Ac ∩ Ac ∩ A3 ) ∪ . . . (Ac ∩ . . . ∩ Ac ∩ An ), (1.1) j=1 1 1 2 1 n−1 which is a disjoint union of sets. In fact, this can be done for inﬁnitely many sets: ∪∞ An = ∪∞ (Ac ∩ . . . ∩ Ac ∩ An ). n=1 n=1 1 n−1 If An ↑, then ∪n Aj = A1 ∪ (A2 \A1 ) ∪ (A3 \A2 ) . . . ∪ (An \An−1 ). j=1 (1.3) (1.2)
c c

Two sets which play an important role in studying convergence questions are:
∞ ∞

limAn ) = lim sup An =
n n=1 k=n ∞ ∞

Ak

(1.4)

and limAn = lim inf An =
n

Ak .
n=1 k=n

(1.5)

3

Notice
∞ ∞ c

(limAn ) =
n=1 k=n ∞ ∞

c

An
c

=
n=1 k=n ∞ ∞

Ak Ac = limAc n k
n=1 k=n

=

Also, x ∈ limAn if and only if x ∈ at least one k > n such that x ∈
k=n Ak0 .

Ak for all n. Equivalently, for all n there is That is, x ∈ An for inﬁnitely many n. For

this reason when x ∈ limAn we say that x belongs to inﬁnitely many of the An s and write this as x ∈ An i.o. If x ∈ limAn this means that x ∈
k=n

Ak for some

n or equivalently, x ∈ Ak for all k > n. For this reason when x ∈ limAn we say that x ∈ An , eventually. We will see connections to limxk , limxk , where {xk } is a sequence of points later. Deﬁnition 1.1. Let F be a collection of subsets of Ω. F is called a ﬁeld (algebra) if Ω ∈ F and F is closed under complementation and ﬁnite union. That is, (i) Ω ∈ F (ii) A ∈ F ⇒ Ac ∈ F
n

(ii) A1 , A2 , . . . An ∈ F ⇒
j=1

Aj ∈ F.

If in addition, (iii) can be replaced by countable unions, that is if

(iv) A1 , . . . An , . . . ∈ F ⇒
j=1

Aj ∈ F,

then F is called a σ–algebra or often also a σ–ﬁeld. Here are three simple examples of σ–algebras. (i) F = {∅, Ω},

Set An = (0. A. Deﬁnition 1. of course. A ⊂ Ω. the real numbers and take F to be the collection of all ﬁnite disjoint unions of intervals of the form (a. F = {∅. The reason for this will become clear in the next section when we introduce measures. ∞) as right–semiclosed.1. That is if F is another σ–algebra and A ⊂ F.1. F) as a measurable space. then σ(A) ⊂ F. A = {A}. In fact. b] = {x: a < x ≤ b}. a]. yes. Ac }. 1 − Then. −∞ ≤ a < b < ∞. we often write σ(F0 ) = F 0 . We call σ(A) the σ–algebra generated by A. Given any collection A of subsets of Ω. ∞ 1 ]. If F0 is an algebra. Let Ω = R. Example 1.2. This collection is not empty since A ⊂ all subsets of Ω which is a σ–algebra. We will refer to the pair (Ω. A. Is there such a σ–algebra? The answer is. F is an algebra but not a σ–algebra. n An = (0. Then σ(A) = {∅. σ(A) = F where the intersection is take over all the σ–algebras containing the collection A. b]c = (b. By convention we also count (a. 1) ∈ F. (iii) If A ⊂ Ω. n=1 The convention is important here because (a. Ac . Ω}. ∞) ∪ (−∞. An example of an algebra which is not a σ–algebra is given by the following. .4 (ii) F = {all subsets of Ω}. Remark 1. Ω. let σ(A) be the smallest σ–algebra containing A.

5 Problem 1.1.2. Deﬁnition 1.1. Problem 1. Assume σ(A) = F. Deﬁnition 2. Let Ω = R and B0 the ﬁeld of right–semiclosed intervals. (ad . −∞ ≤ ai < bi < ∞.4. Then σ(B0 ) = B is called the Borel σ–algebra of R.3. Problem 1. Prove that every open set is in B. Remark 1. . §2. Measures. Prove that B = σ({all open intervals}). . Problem 1. By a measure on this space we mean a function µ : F → [0.2. ∞] with the properties (i) µ(∅) = 0 and (ii) if Aj ∈ F are disjoint then  µ j=1 ∞  Aj  = ∞ µ(Aj ). F) be a measurable space. j=1 . bd ]. Let (Ω. Set A ∩ A = {B ∩ A: B ∈ A}. b1 ] × . Let A be a collection of subsets of Ω and A ⊂ Ω. Show that σ(A ∩ A) = F ∩ A. relative to A.2. The above construction works equally in Rd where we take B0 to be the family of all intervals of the form (a1 . Prove that every open set in R is the countable union of right –semiclosed intervals.

This is called the counting measure. (monotonicity). Deﬁne µ(A) = #A.1. F. µ) be a measure space. then µ(A) ≤ µ(B). 6}. Let Aj = {j. To see this. (continuity for below). Assume all sets mentioned below are in F. 4. . Proposition 2. Tails} = {T.6 Remark 2. . 3. 3. then µ(Aj ) ↑ µ(A). 5. Let Ω = {0. µ) as a measure space. we list several elementary properties of general measures. If µ(Ω) = 1 we refer to it as a probability space and often write this as (Ω. these are nothing but two very simple examples of probability spaces and our goal now is to enlarge this collection. F. ∞ ∞ (ii) If A ⊆ j=1 Aj . then µ(A) ≤ j=1 µ(Aj ). P {w} = 1/6. . Ω = {1. (iv) If Aj ↓ A and µ(A1 ) < ∞. Let Ω be a countable set and let F = collection of all subsets of Ω. If Ω is a ﬁnite set with n points and we deﬁne P (A) = then we get a probability measure. The ﬁniteness assumption in (iv) is needed. (i) If A ⊂ B.2. (continuity from above). j + 1. Then Aj ↓ ∅ but µ(Aj ) = ∞ for all j. then µ(Aj ) ↓ µ(A). 1 n #A Of course. }. 1} = {Heads. First. Remark 2. Proof. F. . . Example 2. (subadditivity). . Then µ(B) = µ(A) + µ(B\A) ≥ µ(A). Denote by #A denote the number of point in A.1. .1. P ). Let (Ω. Write B = A ∪ (B\A). We will refer to the triple (Ω. } and let µ be the counting measure. (iii) If Aj ↑ A. set Ω = {1. 2. H} and set P {0} = 1/2 and P {1} = 1/2 (2) Rolling a die. Concrete examples of these are: (i) Coin ﬂips. 2.

By an extended distribution function on R we shall mean a map F : R → R that is increasing. Next. For (iii) observe that if An ↑. we have ∞ ∞ µ(B\A) = µ(B) − µ(A).. recall that n=1 ∞ ∞ An = n=1 (Ac ∩ . ∩ Ac ) n−1 ≤ n=1 µ(An ). proving (ii).. probability measures correspond to distributions. from which the result follows assuming the ﬁniteness of µ(A1 ). b] = F (b)−F (a) sets a 1-1 correspondence between the Lebesgue–Stieltjes measures and distributions where two distributions that diﬀer by a constant are identiﬁed. of course. lim+ F (x) = F (x0 ). we shall simply call x→∞ x→−∞ it a distribution function. F (a) ≤ F (b) if a < b. If in addition the function F is x→x0 nonnegative satisfying lim FX (x) = 1 and lim FX (x) = 0. then ∞ ∞ µ n=1 An =µ n=1 ∞ (An \An−1 ) µ(An \An−1 ) = n=1 m = lim m→∞ µ(An \An−1 ) n=1 m = lim µ m→∞ n=1 An \An−1 . For (iv) we observe that if An ↓ A then A1 \An ↑ A1 \A. Deﬁnition 2. We will show that the formula µ(a. ∩ Ac ∩ An ) 1 n−1 ∞ where the sets in the last union are disjoint. µ n=1 An = n=1 µ(An ∩ Ac 1 . and right continuous.2. µ(A1 \An ) ↑ µ(A1 \A) and since µ(A1 \An ) = µ(A1 ) − µ(An ) we see that µ(A1 ) − µ(An ) ↑ µ(A1 ) − µ(A). By (iii).7 which proves (i). . . Therefore. A Lebesgue–Stieltjes measure on R is a measure on B = σ(B0 ) such that µ(I) < ∞ for each bounded interval I. note that if if µ(A) < ∞. . As a side remark here.

. ∞]. Let a < b. For example. 0]. . µ(∅) = 0 and if A1 . b] = F (b) − F (a− ) µ[a. . Proof. then µ(A) = µ(Aj ) . b n→∞ n = lim F (b) − F (b − 1/n) = F (b) − F (b−). Problem 2. b]. Suppose F is a distribution function on R. Also. Deﬁnition 2. A2 . µ is a measure on A if µ : A → [0.1. b] ≥ 0. (iv).8 Proposition 2. .1. xn ] ↓ ∅. Thus F (xn ) − F (x) → 0 implying that F is right continuous. b) = F (b− ) − F (a) µ[a. up to additive constants. xn ] → 0. xn ] = ∅ and (x. The measure is σ–ﬁnite if the space Ω = ∪∞ Ωj where the Ωj ∈ A j=1 are disjoint and µ(Ωj ) < ∞. b] = F (b) − F (a). since n=1 (x1 . (1) (2) (3) (4) Theorem 2. . We should notice also that µ{b} = lim µ b − n→∞ 1 . Suppose A is an algebra. Hence in fact F is continued at {b} if and only if µ{b} = 0. ﬁx F (0) arbitrary and set F (x) − F (0) = µ(0. x < 0. are disjoint with A = ∞ j ∞ j Aj ∈ A. Set F (x−) = lim− F (x). x]. Then x→x µ(a. b) = F (b− ) − F (a− ) µ(R) = F (∞) − F (−∞). Then F is an extended distribution.2. by F (b) − F (a) = µ(a. Deﬁne F : R → R.3. x ≥ 0. → x. Then F (b) − F (a) = µ(a. F (0) − F (x) = µ(x. There is a unique measure µ on B(R) such that µ(a. if {xn } is such that x1 > ∞ x2 > . then µ(x. Let µ be a Lebesgue–Stieltjes measure on R. . by Proposition 2.1.

If S is a semi-algebra.2. E = i=1 Ei . b]: −∞ ≤ a < b < ∞}. then µ(E) = i=1 µ(Ei ) and ∞ ∞ (ii) If E ∈ S. and S.j Ai ∩ Bj ∈ S. Theorem 2. Then E1 ∩ E2 = ∪i. if µ is σ–ﬁnite. In addition. Let S be a semialgebra and let µ be deﬁned on S.1. A collection S ⊂ Ω is a semialgebra if the following two conditions hold.4.2 (Carath`odory’s Extension Theorem). This is a semialgebra but not an algebra. Ei ∈ S disjoint. if E = ∪n Aj ∈ S then Ac = ∩j Ac . B ∈ S ⇒ A ∩ B ∈ S. Then µ has a unique extension to σ(A). Suppose µ(∅) = 0 with the additional properties: n n (i) If E ∈ S. Example 2. This is called the algebra generated by S. However. Ei ∈ S disjoint. Then µ has a unique extension µ to S which is a measure. We return to the proof of this theorem later. Lemma 2. then µ(E) ≤ i=1 µ(Ei ).3. by the Carath´odary extension theorem e . Thus S is closed under ﬁnite intersections. we see that Ac ∈ S. Proof.9 Theorem 2. then S = {ﬁnite disjoint unions of sets in S} is an algebra. then µ has a unique extension to a measure (which we continuo to call µ) to σ(S). (i) A. E = i=1 Ei . where the unions are disjoint and j=1 j=1 the sets are all in S. Also. (ii) A ∈ S then Ac is the ﬁnite union of disjoint sets in S. This proves that S is an algebra. Suppose µ is σ–ﬁnite e on an algebra A. S = {(a. Deﬁnition 2. Let E1 = ∪n Aj and E2 = ∪n Bj . by the j=1 j deﬁnition of S.

as required by the deﬁnition of the measures on algebras. . Ei ∈ S. We postpone the proof of this to state the following lemma which will be used in its proof. with E = i=1 n Ei . Ei disjoint. Note that (a) gives more than (i) since Ei ∈ S not just S. Lemma 2. Ei ∈ S. It remains to verify that µ so deﬁned is a measure. Ei ∈ S and the union is disjoint. Then µ(E) = i=1 n µ(Ei ). ∞ Next. Suppose (i) above holds and let µ be deﬁned on S as above. n n µ(Ei ) = i=1 i=1 ˜ µ(Ei ∩ Ej ) = m n ˜ µ(Ei ∩ Ej ) = m ˜ µ(Ej ). ˜ (Ei ∩ Ej ). n n (a) If E. Eij ∈ S. µ is well deﬁned. then µ(E) ≤ i=1 µ(Ei ). with these also disjoint.j µ(Eij ). if E = n i=1 Ei . E ⊂ i=1 Ei . ˜ (Ei ∩ Ej ).10 Proof. suppose we also have E = Then Ei = j=1 m ˜ ˜ Ej . (b) If E. That is. where Ej ∈ S and disjoint. that E ∈ S. Since Ei = Eij . Also. Ei ∈ S where the sets are disjoint and assume. the sets in (b) are not necessarily disjoint. ˜ Ej = i=1 By (i).2. let E = i=1 n Ei . We assume the Lemma for the moment. Deﬁne µ on S by n µ(E) = i=1 µ(Ei ). we have j=1 ∞ ∞ n µ(Ei ) = i=1 (i) µ(Eij ) = i=1 j=1 i. We ﬁrst verify that this is well m j=1 n deﬁned. So.

+ µ(Bn ) + µ(Cn ) n ≥ i=1 µ(Bi ). E = j=1 ˜ ˜ Ej . Cn ∈ S and disjoint. Ej ∈ S and again disjoint sets. and we can write ∞ ˜ Ej = i=1 ˜ (Ej ∩ Ei ). n µ(A) = µ(An ) + µ(Cn ) = µ(B1 ) + . Set E = i=1 Ei . For the apposite inequality we set (recall E = ∞ i=1 Ei ) An = n i=1 Ei and Cn = E ∩ Ac so that E = An ∪ Cn and An . which proves one of the inequalities. Sij ∈ S. ∞ ˜ µ(Ej ) ≤ i=1 ˜ µ(Ej ∩ Ei ).11 So. otherwise replace it by Eij . with n arbitrary. Thus by assumption (ii). . Since n E ∈ S. n µ(E) = ˜ µ(Ej ) j=1 n ∞ ≤ j=1 i=1 ∞ n ˜ µ(Ej ∩ Ei ) ˜ µ(Ej ∩ Ei ) i=1 j=1 ∞ = ≤ i=1 µ(Ei ). Therefore. This proves the other inequality. . n m Proof of Lemma 2.2. Therefore. µ(A) = ij µ(Eij ) = i=1 µ(Ei ) . By assumption (i). we may assume Ei ∈ S instead of S. then Ei = j=1 n Eij .

Proof of Theorem 2. where the last inequality follows from (a). By relabeling we may assume that i=1 a1 = a bn = b ai = bi−1 . assume n = 1. Set F (∞) = lim F (x) x↑∞ and F (−∞) = lim F (x). Let S = {(a. µ(E) ≤ µ(E) + µ(E1 ∩ E c ) = µ(E1 ). Deﬁne for x↓−∞ any µ(a. ∪ (E ∩ Fn ). ∩ c Ei−1 ) = i=1 Fi . These quantities exist since F is increasing.. b] = F (b) − F (a). b] = n (ai .12 proving (a). Now. Suppose (a. . For n > 1. set n n n Ei = i=1 i=1 n (Ei ∩ c E1 ∩ . If E ⊂ E1 . . b]: −∞ ≤ a < b < ∞}. Then E=E∩ Ei i=1 = E ∩ F1 ∪ . where the union is disjoint. . For (b). where F (∞) > −∞. E1 ∩ E c ∈ S.1. µ(E) = n i=1 µ(E ∩ Fi ).. for any −∞ ≤ a < b ≤ ∞. the case n = 1 gives n n µ(E ∩ Fi ) ≤ i=1 i=1 n µ(Ei ) µ(Fi ) i=1 n ≤ =µ i=1 Ei . then E1 = E ∪ (E1 ∩ E c ). So by (a). bi ]. F (−∞) < ∞.

{(ai . or equivalently. Similarly. By compactness. which proves that condition (i) holds. b] ⊂ i=1 (ai . bi + ηi ) and N (a + δ.) By right continuity of F . bi + ηi )} forms a open cover for [a + δ. N [a + δ. b] ⊂ i=1 (ai . (We can also order them if we want. b] ⊂ i=1 (ai . bi ] = F (bi ) − F (ai ) and n n µ(ai . bi ] where the union is disjoint. let −∞ < a < b < ∞ and (a. bi ] = i==1 i=1 F (bi ) − F (ai ) = F (b) − F (a) = µ(a.13 Then µ(ai . bi + ηi ]. b]. F (a + δ) < F (a) + ε. there is a ﬁnite subcover. given ε > 0 there is a δ > 0 such that F (a + δ) − F (a) < . b]. ∞ For (ii). Now. Thus. for all i. there is a ηi > 0 such that F (bi + ηi ) < F (bi ) + ε2−i . .

µ is called the Lebesgue measure on R. b] = F (b) − F (a) ∞ ≤ 2ε + i=1 ∞ F (bi ) − F (ai ) µ(ai . x ≤ 0  F (x) = x. x > 1 . If F (x) = x. B] ⊂ (a. and (A. b] for any −∞ < A < B < ∞. If   0. bi ]. we have by above ∞ F (B) − F (A) ≤ i=1 (F (bi ) − F (ai )) and the result follows by taking limits. b] ⊂ i=1 (ai . F (b) − F (a + δ) = µ(a + δ. i=1 = 2ε + proving (ii) provided −∞ < a < b < ∞.2. ∞ If (a.14 Therefore by (b) of Lemma 2. bi + ηi ] F (bi + ηi ) − F (ai ) i=1 N = = i=1 N {F (bi + ηi ) − F (bi ) + F (bi ) − F (ai )} ∞ −i ≤ i=1 ε2 + i=1 (F (bi ) − F (ai )) ∞ ≤ε+ i=1 F (bi ) − F (ai ). 0 < x ≤ 1   1. Therefore. µ(a. bi ]. a and b arbitrary. b] N ≤ i=1 N µ(ai .

(iv) E = {x: |x| + 2x2 > 1}. Proof of Theorem 2. x] and limx→∞ F (x) = 1.    1 + x. (iii) E = (−1. For any E ⊂ Ω we deﬁne µ∗ (E) = inf µ(Ai ) where the inﬁmum is taken over all sequences of {Ai } in A such that E ⊂ ∪Ai . 1]. x≥2 and let µ be the Lebesgue–Stieltjes measure corresponding to F . then E ∈ A∗ . (ii) If µ∗ (E) = 0. Let A∗ be the collection of all subsets E ⊂ Ω with the property that µ∗ (F ) = µ∗ (F ∩ E) + µ∗ (F ∩ E c ). These two quantities satisfy: (i) A∗ is a σ–algebra and µ∗ is a measure on A∗ . It follows easily from the deﬁnition that E1 ⊂ E2 implies µ∗ (E1 ) ≤ µ∗ (E2 ) and that . 0] ∪ (1. If µ is a probability measure then F (x) = µ(−∞. limx↓−∞ F (x) = 0. (iii) A ⊂ A∗ and µ∗ (E) = µ(E). 3). (ii) E = [−1/2. We begin the proof of (i)–(iii) with a simple but very useful observation. 0 ≤ x < 2    9. if E ⊂ A.2. −1 ≤ x < 0 F (x) =  2 + x2 . Let F be the distribution function deﬁned by  x < −1  0. 2).15 the measure we obtain is called the Lebesgue measure on Ω = (0. for all sets F ⊂ Ω.3. Find µ(E) for (i) E = {2}. Notice that µ(Ω) = 1. Problem 2.

Suppose E1 and E2 are in A∗ . We conclude that E1 ∪ E2 ∈ A∗ . Hence. Clearly by symmetry if E ∈ A∗ we have E c ∈ A∗ . . A∗ is an algebra. j=1 Therefore. to prove that E ∈ A∗ . for all F ∈ Ω. Let E = ∞ j=1 n Ej and An = j=1 Ej . we need to verify that µ∗ (F ) ≥ µ∗ (F ∩ E) + µ∗ (F ∩ E c ). Then for all F ⊂ Ω. where we used the fact that c c E1 ∪ E2 ⊂ (E1 ∩ E2 ) ∪ (E1 ∩ E2 ) ∪ (E1 ∩ E2 ) and the subadditivity of µ∗ observed above. c µ∗ (F ) = µ∗ (F ∩ E1 ) + µ∗ (F ∩ E1 ) c = (µ∗ (F ∩ E1 ∩ E2 ) + µ∗ (F ∩ E1 ∩ E2 )) c c c + (µ∗ (F ∩ E1 ∩ E2 ) + µ∗ (E ∩ E1 ∩ E2 )) ≥ µ∗ (F ∩ (E1 ∪ E2 )) + µ∗ (F ∩ (E1 ∪ E2 )c ). That is.16 E ⊂ ∪∞ Ej implies j=1 µ (E) ≤ ∗ ∞ µ∗ (Ej ). µ∗ (F ) ≤ µ∗ (F ∩ E) + µ∗ (F ∩ E c ) is always true. Now. suppose Ej ∈ A∗ are disjoint. Since En ∈ A∗ we have (applying the deﬁnition with the set F ∩ An ) c µ∗ (F ∩ An ) = µ∗ (F ∩ An ∩ En ) + µ∗ (F ∩ An ∩ En ) = µ∗ (F ∩ En ) + µ∗ (F ∩ An−1 ) = µ∗ (F ∩ En ) + µ(F ∩ En−1 ) + µ∗ (F ∩ An−2 ) n = j=1 µ∗ (F ∩ Ej ).

j=1 ≥ Let n → ∞ we ﬁnd that ∞ µ∗ (F ) ≥ j=1 ∗ µ∗ (F ∩ Ej ) + µ∗ (F ∩ E c ) ≥ µ (∪∞ (F ∩ Ej )) + µ∗ (F ∩ E c ) j=1 = µ∗ (F ∩ E) + µ∗ (F ∩ E c ) ≥ µ∗ (F ). E ∈ A∗ and we have proved (ii). we see that A∗ is a σ algebra and that µ∗ is a measure on it. Since any countable union can be written as the disjoint countable union. Clearly µ∗ (E) ≤ µ(E). For (iii). let E ∈ A. This proves (i). If we take F = E we obtain ∞ µ (E) = j=1 ∗ µ∗ (Ej ). which proves that E ∈ A∗ . Thus.17 Now. the measurability of An together with this gives µ∗ (F ) = µ∗ (F ∩ An ) + µ∗ (F ∩ Ac ) n n = j=1 n µ∗ (F ∩ Ej ) + µ∗ (F ∩ Ac ) n µ∗ (F ∩ Ej ) + µ∗ (F ∩ E c ). From this we conclude that A∗ is closed under countable disjoint unions and that µ∗ is countably additive. If µ∗ (E) = 0 and F ⊂ Ω. . then µ∗ (F ∩ E) + µ∗ (F ∩ E c ) = µ∗ (F ∩ E c ) ≤ µ∗ (F ).

With (i)–(iii) out of the way. choose ∞ Ej ∈ A with F ⊂ j=1 Ej and ∞ µ(Ej ) ≤ µ∗ (F ) + ε. For any ε > 0. the construction of the measure µ∗ clearly shows that whenever µ is ﬁnite or σ–ﬁnite. Next. it is clear how to deﬁne µ∗ . Since this holds for any countable covering of E by sets in A. for all E ∈ A. where Ej = E ∩ Ej i=1 Ej and these sets are disjoint and their union is E. Hence µ(E) = µ∗ (E). This is clearly a measure and it remains to prove that it is unique under the hypothesis of σ–ﬁniteness of µ. we have E = j=1 ˜ ˜ Ei . Since A ⊂ A∗ . First. let E ∈ A. we have µ(E) ≤ µ∗ (E). we have that E ∈ A∗ . σ(A) ⊂ A∗ . and A∗ is a σ–algebra. we have µ(E) = j=1 ∞ ∞ ˜ µ(Ej ) ≤ j=1 µ(Ej ). j=1 Using again the fact that µ is a measure on A. if E ⊂ j=1 Ei . This completes the proof of (iii). . Deﬁne µ(E) = µ∗ (E) for E ∈ σ(A).18 ∞ ∞ j−1 Next. Let F ⊂ Ω and assume µ∗ (F ) < ∞. ∞ µ (F ) + ε ≥ j=1 ∞ ∗ µ(Ej ) ∞ = j=1 ∗ µ(Ej ∩ E) + j=1 ∗ µ(Ej ∩ E c ) ≥ µ (F ∩ E) + µ (F ∩ E c ) and since ε > 0 is arbitrary. Since µ is a measure on A. so are the measure µ∗ and µ. Ei ∈ A.

˜ ˜ Let E ∈ σ(A) have ﬁnite µ∗ measure. E j ∈ A    . . we see that ˜ ∞ µ(E) ≤ ˜ j=1 ∞ µ(Ej ) ˜ µ(Ej ). ˜ Now let Ej ∈ A be such that E ⊂ ∪∞ Ej and j=1 ∞ µ(Ej ) ≤ µ∗ (E) + ε. j=1 ˜ ˜ Set E = ∪∞ Ej and En = ∪n Ej . we have ˜ µ∗ (E\E) ≤ ε. j=1 = This shows that µ(E) ≤ µ∗ (E). Then j=1 j=1 ˜ µ∗ (E) = lim k→∞ k→∞ = lim µ(En ) ˜ = µ(E) ˜ ˜ Since ˜ µ∗ (E) ≤ µ∗ (E) + ε. since µ(Ej ) = µ(Ej ).    ∞ ∞ µ∗ (E) = inf µ(Ej ): E ⊂ j=1 j=1 Ej . Since σ(A) ⊂ A∗ .19 Suppose there is another measure µ on σ(A) with µ(E) = µ(E) for all E ∈ A. However.

µ∗ ) is clearly complete. µ) is said to be complete if whenever E ∈ F and µ(E) = 0 then A ∈ F for all A ⊂ E. By (ii). µ∗ ) is the completion of (Ω. F ∗ . we can write any set E = ∪∞ (Ωj ∩ E) where the j=1 union is disjoint and each of these sets have ﬁnite µ∗ measure. ˜ ˜ Since ε > 0 is arbitrary. What is the diﬀerence between σ(A) and A∗ ? To properly answer this question we need the following Deﬁnition 2. A∗ . µ). A∗ . the uniqueness follows from what we have done for ˜ the ﬁnite case.4. the measure space (Ω. F. Using the fact that both µ and µ are measures. µ) is a measure space we deﬁne F ∗ = {E ∪ N : E ∈ F. Since µ∗ is σ–ﬁnite. Now. ˜ µ∗ (E) ≤ µ∗ (E) = µ(E) ˜ ˜ ≤ µ(E) + µ(E\E) ˜ ˜ ≤ µ(E) + ε. The measure space (Ω. The measure space (Ω.20 Hence. µ∗ ) is complete. F. The space (Ω. µ(E) = µ∗ (E) = µ∗ (E) for all E ∈ σ(A) of ﬁnite µ∗ measure. We can now answer the above question. and N ∈ F. Theorem 2. σ(A). F. We extend the measure µ to a measure on F ∗ by deﬁning µ∗ (E ∪ N ) = µ(E). if (Ω. µ(N ) = 0}. We leave the easy exercise to the reader to check that F ∗ is a σ–algebra. . This measure space is called the completion of (Ω.5. µ).

1.21 II INTEGRATION THEORY §1 Measurable Functions. µ) is σ–ﬁnite. f is measurable relative to F if {ω ∈ Ω : f (ω) > α} ∈ F for all α ∈ R.1. ∞}. This deﬁnition is equivalent to several others as seen by the following . we refer to measurable functions as random variables.F. This function is clearly measurable since  Ω  {x: 1A (ω) < α} = ∅  c  A 1≤α α<0 0 ≤ α < 1.1. In this section we will assume that the space (Ω. Let A ⊂ Ω be a measurable set. Remark 1. F) be a measurable space. Let f be an extended real valued function deﬁned on Ω. the function f is allowed to take values in {+∞. When we say that A ⊂ R is measurable we will always mean with respect to the Borel σ–algebra B as deﬁned in the last chapter. Deﬁnition 1. That is. When (Ω. The indicator function of this set is deﬁned by 1A (ω) = 1 0 if ω ∈ A else. F. We will say that the set A ⊂ Ω is measurable if A ∈ F. P ) is a probability space and f : Ω → R. Example 1. Let (Ω.

and complementations together with the following two identities. f2 ). Proof. (ii) {ω : f (ω) ≥ α} ∈ F for all α ∈ R.2. The following conditions are equivalent. Prove that f is measurable if and only if f −1 (E) = {ω : f (ω) ∈ E} ∈ F for every Borel set E ⊂ R. max(f1 .1. P ) be a probability space. f1 f2 . min(f1 .22 Proposition 1. ∞ {ω : f (ω) ≥ α} = n=1 {ω : f (ω) > α − 1 } n and ∞ {ω : f (ω) > α} = n=1 {ω : f (ω) ≥ α + 1 } n Problem 1. Let f be a measurable function on (Ω. (i) {ω : f (ω) > α} ∈ F for all α ∈ R. so are the functions f1 + f2 . Let f : Ω → R. F). F. (ii) {ω : f (ω) ≤ α} ∈ F for all α ∈ R. {ω : f (ω) < ∞}.2. Prove that the sets {ω : f (ω) = +∞}. (ii) With f as in (i) deﬁne µ on the Borel sets of R by µ(A) = P {ω ∈ Ω : f (ω) ∈ A}. (iii) {ω : f (ω) < α} ∈ F for all α ∈ R. Problem 1. Prove that µ is a probability measure on (R. and {ω : −∞ < f (ω) < ∞} are all measurable. Proposition 1. intersections. {ω : f (ω) = −∞}.1. These follow from the fact that σ–algebras are closed under countable unions. (i) Let (Ω. . If f and f2 are measurable. {ω : f (ω) > −∞}. f2 ) and cf1 . B). for any constant c.

ﬁrst observe that 2 {ω : f1 (ω) > α} = {ω : f1 (ω) > √ √ α} ∪ {ω : f1 (ω) < − α} 2 and hence f1 is measurable. . 2 gives the measurability of the product. F). Proposition 1. lim inf fn are measurable functions. Again. Clearly {inf fn < α} = ∪{fn < α} and {supn fn > α} = ∪{fn > α} and n hence both sets are measurable.3. In the same way. {ω : max(f1 (ω).23 Proof. For the sum note that {ω : f1 (ω) + f2 (ω)} = ({ω : f1 (ω) < r} ∩ {ω : f2 (ω) < α − r}) . f2 ). the result follows from the ﬁrst part. As for the product. then f = inf fn . The min(f1 . Let {fn } be a sequence of measurable functions on (Ω. But then writing f1 f2 = 1 2 2 (f1 + f2 )2 − f1 − f2 . f2 (ω)) > α} = {ω : f1 (ω) > α} ∪ {ω : f2 (ω) > α} gives the measurability of max(f1 . sup fn . lim sup fn . lim sup fn = inf n→∞ n sup fm m≥n and lim inf fn = sup n→∞ n m≥n inf fm . n n Proof. the fact that countable unions of measurable sets are measurable implies the measurability of the sum. where the union is taken over all the rational. Also. f2 ) follows from this by taking complements.

0). Proposition 1. (ii) |f |p . Prove that (i) f p . Problem 1. Let E = {ω ∈ Ω : lim fn (ω) exists}. p > 0. (iii) f + = max(f. ∞) is open for every α.5. Let f : Ω → R be measurable. In particular. 0) are all measurable functions. Prove that E is measurable. Suppose f is a measurable function. Let fn be a sequence of measurable functions.4.24 Problem 1.2. (i) Let Ω be a metric space and suppose the collection of all open sets are in the sigma algebra F. The sigma algebra generated by f is the sigma algebra in Ω1 generated by the collection {f −1 (A): A ∈ B}. (iv) f − = − min(f. p ≥ 1. (ii) Let ψ : R → R be continuous and f : Ω → R be measurable.4. a continuous function f : Rn → R is measurable relative to the Borel σ–algebra in Rn . Proof that f is measurable. Suppose f : Ω → R is continuous. Then f is measurable. Let fn be a sequence of measurable functions converging pointwise to the function f . This is denoted by σ(f ). .3. Proof. These both follows from the fact that for every continuous function f . Deﬁnition 1. Problem 1. {ω : f (ω) > α} = f −1 (α. Then ψ(f ) is measurable.

A function ϕ deﬁne on (Ω. Theorem 1. ϕn (ω) = i−1 2n and so. µ) is a simple function if ϕ(w) = n ai 1Ai where the Ai s are disjoint measurable sets which form a partition of Ω. ≤ f (ω) and ϕn (ω) → f (ω). If f (ω) < ∞. .25 Deﬁnition 1. n2n . Then f (ω) ∈ i i−1 2n . . ∞] be measurable. 2. Let ω ∈ Ω. if f (ω) = ∞ then ϕn (ω) = n for all n and we are done. . . Let f : Ω → [0. 2n 2 ϕn (ω) = i=1 i−1 1Ani (ω) + n1Fn (ω). . i=1 ( Ai = Ω) and the ai s are constants. . §2 The Integral: Deﬁnition and Basic Properties. F. . 2n clearly ϕn is a simple function and it satisﬁes ϕn (ω) ≤ ϕn+1 (ω) and ϕn (ω) ≤ f (ω) for all ω. f (x)) − ϕn (ω) < 2−n . ∞]) and deﬁne the simple functions n2n i−1 i . n2n . . 2. . By our deﬁnition. Fix ε > 0. .3. Fix n ≥ 1 and for i = 1. then pick n so large that 2−n < ε and f (ω) < n. deﬁne the measurable sets Ani = f −1 Set Fn = f −1 ([n. for every ω ∈ Ω. Proof. 2 n for some i = 1. n . There exists a sequence of simple functions {ϕn } on Ω with the property that 0 ≤ ϕ1 (ω) ≤ ϕ2 (ω) ≤ . Thus.1.

µ) be a measure space. (2. F.1. we deﬁne the integral of the function ϕ over the set E by n ϕ(w)dµ = E i=1 ai µ(Ai ∩ E). . If E = Ω we denote this collection of functions by L1 (µ).26 Deﬁnition 2.2) where the sup is over all simple functions ϕ with 0 ≤ ϕ ≤ f. n (i) If ϕ(w) = i=1 ai 1Ai is a simple function and E ∈ F is measurable. For example. we may have diﬀerent representations for the simple functions ϕ. It is clear from our deﬁnition of the integral that such representations lead to the same quantity and hence the integral is well deﬁned. (iv) If |f |dµ = E E f + dµ + E f − dµ < ∞ we say that the function f is integrable over the set E.) (ii) If f ≥ 0 is measurable we deﬁne the integral of f over the set E by f dµ = sup E ϕ E ϕdµ. Let (Ω. (2. and for the rest of these notes. (iii) If f is measurable and at least one of the quantities ﬁnite. that 0·∞ = 0. we deﬁne the integral of f over E to be f dµ = E E E f + dµ or E f − dµ is f + dµ − E f − dµ.1) (We adapt the convention here. Here are some basic and easy properties of the integral. We should remark here that since in our deﬁnition of simple functions we did not required the constants ai to be distinct. if A1 and A2 are two disjoint measurable sets then 1A1 ∪A2 and 1A1 + 1A2 both represents the same simple function.

. Let (Ω. µ(∅) = 0. F.27 Proposition 2. B (ii) If A ⊂ B and f ≥ 0. then A f dµ ≤ (iii) If c is a constant. This proves (i). (i) For E ∈ F deﬁne ν(E) = E ϕdµ. E (v) If µ(E) = 0. then E cf dµ = c f dµ. µ).1. Proposition 2. E = ν(E) = E n Ei . E (iv) f ≡ 0 on E. F. Let f and g be two measurable functions on (Ω. then E f dµ = 0 even if f (x) = ∞ on E. (vi) If f ≥ 0. µ) be a measure space. then E f dµ = Ω gfE f dµ.2. Then n ϕdµ = i=1 ∞ αi µ(Ai ∩ E) ∞ n = i=1 ∞ αi j=1 µ(Ai ∩ Ej ) = j=1 i=1 αi µ(Ai ∩ Ej ) = i=1 ν(Ei ). (i) If f ≤ g on E. By the deﬁnition of the integral. Let Ei ∈ F. (ii) Ω (ϕ + ψ)dµ = Ω ϕdµ + Ω ψdµ. f dµ. The ν is a measure on F. (ii) follows from (i) and we leave it to the reader. Suppose ϕ and ψ are simple functions. Proof. then f dµ = 0 even if µ(E) = ∞. then E f dµ ≤ E gdµ.

. Suppose {fn } is a sequence of measurable functions satisfying: (i) 0 ≤ f1 (ω) ≤ f2 (ω) ≤ .1 (Lebesgue Monotone Convergence Theorem). Set En = {ω: fn (ω) ≥ c ϕ(ω)}. for all n we see that if α = ∞. . Proof. Let 0 ≤ ϕ ≤ f be simple and let 0 < c < 1. Then αn is nondecreasing and it converges to α ∈ [0. E1 ⊂ E2 ⊂ . we have that . Fatou’s lemma and the Lebesgue Dominated Convergence theorem. Then fn dµ ↑ Ω f dµ. then Ω Ω f dµ = ∞ and we are done. . and (ii) fn (ω) ↑ f (ω). Set αn = Ω fn dµ. Since fn dµ ≤ Ω Ω f dµ. Assume f dµ < ∞. In addition. suppose ω ∈ Ω. we need to prove the opposite inequality. Theorem 2. If f (ω) = 0 then ϕ(ω) = 0 and ω ∈ E1 . . If f (ω) > 0 then cs(ω) < f (ω) and since fn (ω) ↑ f (ω). Since α≤ Ω f dµ. for every ω ∈ Ω. ∞].28 We now come to the “three” important limit theorems of integration: The Lebesgue Monotone Convergence Theorem. . for every ω ∈ Ω. Clearly.

Corollary 2.2 above and Proposition 2. Chapter fn dµ ≥ Ω En fn dµ ϕ(ω)dµ En ≥c = cν(En ).1. Let {fn } be a sequence of nonnegative measurable functions and set f= n=1 ∞ fn (ω). ∞ Then f dµ = Ω fn dµ. or in our notation of Proposition 2.29 ω ∈ En some n. ΩE n→∞ fn dµ ≥ cν(Ω) = c Ω Ω ϕdµ for all simple ϕ ≤ f and sup ϕ≤f Ω ϕdµ ≤ α.1. α ≥ lim and therefore. n=1 Ω Proof. Hence I. ϕdµ ≤ α. . En ↑ Ω.1 of Chapter I. Let n ↑ ∞. Hence En = Ω. Apply Theorem 2.1 to the sequence of functions n gn = j=1 fj . proving the desired inequality. By Proposition 2.

Then ∞ f (j)dµ(j) = Ω j=1 aj . Then ∞ f (ω)dµ = Ω n=1 ∞ Ω 1An dµ µ(An ) < ∞.o.1 and proposition 2. . 2. Let aij ≥ 0 for all i.2 (First Borel–Contelli Lemma). However.1 we have Corollary 2. Then ∞ ∞ ∞ ∞ aij = i=1 j=1 j=1 i=1 aij . That is. This proves the corollary. ∞ Proof. The above theorem together with theorem 1. i. 3.} = 0. Let µ be the counting measure on Ω = {1. j.30 Corollary 2. Suppose ∞ µ(An ) < ∞. n=1 Then µ{An . the set A where f (ω) = ∞ has µ–measure 0. Let {An } be a sequence of measurable sets. f (ω) < ∞ for almost every ω ∈ Ω. Let f (ω) = n=1 1An (ω) .2 gives . } and deﬁne the measurable functions f by f (j) = aj where aj is a sequence of nonnegative constants.3. From this and Theorem 2. n=1 = Thus. . f (ω) = ∞ if and only if ω ∈ An for inﬁnitely many n. .

1 gives lim inf fn dµ = lim gn dµ gn dµ Ω fn dµ. Then lim inf fn dµ ≤ lim inf Ω Ω fn dµ. Let f be a nonnegative measurable function. Then ν is a measure and gdν = Ω Ω gf dµ for all nonnegative measurable functions g. Ω n→∞ Ω n n→∞ = lim ≤ lim inf n→∞ Ω fn dµ. Set gn (ω) = inf fm (ω). Let {fn } be a sequence of nonnegative measurable functions. This proves the theorem. Proof.2 (Fatou’s Lemma). . .4. m≥n Then {gn } is a sequence of nonnegative measurable functions satisfying the hypothesis of Theorem 2. n = 1.1.31 Corollary 2. Theorem 2. 2. Since lim gn (ω) = lim inf fn (ω) n→∞ n→∞ and gn dµ ≤ Theorem 2. Deﬁne ν(E) = E f dµ. . .

Since |f (ω)| = lim |fn (ω)| ≤ g(ω) we see that f ∈ L1 (µ). Let {fn } be a sequence of measurable functions such that fn (ω) → f (ω) for every ω ∈ Ω. . Take α = sign(β) so that αβ = |β|.3 (The Lebesgue Dominated Convergence Theorem ). Proof. Since |fn − f | ≤ n→∞ 2g. Then f ∈ L1 (µ) and lim |fn − f |dµ = 0. Then f dµ ≤ Ω Ω |f |dµ. Then f dµ = |β| Ω =α Ω f dµ αf dµ = Ω ≤ Ω |f |dµ. n→∞ lim fn dµ = Ω Ω f dµ. 0 ≤ 2g − |fn − f | and Fatou’s Lemma gives 0≤ Ω 2gdµ ≤ lim = Ω n→∞ 2gdµ + lim − Ω Ω |fn − f |dµ 2gdµ − lim Ω |fn − f |dµ. Let f be a measurable function. We assume the right hand side is ﬁnite. Proof. Theorem 2. Set β = Ω f dµ.2. Suppose there is a g ∈ L1 (µ) with |fn (ω)| ≤ g(ω). Ω n→∞ In particular.32 Proposition 2.

3 (Chebyshev’s Inequality).e. F. λp µ{ω ∈ E: f (ω) > λ} = {ω∈E:f (ω)>λ} λp dµ f p dµ E ≤ E f p dµ ≤ which proves the proposition.2. .4. The second part follows from the ﬁrst. we say that fn → f almost everywhere if fn (ω) → f (ω) for all ω ∈ Ω except for a set of measure zero. if there exists a measurable subset N ⊂ E such that P holds for all E\N and µ(N ) = 0. . µ). In the same way. For example. µ) be a measure space. E Proof. F. Fix 0 < p < ∞ and let f be a nonnegative measurable function on (Ω. Then for any measurable set e we have µ{ω ∈ E: ∈: f (ω) > λ} ≤ 1 λp f p dµ: .33 It follows from this that lim Ω |fn − f |dµ = 0. Since |fn dµ − Ω Ω f dµ ≤ Ω |fn − f |dµ. Proposition 2. and write this as i. Let (Ω. Proposition 2. Let P be a property which a point ω ∈ Ω may or may not have. the ﬁrst part follows. Deﬁnition 2. We say that P holds almost everywhere on E. f = 0 almost everywhere if f (ω) = 0 except for a set of measure zero.

(ii) Suppose f ∈ L1 (µ) and E f dµ = 0 for all measurable sets E ⊂ Ω. a. a. For (ii).e. Then f = 0 a. which proves (i). Observe that ∞ {ω ∈ E: f (ω) > 0} = n=1 {ω ∈ Ω: f (ω) > 1/n}. Then f + dµ = E E f dµ = 0. set E = {ω: f (ω) ≥ 0} = {ω: f (ω) = f + (ω)}. Suppose f dµ = 0. Proof. By Proposition 2. But then f dµ = − E E f − dµ = 0 and this again gives f − dµ = 0 E which implies that f − = 0. E Then f = 0 a. . Therefore. µ{f (ω) > 0} = 0.34 (i) Let f be a nonnegative measurable function. on E. µ{ω ∈ E: f (ω) > 1/n} ≤ n E f dµ = 0.e. on Ω.3.e.e. which by (i) implies that f + = 0.

µ) be a probability space. b) = R is permitted) is convex if ψ((1 − λ)x + λy) ≤ (1 − λ)ψ(x) + λψ(y). Let (x) = c1 x + c2 be the equation of the supporting line of the convex function ψ at the point (t. Let (Ω. satisﬁes (t) = ψ(t) and . F.3) ψ Ω f dµ ≤ Ω ψ(f )dµ. This “easy” to see geometrically but the proof is not as trivial. The measurability of the function ψ(f ) follows from the continuity of ψ and the measurability of f using Proposition 1. The ψ(f ) is measurable and (2. An important property of convex functions is that they are always continuous. Prove that (2.5 (Jensen’s Inequality).3. ψ(t)). Let f ∈ L1 (µ) and a < f (ω) < b. Proposition 2. Since a < f (ω) < b for all ω ∈ Ω and µ is a probability measure.3) is equivalent to the following statement: For all a < s < t < u < b. ϕ(u) − ϕ(t) ϕ(t) − ϕ(s) ≤ . t−s u−t and conclude that a diﬀerentiable function is convex if and only if its derivative is a nondecreasing function. then a < t < b. What follows easily from the deﬁnition is Problem 2. b).35 Deﬁnition 2. Suppose ψ is convex on (a. Proof. That is. for all 0 ≤ λ ≤ 1. we see that if t= Ω f dµ. b) → R (the “interval” (a. The function ψ: (a.4.1.

(i) Let ϕ(x) = ex .36 ψ(x) ≥ (x) for all x ∈ (a. + exn }. (ii) If Ω = {1. ψ(f (ω)) ≥ c1 f (ω) + c2 = (f (ω)).2. . . n} with the measure µ deﬁned by µ{i} = 1/n and the function f given by f (i) = xi . + xn )} ≤ {ex1 + . The for all ω ∈ Ω. αn be a sequence of positive numbers with α1 + · · · + αn = 1 and let y1 . · · · . (y1 . n n Setting yi = exi we obtain the Geometric mean inequality. f dµ ≤ Ω ef dµ. . 2. Examples. extend this example in the following way.1. Problem 2. . That is. . Prove that α α y1 1 · · · yn n ≤ α1 y1 + · · · + αn yn . · · · . . Integrating this inequality and using the fact that µ(Ω) = 1. . Then exp Ω =ψ Ω f (ω)dµ . Let α1 . . . The existence of such a line follows from Problem 2. . + yn ) . we obtain 1 1 exp{ (x1 + x2 + . . . yn )1/n ≤ 1 (y1 + . n More generally. . yn be positive numbers. b). we have ψ(f (ω))dµ ≥ c1 Ω Ω f (ω)dµ + c2 f (ω)dµ Ω = which is the desired inequality. . .

∞] =. < ∞. If E = ∅. we have |f g(ω)| ≤ g ∞ |f (ω)|. Let 0 < p < ∞ and set 1/p f We say that f ∈ Lp (µ) if f p p = Ω |f | dµ p . Let f ∈ Lp (µ) and g ∈ lq (µ). Since ∞ −1 ∞ f and µf −1 ( f ∞ ( f ∞. If p = 1 and q = ∞. ∞] = n=1 f −1 ∞ f ∞ + 1 . or q = 1 and p = ∞. If E = ∅. q = 2. F.37 Deﬁnition 2. deﬁne f ∞ = inf E. < ∞. Then f g ∈ L1 (µ) and |f g|dµ ≤ f p g q. If p = 1 we take q = ∞. (i) (H¨lder’s inequality) Let 1 ≤ p ≤ ∞ and let q be its conjugate exponent. Theorem 2. Let 1/p 1/q A= Ω f dµ p and B = g dµ q . That o is.4. The function < ∞. deﬁne f ∈ L∞ (µ) if f Suppose f ∞ f ∞ = ∞.4. Then f +g p ≤ f p + g p. 1 p + 1 q = 1. (ii) (Minkowski’s inequality) Let 1 ≤ p ≤ ∞. The quantity f is called the essential supremum of f . To deﬁne L∞ (µ) we set E = {m ∈ R+ : µ{ω: |f (ω)| > m} = 0}. Let (Ω. Also note that when p = 2. µ) be a measure space. we see f ∈ E. This immediately gives the result when p = 1 or p = ∞. ∞ n ∞ 1 + n . Proof. Assume 1 < p < ∞ and (without loss of generality) that both f and g are nonnegative.

Then F p dµ = Ω Ω Gp dµ = 1 and by Problem 2. Next. x ∈ R+ is convex. As before. p q Integrating both sides of this inequality gives F (ω)G(ω) ≤ 1 1 + = 1. (f + g)p = (f + g)(f + g)p−1 = f (f + g)p−1 + g(f + g)p−1 together with H¨lder’s inequality and the fact that q(p − 1) = p. For (ii). Put F = f /A > 0. . gives o 1/p 1/q f (f + g)p−1 dµ ≤ Ω Ω f p dµ Ω 1/p (f + g)q(p−1) dµ 1/q = Ω f dµ Ω p (f + g) dµ p . Assume therefore that 1 < p < ∞. G = g/B. we have f +g 2 This gives (f + g)p dµ ≤ 2−(p−1) Ω Ω p ≤ 1 p 1 p f + g . we may assume that both f and G are nonnegative. We start by observing that since the function ψ(x) = xp . p q which implies the (i) after multiplying by A · B.38 If A = 0. Assume 0 < A < ∞ and same for B. f + g ∈ Lp (dµ). 2 2 f p dµ + 2−(p−1) Ω g p dµ. then f = 0 almost everywhere and if B = 0. then g = 0 almost everywhere and in either case the result follows. Thus. the cases p = 1 and p = ∞ are again clear.2 F (ω) · G(ω) ≤ 1 p 1 q F + G .

h ∈ Lp (µ). . for all f. Theorem 2. Let gk be a sequence of functions in Lp and (0 < p < ∞) satisfying gk − gh+1 Then {gk } converges a. 1 ≤ p ≤ ∞. Set Ak = {ω: |gk (ω) − gk+1 (ω)| > 2−k }. 2. Since f + g ∈ Lp (dµ).1. Lp (µ). µ{An } ≤ 2kp ≤ 1 4 1 = kp 2 |gk − gk+1 |p dµ Ω kp p ≤ 1 4 k . By Chebyshev’s inequality. . we may divide by the last expression in brackets to obtain the desired inequality. For 1 ≤ p ≤ ∞. . Minkowski’s inequality shows that this function satisﬁes the triangle inequality. Adding these inequalities we obtain 1/p 1/p 1/q (f + g) dµ ≤ Ω Ω p f dµ p + Ω g dµ Ω p (f + g) dµ p . d(f.e. g). That is.39 In the same way. ·). Proof. h) + d(h. is complete with respect to d(· . g. Lemma 2. ·).4. k = 1. 1/p 1/q g(f + g) dµ ≤ Ω Ω p g dµ Ω p (f + g) dµ p . It follows that Lp (µ) is a metric space with respect to d(·. g) = f − g p = f −h+h−g ≤ f −h p p p + h−g = d(f. · 2kp . g) = f − g p .. For f. g ∈ Lp (µ) deﬁne d(f.

for almost all ω ∈ {An i. The for each k = 1. there is a nk such that fn − fm p 1 ≤ ( )k 4 for all n.40 This shows that µ(An ) < ∞. by Lemma 2. Thus. there is a set Ak so such that µ(Ak ) = 0 for every ω ∈ Ac .e. Let A = ∪Ak . Then given ε > 0 there is an N such that for all n > N and ω ∈ Ac . It follows from this that {gk (ω)} is Cauchy in R and hence { gk (ω)} converges.2. For the converse. We need to show that f ∈ Lp and that it is the Lp (µ) limit of {fn }. Take N so large that . Thus. suppose fn → f uniformly on Ac and µ(A) = 0. Proof of Theorem 2.. Now. That is.1. for all k > N . fm ) = fn − fm < ε for all n. m ≥ N . m > N . Suppose fn − f ∞ → 0. Then µ(A) = 0 and k fn → f uniformly on Ac . Thus.o. . suppose {fn } is Cauchy in Lp (µ). We proof the ﬁrst statement. By Corollary 2. leaving the second to the second to the reader. . . Then for each k > 1 there is an n > n(k) suﬃciently ∞ 1 k large so that fn − f and |fn (ω) − f (ω)| < < 1 k. 2. |fn (ω)−f (ω)| < ε. The sequence of functions {fn } converges to f in L∞ if and only if there is a set measurable set A with µ(A) = 0 such that fn → f uniformly on Ac . Also. there is a N such that for all n.} = 0. µ{An i.o. .4. Proof. the sequence {fn } is Cauchy in L∞ if and only if there is a measurable set A with µ(A) = 0 such that {fn } is uniformly Cauchy in Ac . This is the same as saying that fn −f ∞ < ε for all n > N . Lemma 2. Assume 1 ≤ p < ∞. d(fn . fnk (ω) → f a. Let ε > 0.}c there is an N = N (ω) such that |gk (ω) − gk+1 (ω)| ≤ 2−k . given any ε > 0.2. m ≥ nk .

Let {fn } be Cauchy in L∞ . Therefore the sequence {fn } converges uniformly on Ac to a function f . by Lemma 2. f p = fm − fm − f p ≤ fm p + fm − f p. |fn (ω) − fm (ω)| < ε. Then by the pointwise convergence of the subsequence and by Fatou’s Lemma we have |f − fm |p dµ = Ω Ω k→∞ lim |fnk − fm |p dµ |fnk − fm |p Ω ≤ lim inf <ε . Then fn converges to f in L∞ (µ) and f ∈ L∞ (µ). .41 fn −fm p < ε for all n. In the course of proving Theorem 2. But then. There is a set A with µ(A) = 0 such that fn is uniformly Cauchy on Ac . Then there exists a subsequence {fnk } with fnk → f a.e. Fix such an m. suppose p = ∞. m ≥ N . m > N and all ω ∈ Ac . as k → ∞. Deﬁne f (ω) = 0 for ω ∈ A. Therefore fn → f in Lp (µ) and k→∞ p |f − fm |p dµ < ∞ Ω for m suﬃciently large. That is. Let fn ∈ Lp (µ) with 1 ≤ p < ∞ and fn → f in Lp (µ).2. Now. given ε > 0 there is an N such that for all n. 1 ≤ p < ∞ converges in Lp (µ). then there is a subsequence which converges a. Corollary 2.e.4 we proved that if a sequence of functions in Lp .5. The following Proposition will be useful later. which shows that f ∈ Lp (µ). This result is of suﬃcient importance that we list it here as a corollary.

Suppose the statement is false. F.}. n→∞ lim µ{ω ∈ Ω: |fn (ω) − f (ω)| ≥ ε} = 0. Let f ∈ L1 (µ). Also. Then n=1 j=n the Borel–Cantelli Lemma. n→∞ |f |dµ En This is a contradiction since µ(A) = 0 and therefore the integral of any function over this set must be zero. An+1 ⊂ An for all n and since ν(E) = E |f |dµ is a ﬁnite measure.42 Proposition 2. µ(A) = 0. §3 Types of convergence for measurable functions. we have |f |dµ = lim A n→∞ |f |dµ An ≥ lim ≥ ε. Deﬁnition 3. Given ε > 0 there exists a δ > 0 such that |f |dµ < ε whenever µ(E) < δ. Let {fn } be a sequence of measurable functions on (Ω.1.6. Then we can ﬁnd an ε > 0 and a sequence of measurable sets {En } with |f |dµ ≥ ε En and µ(En ) < 1 2n µ(En ) < ∞ and by Let An = ∪∞ Ej and A = ∩∞ An = {En i. . µ). (i) fn → f in measure if for all ε > 0.o. (ii) fn → f almost uniformly if given ε > 0 there is a set E ∈ F with µ(E) < ε such that fn → f uniformly on E c . E Proof.

for each k take Ak ∈ F with µ(Ak ) < 1 k and fn → f uniformly on Ac . proving that fn → f in measure. Next. Suppose fn → f in Lp . That is. To see this simply observe that 1 1 εp |fn − f |p dµ. . k 1 k If E = ∪∞ Ac . for all n ≥ N . for all n ≥ N . n→∞ lim µ{|fn (ω) − f (ω)| ≥ η} = 0. µ{|fn (ω) − f (ω)| ≥ η} < ε.43 Proposition 3. fn → f in measure.2. given ε > 0 there is a measurable set E such that µ(E) < ε and fn → f uniformly on E c . Since ε > 0 was arbitrary we see that for all η > 0. By Chebyshev’s inequality µ{|fn − f | ≥ ε} ≤ and the result follows. Ω fn and that fn ∞ n p = 0 |fn (x)|p dx = 1 np e →∞ n = e → ∞. Then. Suppose fn → f almost uniformly.1. Hence. Let 1 en 0 ≤ ω ≤ n fn (ω) = 0 else Then fn → 0 in measure but fn → 0 in Lp (µ) for any 0 < p ≤ ∞. as n → ∞. then fn → f on E and µ(E c ) = µ (∩∞ Ak ) ≤ µ(Ak ) < k=1 k k=1 for all k. Let {fn } be measurable and 0 < p < ∞. Proposition 3. Let η > 0 be given.1. Since fn → f almost uniformly. Then fn → f in measure and almost everywhere. Thus µ(E c ) = 0 and we have the almost everywhere convergence as well. Proof. Let Ω = [0. There is a N = N (η) such that |fn (ω) − f (ω)| < η for all n ≥ N and for all ω ∈ E c . {µ: |fn (ω) − f (ω)| ≥ η} ⊆ E. Example 3. Proof. 1] with the Lebesgue measure.

set En = ∪∞ Ak . Since µ{|fn − fm | ≥ ε} ≤ µ{|fn − f | ≥ ε/2} + µ{|fn − f | ≥ ε/2}. n + 1. µ{An i. .e. . For the uniqueness. there is an N = N (ω) such that |gk+1 (ω) − gk (ω)| < 1 2k for all k ≥ N . If 1 2k ω ∈ En . This implies that the sequence of real numbers {gk (ω)} is Cauchy and hence it converges to g(ω). k=1 By the Borel–Cantelli Lemma. This completes the proof. . n + 2.o. take nk such that nk+1 > nk and µ{|fn (ω) − fm (ω)| ≥ 1 1 }≤ k k 2 2 1 } 2k for all n.} = 0. then |gk (ω) − gk+1 (ω)| < c for all k ∈ {n. Then there is a subsequence {fnk } which converges almost uniformly to f . we see that µ{fn − fm | ≥ ε} → 0 as n and m → ∞. Setting gk = fnk and Ak = {ω ∈ Ω: |gk+1 (ω) − gk (ω)| ≥ see that ∞ we µ(Ak ) < ∞. Since we also have fnk → g almost uniformly clearly. }. Thus gk → g uniformly on En . However. Corollary 2.2.e.o. . To get the almost uniform convergence. Thus gk → g a. m ≥ nk .44 Proposition 3.}. suppose fn → f in measure.3. Then µ(En ) ≤ k=n ∞ k=n µ(Ak ) and this can be made smaller than ε as soon as n is large enough. For each k. Then fnk → f in measure also. fnk → g in measure and hence f = g a. Suppose fn → f in measure. Proof. for every ω ∈ {An i.

to f . For each k there is a n(k) such that if Ak = n=n(k) ∞ {ω ∈ Ω: |fn − f | ≥ 1 }. For measurable functions we have the following result. Let εk be a sequence converging down to 0. Proof. Proof. F.45 Theorem 3. Proposition 3.3. The sequence of measurable functions {fn } on (Ω.e. < δ. Thus if ∞ A= k=1 ∞ Ak . k then µ(Ak ) ≤ ε/2k . Let ε > 0 be given.1 (Egoroﬀ ’s Theorem). Then fn → f almost uniformly. F. ∞ 1 . |fn (ω) − f (ω)| < Ac .2 below.e. 1 < δ and then for k then µ(A) ≤ k=1 µ(Ak ) < ε. µ) converges to f in measure if and only if every subsequence {fnk } contains a further subsequence converging a. Suppose (Ω. as n → ∞ for each k. We use Problem 3. if δ > 0 take k so large that 1 k any n > n(k) and ω ∈ A. 2k µ{|fnk − f | > εk } < ∞ k=1 . µ) is a ﬁnite measure space and that fn → f a. We therefore have a subsequence fnk satisfying µ{|fnk − f | > εk } ≤ Hence. Now. Thus fn → f uniformly on Let us recall that if {yn } is a sequence a sequence of real numbers then yn converges to y if and only if every subsequence {ynk has a further subsequence {ynkj } which converges to y. Then µ{|fn − f | > εk } → 0.

F.46 and therefore by the ﬁrst Borel–Cantelli Lemma. Therefore {yn } converges to 0 and hence That is fn → 0 in measure. Prove that fn → f a.e. That is..e.e.e. in measure and in Lp (µ) but that fn → 0 almost uniformly. Let (Ω. (i) Give an example of a sequence of nonnegative measurable functions fn for which we have strict inequality in Fatou’s Lemma. n→∞ n→∞ (converges) Suppose fn is a sequence of nonnegative measurable functions on (Ω. F.e. Let Ω = [0. f1 (ω) ≥ f2 (ω) ≥ · · · ≥ 0 and .o. µ) be a ﬁnite measure space.2.3. Thus |fnk − f | < εk eventually a. If every fnk subsequence has a subsequence fnkj such that fnkj → f a. let ε > 0 and put yn = µ{|fn − f | > ε} and consider the subsequence ynk .} = 0. Then {ynk } has a subsequence. For the converse.1. ynkj → 0. Prove that fn → 0 a. ∞) with the Lebesgue measure and deﬁne fn (ω) = 1 1An (ω) where An = {ω ∈ Ω: n ≤ ω ≤ n + n }. Problem 3. µ{|fnk − f | > εk i. if and only if for all ε > 0 ∞ n→∞ lim µ k=n Ak (ε) =0 where Ak (ε) = {ω ∈ Ω: |fk (ω) − f (ω)| ≥ ε}. Problem 3. (ii) Let (Ω. Problem 3. µ) be a measure space and {An } be a sequence of measurable sets. F. µ) which is pointwise decreasing to f . ∞ ∞ Recall that lim inf An = n n=1 k=n Ak and prove that µ{lim inf An } ≤ lim inf µ{An }. Thus fnk → f a.

n→∞ Problem 3.7. f1 . F. Let (Ω. is right continuous and nonincreasing.47 fn (ω) → f (ω). Prove that p→0 lim f p = exp{ Ω log |f |dP } where exp{−∞} is deﬁned to be zero. Prove that lim sup n→∞ |fn | ≤ 1. F. λn a. Let (Ω.5. f2 are nonnegative measurable and λ1 .e.6. Prove that the function µ{|f | > λ}. . for λ > 0. µ) be a ﬁnite measure space.e. Let (Ω. Problem 3. µ{f > (λ1 + λ2 )λ} ≤ µ{f1 > λ} + µ{f2 > λ}. on Ω to f . Problem 3. Let {fn } be a nondecreasing sequence of measurable nonnegative functions converging a. λ2 are positive numbers with the property that f ≤ λ1 f1 + λ2 f2 . if f.4. F. then for all λ > 0. P ) be a probability space and suppose f ∈ L1 (P ). Prove that lim µ{fn > λ} = µ{f > λ}. µ) be a measure space and suppose {fn } is a sequence of measurable functions satisfying ∞ µ{|fn | > λn } < ∞ n=1 for some sequence of real numbers λn . Is it true that lim fn dµ = Ω Ω n→∞ f dµ? Problem 3. Furthermore.

} = 0 Problem 3.o} = 0 (converges) Suppose the functions are nonnegative. as a ﬁnite number.e. . and compute lim xn f (x)dx.9. µ) be a ﬁnite measure space. Prove that xn f ∈ L1 (Ω) for every n = 1. (i) Prove that fn converges to f a. if and only if for all ε > 0 µ{|fn | > ε. .e. if and only if µ{f > 1} = 0. i. . F. F. Problem 3. 2. µ) be a ﬁnite measure space and f a nonnegative real valued measurable function on Ω. for some n > n ≥ m} = 0 m→∞ (ii) Prove that fn → 0 a. if and only if for all M > 0 µ{fn < M. Ω n→∞ Problem 3.10. 1] with its Lebesgue measure. Prove that fn → ∞ a. Suppose f ∈ L1 (µ). Let (Ω.48 Problem 3.8. Prove that lim f n dµ Ω n→∞ exists.o. Let (Ω. Suppose f ∈ L1 (Ω). i. Prove that lim f dµ = 0 {|f |>n} n→∞ . Let Ω = [0. if and only if for any ε > 0 lim µ{|fn − fn | > ε.e. Let {fn } be a sequence of measurable functions on this space.11.

P ) be a probability space. Prove that p→0 lim f p = exp{ Ω log |f |dP } where exp{−∞} is deﬁned to be zero. Let (Ω. Let E be the set of all x ∈ Ω such that f (x) is an integer. Let (Ω. P ) be a probability space. Prove that lim Ω n→∞ |f |n+1 dP |f |n dP Ω = f ∞ Problem 3. Suppose f ∈ L∞ (P ) and f ∞ > 0. on Ω. µ) be a ﬁnite measure space and f a measurable function on this space. Suppose f and g are positive measurable function such that f g ≥ 1 a.12. Let Ω = [0. Problem 3.49 Problem 3.16. Let F be a bounded uniformly continuous function on R. P ) be a probability space and fn be a sequence of measurable functions converging to zero in measure. Prove that lim F (fn )dP = F (0) Ω n→∞ . P ) be a probability space and suppose f ∈ L1 (P ). Let (Ω. Let (Ω. F. 1] and let (Ω. F. F.14. Prove that the set E is measurable and that lim (cos(πf (x))2n dµ = µ(E) Ω n→∞ Problem 3.e. Prove that f gdP ≥ 1 Ω Problem 3.15.13. F. F.

fn be measurable functions. µ) be a measure space and let fn be a sequence of measurable functions satisfying fn p ≤ n p . f2 . Let (Ω.20. (i) Suppose F : R → R is a continuous function and fn → f in measure. F. P ) is a probability space and that f ∈ L1 (P ) in nonnegative. Then f dµ ≤ lim Ω Ω fn dµ. Let (Ω. F. Problem 3.17. Prove that the 1 1 sequence { n fn } converges to zero almost everywhere. F. P ) be a probability space. Prove that 1 n 1 fj (x) dµ(x) ≤ n j=1 p n n |fj (x)|p dµ(x) Ω j=1 Ω and Ω 1 n n p  fj (x) dµ(x) ≤  1 n n p fj p  j=1 j=1 Problem 3. for 2 < p < ∞. µ) be a measure space and let f1 . Let (Ω. Suppose 1 < p < ∞. (ii) Suppose |fn | ≤ g where g ∈ L1 (µ) and fn → f in measure. Then f dµ = lim Ω Ω fn dµ.19. F.50 Proclaim 3. · · · . (i) If fn ≥ 0 and fn → f in measure. Problem 3. Suppose (Ω. Prove that 1+ f 2 1 ≤ Ω 1 + f 2 dP ≤ 1 + f 1 .18. Prove that F (fn ) → F (f ) in measure.

21.22. 4 2 . Prove that η 1 2 (1 − η) ≤ P {ω ∈ Ω : |f (ω)| ≥ }. justifying all your steps. Problem 3.51 Problem 3. Let probtrip be a probability space. n n→∞ lim 1− 0 x n n ex/2 dx. Let f be a measurable function with the property that f 2 = 1 and f 1 1 = 2 . Compute.

.1. If A ⊂ X.1. where the Ri are disjoint measurable rectangles. obvious. We shall denote by A×B the σ–algebra generated by the measurable rectangle which is the same as the σ–algebra generated by the elementary sets. However. E is closed under complementation and ﬁnite unions. in some sense. we need to be properly state them and carefully prove them so that they may be freely used in the subsequent Chapters. Suppose (X. If X and Y are any two sets. (X. B ∈ B. ∪ Rn . Many of the deﬁnitions and properties of product measures are.52 III PRODUCT MEASURES Our goal in this chapter is to present the essentials of integration in product space. their Cartesian product X × Y is the set of all order pairs {(x. §1 Deﬁnitions and Preliminaries. That is. We begin by deﬁning the product measure. A measurable rectangle is a set of the form A × B. is called an elementary sets. y ∈ Y }. . . Deﬁnition 1. B ⊂ Y. B) are measurable spaces. y): x ∈ X. Exercise 1. A×B ⊂ X×Y is called a rectangle. We denote this collection by E. Prove that the elementary sets form an algebra. A). A set of the form Q = R1 ∪ . A ∈ A.

We show Ω is a σ–algebra containing all measurable rectangles. then Ex ∈ B and E y ∈ A for all x ∈ X and y ∈ Y . let Ω be the collection of all sets E ∈ A × B for which Ex ∈ B for every x ∈ X. Thus. We shall only prove that if E ∈ A × B then Ex ∈ B. We shall . observe that Ex = ∞ i=1 (Ei )x A is a σ algebras shows that E ∈ Ω. where (Ei )x ∈ B. the fact that For (iii). This follows from the fact that (E c )x = (Ex )c .1. Proof. To see this. Let E ⊂ X × Y and deﬁne the projections Ex = {y ∈ Y : (x. In the case when we have several σ–algebras it will be important to clearly distinguish measurability relative to each one of these sigma algebras. deﬁne fx : Y → R by fx (y) = f (x. We next show that the projections of measurable functions are measurable. (i)–(iii) show that Ω is a σ–algebra and the theorem follows. and E y = {x ∈ X: (x. For ﬁx x ∈ X. the case of E y being completely the same. Let f : X × Y → R. y) with a similar deﬁnition for f y . and that A is a σ–algebras. E ⊂ Ω. For this. Once again. If E ∈ A × B. The collection Ω also has the following properties: (i) X × Y ∈ Ω. ∞ (iii) If Ei ∈ Ω then E = i=1 Ei ∈ Ω. y) ∈ E}.53 Theorem 1. note that if E =A×B then Ex = B ∅ if x ∈ A if x ∈ A. y) ∈ E}. (ii) If E ∈ Ω then E c ∈ Ω.

That is: (i) If A1 ⊂ A2 ⊂ .1.2. then ∩Bi ∈ M. then ∪Ai ∈ M (ii) If B1 ⊃ B2 ⊃ . and Bi ∈ M. By Exercise 1.1 that Qx ∈ B and hence fx ∈ σ(B). First we . It is enough to show that F ⊂ M0 . Suppose f ∈ σ(A × B). The same argument proves (ii). y): f (x.1 (Monotone Class Theorem).54 use the notation f ∈ σ(F) to mean that the function f is measurable relative to the σ–algebra F. Lemma 1. That is. Since f ∈ σ(A × B). Let F0 be an algebra of subsets of X and let M be a monotone class containing F0 . f y ∈ σ(A) −1 Proof. we only need to prove that M0 is an algebra. −1 Qx = fx (V ) = {y: fx (y) ∈ V }. Proof. y) ∈ V }. Q ∈ F × G. and Ai ∈ M. If F denotes the σ–algebra generated by F0 then F ⊂ M. Theorem 1. . We need to show that fx (V ) ∈ B. fx ∈ σ(B) (ii) For each y ∈ X. . . M0 is the intersection of all the monotone classes which contain F0 . and it follows by Theorem 1. Then (i) For each x ∈ X. A monotone class M is a collection of sets which is closed under increasing unions and decreasing intersections.2. Deﬁnition 1. Put Q = f −1 (V ) = {(x. . Let M0 be the smallest monotone class containing F0 . Let V be an open set in R. However.

µ) and (Y. Again Ω2 is a monotone class. F ∈ M0 then E ∪ F ∈ M0 .55 prove that M0 is closed under complementation.2. M0 ⊂ Ω and this proves it. Thus. Exercise 1. Since M0 ∈ Ω1 .3. If ϕ(x) = ν(Qx ) then ϕ ∈ σ(A) and ϕ(x)dµ(x) = X Y and ψ(y) = µ(Qy ). if E. Prove that they agree on the σ–algebra F generated by F0 . Lemma 2.1. Prove that an algebra F is a σ–algebra if and only if it is a monotone class. if E ∈ F0 then E ∈ Ω. Again the fact that M0 is a monotone class implies that Ω1 is also a monotone class and since clearly F0 ⊂ Ω1 . if E ∈ M0 . We begin this section with a lemma that will allow us to deﬁne the product of two measures. then E ∪ F ∈ M0 . (2. Let F ∈ F0 . B. Exercise 1. It follows from the fact that M0 is a monotone class that Ω is also a monotone class and since F0 is an algebra. ν) be two σ–ﬁnite measure spaces. This shows that M0 is an algebra and completes the proof. Thus F0 ⊂ Ω2 and hence M0 ⊂ Ω2 . let Ω1 = {E : E ∪ F ∈ M0 for all F ∈ F0 }. Thus. For this let Ω = {E : E c ∈ M0 }. Let (X. A.1) . and ψ ∈ σ(B) ψ(y)dν(y). we have M0 ⊂ Ω1 . Let F0 be an algebra and suppose the two measures µ1 and µ2 agree on F0 . Next. Suppose Q ∈ A × B. §2 Fubini’s Theorem. Deﬁne Ω2 = {F : F ∪ E ∈ M0 for all E ∈ M0 }.

2) and µ(Qy ) = X χQ (x. Lemma 2. y)dν(y) (2.3) Thus (2. With the notation of §1 we can write ν(Qx ) = Y χQ (x. where the second to the last equality follows from the Monotone Convergence Theorem. (2.1) is equivalent to χQ (x. Recalling that (∪Qj )x = ∪(Qj )x and using the fact that ν is a measure we have  (µ × ν)  j=1 ∞  Qj  = X  ν  ∞   Qj   dµ(x) j=1  = X ∞ ∞ x  ν  j=1 Qjx  dµ(x) = X j=1 ∞ ν(Qjx )dµ(x) ν(Qjx )dµ(x) j=1 ∞ X = = j=1 (µ × ν)(Qj ). y)dµ(x)dν(y).4) To see that this is indeed a measure let {Qj } be a disjoint sequence of sets in A × B.1 allows us to deﬁne a new measure µ × ν on A × B by (µ × ν)(Q) = X ν(Qx )dµ(x) = Y µ(Qy )dν(y). y)dν(y)dµ(x) = X Y Y X χQ (x.56 Remark 2.1. y)dµ(x).2. (2. Remark 2. .

That is. let  ϕn (x) = ν((Qn )x ) = ν  j=1 n   Qj   x . By Exercise 1. ∞ ϕ(x)dµ(x) = µ(A)ν(B) X ϕ(y)dν(y) = µ(A)ν(B). Integrating we obtain that            proving (i). this will show that M = F × G. This will be done in several stages. We assume µ(X) < ∞ and ν(Y ) < ∞. = χA (x)ν(B) and clearly ϕ ∈ σ(A). We will prove that M is a monotone class which contains the elementary sets. Let M be the collection of all Q ∈ A × B for which the conclusion of the Lemma is true. A ∈ A. To prove (i) observe that Qx = Thus ϕ(x) = ν(B) B ∅ if x ∈ A if x ∈ A.57 Proof of Lemma 2. B ∈ B. .1. Similarly. . ψ(y) = 1B (y)µ(A) ∈ B.1 and the Monotone class Theorem. E ⊂ M. Y (ii) Let Q1 ⊂ Q2 ⊂ . Qj ∈ M. Then Q ∈ M. To prove this. First we prove that rectangles are in M. (i) Let Q = A × B. . Then Q = j=1 Qj ∈ M. if x ∈ A 0 if x ∈ A.

we have ϕ ∈ σ(A) and ψ ∈ σ(B). ∞ (iv) Let {Qi } ∈ M with Qi ∩ Qj = ∅. This proves the Lemma for ﬁnite measure and the following exercise does the rest. It follows from (i)–(iv) that M is a monotone class containing the elementary sets E. . j=1 Then ϕn (x) ↑ ϕ(x) = ν(Qx ) and ψn (x) ↑ ψ(x) = µ(Qx ). respectively. By Monotone Convergence Theorem. . ˜ For the proof of this. Also by assumption ϕn (x)dµ(x) = X Y ψn (y)dν(y). Since ϕn ∈ σ(A) and ψn ∈ σ(B). That is. ϕ(x)dµ(x) = X Y ϕ(y)dν(y) and we have proved (ii). for all n.58 and  ψn (y) = µ(Qy ) = µ  n n y  Qj   . ψn (y) = µ(Qy ) n are both decreasing to ϕ(x) = ν(Qx ) and ψ(y) = µ(Qy ). this time the sequences ϕn (x) = ν((Qn )x ). By the Monotone Class Theorem and Exercise 1. Then Qn ∈ M. ˜ However. and since since both measures are ﬁnite. the Qn s are increasing and it follows from (ii) that their union is in M. . Then Q = j=1 Qj ∈ M. proving (iv).1. . ∞ (iii) Let Q1 ⊃ Q2 ⊃ . both sequences of functions are uniformly bounded. Then j=1 Qi ∈ M. A×B = σ(E) = M. Qj ∈ M. let Qn = n i=1 ˜ Qi . since the sets are disjoint. The proof of this is the same as (ii) except this time we use the Dominated Convergence Theorem.

(c) If f ∈ L1 (µ × ν). Q ∈ A×B. B. y) ∈ X × Y . (2. Let 0 ≤ s1 ≤ .1 (Fubini’s Theorem).5) then ϕ ∈ σ(A). y) ↑ f (x. the result follows from Lemma 2. Theorem 2. . x ∈ X.6) and ϕ∗ (x)dµ(x) < ∞ X then f ∈ L1 (µ × ν). y ∈ Y . y)|dν(y) < ∞ (2. By linearity we also have the result for simple functions. be nonnegative simple functions such that sn (x. Let (X. then fx ∈ L1 (ν) for a.1. Let f ∈ σ(A × B). A. Let ϕn (x) = Y (sn )x (y)dν(y) . the functions deﬁned in (2. µ) and (Y.5) are measurable and (2. f y ∈ L1 (µ) for a.1. (a) (Tonelli) If f is nonnegative and if ϕ(x) = X fx (y)dν(y).1 to the case of σ–ﬁnite measures. y) for every (x. ν) be σ–ﬁnite measure spaces.6) holds. Extend the proof of Lemma 2. . Proof of (a).e.e.6) holds. and (2. ψ ∈ σ(B) and ϕ(x)dµ(x) = X X×Y f (x. ψ(y) = X f y (x)dµ(x). A similarly statement holds for y in place ofx. If f = χQ . y)d(µ × ν) = Y ψ(y)dν(y). (b) If f is complex valued such that ϕ∗ (x) = Y |f (y)|x dν(y) = X |f (x.59 Exercise 2.

n Then ϕn (x)dµ(x) = X X×Y sn (x. y)d(µ × λ) ψn (y)dµ(y). 1] with µ = the Lebesgue measure and ν = the counting measure. n n . This can be seen as follows: Set Ij = and Qn = (I1 × I1 ) ∪ (I2 × I2 ) ∪ . y)dν(y) = 1. y) ↑ f (x. y) = 0 if x = y. ϕn (x) ↑ ϕ(x) and ψn (y) ↑ ψ(y).2. ∪ (In × In ). and hence also f . Y = Since sn (x. Then Qn is measurable and so is Q = ∩Qn . y)dµ(x) = 0. X = Y = [0. Example 2. Example 2.1. . f (x. 1). however. Then f (x. y) ∈ X × Y . we need to verify that it (and hence its projections) is (are) measurable. Consider the function f (x.60 and ψn (y) = X sy (x)dµ(x). y) for every (x. Let f (x. Remark 2. j−1 j . The Monotone Convergence Theorem implies the result. y) = x2 − y 2 (x2 + y 2 )2 on (0.1. Parts (b) and (c) follow directly from (a) and we leave these as exercises. 1) × (0. Before we can integrate the function f in this example. X and Y f (x. the function f is the characteristic function of the diagonal of the square. . y) = 1 if x = y. That is. The assumption of σ–ﬁniteness is needed as the following example shows.

1) × (0. F. However. Then 1 0 0 1 f (x. 1)} since / 1 |f (x. if mk (E) = 0 then E is Lebesgue measurable. This new space is called the completion of (X. µ∗ ) is complete. The next Theorem says that as far as Fubini’s theorem is concerned. µ). What is needed here is the notion of the completion of a measure. Thus m2 = m1 ×m1 .61 with the µ = ν = Lebesgue measure. . 0 Let mk = Lebesgue measure in Rk and recall that mk is complete. for any set B ⊂ R. y)|dy ≥ 1/2x.3. The measure space (X. m1 × m1 is not complete since {x}×B.2. The problem here is that f ∈ L1 {(0. F. y)dxdy = −π/2. Then F ∗ is a σ–algebra and the function µ∗ deﬁned on F ∗ by µ∗ (E) = µ(A) is a measure. µ) is a measure space we let F ∗ = {E ⊂ X: ∃ A and B ∈ F. m∗ . the completion of the product Lebesgue measures. Then mn = (mr × mj )∗ . If (X. We leave the proof of the ﬁrst two Theorems as exercises. Let mn be the Lebesgue measure on Rn . That is. we need not worry about incomplete measure spaces. Theorem 2. A ⊂ E ⊂ B and µ(B\A) = 0}. y)dydx = π/2 but 1 0 0 1 f (x. has m1 ×m1 – measure zero. n = r + s. Theorem 2.

Then for almost every x ∈ X with respect to µ. For (ii) let Ω = {(x. with respect to µ. B. Theorems 2. ν) is complete. Now.e. suppose that f = χE where E ∈ A∗ . (ii) Let (X. µ).1. µ) be a measure space. with respect to µ and we have proved (i) for characteristic function. ˜ Then Ω ∈ (A × B)∗ and (µ × ν)(Ω) = 0. Let us assume (i) and (ii) for the moment. By deﬁnition there is an Ω ∈ A × B such ˜ ˜ that Ω ⊂ Ω and µ × ν(Ω) = 0.62 Theorem 2. A. fx = 0 a. By Theorem 2. y) : f (x. In particular. µ) and (Y. fx ∈ σ(B) for almost every x ∈ X.1 remains valid if µ×ν is replaced by (µ×ν)∗ except that the functions ϕ and ψ are deﬁned only almost everywhere relative to the measures µ and ν. details left to the reader.e. Proof. Then if f ∈ σ((A × B)∗ ) is nonnegative there is a g ∈ σ(A × B) such that f = g a. y) = 0}. We now extend this to simple functions and to nonnegative functions in the usual way.1 to g and the rest follows from (ii). ν) be two complete and σ–ﬁnite measure spaces. we see that Ωx ∈ B for almost every x ∈ X with respect . B. A similar statement holds with y replacing x.4. with respect to ν.e. A. ν) be two complete σ–ﬁnite measure spaces. (Y. respectively. Suppose f ∈ σ((A × B)∗ ) is such that f = 0 almost everywhere with respect to µ × ν. The proof of this theorem follows from the following two facts. By deﬁnition A ⊂ E ⊂ B with µ(A\B) = 0 and A and B ∈ A. Since Ωx ⊂ Ωx and the space (Y. For (i). (i) Let (X. B. ˜ ν(Ωx )dµ(x) = 0 X ˜ ˜ and so ν(Ωx ) = 0 for almost every x with respect to µ. If we set g = χA we have f = g a. It remains to prove (i) and (ii). apply Theorem 2. Suppose f ∈ σ(F ∗ ). with respect to µ × ν. Let (X.e. F. Then there is a g ∈ σ(F) such that f = g a.

Exercise 2. Exercise 2. µ) be a measure space.7) for all y > 0 and that 0 ∞ π sin(αx) dx = sign(α) x 2 1 − cos(αx) π dx = |α|.4. Let (X. F. Prove that f (x)p dµ ≤ Cp X X g(x)p dµ for any 0 < p < ∞ for which both integrals are ﬁnite where Cp is a constant depending on C and p. For any α ∈ R deﬁne sign(α) = Prove that y      − 1. 1.8) and 0 ∞ (2. µ).3. 0. Suppose f and g are two nonnegative functions satisfying the following inequality: There exists a constant C such that for all ε > 0 and λ > 0. Thus for almost every x ∈ X the projection function fx ∈ B and fx (y) = 0 almost everywhere with respect to µ. This completes the proof of (ii) and hence the theorem. 2 x 2 (2.9) . Let f be a nonnegative measurable function on (X. ∞ f (x) dµ(x) = p X 0 p λp−1 µ{x ∈ X : f (x) > λ}dλ. Prove that for any 0 < p < ∞. F.5.63 to the measure µ. Exercise 2. µ{x ∈ X : f (x) > 2λ. α>0 α=0 α<0 π 0 0 ≤ sign(α) 0 sin(αx) dx ≤ x sin(x) dx x (2. g(x) ≤ ελ} ≤ Cε2 µ{x ∈ X : f (x) > λ}.

Notice that with this deﬁnition the surface area ωn−1 of the sphere in Rn satisﬁes ωn−1 = nγn = 2π 2 /Γ( n ) where γn is the volume of the unit ball in Rn .10) for all α > 0. Let S n−1 = {x ∈ Rn : |x| = 1} and for any Borel set E ∈ S n−1 ˜ ˜ set E = {rθ : 0 < r < 1. that is. Prove that for any x ∈ Rn and any 0 < p < ∞ |ξ · x|p dσ(ξ) = |x|p S n−1 S n−1 |ξ1 |p dσ(ξ) where ξ · x = ξ1 x1 + · · · + ξn xn is the inner product in Rn .64 Exercise 2. e−t −α2 /4t √ e dt. to prove that 1 e−α = √ π ∞ 0 ∞ e−(1+s 0 2 )t dt. f (x) = f (|x|). then 2π 2 f (x)dx = Γ( n ) Rn 2 n ∞ ∞ rn−1 f (r)dr = nγn 0 0 rn−1 f (r)dr. if f is a radial function. Exercise 2.7. ∞ n f (x)dx = Rn 0 rn−1 S n−1 f (rθ)dσ(θ) dr. . Deﬁne the measure σ on S n−1 by σ(A) = n|E|. Prove that e−α = 2 π ∞ 0 cos αs ds 1 + s2 (2.8. the fact that 1 = 1 + s2 and Fubini’s theorem.6. Prove 2 (integration in polar coordinates) that for all nonnegative Borel functions f on Rn . Use (2. t (2. In particular.10). θ ∈ A}.11) Exercise 2.

by ﬁrst integrating over Lθ = {ξ ∈ S n−1 : e1 · ξ = cos θ}.65 Exercise 2.13) . 0) and for any ξ ∈ S n−1 deﬁne θ. π |ξ1 |p dσ(ξ) = ωn−1 S n−1 0 | cos θ|p (sin θ)n−2 dθ. (2. . π 2 2 0 (cos θ)2r−1 (sin θ)2s−1 dθ = Γ(s)Γ(r) Γ(r + s) ([Ru1.12) and the fact that for any r > 0 and s > 0. . 0. Let e1 = (1. that for any 1 ≤ p < ∞.9. 0 ≤ θ ≤ π such that e1 · ξ = cos θ. . p. Prove.12) Use (2. 194]) to prove that for any 1 ≤ p < ∞ |ξ1 | dσ(ξ) = S n−1 p 2π n−1 2 Γ( p+1 ) 2 Γ( n+p ) 2 (2. .

66 IV RANDOM VARABLES §1 Some Basics. (Ω. as we saw in Chapter I. (ii) lim FX (x) = 1. µ) will denote a probability space. We list some additional properties of this distribution function given the fact that µX (R) = 1 and since it arises from the random variable X. We will use the notation E(X) = Ω XdP. is a probability measure on (R. X : Ω → R is a random variable if X is measurable relative to F. or expectation of X. B(R)). the n µX (−∞. x] = P {X ≤ x} = FX (x) deﬁnes a distribution function. E(X) is called the expected value of X. then µ(A) = µX (A) = P {X ∈ A}. From this point on. A ∈ B(R). we see that FX (x−) = P (X < x). Y are equally distributed if µX = µX . This measure is called the distribution measure of the random variable X. x→∞ x→−∞ d (iii) With FX (x−) = lim FX (y). x]. . (i) FX (b) = FX (a) = µ(a. lim FX (x) = 0. If we take the set A = (−∞. y↑x (iv) P {X = x} = µX {x} = FX (x) − FX (x−). b]. F. This is often written as X = Y or X ∼ Y . for any x ∈ R. We recall from Problem – that if X is a random variable. Two random variables X.

1. there exists > 0 such that F (x + ) < ω0 . That is. P ) and a random variable X deﬁned on this space such that F = FX . Since F is right continuous. as we have just seen. F.1) Clearly then X is measurable and also P {X(ω) ≤ x} = F (x). proving that F = FX . 1). Proof. To prove (1.1) let ω0 ∈ {ω ∈ Ω : ω ≤ F (x)}. suppose ω0 > F (x). if and only if µX {x} = 0. As we saw in Chapter I. That is. F = Borel sets and P the Lebesgue measure.2. µ) with Ω = (0.2) . Hence X(ω) ≥ x + > x. Suppose we can show that for each x ∈ R. F. (1. Then G(X(ω))d(ω) = E(G(X)) = Ω R G(y)dµX (y).67 It follows from (iv) that F is continuous at x ∈ R if and only if x is not an atom of the measure µ. Also. B). Suppose X is a random variable and let G : R → R be Borel measurable. {ω ∈ Ω : X(ω) ≤ x} = {ω ∈ Ω : ω ≤ F (x)}. / Theorem 1. For each ω ∈ Ω. Suppose F is a distribution function. This shows that ω0 ∈ {ω ∈ Ω : X{ω} ≤ x} and concludes the proof. We claim this is the desired random variable. The following theorem completes this circle. On the other hand. Then x ∈ {y : F (y) < ω0 } and therefore X(ω0 ) ≤ x. We take (Ω. Suppose in addition that G is nonnegative or that E|G(X)| < ∞. ω0 ≤ F (x). every random variable gives rise to a distribution function. deﬁne X(ω) = sup{y : F (y) < ω}. Then there is a probability space (Ω. Theorem 1. (1. Thus {ω ∈ Ω : ω ≤ F (x)} ⊂ {ω ∈ / Ω : X(ω) ≤ x}. distribution functions are in a one-to-one correspondence with the probability measures in (R.

More generally let X1 . . suppose G is nonnegative. . . µn is then a Borel probability measure on (Rn . The case and the variance is deﬁned by var(X) = E|X − m|2 Note that by expanding this quantity we can write var(X) = EX 2 − 2(EX)2 + (EX)2 = EX 2 − (EX)2 . it holds for simple functions. . then E(G(X1 (ω). Now . E(G(X(ω)) = R G(x)dµX (x). . Apply the result for nonnegative G to G+ and G− and subtract the two using the fact that E(G(X)) < ∞. By linearity. . If E|G(X)| < ∞ write G(X(ω)) = G+ (X(ω)) − G− (X(ω)). X2 (ω). . .68 Proof. . Let B ⊂ B(R). . . Xn (ω))) = Rn G(x1 . . X2 . The quantity EX p . Xn ) ∈ A}. if G : Rn → R is Borel measurable nonnegative or E(G(X1 . . . x2 . . B). . . Xn be n-random variables and deﬁne their join distribution by µn (A) = P {(X1 . A ∈ B(Rn ). Xn )) < ∞. As before. . for 1 ≤ p < ∞ is called the p–th moment of the random variable X. By the Monotone Convergence Theorem. Then E(1B (X(ω)) = P∗ {X ∈ B} = µX (B) = B dµX = R 1B (y)dµX (y). . . X2 . . Let ϕn be a sequence of nonnegative simple functions converging pointwise up to G. . Thus the result holds for indicator functions. . xn ). X2 . . xn )dµn (x1 .

suppose f is a nonnegative borel measurable function in R with f (x)dx = 1 R where here and for the rest of these notes we will simply write dx in place of dm when m is the Lebesgue measure. R = Now. P ) then µ(A) = A f dP deﬁnes a new measure on (Ω. F) and g dµ = Ω Ω gf dP.69 If we take the function G(x) = xp then we can write the p–th moments in terms of the distribution as EX p = R xp dµX and with G(x) = (x = m)2 we can write the variance as var(X) = R (x − m)2 dµ x2 dµX − m2 . b] = a f (t)dt = F (b) − F (a). . A ∈ B(R) then µ is a probability measure and since b µ(a. Then x F (x) = −∞ f (t)dt is a distribution function. Hence if µ(A) = A f dt. (1. F.3) In particular. recall that if f is a nonnegative measurable function on (Ω.

70 for all interval (a. 1) x ∈ (0. We shall now give several classical examples of such distributions. Let λ > 0 and set f (x) = λe−λx . 1).1.2. f (x) = Then 1 0 x ∈ (0. ∞ EX k = λ 0 xk e−λx dx = k! λk . The exponential distribution of parameter λ. x≥1 If we take a random variable with this distribution we ﬁnd that the variance var(X) = 1 12 and that the mean m = 1 . 0 x≥0 else If X is a random variable with associated to this density. (1. The function f is called the density of the random variable associated with the distribution. The Uniform distribution on (0. Then by (1. b] we see that µ ∼ F (by the construction in Chapter I). Example 1.2.3) and Theorem 1. 1) /  0  F (x) = x   1 x≤0 0≤x≤1. E(g(X)) = R g(x)dµ(x) = R g(x)f (x)dx. Let X be a random variable with this distribution function. 2 Example 1.4) Distributions arising from such f s are called absolutely continuous distributions. we write X ∼ exp(λ).

Example 1.71 Example 1. . If we take σ > 0 and µ ∈ R. and set f (x) = 1 (2πσ 2 ) e −(x−µ)2 2σ 2 we get the normal distribution with mean µ and variance σ and write X ∼ N (µ. σ). Set x2 1 f (x) = √ e− 2 . To compute the variance let us recall ﬁrst that for any α > 0. ∞ Γ(α) = 0 tα−1 e−t dt. We write X ∼ N (0. 1 E(X) = √ 2π xe− R x2 2 dx = 0. 1) By symmetry.3. 2π The corresponding random variable is the normal distribution.3. The Cauchy distribution of parameter a. For this we have Ex = µ and var(X) = σ 2 . Set f (x) = 1 a π a2 + x2 We leave it to the reader to verify that if the random variable X has this distribution then E|X| = ∞. The normal distribution. We note that x e R 2 −x 2 2 ∞ dx = 2 u 2 e−u dx 0 0 √ √ √ √ 3 1 k 2 1 = 2 2Γ( ) = 2 1Γ( + 1) = Γ( ) = 2π 2 2 k 2 x e dx = 2 2 2 −x 2 2 √ ∞ 1 and hence var(X) = 1.

EX 2 = 12 · p = p and var(X) = p − p2 = p(1 − p). . 2.5.72 Example 1. P (X = 1) = p and P (X = 0) = 1 − p For this random variable we have EX = p · 1 + (1 − p) · 0 = p.” Here are some examples. The gamma distribution arises from f (x) = 1 α α−1 −λx e Γ(α) λ x x≥0 0. . 0 < p < 1.6. For this random variable. We write X ∼ Γ(α. . 1. . λ) when the random variable X has this density. Example 1. if X takes only two values one with probability p and the other with probability 1 − p. k! . x < 0. X is a Bernoulli random variable with parameter p. We say X has Poisson distribution of parameter λ if P {X = k} = e−λ λk k! k = 0.4. Random variables which take only discrete values are appropriately called “discrete random variables. ∞ EX = k=0 e−λ λk k = λe−λ k! ∞ ∞ k=1 λk−1 =λ (k − 1)! and Var(X) = EX − λ = 2 2 k=0 k 2 e−λ λk − λ2 = λ. Example 1.

F2 . . . P {X1 ∈ B1 . ∞ (1 − p)k = k=1 1 p and we leave it to the reader to verify that EN = and var(N ) = §2 Independence. .7. . . 2. Xn ∈ Bn } = P { ∩ (Xj ∈ Bj )} = j=1 j=1 n n P {Xj ∈ Bj }. . Fn of σ–algebras is said to be independent if whenever A1 ∈ Fj . . . for k = 1. (i) The collection F1 . . (iii) The collection of measurable subsets A1 . Deﬁnition 2. . n} we have     P (Aj ) = P {Aj }   j∈I j∈I . . . An in a σ–algebra F is independent if for any subset I ⊂ {1. A2 ∈ F2 . A2 . . . p2 P ∩n Aj j=1 = j=1 P (Aj ). . . For 0 < p < 1 deﬁne P {X = k} = p(1 − p)k−1 . . . (ii) A collection {Xj : 1 ≤ j ≤ n} of random variables is said to be (totally) independent if for any {Bj : 1 ≤ j ≤ n} of Borel sets in R. 2. By the geometric series. .73 Example 1.1. The random variable represents the number of independent trial needed to observe an event with probability p. . An ∈ Fn . then n 1 p 1−p . The geometric distribution of parameter p. . . X1 ∈ B2 .

. . Let A1 . Thus for this probability measure on (Rn . . denoted σ(X). (Recall that the sigma algebra generated by the random X. we say that the random variables are identically distributed and write this as i. 1A3 are independent. We note that (iii) is equivalent to asking that the random variables 1A1 . . µ (B1 × · · · × Bn ) = i=1 n µj (Bj ) and hence µn = µ1 × · · · × µn where the right hand side is the product measure constructed from µ1 . . . As we did earlier. . Then with B = B1 × · · · × Bn we see that n B ∈ B(Rn ). . For the other direction the reader / is asked to do Problem 2. 1An are independent. x2 . . A2 . for one direction we take Bj = {1} for j ∈ I and Bj = R for j ∈ I. . . . the corresponding n-dimensional distribution is n F (x) = j=1 FXj (xj ). Ac . .1. B). . . . .d. X2 . µn as in Chapter III. . is the sigma algebra generated by the sets X −1 {B} where B ranges over all Borel sets in R. . Xn } are independent and set µn (B) = P {X1 . . Xn } of independent random variables with the same distribution.2. .i. 1A2 . where x = (x1 . . .) Prove that X. xn )). 1A2 . Y are independent if and only if F1 . . Indeed. . . Xn ) ∈ B} as in §1.74 Whenever we have a sequence {X1 . Proof that Ac . . Problem 2. An be independent. Let X and Y be two random variable and set F1 = σ(X) and F2 = σ(Y ). Suppose {X1 . . . . . . F2 are independent. . . . . Ac and n 1 2 1A1 .

75

Deﬁnition 2.2. Suppose A ⊂ F. A is a π-system if it is closed under intersections: A, B ∈ A ⇒ A ∩ B ∈ A. The subcollection L ⊂ F is a λ-system if (i) Ω ∈ L, (ii) A, B ∈ L and A ⊂ B ⇒ B\A ∈ L and (iii) An ∈ L and An ↑ A ⇒ A ∈ L. Theorem 2.1. Suppose A is a π-system and L is a λ-system and A ⊂ L. Then σ(A) ⊂ L. Theorem 2.2. Let µ and ν be two probability measures on (Ω, F). Suppose they agree on the π-system A and that there is a sequence of sets An ∈ A with An ↑ Ω. Then ν = µ on σ(A) Theorem 2.3. Suppose A1 , A2 , . . . , An are independent and π-systems. Then σ(A2 ), σ(A2 ), . . . , σ(An ) are independent. Corollary 2.1. The random variables X1 , X2 , . . . , Xn are independent if and only if for all x = {x1 , . . . , xn }, xi ∈ (−∞, ∞].
n

F (x) =
j=1

FXj (xi ),

(2.1)

where F is the distribution function of the measure µn . Proof. We have already seen that if the random variables are independent then the distribution function F satisﬁes 2.1. For the other direction let x ∈ Rn and set Ai be the sets of the form {Xi ≤ xi }. Then {Xi ≤ xi } ∩ {Xi ≤ yi } = {Xi ≤ xi ∧ yi } ∈ Ai . Therefore the collection Ai is a π-system. σ(Ai ) = σ(X). Corollary 2.2. µn = µ1 × · · · × µn .

76

Corollary 2.3. Let X1 , . . . , Xn with Xi ≥ 0 or E|Xi | < ∞ be independent. Then  
n n

E
j=1

Xi  =
i=1

E(Xi ).

Proof. Applying Fubini’s Theorem with f (x1 , . . . xn ) = x1 · · · xn we have (x1 · · · xn )d(µ1 × · · · × µn ) =
Rn R

x1 dµ1 (a)· · ·
Rn

xn dµn (a).

It follows as in the proof of Corollary 1.3 that if X1 , . . . , Xn are independent and g ≥ 0 or if E|
n j=1

g(Xi )| < ∞, then
n n

E
i=1

g(Xi )

=
i=1

E(g(Xi )).

We warn the reader not to make any inferences in the opposite direction. It may happen that E(XY ) = (E(X)E(Y ) and yet X and Y may not be independent. Take the two random variables X and Y with joint distributions given by X\Y 1 0 −1 1 0 b 0 0 a c a −1 0 b 0

with 2a + 2b + c = 1, a, b, c > 0. Then XY = 0 and E(X)E(Y ) = 0. Also by symmetry, EX = EY = 0. However, the random variables are not independent. Why? Well, observe that P (X = 1, Y = 1) = 0 and that P (X = 1) = P (X = 1, Y = 1) = ab = 0. Deﬁnition 2.2. If F and G are two distribution functions we deﬁne their convolution by F ∗ G(z) =
R

F (z − y)dµ(y)

77

where µ is the probability measure associated with G. The right hand side is often also written as F (z − y)dG(y).
R

In this notes we will use both notations. Theorem 2.4. If X and Y are independent with X ∼ FX , Y ∼ GY , then X +Y ∼ F ∗ G. Proof. Let Let us ﬁx z ∈ R. Deﬁne h(x, y) = 1(x+y≤z) (x, y) Then FX+Y (z) = P {X + Y ≤ z} = E(h(X, Y )) =
R2

h(x, y)d(µX × µY )(x, y) h(x, y)dµX (x) dµY (y)
R R

= =
R R

1−∞,z−y) (x) dµX (x) dµY (y) µX (−∞, z − y)dµY (y) =
R R

=

F (z − y)dG(y).

Corollary 2.4. Suppose X has a density f and Y ∼ G, and X and Y are independent. Then X + Y has density h(x) =
R

f (x − y)dG(y).

If both X and Y have densities with g denoting the density of Y . Then h(x) = f (x − y)g(y)dy.

78

Proof. FX+Y (z) =
R z−y

F (z − y)dG(y) f (x)dxdG(y)
R −∞ z

= =
R −∞ z

f (u − y)dudG(y) f (u − y)dG(y)du
−∞ z R

= =
−∞

f (u − y)g(y)dy du,
R

which completes the proof. Problem 2.1. Let X ∼ Γ(α, λ) and Y ∼ Γ(β, λ). Prove that X +Y ∼ Γ(α+β, λ). §3 Construction of independent random variables. In the previous section we have given various properties of independent random variables. However, we have not yet discussed their existence. If we are given a ﬁnite sequence F1 , . . . Fn of distribution functions, it is easy to construct independent random variables with this distributions. To do this, let Ω = Rn and F = B(Rn ). Let P be the measure on this space such that
n

P ((a1 , b1 ] × · · · × (an , bn ]) =
j=1

(Fj (bj ) − Fj (aj )).

Deﬁne the random variables Xj : Ω → R by Xj (ω) = ωj , where ω = (ω1 , . . . , ωn ). Then for any xj ∈ R, P (Xj ≤ xj ) = P (R × · · · × (−∞, xj ] × R × · · · × R) = Fj (xj ). Thus Xj ∼ Fj . Clearly, these random variables are independent by Corollary 2.1. It is, however, extremely important to know that we can do this for inﬁnitely many distributions.

79

Theorem 3.1. Let {Fj } be a ﬁnite or inﬁnite sequence of distribution functions. Then there exists a probability space (Ω, F, P ) and a sequence of independent random variables Xj on this space with Xj ∼ Fj . Let N = {1, 2, . . . } and let RN be the space of inﬁnite sequences of real numbers. That is, RN = {ω = (ω1 , ω2 , . . . ) : ωi ∈ R}. Let B(RN ) be the σalgebra on RN generated by the ﬁnite dimensional sets. That is, sets of the form {ω ∈ RN : ωi ∈ Bi , 1 ≤ i ≤ n}, Bi ∈ B(R). Theorem 3.2 (Kolmogovov’s Extension Theorem). Suppose we are given probability measures µn on (Rn , B(Rn )) which are consistent. That is, µn+1 (a1 , b1 ] × · · · × (an , bn ] × R) = µn (a1 , b1 ] × · · · × (an , bn ]. Then there exists a probability measure P on (RN , B(RN )) such that P {ω : ωi ∈ (ai , bi ], 1 ≤ i ≤ n} = µn ((a1 , b1 ] × · · · × (an , bn ]).

The Means above µn are consistent. Now, deﬁne X j : RN → R by Xj (ω) = ωj . Then {Xj } are independent under the extension measure and Xj ∼ Fj . A diﬀerent way of constructing independent random variables is the following, at least Bernoulli random variables, is as follows. Consider Ω = (0, 1] and recall that for x ∈ (0, 1) we can write

x=
n=1

n 2n

where

n

is either 0 or 1. (This representation in actually unique except for x the

80

Problem 3.1. Deﬁne Xn (x) = variables is independent.

n.

Prove that the sequence {Xn } of random

Problem 3.2. Let {An } be a sequence of independent sets. Prove that
∞ ∞

P{
j=1

Aj } =
j=1

P {Aj }

and P{

Aj } = 1 −
j=1

(1 − P {Aj })

j=1

Problem 3.3. Let {X1 , . . . , Xn } be independent random variables with Xj ∼ Fj . Fin the distribution of the random variables max1≤j≤n Xj and min1≤j≤n Xj . Problem 3.4. Let {Xn } be independent random variables and {fn } be Borel measurable. Prove that the sequence of random variables {fn (Xn )} is independent. Problem 3.5. Suppose X and Y are independent random variables and that X + Y ∈ Lp (P ) for some 0 < p < ∞. Prove that both X and Y must also be in Lp (P ). Problem 3.6. The covariance of two random variables X and Y is deﬁned by Cov(X, Y ) = E[(X − EX)(Y − EY )] = E(XY ) − E(X)E(Y ). Prove that
n n

var(X1 + X2 + · · · + Xn ) =
j=1

var(Xj ) +
i,j=1,i=j

Cov(Xi , Xj )

and conclude that if the random variables are independent then
n

var(X1 + X2 + · · · + Xn ) =
j=1

var(Xj )

We can compute and ﬁnd that the probability of exactly j successes in n trials is P {Sn = j} = n P {any speciﬁc sequence of n trials with exactly j heads} j n j = p (1 − p)n−j j n j = p (1 − p)n−j j n! = pj (1 − p)n−j . the average number of successes in n trials. Then Sn n denotes the relative frequency of heads in n trials. Xi = 1 0 with probability p with probability 1 − p If we use 1 to denote success (=heads) and 0 to denote failure (=tails) and Sn . in the long run. We should expect. The precise statement of this is . Consider the sequence of independent random variables which arise from tossing a fair coin. or. for the number of successes in n -trials.81 V THE CLASSICAL LIMIT THEOREMS §1 Bernoulli Trials. Let us take p = 1/2 which represents a fair coin. j!(n − j)! This is called Bernoulli’s formula. for this to be 1/2. we can write n Sn = j=1 Xj .

1) .” or ‘Weak low of large numbers”). That is. That is. n 2 Let x ∈ [0.82 Theorem 1. and write this as a. write x= εn 2n n=1 ∞ with εn = 0 or 1.e. as n → ∞. First. Xn → Y a.s. m→∞ lim P {|Yn − Y | ≤ ε for all n ≥ m} = 1 (2. As n–increases. In addition. all numbers in [0.s. namely 1 . That is. recall that by Problem 3. iﬀ for any ε > 0. Xn (x) = εn and Sn is the partial sum of these random variables. If the convergence is in measure we shall say that Xn → X in probability.s. we have Sn (x) n → 1 2 a. the probability that the average number of successes deviates from 1 2 by more than any preassigned number tends to zero. Except for a set of Lebesgue measure zero. as n → ∞. if it converges a. The number x is a normal number if each digit occurs the “right” proportion of times. 1] are normal numbers. Xn → X if for all ε > 0. as deﬁned in the Chapter II. The rest of this chapter is devoted to proving various generalizations of these results.1 (Bernoulli’s “Law of averages.2 (Borel 1909). §2 L2 and Weak Laws.8 in Chapter II.s. P {|Xn − X| > ε} → 0 as n → ∞. That is. P {| Sn 1 − | > ε} → 0. We recall that Xn → X in Lp then Xn → X in probability and that there is a subsequence Xnk → X a. 1] and consider its dyadic representation.. 2 Theorem 1. to conform to the language of probability we shall say that a sequence of random variables Xn converges almost surely.

|Xn − X| ≤ ε.83 or m→∞ lim P {|Xn − X| > ε for all n ≥ m} = 0. for each ω0 ∈ Ω0 there exists an M (ω0 . if and only if |Xn − X| < ε. convergence. m m=1 However. ω ∈ AM (ω0 . (2. we see that Xn → X a.2) The proofs of these results are based on a convenient characterization of a.s. Thus ∞ Ω0 ⊂ m=1 Am (ε) .) Am (ε) ⊂ Am+1 (ε). Therefore. eventually almost surely. Set ∞ Am = {|Xn − X| ≤ ε} n=m = {|Xn − X| ≤ ε for all n ≥ m} so that Ac = {|Xn − X| > ε for some n ≥ m}.o.s. Set Am (ε) = ∞ {|Xn − X| ≤ ε} n=m (2. Now. ∞ ∞ {|Xn − X| > ε i.3) Now. Xn (ω0 ) → X(ω0 ).2) follow easily from this.o. ε).1) and (2.ε) . ε) such that for all n ≥ M (ω0 .s.3. since Xn → X a.} = = {|Xn − X| > ε} m=1 n=m ∞ Ac . m m→∞ (2. m Therefore.} = lim P {Ac } = 0. (2. Suppose there is a measurable set N with P (N ) = 0 such that for all ω ∈ Ω0 = Ω\N. if and only if P {|Xn − X| > ε i.

Set A= Then 1 P (A) = lim P (A( )) = 1 n→∞ n and therefore if ω0 ∈ A we have ω0 ∈ A(1/n) for all n.1 (L2 –weak law). m→∞ which proves that (2. Then Am (ε)} = 1. We need to verify that Sn E −µ n 2 → 0. Conversely. 2 Proof. suppose EXi Xj = EXi EXj . Assume that EXi = µ and that n 1 A( ).d with EXi = µ. Let ω0 ∈ A(ε). . where C is a constant.84 and therefore 1 = P (Ω0 ) = lim P {Am (ε)}. That is. + Xn ) = var(X1 ) + . + var(Xn ) and that var(cX) = c2 var(X) for any constant c. . ε) such that |Xm −X| ≤ ε Let ε = {1/n}.3) and A(ε) = P {A(ε)} = P { m=1 ∞ m=1 ∞ Am (ε). n n=1 ∞ var (Xi ) ≤ C for all i. Let Sn = i=1 Xi .1) holds for all ε > 0 and the Am (ε) are as in (2. Then Sn n → µ as n → ∞ in L2 (P ) and in probability.1) holds. Therefore |Xn (ω0 ) − X(ω0 )| < 1/n which is the same as Xn (ω0 ) → X(ω0 ) Theorem 2. . Let {Xj } be a sequence of uncorrelated random variables. suppose (2. . . Then in L2 and in probability. var (Xi ) < ∞.1. Then for ω0 ∈ A(ε) there exists m = m(ω0 . Corollary 2. Suppose Xi are i. We begin by recalling from Problem that if Xi are uncorrelated and E(Xi ) < Sn →µ n ∞ then var(X1 + .i.

85 Observe that E Sn n = µ and therefore. the result follows. These are called the Bernstein polynomials of degree n associated with f . Then there exists a sequence pn of polynomials such that pn → f uniformly on [0. . Proof. Since convergence in Lp implies convergence in probability for any 0 < p < ∞. Put n pn (x) = j=0 n j x (1 − x)n−j f (j/n) j recalling that n j = n! . 1].2 (The Weierstrass approximation Theorem). The assumption in Theorem Here is a standard application of the above weak–law. Without loss of generality we may assume that f (0) = f (1) = 0 for if this is not the case apply the result to g(x) = f (x) − f (0) − x(f (1) − f (0)). 2 E Sn −µ n = var = Sn n 1 var (Sn ) n2 n 1 var (Xi ) = 2 n i=1 ≤ Cn n2 and this last term goes to zero as n goes to inﬁnity. Let f be a continuous function on [0. This proves the L2 . 1]. Theorem 2. j!(n − j)! The functions pn (x) are clearly polynomials.

. i. the right hand side can be made smaller than 2ε by taking n large enough and independent of x. By Chebyshev’s’s inequality. Sn /n → x in probability. X2 . P Sn −x >δ n ≤ 1 Sn var 2 δ n 1 1 = 2 2 var(Sn ) δ n x(1 − x) = nδ 2 1 ≤ 4nδ 2 ∞ for all x ∈ [0.i. . j n j x (1 − x)n−j f (j/n) = pn (x) j as n → ∞. Thus |pn (x) − f (x)| = Ef Sn − f (x) n Sn − f (x) = E f n Sn ≤E f − f (x) n Sn = f − f (x) dP n {| Sn −x|<δ} n f {| Sn −x|≥δ n + < ε + 2M P Sn n − f (x) dP Sn −x ≥δ . There exists a δ > 0 such that |f (x) − f (y)| < ε when |x − y| < δ. . Set M = f 4 and let ε > 0. . n Now. P (Xi = 0) = 1 − x so that E(Xi ) = x and var(Xi ) = x(1 − x).86 Let X1 . The if Sn denotes their partial sums we have from the above calculation that P {Sn = j} = Thus E(f (Sn /n)) = j=0 n n j x (1 − x)n−j . 0 < x < 1. This proves the result. Also.d according to the distribution: P (Xi = 1) = x. 1] since x(1 − x) ≤ 1 .

3. with E|X1 | < ∞. Proof of Corollary. Before proving the theorem we have a Corollary 2. Let µ = EXi . and assume λP {|Xi | > λ} → 0 as λ → ∞.3) Xj and µn = E(X1 1(|X1 |≤n) ). Remark 2. Let Sn = n j=1 (1. Let Xi be i.3) is necessary to have a sequence of numbers an such that II (1971). First. Theorem 2.d. For this. Then probability.1.i. Hence. Then Sn − µn → 0 n in probability. . λP {|Xi | > λ} = λP {|X1 | > λ} → 0 as λ → ∞ and µn → E(X) = µ. by the Monotone Convergence Theorem and Chebyshev’s’s inequality. Let Xi be i.2.87 The assumption that the variances of the random variables are uniformly bounded can be considerably weaken. Vol → µ in and these two terms go to zero as n → ∞. we refer the reader to Feller. The condition (1.d.i. P Sn −µ n >ε =P ≤P Sn − µ + µn − µn > ε n Sn − µn > ε/2 + P {µn − µ| > ε/2} n Sn n Sn n − an → 0 in probability.

Suppose that n (i) n=1 P {|Xn. Proof. 1 ≤ k ≤ n.k = Xn. Xn. k=1 n Put an = k=1 EX n. are independent. .k } ≤ k=1 n = k=1 P {|Xn. 1 ≤ k ≤ n.k → 0 as n → ∞.k .88 Lemma 2. .k | > bn } and this last term goes to zero by (i).2 + . n = 1.k | > bn ) → 0. . Let Xn.k and set Sn = Xn. n ˜ P {Sn = Sn } ≤ P k=1 n {X n. bn → ∞ as n → ∞ and deﬁne the truncated random variables by X n. + X n.1 ( triangular arrays). . as n → ∞.k = Xn. Let bn > 0.1 + Xn. 2. Then Sn − an →0 bn in probability. n 2 1 (ii) 2 bn EX n. be a triangular array of random variables and assume that for each n.1 + . Sn = S n bn |S n − an | >ε . .k } P {X n. Sn = S n . Let S n = X n. bn ≤ P {Sn = S n } + P However.k = Xn.k |≤bn ) .k 1|Xn. Then P |Sn − an | >ε bn =P Sn − an > ε.n . . . .n . + Xn.k .

1 . we have P |S n − an | >ε bn ≤ 1 ε 2 b2 n E|S n − an |2 n 1 = 2 2 ε bn ≤ and this goes to zero by (ii). We claim that as n → ∞. For (i) we have n P {|Xn.k = Xk and bn = n. We apply the Lemma with Xn. We ﬁrst need to check that this sequence satisﬁes (i) and (ii). k=1 which goes to zero as n → ∞ by our assumption.1 = k=1 2 1 2 EX n. 1 n For this.k | > n} = nP {|X1 | > n}.k 2 Proof of Theorem 2.i. 2 E|X n. let g(λ) = λP {|X1 | > λ}.1 | ∞ n =2 0 λP {X n.d we have 1 n2 n EX n. ∞ EY p = p 0 λp−1 P {Y > λ}dλ Thus. n Let us now recall that by Problem 2.k ) k=1 n ε 2 b2 n k=1 EX n. 0 .1 | > λ}dλ. For (ii) we see that that since the random variables are i.1 | > λ}dλ = 2 0 P {|Xn. 1 var(X n.3 in Chapter III. n λP {|X1 | > λ}dλ → 0. for any nonnegative random variable Y and any 0 < p < ∞.3.89 Since an = ES n .

The last quantity goes to ε as n → ∞ and this proves the claim.} = limAn = m=1 n=m An and {An . i. Before we stay our Borel–Cantelli lemmas for independent events. . Fix k0 λ>0 so large that g(λ) < ε for all λ > k0 . If An ⊂ Ω. Therefore 1 n n λP {|x1 | > λ} < 0 M +ε n n − k0 n . we recall a few already proven facts. Also recall Corollary 2. eventually} = limAn = ∞ ∞ An m=1 n=m Notice that lim1An = 1{limAn } and lim1An (ω) = 1{limAn } It follows from Fatou’s Lemma P (limAn ) ≤ limP {An } and that limP {An } ≤ P {limAn }. then ∞ ∞ {An .90 Then 0 ≤ g(λ) ≤ λ and g(λ) → 0. Set M = sup |g(λ)| < ∞ and let ε > 0.o.2 of Chapter II. §3 Borel–Cantelli Lemmas. Then n n λP {|X1 | > λ}dx = M + 0 k0 λP {|x1 | > λ}dλ < M + ε(n − k0 ).

§4 Applications of the Borel–Cantelli Lemmas. §3. i. Example 3.} = 1 and completes the proof.o. In Chapter II.o. Let Ω = (0.o. 1/n). By independence.1 (The second Borel–Cantelli Lemma).} = 1. But. i. Clearly then P {∅} = 0. at least not in general.1.o. In the next section we will have several more applications of this and of . is it true that P {An . Let {An } be a sequence of independent sets with the property that P (An ) = ∞ Then P {An i. Let Fix N . N N P n=m Ac n = n=m N P {Ac } n {1 − P {An }} n=m N = ≤ n=m e−P {An } −{ N P n=m = exp P {An }} and this quantity converges to 0 as N → ∞ Therefore. Question: Is it possible to have a converse ? That is. If n=1 p(An ) < ∞ then P {An . P {An i.} = 0.o. 1) with the Lebesgue measure on its Borel sets. P (An ) = ∞. Theorem 3. ∞ P n=m An =1 which implies that P {An i.} = Proof. Let an = 1/n and set An = (0.91 ∞ First Borel–Cantelli Lemma.} = 0 implies that ∞ n=1 P {An } = ∞? The answer is no. we had several applications of the First Borel–Cantelli Lemma. We use the elementary inequality (1 − x) ≤ e−x valid for 0 ≤ x ≤ 1.

s.j.l≤n E(Xi Xj Xk Xl ) Since the random variables have zero expectation and they are independent. Then Sn →µ n a. Thus.o. P {|Sn | > nε} ≤ and therefore. Before that.j.l≤n = 1≤i. 4 2 E(Sn ) = nE(X1 ) + 3n(n − 1)(EX1 )2 ≤ Cn2 .k.92 the Second Borel–Cantelli. Then n 4 E(Sn ) = E( i=1 Xi )4 Xi Xj Xk Xl =E 1≤i. That is.1.d. we give a simple application to a fairly weak version of the strong law of large numbers. Proof. the only terms in this sum which are not zero are those where all the indices are equal 4 and those where to pair of indices are equal. with EX1 = µ and EX1 = C∞.k. There are n of the ﬁrst type and 3n(n − 1) of the second type. terms of the form EXj and 2 2 2 EXi Xj = (EXi )2 . ∞ C Cn2 = 2 4 4 ε4 n n ε P {| n=1 Sn | > ε} < ∞ n and the First Borel–Cantelli gives P Sn > ε i. Let {Xi } be i.} = 0 n .i. By Chebyshev’s inequality with p = 4. By considering Xi = Xi − µ we may assume that µ = 0. 4 Theorem 4.

93

which proves the result. The following is an applications of the Second Borel–Cantelli Lemma. Theorem 4.2. If X1 , X2 , . . . , Xn are i.i.d. with E|X1 | = ∞. Then P {|Xn | ≥ n i.o.} = 1 and P {lim Sn exists ∈ (−∞, ∞)} = 0. n Thus E|X1 | < ∞ is necessary for the strong law of large numbers. It is also suﬃcient. Proof. We ﬁrst note that
∞ ∞

P {|X1 | ≥ n} ≤ E|X1 | ≤ 1 +
n=1 n=1

P {|X1 | > n}

which follows from the fact that

E|X1 | =
0

P {|X1 | > X}dx

and
∞ n+1 ∞

P {|X1 | > x}dx ≤
n=0 n 0

P {|X1 | > X}dx

≤1+
n=1

P {|X1 | > n}

Thus,

P {|Xn | ≥ n} = ∞
n=1

and therefore by the Second Borel–Cantelli Lemma, P {|Xn | > n i.o.} = 1. Next, set A=
n→∞

lim

Sn exits ∈ (−∞, ∞) n

94

Clearly for ω ∈ A,
n→∞

lim

Sn (ω) Sn+1 − (ω) = 0 n n+1 Sn (ω) = 0. n(n + 1)

and
n→∞

lim

Hence there is an N such that for all n > N , lim 1 Sn (ω) < . n(n + 1) 2

n→∞

Thus for ω ∈ A ∩ {ω: |Xn | ≥ n i.o.}, 1 Sn (ω) Xn+1 (ω) > , − n(n + 1) n+1 2 inﬁnitely often. However, since Sn+1 Sn Xn+1 Sn − = − n n+1 n(n + 1) n + 1 and the left hand side goes to zero as observe above, we see that A ∩ {ω: |Xn | ≥ n i.o.} = ∅ and since P {|Xn | > 1 i.o.} = 1 we conclude that P {A} = 0, which completes the proof. The following results is stronger than the Second–Borel Cantelli but it follows from it. Theorem 4.3. Suppose the sequence of events Aj are pairwise independent and

P (Aj ) = ∞. Then
j=1 n j=1 n j=1

n→∞

lim

1Aj

P (Aj )

= 1. a.s.

In particular,
n n→∞

lim

1Aj (ω) = ∞ a.s
j=1

95

which means that P {An i.o.} = 1.
n

Proof. Let Xj = 1Aj and consider their partial sums, Sn =
j=1

Xj . Since these

random variables are pairwise independent we have as before, var(Sn ) = var(X1 )+ . . .+ var(Xn ). Also, var(Xj ) = E(Xj −EXj |2 ≤ E(Xj )2 = E(Xj ) = P {Aj }. Thus var(Sn ) ≤ ESn . Let ε > 0. P Sn −1 >ε ESn =P ≤ = Sn − ESn | > εESn 1 ε2 (ESn )2 1 ε2 ES
n

var(Sn )

and this last goes to ∞ as n → ∞. From this we conclude that probability. However, we have claimed a.s. Let nk = inf{n ≥ 1: ESn ≥ k 2 }. and set Tk = Snk . Since EXn ≤ 1 for all n we see that k 2 ≤ ETk ≤ E(Tk−1 ) + 1 ≤ k 2 + 1 for all k. Replacing n with nk in the above argument for Sn we get P {|Tk − ETk | > εETk } ≤ 1 ε2 ET
k

Sn → 1 in ESn

1 ≤ 2 2 ε k Thus

P
k=1

Tk −1 >δ ETk

<∞

proving the result. ETk for every ω ∈ Ω0 . . Strong Law of Large Numbers. clearly we also have Tk (ω) Sn (ω) Tk+1 (ω) ETk+1 ETk · ≤ ≤ · ETk+1 ETk ESn ETk+1 ETk and since k 2 ≤ ETk ≤ ETk+1 ≤ (k + 1)2 + 1 we see that 1 ≤ ETk ETk+1 ETk+1 ETk Tk − 1 > ε i. §5.s. Deﬁnition 5. Let {Xn } be a sequence of random variables. ) and I = n≥1 Fn . n→∞ ETk+1 n→∞ ETk lim Now. a. Tk → 1. Fn is often called the “future” σ–algebra and I the remote (or tail) σ–algebra. Then Tk (ω) Tk+1 (ω) Sn (ω) ≤ ≤ . .96 and the ﬁrst Borel–Cantelli gives P That is. Convergence of Random Series. ETk Let Ω0 with P (Ω0 ) = 1 be such that Tk (ω) → 1. Deﬁne the σ– algebras Fn = σ(Xn . Let n be any integer with nk ≤ n < nk+1 . . . ETk+1 E(Sn ) ETk We will be done if we can show that Tk (ω) Tk+1 (ω) → 1 and lim = 1.1. Xn+1 .o ETk =0 and that ETk+1 ETk → 1 and similarly 1 ≥ ETk ETk+1 and that → 1.

. . . λ2 The following results is stronger and more useful as we shall see soon.1 (Kolmogorov 0 − 1 Law). . Let An be independent. . Proof. (ii) If Sn = X1 +. . then P (A) = 0 or 1. . if cn → ∞.o. . . .} ∈ I. {limn→∞ Sn exists} ∈ I and {limSn /cn > λ} ∈ I. Since A ∈ I implies A ∈ σ(X1 . ). . . . . . Xn ) and B ∈ I.m) )). If X1 . . Thus if. A ∈ σ(X1 . . . {limSn > 0} ∈ I Theorem 5. . Xm ) for σ(X1 . .3. . and if we take Xn = 1An we see that Then {An i. then A and B are independent.} ∈ I. Chapter IV. X2 . Xn ) is indepenn some n and m and so A∩B ∈ σ(X1 . . Xn ) and B ∈ σ(Xn+1 .97 Example 5. (i) If Bn ∈ B(R). . . . . . . . Then P {An i. Corollary. X2 .1. Since they are both π–systems (clearly n if A. B ∈ n σ(X1 .o. that P {|Sn | > λ} ≤ 1 var(Sn ). dent of I. Recall that Chebyshev’s inequality gives. We shall show that A is “independent of itself” and hence P (A) = P (A ∩ A) = P (A)P (A) which implies that P (A) = 0 or 1. Xn ) is independent of I. . First. Or next task is to investigate when the above probabilities are indeed one. Xmax(n. ) then A and B are independent. .+Xn . Xn ) then A ∈ σ(X1 . .} = 0 or 1. . we are done. . Thus σ(X1 . since X1 . Xn ) and B ∈ σ(X1 . are independent and A ∈ I. . However. then {Xn ∈ Bn i. . . by Theorem 2. In the same way. . . if X − n are independent then P {lim Sn exists} = 0 or1. . . . . then clearly. are independent if A ∈ σ(X1 .o. for mean zero random variables which are independent. . .

we have E(Sk 1Ak (Sn − Sk )) = 0 and therefore the second term in (5.1) Sk 1Ak ∈ σ(X1 . Set Ak = {ω ⊂ Ω: |Sk (ω)| ≥ λ. . EXn = 0 and var(Xn ) < ∞ for all n.2 ( Kolmogorov’s inequality). . Sn ) and hence they are independent. Since E(Sn − Sk ) = 0. |Sj (ω)| < λ for all j < k} Note that these sets are disjoint and n 2 ESn ≥ k=1 n Ak 2 Sk + 2Sk Sn − 2Sk + (Sn − Sk )2 dP k=1 n Ak n 2 Sk dP + 2 k=1 Ak k=1 Ω 2 Sn dP = = ≥ Now. .1) is zero and we see that n ES n ≥ k=1 Ak ∞ 2 2 |Sk |dp ≥ λ2 k=1 P (Ak ) = λ2 P ( max |Sk | > λ 1≤k≤n which proves the theorem.98 Theorem 5. Sk 1Ak (Sn − Sk )dP (5. λ Proof. . . Suppose {Xn } be independent. . Then for any λ > 0. Xk ) and Sn − Sk ∈ σ(Xk+1 . P { max |Sk | ≥ λ} ≤ 1≤k≤n 1 E|Sn |2 λ2 1 = 2 var(Sn ). .

Example 5.s. for N > M we have P max |Sn − SM | > ε ≤ 1 var(SN − SM ) ε2 N M ≤n≤N 1 = 2 ε var(Xn ).4 (Kolmogorov’s Three Series Theorem). By Theorem 5. Then for every t. n=M +1 ∞ 1 ε(Xn ) and this last Letting N → ∞ gives P { max |Sn − SM | > ε} ≤ 2 m≥M ε n=M +1 quantity goes to zero as M → ∞ since the sum converges. ∞ If Xj are independent.s.) Theorem 5.s.m≥M then P {ΛM > 2ε} ≤ P { max |Sm − SM | > ε} → 0 m≤M as M → ∞ and hence ΛM → 0 a. N (0. X2 . Let X1 . Thus for almost every ω. n=1 Proof. as M → ∞. EXj = 0 and n=1 var(Xn ) < ∞. be i.i. 1).d. {Sm (ω)} is a Cauchy sequence and hence it converges. . Let A > 0 and set Yj = Xj 1(|Xj |≤A) . . then Xn converges a.2. Thus if ΛM = sup |Sm − Sn | n.s. . ∞ Bt (ω) = n=1 Xn sin(nπt) n converges a. Then n=1 Xn converges a. if and only if the following three conditions hold: ∞ (i) n=1 P (|Xn | > A) < ∞.3.2.99 ∞ Theorem 5. Let {Xj } be inde∞ pendent random variables. . (This is a series representation of Brownian motion.

e. ∞ (Yn −µn ) con- verges a. we need Lemma 5. Let bn = j=1 xj . (iii) n=1 var(Yn ) < ∞.o.} = 0. Then real numbers converging up to ∞ and suppose an n=1 1 an n xm → 0. For the proof of the strong law of large numbers. so does Xn . However. Then bn → b∞ . By (iii) and Theorem 5.3.s. b0 = 0. Let µn = EYn . P (Xn = Yn i. We will prove the necessity later as an application of the central limit theorem. Suppose that an is a sequence of positive ∞ xn converges. Therefore. Set a0 = 0. (i) is equivalent to n=1 P (Xn = Yn ) < ∞ and by the Borel–Cantelli Lemma. Assume (i)–(iii). by assumption. This and (ii) show that ∞ n=1 Yn converges a.1 (Kronecker’s Lemma). Proof.100 ∞ (ii) n=1 ∞ EYn converges. m=1 n Proof. P (Xn = Yn eventually} = 1. Then aj . Thus if ∞ n=1 ∞ n=1 Yn converges.

. recall a bn → b∞ .101 xn = an (bn − bn−1 ). |bj − b∞ | < ε. . Since 1 an n−1 (aj+1 − aj ) = 1 j=0 . n = 1. We claim that 1 an n−1 bj (aj+1 − aj ) → b∞ . precede by induction observing ﬁrst that n n−1 aj (bj − bj−1 ) = j=1 j=1 aj (bj − bj−1 ) + an (bn − bn−1 ) n−2 = bn−1 an−1 − b0 a0 − j=0 n−2 bj (aj+1 − aj ) + an bn − an bn−1 = an bn − b0 a0 − j=0 n−1 bj (aj+1 − aj ) − bn−1 (an − an−1 ) = an bn − b0 a0 − j=0 bj (aj+1 − aj ) Now. j=0 given ε > 0 ∃ N such that for all j > N . To see this. Since bn → b∞ . 2. and 1 an 1 xj = an j=1 = n n aj (bj − bj−1 ) j=1  1  bj (aj+1 − aj ) bn an − b0 a0 − an j=0 n−1 n−1  1 = bn − an bj (aj+1 − aj ) j=0 The last equality is by summation by parts. .

. P {Xk = Yk i.d. Let Yk = Xk 1(|Xk |≤k) Then ∞ P {Xk = Yk } = n=1 P (|Xk | > k} ∞ ≤ 0 P (|X1 | > λ}dλ = E|X| < ∞. E|X1 | < Sn → µ a.} = 0 or put in other words. an Letting ﬁrst n → ∞ and then ε → 0 completes the proof. Suppose {Xj } are i.o.5 (The strong law of large numbers). Therefore by the First Borel–Cantelli Lemma.102 Jensen’s inequality gives 1 | an n−1 1 bj (aj+1 − aj ) − b∞ | ≤ an j=0 ≤ 1 an n−1 (b∞ − bj )(aj+1 − aj ) j=1 N |(b∞ − bj )(aj+1 − aj )| j=1 n−1 1 + an 1 ≤ an ≤ |b∞ − bj ||aj+1 − aj | j=N +1 1 |bN − bj ||aj+1 − aj | + ε an j=1 N n |aj+1 − aj | j=N +1 M + ε. Then n Proof. Thus if we set Tn = Y1 + .i. It . ∞ and set EX1 = µ.. Theorem 5.s. P {Xk = Yk eventually} = 1. + Yn . .

103 suﬃces to prove ∞ Tn → µ a.s. k=1 (5. By Theorem 5.2) . and the Kronecker’s Lemma gives that 1 n which is the same as 1 n or Tn 1 − n n n n Zk → 0 a. Then E(Zk ) = 0 and n var(Zn ) ≤ k2 = k=1 ∞ ∞ 2 E(Yk ) k2 k=1 k=1 ∞ 1 k2 1 k2 ∞ 2λP {|Yk | > λ}dλ 0 k = k=1 2λP {|X1 | > λ}dλ 0 ∞ ∞ =2 0 ∞ k=1 λ 1{λ≤k} (λ)P {|X1 | > λ}dλ k2 1 k2 P {|X1 | > λ}dλ.s. where we used the fact that k>λ 1 C ≤ 2 k λ ∞ for some constant C which follows from the integral test.s.s.s. Now set Zk = Yk − EYk .3. k=1 n EYk → 0 a. k=1 We would be done if we can show that 1 n n EYk → µ. k=1 Zk k converges a. =2 0 λ k>λ ≤ CE|X1 |. k=1 (Yk − EYk ) → 0 a.

a1 > 0 var(Xn /an ) = σ 1 1 2 + a1 n=2 n(log n)1+2ε ∞ 1 < ∞. n=1 . |EYk − µ| < ε. Proof. EXi = 0 and EX1 ≤ σ 2 < ∞. lim Sn 2σ 2 n log log n = 1. k=1 Let n → ∞ to complete the proof.i.s. there exists an N such that for all k > N .s.s. a. Let X1 . . That is.. §6. Variants of the Strong Law of Large Numbers. n→∞ n1/2 (log n)1/2+ε lim a. .1. Then for any ε ≥ 0 Sn = 0. The question we address now is: Can we have large numbers we have n a better rate of convergence? The answer is yes under the right assumptions and we begin with 2 Theorem 6. With this N ﬁxed we have for all n ≥ N . be i.d. Let us assume E(Xi ) = 0 then under the assumptions of the strong law of Sn → 0 a.104 We know EYk → µ as k → ∞. We will show later that in fact. . n ≥ 2. This last is the celebrated law of the iterated logarithm of Kinchine. Set an = ∞ √ n(log n) 2 +ε . 1 n n k=1 1 EYk − µ = n ≤ 1 n 1 n n (EYk − µ) k=1 N E|Yk − µ| + k=1 N 1 n n E|Yk − µ| k=N ≤ E|Yk − µ| + ε. X2 .

and hence a an n=1 n ∞ n Xk → 0 a. Then Sn = 0..d. that Tn . eventually} = 1 Next.o. Proof.i. To see this. P {Yk = Xk i. as above. a. Xj i. observe that P {|Xk |p > k} k=1 It is enough to prove. k=1 What if E|X1 |2 = ∞ But E|X1 |p < ∞ for some 1 < p < 2? For this we have Theorem 6.s.s. n→∞ n1/p lim a.} = 0 which is the same as P {Yk = Xk . Let Yk = Xk 1(|Xn |≤k1/p ) and set Tn = k=1 n Yk .s. n1/p →0 ∞ P {Yk = Xk } = ≤ E(|X1 |p ) < ∞ and therefore by the ﬁrst Borel–Cantelli Lemma. estimating by the integral we have 1 k>λp ∞ k 2/p ≤C λp dx x2/p ∞ λp = 1 x2−2/p (1 − 2/p) p−2 =λ .s.2 (Marcinkiewidz and Zygmund). EX1 = 0 and E|X1 |p < ∞ for some 1 < p < 2.105 1 Xn Then converges a.

a. k=1 n µk → 0.k1/p ) (λ)λP {|X1 | > λ}dλ 1 k>λp =2 0 ∞ λP {|X1 | > λ} k 2/p dλ ≤2 0 λp−1 P {|X1 | > λ}dλ = Cp E|X1 |p < ∞. we will be done. k=1 Since X ∈ Lp . Observe that 0 = n1/p k=1 E(X1 ) = E(X1(|X|≥k1/p ) ) + µk so that |µk | ≤ |E(X1(|X|≥k1/p ) | and therefore 1 n1/p n µk ≤ k=1 1 n1/p n ∞ P {|X1 | > λ}dλ 1/p k=1 k n 1 ≤ pn1/p = 1 1 ∞ 1 pn1/p k=1 n k 1−1/p k 1−1/p pλp−1 P {|X1 | > λ}dλ k1/p E{|X1 |p . and Kronecker implies. with µk = E(Yk ).106 and hence ∞ var(Yk /k k=1 1/p )≤ 2 EYk k 2/p k=1 ∞ ∞ =2 k=1 ∞ 1 k 2/p 1 0 0 ∞ λP {|Yk | > λ}dλ k1/p =2 =2 k 2/p k=1 ∞ λP {|X1 | > λ}dλ ∞ 1 0 k 2/p k=1 ∞ 1(0.s. n n 1 k 1−1/p ≤C 1 x1/p−1 dx ≤ Cn1/p . given ε > 0 there is an N such that E(|X1 |p |X1 | > k 1/p ) < ε if k > N . k=1 The Theorem follows from these. Thus. |X1 | > k 1/p }. . that 1 n1/p If we are bale to show that 1 n (Yk − µk ) → 0. Also.

. lim and the result is proved. Two Applications. Then Tn = is the time the nth lightbulb burns out. . and 0 < Xi < ∞. We begin with an example from Renewal Theory. the maximum of Xj and M . n → 1/µ. n→∞ n n + M However.i.s. a. Xi could be the lifetime of ith lightbulb in a room with inﬁnitely many lightbulbs. Let Nt = sup{n: Tn ≤ t} which in this example is the number of lightbulbs that have burnt out by time t. and set EX1 = µ which may or may not be ﬁnite. a. . For example. Then Xi − M are i. . since Xi ≥ Xi n we have M Sn Sn M lim ≥ lim = EX1 . a. a. .i. Let M > 0 and Xj = Xi ∧ M . . note that if the mean lifetime is large then the number of lightbulbs burnt by time t is very small.i. .d. be i. Then Sn = ∞. E(X1 )+ ↑ E(X1 ) = ∞.d.3. as t → ∞ where this is 0 if µ = ∞. with EXj = ∞ and EXj < ∞. Now. and E|Xi | < ∞. . by the monotone convergence theorem.s. X2 . (Here we have used the fact that EXj < ∞. Therefore. Also.i. + Xn and think of Tn as the time of the nth occurrence of an event.s. . §7. + Xn we see that n → EX1 a. Suppose X1 . Then Nt t Sn = ∞. hence M M M EX1 = E(X1 )+ − E(X1 )− ↑ +∞. .s.1. . n→∞ n lim M M Proof. Theorem 7.d.107 + − Theorem 6. E(N (t))/t → 1/µ Continuing with our lightbulbs example. Let Tn = X1 + . a.s. Let X1 .d.) Setting SM M M M M M M Sn = X1 = X1 + .s. be are i. X2 . Let Xj be i.

We know Tn → µ a.s. Thus.108 Proof. what we have proved here is that given x ∈ R there is a set . TNt (ω) (ω) Nt (ω) + 1 → µ and → 1. Of course. ·) = Sn n n ρk k=1 and the Strong Law of Large numbers shows that for every x ∈ R. . Xk (ω) ≤ x Xk (ω) > x Notice that in fact the ρk are independent Bernoullians with p = F (x) and Eρk = F (x). ω) is the distribution with jump of size 1 n n at the points ak . Now. What kind of a random variable is it? Deﬁne ρk (ω) = 1(Xk ≤x) (ω) = 1. Fn (x. ω) → F (x) a. That is. since Tn < ∞ for all n. ω) = n we see that in fact Fn (x. X2 . By the law of large numbers there is an Ω0 ⊂ Ω such that P (Ω0 ) = 1 and such that for ω ∈ Ω0 . ·) is a random variable. n n=1 This is the observed frequency of values ≤ x. Then Fn (x. ω) = 1(Xk ≤x) (ω). 0. ﬁx ω ∈ Ω and set ak = Xk (ω). On the other hand.s. let us ﬁx x.i. Then Fn (x. we have Nt ↑ ∞ a. with distribution F . Nt Nt Nt + 1 Nt Now. be i.s. For x ∈ R set 1 Fn (x. T (Nt ) t T (Nt + 1) Nt + 1 · ≤ ≤ . and we are done. the exceptional set may depend on x.s. This is called the imperial distribution based on n samples of F . . Let X1 . Note that for every ω ∈ Ω. Writing 1 Fn (x. .d. Nt (ω) Nt (ω) Thus t/Nt (ω) → µ a. Nt (ω) is and integer and n T (Nt ) ≤ t < T (Nt + 1).

109

Nx ⊂ Ω such that P (Nx ) = 0 and such that Fn (x, ω) → F (x) for ω ∈ Nx . If we set N = ∪x∈Q Nx where we use Q to denote the rational numbers, then this set also has probability zero and oﬀ this set we have Fn (x, ω) → F (x) for all ω ∈ N and all x ∈ Q. This and the fact that the discontinuities of distribution functions are at most countable turns out to be enough to prove Theorem 7.2 ( Glivenko–Cantelli Theorem). Let Dn (ω) = sup |Fn (x, ω) − F (x)|.
x∈R

Then Dn → 0 a.s.

110

VI THE CENTRAL LIMIT THEOREM

§1 Convergence in Distribution. If Xn “tends” to a limit, what can you say about the sequence Fn of d.f. or the sequence {µn } of measures? Example 1.1. Suppose X has distribution F and deﬁne the sequence of random variables Xn = X +1/n. Clearly, Xn → X a.s. and in several other ways. Fn (x) = P (Xn ≤ x) = P (X ≤ x − 1/n) = F (x − 1/n). Therefore,

n→∞

lim Fn (x) = F (x−).

Hence we do not have convergence of Fn to F . Even worse, set Xn = X + Cn 1 even n . Thenthe limit may not even exist. where Cn = −1/n odd Deﬁnition 1.1. The sequence {Fn } of d.f. converges weakly to the d.f. F if Fn (x) → F (x) for every point of continuity of F . We write Fn ⇒ F . In all our discussions we assume F is a d.f. but it could just as well be a (sub. d.f.). The sequence of random variables Xn converges weakly to X if their distributions functions Fn (x) = P (Xn ≤ x) converge weakly to F (x) = P (X ≤ x). We will also use Xn ⇒ X. Example 1.2. (1) The Glivenko–Cantelli Theorem

111

(2) Xi = i.i.d. ±1, probability 1/2. If Sn = X1 + . . . + Xn then Fn (y) = P S √n ≤ y n 1 →√ 2π
Sn n y −∞

e−x

2

/2

dx.

This last example can be written as

⇒ N (0, 1) and is called the the De

Moivre–Laplace Central limit Theorem. Our goal in this chapter is to obtain a very general version of this result. We begin with a detailed study of convergence in distribution. Theorem 1.1 (Skorhod’s Theorem). IF Fn ⇒ F , then there exists random variables Yn , Y with Yn → Y a.s. and Yn ∼ Fn , Y ∼ F . Proof. We construct the random variables on the canonical space. That is, let Ω = (0, 1), F the Borel sets and P the Lebesgue measure. As in Chapter IV, Theorem 1.1, Yn (ω) = inf{x: ω ≤ Fn (x)}, Y (ω) = inf{x: ω ≤ F (x)}. are random variables satisfying Yn ∼ Fn and Y ∼ F
−1 The idea is that if Fn → F then Fn → F −1 , but of course, the problem is

that this does not happen for every point and that the random variables are not exactly inverses of the distribution functions. Thus, we need to proceed with some care. In fact, what we shall show is that Yn (ω) → Y (ω) except for a countable set. Let 0 < ω < 1. Given ε > 0 chose and x for which Y (ω) − ε < x < Y (ω) and F (x−) = F (x), (that is for which F is continuous at x). Then by deﬁnition F (x) < ω. Since Fn (x) → F (x) we have that for all n > N , Fn (x) < ω. Hence, again by deﬁnition, Y (ω) − ε < x < Yn (ω), for all such n. Therefore, lim Yn (ω) ≥ Y (ω). It remains to show that lim Yn (x) ≤ Y (x).

Now. ω ≤ Fn (y) and hence Yn (ω) ≤ y < Y (ω ) + ε which implies lim Yn (ω) ≤ Y (ω ). Corollary 1.1 and the results in Chapter II. if ω < ω and ε > 0. The following is a useful characterization of convergence in distribution.ε (y) = 0   linear y≤x y ≥x+ε x≤y ≤x+ε . Conversely.2.112 Now. The following corollaries follow immediately from Theorem 1.1 (Fatou’s in Distribution).2 (Dominated Convergence in Distribution). then E(g(Xn )) → E(g(X)). Suppose Xn ⇒ X and g ≥ 0 and continuous. Theorem 1. since Fn (y) → F (y) we see that for all n > N . If Y is continuous at ω we must have lim Yn (ω) ≤ Y (ω). Proof. ω < ω ≤ F (Y (ω )) ≤ F (y). If Xn ⇒ X. If Xn ⇒ X then Corollary 2. Corollary 1. g is continuous and and E|(g(Xn )| < C.1 implies the convergence of the expectations. Then E(g(X)) ≤ limE(g(Xn )). Xn ⇒ X if and only if for every bounded continuous function g we have E(g(Xn )) → E(g(X)). Again. let  1  gx. choose y for which Y (ω ) < y < Y (ω ) + ε and F is continuous at y.

ε (X)) ≤ P (X ≤ x + ε).ε (Xn )) and therefore.ε (Xn )) n→∞ ≤ lim P (Xn ≤ x). Lemma 1. limP (Xn ≤ x) ≤ limE(gx.1. Suppose Xn → X in probability. Corollary 1. let ε → 0. P (X ≤ x − ε) ≤ E(gx−ε. Then P {|Xn | > ε} → 0.6 in Chapter II. n→∞ Now. as n → ∞.ε (X)) = lim E(gx−ε. Suppose Xn → 0 in probability and |Xn | ≤ Y with E(Y ) < ∞. |Y | dP → 0. Hence by Proposition 2. Fix ε > 0. Then E|Xn | → 0. let ε → 0 to conclude that limP (Xn ≤ x) ≤ P (X ≤ x).113 It follows from this that P (Xn ≤ x) ≤ E(gx. Now.3. In the same way. If F continuous at x. {|Xn |>ε} . Proof.ε (Xn )) = E(gx. as n → ∞. we obtain the result. Then Xn ⇒ X.

Let g be a measurable function in R and let Dg = {x: g is discontinuous at x}. and the dominated convergence theorem implies that E(f (g(Yn ))) → E(f (g(Y )) and this proves the result. Next result gives a number of useful equivalent deﬁnitions. An alternative proof is as follows. Then Df ◦g ⊂ Dg . If Xn ⇒ X and P {X ∈ Dg } = µ(Dg ) = 0. If Xn → X in probability and g is bounded and continuous then g(Xn ) → g(X) in probability (why ?) and hence E(g(Xn )) → E(g(X)).114 Since E|Xn | = {|Xn |<ε} |Xn |dP + {|Xn |>ε} |Xn |dP <ε+ {|Xn |>ε} |Y | dP. Proof of Corollary 1. Let ank be a subsequence. So. Since Xnk converges to X in probability also.3 (Continuous mapping Theorem). Theorem 1. Set an = E(G(Xn )) and a = E(X). Thus. the result follows. proving the result. then g(Xn ) ⇒ g(X) Proof.s. X ∼ Y and Yn → Y a.3. . Let f be continuous and bounded. P {Y∞ ∈ Df ◦g } = 0. Let Xn ∼ Yn . we have a subsequence Xnkj which converges almost everywhere and hence by the dominated convergence theorem we have ankj → a and hence the sequence an also converges to a.s. proving Xn ⇒ X. f (g(Yn )) → f (g(Y )) a.

The following are equivalent: (i) Xn ⇒ X. the last property can be used to deﬁne weak convergence of probability measures. (iv) For all sets A ⊂ R with P (X ∈ ∂A) = 0 we have lim P (Xn ∈ A) = n→∞ P (X ∈ A). limP (Xn ∈ K) ≤ P (X ∈ K). Let Yn ∼ Xn . limP (Xn ∈ G) ≥ P (X ∈ G) or what is the same. Proof. Proof. We recall that for any set A. P (X ∈ G) = 0.4.s. where Xn ∼ µn and X ∼ µ. It can very well be that we have strict inequality in (ii) and (iii). That is. We shall say that µn converges to µ weakly if µn (A) → µ(A) for all borel sets A in R with the property that µ(∂A) = 0. Y ∼ X. We shall prove that (i) ⇒ (ii) and that (ii)⇔ (iii). Since G is open. But 1/n → 0 ∈ ∂G. lim1(Yn ∈G) (ω) ≥ 1(Y ∈G) (ω) Therefore Fatou’s Lemma implies P (Y ∈ G) ≤ limP (Yn ∈ G). Consider for example. (ii) For all open sets G ⊂ R. and ﬁnally that (iv) ⇒ (i). let µn and µ be probability measures on (R. Also. limµn (G) ≥ µ(G). Then P (Xn ∈ G) = 1. B). Yn → Y a. ∂A = A\A0 where A is the closure of the set and A0 is its interior. .115 Theorem 1. 1). Assume (i). so. Xn = 1/n so that P (Xn = 1/n) = 1. Then that (ii) and (iii) ⇒ (iv). (iii) For all closed sets K ⊂ R . Take G = (0.

Now. Now. Therefore. P (X ∈ K) = P (X ∈ A) = P (X ∈ G).116 proving (ii). Let K = A. in general. Let K be closed. take A = (−∞. Then ∂A = {x} and this completes the proof. The equivalence of (ii) and (iii) follows from this. (ii) and (iii) ⇒ (iv). . Then K c is open. Next. Put P (Xn ∈ K) = 1 − P (Xn ∈ K c ) P (X ∈ K) = 1 − P (X ∈ K c ). Suppose we have a sequence of probability measures µn . is it true that given distribution functions Fn there is a subsequence {Fnk } such that Fnk converges weakly to a distribution function F ? The answer is no. G = K\∂A and under our assumption that P (X ∈ ∂A) = 0. G = A0 and ∂A = A\A0 . Is it possible to pull a subsequence µuk so that it converges weakly to a probability measure µ? Or. (ii) ⇒ (iii). recall that any bounded sequence of real numbers has the property that it contains a subsequence which converges. (ii) and (iii) ⇒ lim P (Xn ∈ A) ≤ lim P (Xn ∈ K) ≤ P (X ∈ K) = P (X ∈ A) lim P (Xn ∈ A) ≥ lim P (Xn ∈ G) and this gives P (X∞ ∈ G) = P (X∞ ∈ A). To prove that (iv) implies (i). x]. Next.

˜ Proof. That is. x↓−∞ ˜ Lemma 1. we have ˜ f (t) < f (x0 ) + ε and therefore if x0 < x < t0 we see that ˜ ˜ 0 ≤ f (x) − f (x0 ) < ε. we have ˜ ˜ 0 ≤ f (t) − f (x0 ) ≤ f (t0 ) − f (x0 ) < ε. Thus if t ∈ Q is such that x0 < t < t0 . Let x0 ∈ R and ﬁx ε > 0. Then n→∞ x↑∞ 1 3 1(x≥n) (x) 1 + 1 1(x≥−n) (x) + 3 G(x) where G is a 3 lim Fn (x) = F (x) = 1/3 + 1/3G(x) lim f (x) = 2/3 < 1 lim F (x) = 1/3 = 0.3. ˜ proving the right continuity of f . for all x0 < t < t0 .117 Example 1. By the deﬁnition. Let f be an increasing function on the rationals Q and deﬁne f on R by ˜ f (x) = inf f (t) = inf{f (t): x < t ∈ Q} x<t∈Q tn ↓x = lim f (tn ) ˜ Then f is increasing and right continuous. Take Fn (x) = distribution function. We shall show that there is an x > x0 such that ˜ ˜ 0 ≤ f (x) − f (x0 ) < ε. . The function f is clearly increasing.1. Hence ˜ |f (t0 ) − f (x)| < ε. there exists t0 ∈ Q such that t0 > x0 and ˜ f (t0 ) − ε < f (x0 ) < f (t0 ).

The sequence {Fn (q1 )} has values in [0. we have a nondecreasing function G deﬁned on all the rationals. be an enumeration of the rational. let {Fnn } be the diagonal subsequence.5 (Helly’s Selection Theorem). . 1]. . So. Proof. . Similarly for Fn1 (q2 ) and so on. Let qj be any rational. s ∈ Q with r1 < r2 < x < s so that F (x) − ε < F (r1 ) ≤ F (r2 ) ≤ F (x) ≤ F (s) < F (x) + ε. r2 . . Then Fnn (qj ) → G(qj ). . → G(q2 ). schematically we see that q1 : Fn1 . Now. Let {Fn } be a sequence of of distribution functions. . . there exists a subsequence Fn1 (q1 ) → G(q1 ). . . . → G(q1 ) q2 : Fn2 . let us show that Fnk (x) → F (x) for all points of continuity of F . . → G(qk ) . Let x be such a point and pick r1 . qk : Fnk (qk ) . .118 Theorem 1. .1 F is right continuous and nondecreasing. . There exists a subsequence {Fnk } and a right continuous nondecreasing function function F such that Fnk (x) → F (x) for all points x of continuity of F . . Let q1 . Next. q2 . . Set F (x) = inf{G(q): q ∈ Q: q > x} = lim G(qn ) qn ↓x By the Lemma 1. . Hence.

µ(R) = 1 and µ is a probability measure. Let µnk ⇒ µ. “The mass of µn scapes to inﬁnity.119 Now. Proof. Then µ(R) ≥ µ(J) = lim µnk (J) n→∞ ≥ limµnk (Iε ) > 1 − ε. as claimed. Then we can ﬁnd an ε > 0 and a sequence nk such that µnk (I) ≤ 1 − ε. F (x) − ε < Fnk (r2 ) ≤ Fnk (x) ≤ Fnk (s) < F (x) + ε and this shows that Fnk (x) → F (x). since Fnk (r2 ) → F (r2 ) ≥ F (r1 ) and Fnk (s) → F (s) we have for nk large enough. . suppose (∗) fails. A sequence of probability measures satisfying (∗) is said to be tight. b] such that inf µn (Iε ) > 1 − ε.6. n (*) In terms of the distribution functions this is equivalent to the statement that for all ε > 0. When can we guarantee that the above function is indeed a distribution? Theorem 1. there exists an Mε > 0 such that supn {1 − Fn (Mε ) + Fn (−Mε )} < ε. Therefore. Let J ⊃ Iε and µ(∂J) = 0.” The tightness condition prevents this from happening. Notice that if µn is unit mass at n then clearly µn is not tight. Every weak subsequential limit µ of {µn } is a probability measures if and only if for every ε > 0 there exists a bounded interval Iε = (a. Conversely.

To see the this observe that |ϕ(t + h) − ϕ(t)| = |E|ei(t+h)X − eitX )| ≤ E|eihX − 1| . Note that if X ∼ Y then ϕX (t) = ϕY (t) and if X and Y are independent then ϕX+Y (t) = E(eitX eitY ) = ϕX (t)ϕY (t). ˆ and again |ϕX (t)| ≤ 1.d.120 for all nk and all bounded intervals I. Therefore. Then µ(J) = lim µnkj (J) ≤ limµnkj (J) j→∞ ≤ 1 − ε. Let µnkj → µ weakly. Notice also that if (a + ib) = a − ib then ϕX (t) = ϕX (−t).. X2 . µ(R) ≤ 1 − ε and µ is not a probability measure. . In particular. if X1 . . Let J be a continuity interval for µ. If X be a random variable its µ characteristic function is deﬁned by ϕX (t) = E(eitX ) = E(cos(tX)) + iE(sin(tX)).i. . . Notice that if µis the distribution measure of X then ϕX (t) = R eitx dµ(x) = µ(t). Xn are are i. then ϕXn (t) = (ϕX1 (t))n . Let µ be a probability measure on R and deﬁne its Fourier transform by µ(t) = ˆ R eitx dµ(x). The function ϕ is uniformly continuous. §2 Characteristic Functions. Notice that the Fourier transform is a a complex valued function satisfying |ˆ(t)| ≤ µ(R) = 1 for all t ∈ R.

k = 0. Then ϕ(t) = E(eitX ) = eita (ii) (Coin ﬂips) P (X = 1) = P (X = −1) = 1/2. We now proceed to present some examples which will be useful later. 1. ϕ−X (t) = ϕX (−t) = ϕX (t). In particular. Then ϕaX+b (t) = eitb ϕX (at). Then ϕ(t) = E(eitX ) = peit + (1 − p) = 1 + p(eit − 1) k (iv) (Poisson distribution) P (X = k) = e−λ λ . P (X = 0) = 1 − p. . . 2 2 2 (iii) (Bernoulli ) P (X = 1) = p. Then ϕ(t) = E(eitX ) = 1 1 it 1 −it e + e = (eit + e−it ) = cos(t). k! ∞ ϕ(t) = k=0 e −λ k λ itk e ∞ k! it =e −λ k=0 (λeit ) k! k = e−λ eλe = eλ(e it −1) . If −X ∼ X then ϕX (t) = ϕX (t) andϕX is real. suppose a and b are constants.121 and use the continuity of the exponential to conclude the uniform continuity of ϕX .1. 3 . Next. 2. Examples 2. (i) (Point mass at a) Suppose X ∼ F = δa .

Writing eitx = cos(tx) + i sin(tx) we obtain 1 ϕ(t) = √ 2π eitx e−x R 2 /2 1 dx = √ 2π cos txe−x R 2 /2 dx 2 1 ϕ (t) = √ −x sin(tx)e−x /2 dx 2π R 2 1 = −√ sin(tx)xe−x /2 dx 2π R 2 1 = −√ t cos(tx)e−x /2 dx 2π R = −tϕ(t). it Remark. then we have the integral of which does not converse t absolutely. Integration by parts gives ϕ(t) = (vi) (Normal) X ∼ N (0. ϕ(t) = e−t 2 1 . (1 − it) /2 . Let µ be a probability measure and let ϕ(t) = R eitx dµ(x). This gives ϕ (t) ϕ(t) = −t which.1 (The Fourier Inversion Formula). if µ = δ0 then ϕ(t) = 1. we do not mean that the integral converges absolutely. Then if x1 < x2 T −T 1 1 1 µ(x1 . For example. x2 ) + µ(x1 ) + µ(x2 ) = lim T →0 2π 2 2 e−itx1 − e−itx2 ϕ(t)dt. Theorem 2. 1). If 2 sin t x1 = −1 and x2 = 1.122 (v) (Exponential) Let X be exponential with density e−y . . The existence of the limit is part of the conclusion. Proof of (vi). Also. immedi2 ately yields ϕ(t) = e−t /2 as desired. together with the initial condition ϕ(0) = 1.

.3) for α = 1. If n is odd then n − 1 is even and nπ sin x dx < 0.3) apply Fubini’s Theorem to obtain ∞ 0 sin x dx = x = ∞ ∞ sin x 0 ∞ 0 ∞ 0 ∞ 0 e−ux dudx e−ux sin xdx du = 0 du 1 + u2 = π/2.3) sin(αx) dx = π/2 sign(α).2) and (2.123 Recall that  α>0 1  sign(α) = 0 α=0   −1 α < 0 Lemma 2. ∞) = [0. For (2.1)–(2. x ∞ 0 1 − cos αx π dx = |α|. the result follows by replacing y with (n + 1)π and using the same argument. + dx x x x x π 2π nπ y sin x sin x dx + (−1)a1 + (−1)2 a2 + .1. Comparing x terms we are done. For all y > 0. . If n is even. π] ∪ [π. .1). x (2. 2π]. . . . y 0 ≤ sign(α) 0 ∞ 0 sin(αx) dx ≤ x π 0 sin x dx. 2 x 2 Proof.1) (2.2) (2. and choose n so that nπ < y ≤ (n + 1)π. . For (2. . It suﬃces to prove (2. write [0. Let αx = u. Then y 0 π sin x dx = x n (k+1)π kπ = 0 π = 0 3π y sin x sin x sin x sin x dx + dx + dx + . + (−1)n−1 an−1 + (−1)n dx x x nπ y k=0 2π sin x dx x y + nπ sin x dx x where |aj+1 | < |aj |.

We begin by observing that eit(x−x1 ) − eit(x−x2 ) = it and hence for any T > 0.4) −T = −∞ F (T. the deﬁnition of ϕ and Fubini’s Theorem. F (T. x1 . we obtain 1 2π T −T e−i+x1 − e−itx2 ϕ(t)dt = it = ∞ −∞ ∞ −∞ ∞ T −T T e−itx1 − e−itx2 itx e dtdµ(x) 2πit eit(x−x1 ) − eit(x−x2 ) dt dµ(x) 2πit (2. R1 −T From this. x. x. This completed the proof. T x2 x1 e−itu du ≤ |x1 − x2 | |x2 − x2 |dtdµ(x) ≤ 2T |x1 − x2 | < ∞.1.124 and ∞ 0 1 − cos x dx = x2 = ∞ 0 ∞ 1 x2 sin u x sin ududx 0 ∞ u 0 ∞ dx du x2 = 0 sin u du u = π/2. Proof of Theorem 2. x2 )dµ(x) Now. t . x2 ) = − = 1 2πi 1 2πi 1 π T −T T −T T 0 cos(t(x − x1 )) i dt + t 2πi i cos(t(x − x2 )) dt − t 2πi T 0 T −T T −T sin(t(x − x1 )) dt t sin(t(x − x2 )) dt t sin(t(x − x1 )) 1 dt − t π sin(t(x − x2 )) dt. x1 .

This follows from our construction. 2 2 0 · dµ + proving the Theorem. x1 . 2 Therefore by the dominated convergence theorem we of (2.∞) Corollary 2. x2 ) = − (− 2 ) = 1. 1 dµ + 2 0 · dµ (x2 . page 28).1. x2 )| ≤ and sin t dt.1. x.x2 ) (−∞. the union of their atoms is also countable and hence we may apply the Lemma. .125 using the fact that sin(t(x − xi )) cos(t(x − xi ) is even and odd. the two measures agree. x1 . (see also Chung.    0 − (− 1 ) = 1 . |F (T.x1 ) {x1 } 2 1 1 = µ(x1 . then they agree on all of B(R).1. If two probability measures have the same characteristic function then they are equal. t t 2 π π 0 By (2.1) and (2. Since the atoms of both measures are countable. x2 ) + µ{x1 } + µ{x2 }. This follows from the following Lemma 2. Proof Corollary 2.   2 2  1 1 lim F (T. Suppose The two probability measures µ1 and µ2 agree on all intervals with endpoints in a given dense sets. t if x < x1 if x = x1 if x1 < x < x2 if x = x2 if x > x2 see that the right hand side  1 1  − 2 − (− 2 ) = 0.2).4) is 1 dµ + 1 · dµ + (x1 . T →∞  2  1   2 −0= 1  2   1 1 − 2 = 0. x.

hit . x + h) h 1 e−it − e−it(x+h) = 2π R hit = Let h → 0 to arrive at F (x) = 1 2π e−itx ϕX (t)dt. h > 0.126 Corollary 2.2. = 2π −∞ it Since e−it(x−h) − e−itx = it we see that 1 1 lim (µ(x1 . Then F is continuously e−ity ϕX (y)dy. F (x + h) − F (x) = µ(x. Now. Suppose X is a random variable with distribution function F and characteristic function satisfying diﬀerentiable and F (x) = 1 2π R |ϕX |dt < ∞. x2 ) = F (x2 −) − F (x1 ) we have 1 1 F (x2 −) − F (x1 ) + (F (x1 ) − F (x1 −)) + (F (x2 ) − F (x2 −)) 2 2 1 1 = µ(x1 . µ{x} = 0 for any x ∈ R. proving the continuity of F . h→0 2π R Hence. R x e−ity dy ≤ h x−h ϕX (t)dt 1 2π − R (e−it(x+h) − e−itx ) ϕX (t)dt. Let x1 = x − h. x2 = x. R Proof. Since µ(x1 . x2 ) + µ(x1 )) + µ{x2 } ≤ h→∞ 2 2 h ≤ lim |ϕX (t)| = 0. x2 ) + µ{x1 } + µ{x2 } 2 2 ∞ −it(x−h) 1 e − e−itx ϕX (t)dt.

Writing x x F (x) = −∞ F (t)dt = −∞ f (t)dt we see that it has a density f= and hence also ϕ(t) = R 1 2π e−itx ϕX (t)dt. then ϕn (t) → ϕ(t) for all t ∈ R. σ 2 ) then ϕX (t) = eiµt−σ 2 2 nt2 2 .127 Note that the continuity of F follows from this. x] = √ 2π and hence no weak convergence.) Clearly ϕn → 0 for all t = 0 and ϕn (0) = 1 for all n. In particular. Example 3. e −∞ −t2 2 dt → 1/2 . Let {µn } be a sequence of probability measures with characteristic functions ϕn . x] = √ a simple change of variables (r = t √ ) n 1 2πn x −∞ e−t 2 /2n dt gives x √ n 1 µn = (−∞. he continuity of the exponential and the dominated convergence theorem.1. (ii) If ϕn (t) → ϕ(t) for all ˜ t ∈ R where ϕ is a continuous function at 0. (By scaling if X ∼ t /2 . (i) If µn converges weakly to a probability measure µ with characteristic function ϕ.1. Also with µn (−∞. n). Let µn ∼ N (0. then the sequence of measures {µn } ˜ is tight and converges weakly to a measure µ and ϕ is the characteristic function ˜ of µ. Theorem 3. Thus ϕn (t) converges for ever t but the limit is not continuous at 0. R eitx f (x)dx. Then ϕn (t) = e− N (µ. if ϕn (t) converges to a characteristic function ϕ then µn ⇒ µ. §3 Weak convergence and characteristic functions.

Thus for all ε > 0 δ→0 2δ δ there exists a δ = δ(ε) > 0 and n0 = n0 (ε) such that for all n ≥ n0 . Since µn ⇒ µ we get that E(g(Xn )) → E(y(X∞ )) and this gives ϕn (t) → ϕ(t) for every t ∈ R. 2A].2) or P {|X| > 2A} ≤ 2 − A ϕ(t)dt . lim 1 − ε/2 < 1 2δ δ ϕn (t)dt + ε/2. This is the easy part. For all A > 0 we have A−1 µ[−2A. Lemma 3. Note that g(x) = eitx is bounded and continuous. (3. we have for each ﬁxed δ > 0 (by the dominated convergence theorem) δ 1 lim n→∞ 2δ −δ |ϕn (t) − ϕ(t)|dt → 0.1) This. δ 1 |ϕ(t)|dt = |ϕ(0)| = 1. −δ Since ϕn (t) → ϕ(t) for all t. 1 2δ δ ϕ(t)dt ≤ −δ 1 2δ 1 2δ δ ϕn (t)dt −δ δ + |ϕn (t) − ϕ(t)|dt. Since ϕ is continuous at 0. −A−1 (3. For the proof of (ii) we need the following Lemma. A−1 (3. −δ .128 Proof of (i).3) Proof of (ii). 2A] ≥ A −A−1 ϕ(t)dt − 1. Let δ > 0. of course can also be written as A−1 1−A −A−1 ϕ(t)|dt| ≥ −µ[−2A.1 (Estimate of µ in terms of ϕ).

This completes the proof. For any T > 0 T T (1 − eitx )dt = 2T − −T −T (cos tx + i sin tx)dt 2 sin(T x) .129 or 1 δ Applying the Lemma with A = δ ϕn (t)dt > 2(1 − ε). 1 T or 2− That is. x = 2T − Therefore. ψ(t) = ϕ(t) and hence ϕ(t) is ˜ ˜ a characteristic function and any weakly convergent subsequence musty converge to a measure whose characteristic function is ϕ.1. ˜ Proof of Lemma 3. 2δ −1 ] > δ −δ ϕ(t)dt > 2(1 − ε) − 1 = 1 − 2ε. for all T > 0. T T (1 − eitx )dtdµ(x) = 2 − Rn −T R 2 sin(T x) dµ(x) Tx 1 T T ϕ(t)dt = 2 − −T R 2 sin(πx) dµ(x). −δ 1 δ gives δ µn [−2δ −1 . Then since µnk ⇒ ν the ﬁrst part implies that ϕnk (t) → ψ(t) for all t. Then ν a probability measure. Therefore. for all n ≥ n0 . Let ψ be the characteristic function of ν. Thus the sequence {µn } is tight. for any |x| > 2A. Tx sin(T x) 1 1 ≤ ≤ Tx |T x| (2T A) . Tx ϕ(t)dt = −T R sin(T x) dµ(x). 1 2T Now. Let µnk ⇒ ν.

130 and also clearly. 2A] + [1 − µ[−2A. µ[−2A. Then its characteristic function ϕ has bounded continuous derivatives of any order less than or equal to n and ∞ ϕ (k) (t) = −∞ (ix)k eitx dµ(x). −T T −1 (1 − ϕ(t))dt. for all x. Corollary. 2A] + 2T A 2T A 2A Now.1. sin T x < 1. variable. P {|X| > 2/T } ≤ or P {|X| > T } ≤ T /2 −T −1 µ{x: |x| > 2/T } ≤ 1 T T (1 −T − ϕ(t))dt. Tx Thus for any A > 0 and any T > 0. 2A] + 1/2 2 which completes the proof. Theorem 4. take T = A−1 to conclude that A 2 A−1 ϕ(t)dt ≤ −A−1 R sin T x dµ Tx 1 = µ[−2A. . | R sin(T x) dµ(x)| = Tx sin(T x) sin(T x) µ(dx) + dµ(x) Tx Tx −2A |x|>2A 1 ≤ µ[−2A. §4 Moments and Characteristic Functions. 2A]] 2T A 1 1 = 1− . or in terms of the random T 1 T (1 − ϕ(t))dt. Suppose X is a random variable with E|X|n < ∞ for some positive integer n.

m! We recall here that g(t) = o(tm ) as t → 0 means g(t)/tm → 0 as t → 0. For any random variable X and any n ≥ 1 n n EeitX − m=0 E(itX)m (itX)m ≤ E eitX − m! m! m=0 |tX|n+1 2|tX|n . R ei(t+h)x − eitx h dµ We now continue by induction to complete the proof. n ϕ(t) = m=0 im tm E(X)m + o(tn ). By calculus. Suppose E|X|n < ∞.131 for any k ≤ n. (n + 1)! n! . Corollary 4. n ≤ E min This follows directly from .2. Let µ be the distribution measure of X. Suppose n = 1. ϕ(m) (0) = im E(X m ) by the above theorem. Then its characteristic function ϕ has the following Taylor expansion in a neighborhood of t = 0. n an integer. m! m=0 In the present case. Theorem 4. Proof.1. Since ∞ the dominated convergence theorem implies that ϕ (t) = lim = R R |x|dµ(x) < ϕ(t + h) − ϕ(t) = lim n→0 h→0 h (ix)eitx dµ(x). Proof. if ϕ has n continuous derivatives at 0 then ϕ(m) (0) m ϕ(t) = t + o(tn ).

need to estimate the right hand side. e ix i3 i2 x2 + 2 2 x (x − s)2 eis ds 0 in+1 (ix)m = − m! n! m=0 n x (x − s)n eis ds. We note that this is just the Taylor expansion for eix with some information on the error. i n x 0 xn (x − s) e ds = − + n n is x (x − s)n−1 eis ds. Proof. (n + 1)! n This is good for |x| small. x 0 i xn+1 + (x − s) e ds = (n + 1) (n + 1) n is x (x − s)n+1 eis ds.132 Lemma 4. For n = 1. e ix − (ix)m ≤ min m! m=0 n |x|n+1 2|X|n . (n + 1)! n! . For all n ≥ 0 (by integration by parts). in+1 n! x 0 1 (x − s) e ds ≤ n! n is x 0 |x|n+1 (x − s) dx = . For any real x and any n ≥ 1. 0 For n = 0 this is the same as 1 ix (e − 1) = i or x x x eis ds = x + i 0 0 (x − s)eis ds eix = 1 + ix + i2 0 (x − s)eis ds.2. eix = 1 + ix + and continuing we get for any n. 0 So. 0 . Next.

We shall ﬁrst look at the i.i. 0 2 (x − s) e ds ≤ (n − 1)! 2 ≤ |x|n . with EXi = µ. σ Corollary 1. then ϕ(t) = 1+itµ− t 2 +o(t)2 . Then Sn − nµ √ ⇒ N (0. 1).1. n! n is |x| (x − s)n−1 ds 0 and this completes the proof. . var(Xi ) = σ 2 < ∞. Proof. §5. + Xn . σ n . Set Sn = X1 + . The Central Limit Theorem. If EX = µ and E|X|2 = σ 2 < ∞.i.d.133 Since xn = n 0 x (x − s)n−1 ds we set i n or in+1 n! This gives in+1 n! x 0 x 0 x x (x − s)n eis ds = 0 0 (x − s)n−1 (eis − 1)ds x in (x − s) e ds = (n − 1)! n is (x − s)n−1 (eis − 1)ds. Applying Theorem 4.d. {Xi } i. case. . 2 2 as t → 0. Theorem 5.1 with n = 2 gives ϕ(t) − 1 + itµ − t2 σ 2 2 ≤ t2 E |t||X|3 2|X|2 ∧ 3! 2! and the expectation goes to zero as t → 0 by the dominated convergence theorem.

P{ Sn − µ 1 √ ≤ x} → √ σ n 2π x e−y −∞ 2 /2 dt. By looking at Xi = Xi − µ.134 Equivalently.1 bellow to ϕ Sn √ σ n (t) = √ t2 + g(t/σ n) 1− 2n n → e−t 2 /2 and complete the proof. Apply Lemma 5.. By above ϕX1 (t) = 1 − with g(t) t2 t2 σ 2 + g(t) 2 → 0 as t → 0. we have (for ﬁxed t) that t2 g t √ σ n g √ = (1/ n)2 as n → ∞. This can be written as ng t2 + ng 2 t √ t √ t √ σ n 1 n → 0. for any real number x. we may assume µ = 0. . Proof. By i. g(t) → 0 as t → 0. σ n → 0 as n → ∞. set Cn = − get σ n and C = −t2 /2.i. Next.d. ϕSn (t) = t2 σ 2 + g(t) 1− 2 n or ϕ Since Sn √ σ n (t) = ϕSn (σ nt) = √ t2 1− +g 2n σ n t √ n .

2) + b3 3! + . 2 2 2 1+ |eb − (1 + b)| ≤ |b|2 2 |b|2 ≤ 2 = |b|2 . . write eb = 1 + b + b2 2 (5. . . .. . (5. in fact. 3! 4! 1 1 1 1 + + 2 + 3 + . For this. which establishes (5. ωn are complex numbers with |zj | and |ωj | ≤ η for all j. . 1+ Cn n n → Proof. with equality. z2 .2). . . . zn and ω1 . First we claim that if z1 .. . Assume it for n − 1 to get n n n−1 n−1 zm − m=1 n−1 m=1 ω m ≤ zn m=2 n−1 zm − ω n m=1 n−1 ωm n−1 zn m=1 zm − zn m=1 n−1 n−1 ω m + zn m=1 n−1 ωm − ωn m=1 ωm ≤η m=1 zm − m=1 n−1 ωm + m=1 ωm |zn − ωm | ≤ ηη n−2 m=1 n |zm − ωm | + η n−1 |zn − ωm | |zm − ωm |. .1.1) If n = 1 the result is clearly true for n = 1. if b ∈ C and |b| ≤ 1 then |eb − (1 + b)| ≤ |b|2 . Then eC . If Cn are complex numbers with Cn → C ∈ C.. .. = η n−1 m=1 Next. then n n n zm − m=1 m=1 ωm ≤ η n−1 m=1 |zm − ωm |. Then 2|b| 2|b|2 + + .135 Lemma 5.

2) we see that this quantity is dominated by ≤ e n (n−1) 1 + γ γ γ n n−1 n n−1 Cn n 2 γ γ 2 e n (n−1) 1 + n ≤ n γ 2 n (n−1) γ e γ e γ 2 e2γ ≤ ≤ < ε. 2. . Set zi = (1 + Cn ) n n and ωi = (eCn /n ) for all i = 1.2) established we let ε > 0 and choose γ > |C|. 1). n n which proves the lemma Example 5. var(Xi ) = EXi − (E(X))2 = 1/2 − ( ) = 1/4 4 and hence Sn − n Sn − µn 2 √ = ⇒ χ = N (0. Let Xi be i.1.1) and (5. By (5. n. .i.9773 = 0. . . Bernoullians 1 and 0 with probability 1/2. . σ n n/4 From a table of the normal distribution we ﬁnd that P (χ > 2) ≈ 1 − .136 With both (5. + Xn = total number of heads after n–tones.227.1) we have 1+ Cn n n − eCn ≤ e n (n−1) 1 + ≤ e n (n−1) 1 + γ γ γ n γ n n−1 n e m=1 n−1 Cn n − 1+ Cn n Cn n ne Cn n − 1+ Setting b = Cn /n and using (5.d. Take Cn n large enough so that |Cn | < γ and γ 2 e2γ /n < ε and ≤ 1. Let Sn = X1 + . . . 1 2 EXi = 1/2. Then |zi | = 1 + Cn γ ≤ 1+ n n and |ωi | ≤ eγ/n γ n hence for both zi and |ωi | we have the bound eγ/n 1 + .

. A Roulette wheel has slots 1–38 (18 red and 18 black) and two slots 0 and 00 that are painted green.500 and 125. 000 + 5000. 2 P (Xi = −1) = 20 38 . Let X1 .227) = . n √ − n = 125. Players can bet \$1 on each of the red and black slots. . 000 − 500 2 n √ + n = 125.500 heads. Xn be i. 2 That is. 000. . Hence for n large we should have 0. .000 tosses you will get between 124.137 Symmetry: P (|χ| < 2) = 1 − 2(0. Sn − n 2 n2 /4 <2 −2 √ n 2√ n ≤ Sn − < n 2 2 2 √ n √ − n < Sn ≤ n + n/2 .9546. with Xi = {±1} and P (Xi = 1) = 18 38 . with probability 0.95 ≈ P =P =P If n = 250. Suppose we want to know P (Sn ≥ 0) after large numbers tries. The player wins \$1 if the ball falls on his/her slot. Sn = X1 + · · · + Xn is the total fortune of the player after n games. .2.9972 Sn − nµ −nµ √ ≥ √ σ n σ n . Since E(Xi ) = 18 20 −2 1 − = =− 38 38 38 19 2 2 var(Xi ) = EX − (E(x)) = 1 − we have P (Sn ≥ 0) = P 1 19 2 = 0.i. Examples 5.d. after 250.95.

138 Take n such that √ − n −1 19 = 2.m → 1≤m≤n n 0 and m=1 Cn. Let Cn. So. after n = 1444 the Casino would have won \$76 of your hard earned dollars.9972) or n ≈ 3 61. Then n (1 − Cn.2.m ) → e−λ . .9972) This gives √ n = 2(19)(0.m be nonnegative numbers with the property that max Cn. Thus.0228 Also. you decide if you want to play! Lemma 5.4 = 1444. m=1 Proof.0225 that you will be ahead. but there is a probability . Therefore. Hence P (S1444 ≥ 0) = P Sn − nµ √ ≥2 σ n ≈ P (χ ≥ 2) = 1 − 0. Recall that lim log a↓0 1 1−a a = 1. E(S1444 ) = − 1444 = −4. in the average. given ε > 0 there exists δ > 0 such that 0 < a < δ implies (1 − ε)a ≤ log 1 1−a ≤ (1 + ε)a.19 19 = −76.m → λ. (.9772 = 0.

For each n. Example 5. 1/2 > ε n n = E(|Y1 |2 . σ ∈ (0. m=1 Also. Let Y1 .n .m | > ε) = 0. |Xn.n ) → −λ m=1 or log n (1 − Cm. m=1 Then..1 + Xn. lim n→∞ E(|Xn. .d.n ≤ δ and (1 − ε)Cm.m → σ 2 . σ 2 ). let Xn. for all ε > 0. .139 If n is large enough. . Sn = Xn. n E(|Xn. be i. . |Xn. This implies the result. Let Xn. n (ii) For all ε > 0.3.’s with EXn.n ≤ log Thus 1 1 − Cm.m = 0. Cm. EYi = 0. n log m=1 →λ and this is the same as n log(1 − Cm. n n Clearly. |Y1 | > εn1/2 ) .m . 1 = σ2 . .m ⇒ N (0.n ≤ (1 + ε)Cm.m = Ym /n1/2 . Then Xn. be independent r.1 + Xn. 1 ≤ m ≤ n. Theorem 5.i. + Xn. E(Yi2 ) = σ 2 .m | > ε) = nE m=1 |Y1 |2 |Y1 | .v. ∞). . Suppose n (i) m=1 2 EXn. Y2 .n ) m=1 → −λ.2 + .n 1 1 − Cm. + Xn.2 + .m = 2 E(Ym ) σ2 = n n m=1 n S √n .m |2 .m |2 .2 (The Lindeberg–Feller Theorem). .

|Xn.m | > ε) . |Xn.m |2 +E ∧ . |Xn. σn.140 and this goes to 0 as n → ∞ since E|Y1 |2 < ∞.m |2 .m − ωn.m | ≤ m=1 εt3 σ 2 . 2 2 Proof.m (t) = E(eitXn.m |2 . |Xn.m = ϕn.m ≤ ε2 + E(|Xn.m |2 ≤E ∧ .m |3 2|tXn. 2 Let ε > 0 and set zn. Now. Let ϕn.m /2). We have |zn.m |3 2t2 |Xn. 6 Let ε → 0 to conclude that n n→∞ lim |zn.m ).m ).m (t) → e−t m=1 2 σ 2 /2 .m (t) − m=1 m=1 1− 2 t2 σn.m | > ε 3! 2! |tXn.m |2 ∧ 3! 2! 3 |tXn.m |2 + t2 E(|Xn. ωn.m − ωn.m | 2|tXn. as n → ∞.m = (1 − t2 σn. 2 σn.m | ≤ ε 3! + E |tXn.m |3 ≤E .m | ≤ ε 3! 2! |tXn.1) gives n n ϕn.m (t). |Xn. m=1 Hence with η = 1 (5.m | > ε) 6 Summing from 1 to n and letting n → ∞ gives (using (i) and (ii)) n n→∞ lim |zn. |Xn.m = E(Xn.m 2 → 0.m | ≤ E |tXn.m − ωn.m | > ε ≤ εt3 E|Xn. It is enough to show that n ϕn.m |2 .m | → 0.

We shall now return to the Kolmogorov three series theorem and prove the necessity of the condition. (ii) n=1 ∞ EYn converges and (iii) n=1 var(Yn ) < ∞. be independent ∞ random variables. The Kolmogorov’s Three Series Theorem. We begin by proving (i). ∞ Proof.m |2 .m → 0. |Xn. That is. 2 t2 σn.s.m = 1≤m≤n Then n Cn. .m | > ε). We now show that if n=1 Xn converges then (i)–(iii) hold.s. . 2 2 The second term goes to 0 as n → ∞.m 2 → e− t2 σ 2 2 .141 and therefore. (iii) are true then ∞ n=1 Xn converges a.m → m=1 t2 σ 2 and Lemma 5. Let X1 . Set Cn.m ≤ε + m=1 2 E(|Xn. Then n=1 Xn converges a.m . Let A > 0 and Ym = Xm 1(|Xm |≤A) .2 shows that n 1− m=1 t2 σn. max σn. completing the proof of the Theorem. This was not done when we ﬁrst stated the result earlier. . . For the sake of completeness we state it in full again. (ii). n 1≤m≤n max 2 σn. if and only if the following three hold: ∞ (i) n=1 ∞ P (|Xn | > A) < ∞. We have shown that if (i). X2 .

That is. the above sum is zero. . lim m=1 Xm cannot exist. Let (Ym − EYm Cn 1/2 Cn = m=1 var(Ym ) and Xn. n Thus.m = By Theorem 5. 1).m |2 . m=1 Then the Borel–Cantelli lemma implies that P (|Xn | > A i.m |2 . suppose ∞ P (|Xn | > A) = ∞. m=1 Let ε > 0 and choose n so large that n 2A Cn n 1/2 < ε. suppose (i) holds but n=1 n var(Yn ) = ∞.) > 0.m = .142 Suppose this is false. Then EXn.1 + Xn. . Hence if the series converges we must have (i).m | > E |Xn. So. Then 2A Cn 1/2 E(|Xn.m |2 . Xn.m = 1.m | > ε) ≤ m=1 n m=1 E |Xn. But |Yn | + E|Ym | 1/2 Cn ≤ 2A Cn 1/2 . |Xn. Let Sn = Xn.m = 0 and n 2 EXn. 2A 1/2 Cn ≤ m=1 < |Yn | + E|Ym | Cn 1/2 . . 1 Cn 1/2 m=1 n (Ym − EYm ).2.2 + . ∞ Next. |Xn. Sn ⇒ N (0.o.

∞ ∞ Now. The Polya distribution. Thus if m=1 Xn converges so does Ym and hence also EYm .1) = (1 − |x|)+ .) But Sn − T n = − 1 Cn 1/2 m=1 n E(Ym ) which is nonrandom. §6. Therefore. Consider the density function given by f (x) = (1 − |x|)1x∈(−1. if lim Let n→∞ Xm exists then lim m=1 n→∞ m=1 n Ym exists also. t2 R .143 n n Now. Its characteristic function is given by ϕ(t) = and therefore for all y ∈ R. 1). n=1 var(Yn ) < ∞ implies m=1 (Ym − EYm ) converges. (This follows from (i). by the corollary n to Kolmogorov maximal inequality. This gives a contradiction and shows that (i) and (iii) hold.) Tn = m=1 Ym Cn 1/2 n = 1 Cn 1/2 Ym m=1 and observe that Tn ⇒ 0. We begin with some discussion on the Polya distribution. (This follows from the fact that limn→∞ E(g(Sn − Tn )) = limn→∞ E(g(Sn )) = E(g(χ)). (1 − |y|)+ = 2 2π 1 = π e−ity R 2(1 − cos t) t2 (1 − cos t) dt t2 1 − cos t −ity e dt. (Sn − Tn ) ⇒ χ where χ ∼ N (0.

Example 6.1. one can do the case 1 < α < 2. . Fn have characteristic functions ϕ1 . With a more delicate argument. we have the Cauchy density. Theorem 6.144 Take y = −y this gives (1 − |y|)+ = 1 − cos x which has πx2 R 1 π R (1 − cos t) ity e dt. ∞) with lim ϕ(t) = 1. If fa (x) = + 1−cos ax πax2 . ϕ(t) = e−|t| for any 0 < α ≤ 2. ∞) so that t↓0 t↑∞ ∞ ϕ(t) = 0 1− t s + dν(s) and ϕ(t) is a characteristic function. we have the normal density. The following fact will be n n useful below. If α = 2. and λi ≥ 0 with λi = 1. . just by changing variables. decreasing and convex on (0. . lim ϕ(t) = 0. We only need to verify that the function is convex. and we take X ∼ F where F has density f1 we see that (1−|t|)+ is its characteristic function This is called the Polya distribution. More generally. then then we get the characteristic function ϕa (t) = 1− t a . . respectively. . α . if f1 (x) = f1 (x)dx = 1. ϕ(t) = ϕ(−t). . If F1 .1 (The Polya Criterion). Diﬀerentiating twice this reduces to proving that αt2α−2 − α2 tα−2 + αtα−2 > 0. . . Let us in show here that exp(−|t|α ) is a characteristic function for any 0 < α < 1. Then the characteristic function of i=1 λi Fi is i=1 λi ϕi . t2 So. Let ϕ(t) be a real and nonnegative function with ϕ(0) = 1. ϕn . If α = 1. There is a probability measure ν on (0. This is true if α2 tα − α2 + α > 0 which is the same as α2 tα − α2 + α > 0 which follows from αtα + α(1 − α) > 0 since 0 < α ≤ 1.

d. . Since F (t) − G(t) → 0 at t → ±∞. Set A = x∈R There is a number α such that for all T > 0. since G is bounded and we may obviously assume that it is positive. 2M x∈R (ii) G has bounded derivative with sup |G (x)| ≤ M .1.145 §7.1. there is a sequence xn → b ∈ R such that   2M A  F (xn ) − G(xn ) → or . . TA 2M T A 3 0 ∞ 1 − cos x dx − π x2 ≤ −∞ 1 − cos T x {F (x + α) − G(x + α)}dx . 1 sup |F (x) − G(x)|.   −2M A . x2 Proof. however. Lemma 7. + 3/2 + . x→+∞ lim G(x) = 1. In fact. σ3 n where c is an absolute constant. Theorem 7. EXi = 0 and E|Xi |3 = ρ < ∞. Rates of Convergence. . If Sn Fn is the distribution of √ and Φ(x) is the normal distribution. . Let Xi be i. E|Xi |2 = σ 2 .. More is actually true: H3 (x) H1 (x) H2 (x) + . Berry–Esseen Estimates.i. we have σ n sup |Fn (x) − Φ(x)| ≤ x∈R cρ √ . we may take c = 3. We shall not prove this. Observe that A < ∞. Let F be a distribution function and G a real–valued function with the following conditions: (i) x→−∞ lim G(x) = 0. Fn (x) = φ(x) + √ + n n n where Hi (x) are explicit functions involving Hermit polynomials.

A −A 1 − cos T x {F (x + α) − G(x + α)}dx ≤ −M x2 A −A 1 − cos T x (x + A)dx x2 A = −2M A 0 1 − cos T x x2 dx . since A = (b − α).146 Since F (b) ≥ F (b−) it follows that either   F (b) − G(b) = 2M A  or   F (b−) − G(b) = −2M A. Assume F (b−) − G(b) = −2M A. the other case being similar. If |x| < A we have G(b) − G(x + α) = G (ξ)(b − α − x) = G (ξ)(A − x) Since |G (ξ)| ≤ M we get G(x + a) = G(b) + (x − A)G (ξ) ≥ G(b) + (x − A)M. Therefore for all T > 0. So that F (x + a) − G(x + a) ≤ F (b−) − [G(b) + (x − ∆)M ] = −2M A − xM + AM = −M (x + A) for all x ∈ [−A. Put α = b − A < b. A].

x2 Adding these two estimates gives ∞ −∞ 1 − cos T x {F (x + α) − G(x + α)}dx x2 A ∞ ≤ 2M A − 0 +2 A A ∞ 1 − cos T x dx x2 1 − cos T x dx x2 ∞ 0 = 2M A − 3 0 A +2 0 = 2M A − 3 0 A 1 − cos T x dx + 2 x2 1 − cos T x dx + 2 x2 TA 1 − cos T x dx x2 = 2M A − 3 0 πT 2 < 0. Suppose in addition that G is of bounded variation in (−∞. . −A ∞ + −∞ A 1 − cos T x {F (x + α) − G(x + α)}dx ≤ 2M A x2 = 4M A −A ∞ + −∞ ∞ A A 1 + cos T x dx x2 1 − cos T x dx. ∞) (for example if G has a density) and that ∞ |F (x) − G(x)|dx < ∞. −∞ Let f (t) and g(t) be the characteristic functions of F and G. Then 1 A≤ 2πM for any T > 0.147 Also. Lemma 7. Proof. respectively. ∞ T −T |f (t) − g(t)| 12 dt + . t Tπ f (t) − g(t) = −it −∞ {F (x) − G(x)}eitx dx. Since F and G are of bounded variation. = 2M T A − 3 0 1 − cos x dx + π x2 proving the result.2.

148 Therefore.1. −T = −∞ (F (x + α) − G(x + α)) =I Writing 1 1 − cos T x = 2 x 2 we see that ∞ T (T − |t|)eitx dt −T I=2 −∞ (F (x + α) − G(x + α)) 1 − cos T x dx x2 which gives ∞ F (x + α) − G(x + α)} −∞ 1 − cos T x dx x2 ≤ 1 2 T −T T f (t) − g(t) −ita e (T − |t|)dt −it f (t) − g(t) dt t ≤ T /2 −T Therefore by Lemma 7. f (t) − g(t) −itα e = −it = −∞ ∞ (F (x) − G(x))e−itα+itx dx −∞ ∞ F (x + α) − G(x + α)eitx dx. Multiply the left hand side by (T − |t|) and integrating gives T −T f (t) − g(t) −itα e (T − |t|)dt −it T ∞ = −T ∞ −∞ {F (x + α) − G(x + α)}eitx (T − |t|)dxdt T = −∞ ∞ {F (x + α) − G(x + α)} −T T eitx (T − |t|)dtdx eitx (T − |t|)dtdx. TA 2M A 3 0 1 − cos x dx − π x2 ≤ 1 2 T −T f (t) − g(t) dt t . It follows from our assumptions that the right hand side is uniformly bounded in α.

t Tπ S √n > x n e−y −∞ 2 /2 dy. Without loss of generality. Then ρ ≥ 1. TA 3 0 1 − cos x dx − π = 3 x2 ∞ 1 − cos x 1 − cos x dx − 3 dx − π 2 x x2 0 TA ∞ π 1 − cos x =3 −3 dx − π 2 x2 TA ∞ π 6 dx 3π −6 −π = − ≥ 2 2 2 TA TA x ∞ Hence. Clearly they satisfy the hypothesis of Lemma 7.39894 < 2/5. T −T |f (t) − g(t)| dt ≥ 2 2M A t π 6 − 2 TA 24M = 2M πA − T or equivalently. Proof of Theorem 7. 2π x∈R Also clearly G is of bounded variation. σ 2 = 1. We need to show that |Fn (x) − Φ(x)|dx < ∞.1. R .149 However. A≤ which proves the theorem.1 and in fact we may take M = 2/5 since 1 sup |Φ (x)| = √ = . We will apply the above lemmas with F (x) = Fn (x) = P and 1 G(x) = Φ(x) = √ 2π x 1 2M T −T 12 f (t) − g(t) dt + .

2 x x 1 x2 . (7. In particular: for x > 0. x2 |F (x) − Φ(x)| ≤ Therefore.1) For x > 0. We obtain 1 |Fn (x) − Φ(x)| ≤ π ≤ 1 π T −T T −T √ 2 |ϕn (t/ n) − e−t /2 | 24M dt + |t| πT √ 2 |ϕn (t/ n) − e−t /2 | 48 dt + |t| 5πT . ≤ 1 Sn E √ 2 x n 2 Sn (1 − Fn (x)) = P √ > x n < 1 x2 and if N denotes a normal random variable with mean zero and variance 1 we also have (1 − Φ(x)) = P (N > x) ≤ 1 1 E|N |2 = 2 . |F (x) − Φ(x)|dx < ∞. P (|X| > x} ≤ by Chebyshev’s inequality. note that Clearly.1) holds and we have veriﬁed the hypothesis of both lemmas. 1 |Fn (x) − Φ(x)|dx < ∞ −1 and we need to verify that −1 ∞ |Fn (x) − Φ(x)|dx + −∞ 1 1 2 λ2 E|X| . x2 hence for all x = 0 we have 1 . Therefore. Φ(x)) ≤ 1 x2 If x < 0 then 2 =P Sn − √ > −x n ≤ 1 Sn E √ 2 x n = 1 x2 1 .150 To see this last fact. (7. we have max(Fn (x). max ((1 − Fn (x)). (1 − Φ(x))) ≤ Sn Fn (x) = P √ < x n and Φ(x) = P (N < x) ≤ Once again.

151 Assume n is large and take T = √ 4 n 3ρ .6. = 5πT 5π4 n 5π n 5π n Next we claim 2 2 1 2t2 |t|3 1 n √ |ϕ (t/ n) − e−t /2 | ≤ e−t /4 + |t| T 9 18 (7. e−t 2 2 /4 Since 2 9 and 1 18 ∞ −∞ ∞ −∞ e−t 2 /4 2 t dt = 8√ π 9 |t|3 e−t 2 /4 dt = 1 18 1 = 18 1 = 18 1 = 18 = ∞ 2 0 ∞ t3 e−t 2 /4 dt /4 2 0 t2 · te−t ∞ 2 dt 2 + 2 0 ∞ 2t · 2e−t te−t 2 /4 dt 8 0 /4 dt 1 18 ∞ − 16e−t 2 /4 0 16 8 = = . Then 36ρ 48 48 · 3 12 · 3 √ ρ= √ ρ= √ . If this were the case then T πT |Fn (x) − Φ(x)| ≤ T e−t −T 3 2 /4 |t|3 48 2t2 + dt + 9 18 5 = |t| 2t 48 + dt + 9 18 5 −T ∞ ∞ 2 2 2 1 ≤ e−t /4 t2 dt + e−t /4 |t|3 dt + 9. 18 9 .6 9 −∞ 18 −∞ = I + II + 9. T = 4 n/3ρ and n ≥ 10.2) √ for −T ≤ t ≤ T .

2). √ −5t2 2 given that 1 − x ≤ e−x .152 Therefore. Then for n ≥ 10. 9 9 For n ≤ 9. πT |Fn (x) − Φ(x)| ≤ This gives.6 π 9 8√ 8 π + + 9. if |t| ≤ T then t ϕ √ n ρ|t| √ n √ 4 ≤ (4/3) < 2 and t/ n = < 2. |Fn (x) − Φ(x)| ≤ 1 πT 3ρ = √ 4 n 3ρ <√ . Recall that n that ϕ(t) − m=0 E(itX)m m! ≤ E(min |tX| (n+1)! ϕ(t) − 1 + n+1 2|tX|n n! . let z = ϕ(t/ n). w = e−t /2n and γ = e 18n . With T = √ 4 n 3ρ .6 . Now. γ n−1 ≤ e−t 2 /4 and the lemma above gives |z n − wn | ≤ nγ n−1 |z − w| . This gives t2 ρ|t|3 ≤ 2 6 ρ|t|3 . the result is clear since 1 ≤ ρ. n √ 8 (1 + π + 9.6 9 √ 8 (1 + π) + 9. It remains to prove (7. 6 and hence |ϕ(t)| ≤ 1 − t2 /2 + for t2 ≤ 2. Thus 3ρ ρ|t|3 6n3/2 ρ|t| |t|2 √ 6 n n 4 t2 18 n t2 + 2n t2 + =1− 2n t2 ≤1− + 2n 5t2 =1− 18n ≤1− ≤e −5t2 18n .

y ∈ Rd we will write x ≤ y if xi ≤ yi for all i = 1. . . 2 · 4n2 6n ≤ ne−t 2 /4 using the fact that |e−x − (1 − x)| ≤ x2 2 . However. . T 9 18 √ using ρ/ n = 1 1 1 4 1 = √ √ ≤ T . Recall that Rd = {(x1 . page 489. xd ): xi ∈ R}.2) and the proof of the theorem.i. If ϕ ∈ L1 . Suppose F has density f . Sn Is it true that the density of √ tends to the density of the normal? This is not n always true. it is true if add some other conditions. for 0 < x < 1. ρ > 1 and n ≥ 10. d and write x → ∞ if xi → ∞ for all . EXi = 0 and EXi = 1. We get 2 2 √ e−t /4 |t|3 1 ρt2 e−t /4 −t2 /2 √ + ϕ(t/ n) − e ≤ |t| 8n 6 n 2 2 ρt |t|3 = e−t /4 √ + 8n 6 n 2 1 −t2 /4 2t |t|3 ≤ e + .. . For any two vectors x.153 which implies that √ √ −5t2 2 2 |ϕ(t/ n) − e−t /2 | ≤ ne 18n (n−1) ϕ(t/ n) − e−t /2n √ 2 t2 t2 ϕ(t/ n) − 1 + − e−t /2n + 1 − 2n 2n 2 √ 2 2 2 t2 t ≤ ne−t /4 ϕ(t/ n) − 1 + + ne−t /4 1 − − e−t /2n 2n 2n 2 2 ρ|t|3 t4 ≤ ne−t /4 3/2 + ne−t /4 . volume 2. 2π §8. . . We state the theorem without proof. . Limit Theorems in Rd . 4 3T and Let us now take a look at the following question. as shown in Feller. Let Xi be i. .d. Sn 2 Theorem. then √ has a n 1 −x2 /2 density fn which converges uniformly to √ e = η(x).This completed n 3 3 n n the proof of (7.

x2 ) =  2/3. then F (b1 . 0 ≤ x1 < 1    0. x→∞ xi →−∞ lim F (x) = 0. a2 ≤ X2 ≤ b2 ) − F (b1 . b2 < ∞. y↓x (iii) F is right continuous. else If 0 < a1 . F has the following properties: (i) If x ≤ y then F (x) ≤ F (y). .1. a2 ) = 1 − 2/3 − 2/3 + 0 = −1/3. x2 ≥ 1. Let X = (x1 .    2/3. a function satisfying (i) ↔ (ii) may not be the distribution function of a random vector. b2 ) P (a < X1 ≤ b1 . However. That is. a2 ) + F (a1 . Example: we must have: P (X ∈ (a1 . . a2 ). b2 ]) = F (b1 . for all A ∈ B(Rd ). x ≥ 1. lim F (x) = F (x). 0 ≤ x ≤ 1 1 2 F (x1 . . xd ) be a random vector and deﬁned its distribution function by F (x) = P (X ≤ x). Need: measure of each vect. b2 ) − F (a1 . a2 < 1 ≤ b1 . ≥ 0. The distribution measure is given by µ(A) = P (X ∈ A). x1 ≥ 1  1.  x1 . . b1 ] × (a2 . unlike the situation of the real line. Example 8. . a2 ) + F (a1 . b2 ) − F (a1 .154 i. b2 ) − F (b1 . (ii) lim F (x) = 1.

. . y2 . . . . recall that A in the closure of A and Ao is its interior and ∂A = A − Ao is its boundary.1. −∞ f (y)dy1 . . and write Fn ⇒ F . if lim Fn (x) = F (x) for all points of continuity n→∞ of F . We leave the proofs to the reader. x ∈ R is called the marginal distributions of F . 0) = 2/3. Xn ). If F is the distribution function of (X1 . Theorem 8. . µn ⇒ µ. Theorem (Skorohod) 8. We also see that Fi (x) = lim F (m. Then there exists a sequence of random vectors Yn and a random vector Y with Yn ∼ Xn and Y ∼ X such that Yn → Y a. . 1) = µ(1. yn )dy1 .. 1) = −1/3 which is a signed measure (not a probability measure)..2. The following statements are equivalent to Xn ⇒ X. xi . m) m→∞ As in the real line. . If Fn and F are distribution functions in Rd . m. As in the real line. xd ) = −∞ . Deﬁnition 8.1. dyn = 1 and x1 xd F (x1 . (i) Ef (Xn ) → E(f (X)) for all bounded continuous functions f . Suppose Xn ⇒ X. dy2 . The following two results are exactly as in the real case. . x2 .155 Hence the measure has µ(0. . . µ(1. . . . . . we also write Xn ⇒ X. F has a density if there is a nonnegative function f with f (y)dy = Rn R ··· R f (y1 . . then Fi (x) = P (Xi ≤ x).e. . . . . we say Fn converges weakly to F . . . . As before.

we say that a sequence of measurers µn is tight if given ε ≥ 0 exists an Mε > 0 such that inf µn ([−Mε . K)). . assume F is continuous at x = (x1 . and set A = (−∞. n→∞ (vi) Let f : Rd → R be bounded and measurable. . x] = (−∞. (i) ⇒ (ii) trivial. . As in the real case. (v) For all Borel sets A with (P (X ∈ ∂A) = 0. So. limP (Xn ∈ K) ≤ P (X ∈ K). Fn (x) = F (xn ∈ A) → P (X∞ ∈ A) = F (x). (i) ⇒ (ii). lim P (Xn ∈ A) = P (X∞ ∈ A). x1 ] × . . That (iii) ⇒ (iv) follows by taking complements. The functions fj are continuous and bounded by 1 and fj (x) ↓ IK (x). K) = inf{|x − y|: y ∈ K}. (iv) For all open sets G. Let d(x. If P (X ∈ Df ) = 0. . then E(f (Xn )) → E(f (X∞ )). Let Df be the discontinuity points of f . (−∞. Mε ]d ) ≥ 1 − ε. For (v) implies convergence in distribution. xd ]. n . . lim sup µn (K) ≤ lim E(fj (Xn )) n→∞ n→∞ = E(fj (X)) and this last quantity ↓ P (X ∈ K) as j ↑ ∞. xd ).156 (iii) For all closed sets K. Xn ⇒ X∞ ⇒ (i) trivial. Proof. since K is closed. Therefore. limP (Xn ∈ G) ≥ P (X ∈ G). Set  1  ϕj (t) = 1 − jt   0 t≤0 0 ≤ t ≤ j −1 −1 ≤t and let fj (x) = ϕj (dist (x. We have µ(∂A) = 0.

T ] ψj (tj )eitj Xj dtj dµ(x) → Rd j=1 π(1(aj .T ]d j=1 d Rd eit·x dµ(x) = [−T.T ]d j=1 d ψj (tj )eit·X dtdµ(x) ψj (tj )eitj Xj dtdµ(x) Rd [−T. The characteristic function of the random vector X = (X1 . .6 above holds also in the setting of Rd .T ]d j=1 ψj (s) = for s ∈ R. . + td Xd . Applying Fubini’s Theorem we have d ψj (tj ) [−T. . . Then µ(A) = lim 1 T →∞ (2π)d T T . . −T −T d ψ1 (t1 )ϕ(t) .bj ] (xj ) dµ(x) and this proves the result.. dt2 ψj (tj )ϕ(t)dt. × [ad . . . .T ]d j=1 ψj (tj ) Rd d eit1 ·x1 +. . eisaj − e−sbj . .T ]d d j=1 = = Rd j=1 d [−T.157 We remark here that Theorem 1.bj ) (xj ) + 1[aj . Xd ) is deﬁned as ϕ(t) = E(eit·X ) where t · X = t1 X1 + . ψd (td )ϕ(t)dt1 . b1 ] × .. is Proof.3 (The inversion formula in Rd ). . 1 = lim T →∞ (2π)d where [−T. Theorem 8.. Let A = [a1 .. . . bd ] with µ(∂A) = 0.+itd ·Xd dµ(x)dt = Rd [−T. . .

M ]d ) ≥ 1 − ε and the result follows. Then P (Xn ∈ [M. 1). Then Xn ⇒ X if and only if ϕn (t) → ϕ(t). then ϕ(t) is the characteristic function of a random vector X and Xn ⇒ X. Also. n Now take M = max1≤j≤d Mj . Xd ) be independent Xi ∼ N (0.4. respectively. As before. it follows from the above argument that If θ · Xn ⇒ θ · X for all θ ∈ Rd then Xn ⇒ X. . Let f (x) = eitx . Xn ⇒ X implies ϕn (x) = E(f (Xn )) → ϕ(t). Remark. e Next let X = (X1 . Proof. Let Xn = θ · Xn . Then for ˜ ∀ s ∈ R ϕn (sθ) → ϕ(sθ). Then X has density 1 −|x|2 /2 d/2 e (2π) where |x|2 = d i=1 |xi |2 . . This is bounded and continuous. Thus the random variables ej · Xn are tight. Mi ]) ≥ 1 − ε. ˜ ˜ ˜ Therefore the distribution of Xn is tight by what we did earlier. Let Xn and X be random vectors with characteristic functions ϕn and ϕ. . This is called the standard normal distribution in Rd and its characteristic function is d ϕX (t) = E j=1 eitj Xj = e−|t| 2 /2 . This is often called the Cram´r–Wold devise. Fix θ ∈ Rd . . Let ε > 0. As before.158 Theorem (Continuity Theorem) 8. if ϕn (t) → ϕ(t) and ϕ is continuous at 0. one direction is trivial. There exists a constant positive constant Mi such that lim P (ej · Xi ∈ [Mi . Then ϕX (s) = ϕX (θs) and ˜ ϕXn (s) → ϕX (s). For the other direction we need to show tightness. .

the random vector Y = AX has a multivariate normal distribution with covariance matrix Γ. AT t ij = |AT t|2 ≥ 0. Thus Γ = (Γij ) = AAT . ΓT = Γ. Conversely. t = AT t. Γij ti tj = Γt. and set Y = AX where X is standard normal. That is. ϕY (t) = E(eit·AX ) = E(eiA = e− − 2 T t·X ) |AT t|2 P ij Γij ti tj =e .159 Let A = (aij ) be a d × d matrix. Also the quadratic form of Γ is positive semideﬁnite. and the matrix γ is symmetric. Then there exists an orthogonal matrix O such that OT ΓO = D . So. The covariance matrix of this new random vector is Γij = E(Yi Yj ) d d =E l=1 d d ail Xl · m=1 ajm Xm = l=1 m=1 d ail ajm E(Xl Xm ) = l=1 ail ajl . let Γ be a symmetric and nonnegative deﬁnite d × d matrix.

Let X1 .d. if we let Y = AX.j − µj )(X1. . EXi = 0 and E|Xi |3 = ρ < ∞. Then if Sn Fn is the distribution of √ and Φ(x) is the normal we have σ n sup |Fn (x) − Φ(x)| ≤ x∈R σ cρ √ 3 n (may take c = 3). EXn = µ and covariance matrix Γij = E(X1. Let D0 = √ D and A = OD0 .i.d.160 where D is diagonal.i. So. be i. with Sn = j=1 (t · Xj ) we have − P ij ϕSn (1) = E(e ˜ This is equivalent to ϕSn (t) = E(e ˜ iS n )→e Γij ti tj /2 it·Sn − P ij )→e Γij ti tj /2 . By setting Xn = Xn −µ we may assume µ = 0.i − µi )). . Let t ∈ Rd . X normal. Then Xn = t·Xn ˜ are i. . If Γ is non–singular. . Theorem 8. . then Y is multivariate normal with covariance matrix Γ. Theorem.i. If Sn = X1 + . T Then AAT = OD0 (D0 OT ) = ODOT = Γ. random vectors. random variables with E(Xn ) = 0 and d 2 ˜ E|Xn |2 = E i=1 n ti (Xn )i = ij ti tj Γij .d. so is A and Y has a density. X2 . E|Xi |2 = σ 2 . ˜ So. Let Xi be i. . + Xn then Sn − nµ √ ⇒χ n where χ is a multivariate normal with covariance matrix Γ = (Γij ).5. ˜ Proof.

Lemma 1. Put a = b − ∆ < b. .f. either   F (b) − G(b) = 2M ∆  or   F (b−) − G(b) = −2M ∆. x→+∞ lim G(x) = 1.t.. . is > 0 so that ∆ > 0. Assume L. G a real–valued function with the following conditions: (i) x→−∞ lim G(x) = 0. .   2M ∆  .161 ?? is actually true: H1 (x) H2 (x) H3 (x) Fn (x) = φ(x) + √ + . Since F − G = 0 at ±∞.t. ∆ < ∞ since G is bounded. ∀ T > 0 T∆ 2M T ∆ 3 0 ∞ 1 − cos x dx − π x2 ≤ −∞ 1 − cos T x {F (x + a) − G(x + a)}dx . . + 3/2 + . Set A = x∈R There is a number a s. Let F be a d. ∃ sequence xn → b ∈ R s. since ∆ = (b − a) −b<−a a<b . . 2M x∈R (ii) G has bounded derivative with sup |G (x)| ≤ M . Assume F (b−) − G(b) = −2M ∆.H. + n n n where Hi (x) are explicit functions involving Hermid polynomials. 1 sup |F (x) − G(x)|. F (xn ) − G(xn ) → or   −2M ∆ So. x2 b<−a a>0 Proof.S.

x2 (2) . So that F (x + a) − G(x + a) ≤ F (b−) − [G(b) + (x − ∆)M ] = −2M ∆ − xM + ∆M = −M (x + ∆) ∀ x ∈ [−∆.162 if |x| < ∆ we have G(b) − G(x + a) = G (ξ)(b − a − x) = G (ξ)(∆ − x) |G (ξ)| ≤ M or G(x + a) = G(b) + (x − ∆)G (ξ) ≥ G(b) + (x − ∆)M. ∆] ∞ ∆ ∴ T to be chosen: we will consider −∞ = −∆ +rest ∆ −∆ 1 − cos T x {F (x + a) − G(x + a)}dx x2 ∆ −∆ ≤ −M = −2M ∆ 1 − cos T x (x + ∆)dx x2 ∆ 0 (1) 1 − cos T x x2 dx −∆ ∞ + −∞ ∆ {F (x + a) − G(x + a)}dx −∆ ∞ ≤ 2M ∆ −∞ + ∆ 1 + cos T x dx = 4M ∆ x2 ∞ ∆ 1 − cos T x dx.

Suppose in addition that (iii) G is of bounded variation in (−∞. Then 1 ∆≤ 2πM Proof. ∞ (iv) −∞ |F (x) − G(x)|dx < ∞. g(t) = −∞ −∞ T −T eitx dG(x). = 2M T ∆ − 3 0 1 − cos x dx + π x2 Lemma 2. (Assume G has a density). t πT ∞ {F (x) − G(x)}eitx dx and ∴ f (t) − g(t) −ita e = −it = −∞ ∞ (F (x) − G(x))e−ita+itx dx −∞ ∞ F (x + a) − G(x + a)e+itx dx. f (t) − g(t) = −it −∞ |f (t) − g(t)| 12 dt + .163 Add: ∞ −∞ 1 − cos T x {F (x + a) − G(x + a)}dx x2 ∆ ∞ ≤ 2M ∆ − 0 +2 ∆ ∆ ∞ 1 − cos T x dx x2 1 − cos T x dx x2 ∞ 0 = 2M ∆ − 3 0 ∆ +2 0 = 2M ∆ − 3 0 ∆ 1 − cos T x dx + 2 x2 1 − cos T x dx + 2 x2 T∆ 1 − cos T x dx x2 = 2M ∆ − 3 0 πT 2 < 0. ∞). . Let f (t) = ∞ ∞ eitx dF (x).

R.H. 3 ∞ 1 − cos x 1 − cos x dx − π = 3 dx − 3 x2 x2 0 0 ∞ π 1 − cos x =3 −3 dx − π 2 x2 T∆ ∞ 3π dx π 6 ≥ −6 −π = − 2 2 2 T∆ T∆ x T∆ ∞ T∆ 1 − cos x dx − π x2 .S.164 ∴ By (iv). Multiply left hand side by (T −|t|) and integrade T −T f (t) − g(t) ita e (T − |t|)dt −it T ∞ = −T ∞ −∞ {F (x + a) − G(x + a)}eitx (T − |t|)dxdt T = −∞ ∞ {F (x + a) − G(x + a)} −T T eitx (T − |t|)dtdx eitx (T − |t|)dtdx −T = −∞ (F (x + a) − G(x + a)) Now.H. 1 1 − cos T x = 2 x 2 above ∞ T (T − |t|)eitx dt −T = 2 or ≤ 1 − cos T x dx x2 −∞ ∞ 1 − cos T x F (x + a) − G(x + a)} dx x2 −∞ (F (x + a) − G(x + a)) 1 2 T −T T f (t) − g(t) −ita e (T − |t|)dt −it f (t) − g(t) dt t ≤ T /2 −T ∴ Lemma 1 ⇒ T∆ 2M ∆ 3 0 1 − cos x dx − π x2 T ≤ −T |f (t) − g(t)| dt t or now. is also. is bounded and L.S.

2 1 1 sup |Φ (x)| = √ e−x /2 = √ = .i. ρ ≥ 1. P (|X| > x) = Sn 1 − Fn (x) = P √ > x n Sn 1 1 ≤ E √ < 2 |X| |X|2 n 1 1 1 − Φ(x) = P (N > x) ≤ 2 E|N |2 = . S √n > x . G(x) = φ(x) = P (N > x) = √ 2π −∞ F (x) = Fn (x) = P Clearly Fn and Φ satisfy (i).39894 < 2/5 = M. Clearly. |Fn (x) − Φ(x)|dx + −∞ 1 |F (x) − Φ(x)|dx < ∞.d. For x > 0.f. 2 Assume w log σ 2 = 1. x |x|2 . satisfying certain conditions by the average diﬀerence of. Need: −1 ∞ 1 −1 |Fn (x) − φ(x)|dx < ∞. Xi i. 1 2 λ2 F |x| . Now we apply this to our functions: Assume w log σ 2 = 1. x∈ 2π 2π (iii) Satisfy: (iv) R |Fn (x) − Φ(x)|dx < ∞.165 or T −T |f (t) − g(t)| dt ≥ 2 2M ∆ t π 6 − 2 T∆ 24M = 2M π∆ − T |f (t) − g(t)| 12 dt + |t| π or 1 ∆≤ 2M T T −T We have now bound the diﬀerence between the d. n x 2 1 e−y /2 dy.

Claim: 2 1 n √ |ϕ (t/ n) − e−t /2 | ≤ |t| 1 −t2 /4 2t2 |t|3 −T ≤ t ≤ T √ ≤ e + . Take T = √ 4 n 3ρ : Then 48 48 · 3 12 · 3 36ρ √ ρ= √ ρ= √ . Φ(x)) ≤ |F (x) − Φ(x)| ≤ {|F (x) − φ(x)| = 1 x2 1 if x < 0 x2 1 . x2 = |1 − φ(x) − (1 − F (x))| ≤ x>0 ∴ (iv) hold.166 In particular: for x > 0. Then Sn Fn (x) = P √ < x n Φ(x) = P (N < x) ≤ x<0 1 |x|2 . max((1 − Fn (x)). tells us what we must do. 1 |Fn (x) − Φ(x)| ≤ π ≤ 1 π T −T T −T √ 2 |ϕn (t/ n) − e−t /2 | 24M dt + |t| πT √ 2 48 |ϕn (t/ n) − e−t /2 dt + |t| 5πT √ n. = 5πT 5π4 n 5π n 5π n For second. =P 1 x2 Sn − √ > −x n ≤ Sn 1 E √ x2 n 2 = 1 x2 ∴ max(Fn (x). max 1 − Φ(x)) ≤ If x < 0. T 9 18 T = 4 n/3ρ n ≥ 10 . Take T = multiple of Assume n ≥ 10. .

6 ≤ 9 −∞ 18 −∞ = I + II + 9.6 Recall: √ 1 2πσ 2 e−t 2 /2σ 2 2 t dt = σ 2 . Take σ 2 = 2 2 9 ∞ −∞ e−t 2 /2σ 2 2 t dt = 2√ 2 · 2 · 2√ 2π · 2 · 2 = π 9 9 8√ = π 9 1 18 ∞ −∞ |t|3 e−t 2 /2σ 2 dt ∞ ∞ 2 1 1 3 −t2 /4 2 t e dt = 2 t2 · te−t /4 dt = 18 18 0 0 ∞ 2 1 = + 2 2t · 2e−t /4 dt 18 0 ∞ ∞ 1 1 −t2 /4 −t2 /4 = 8 te dt = − 16e 18 18 0 0 16 8 = = 18 9 8√ 8 πT |Fn (x) − Φ(x)| ≤ π + + 9.6. 9 9 or |Fn (x) − Φ(x)| ≤ 1 πT 3ρ ≤ √ 4 n ρ =√ n √ 8 (1 + π + 9.6 9 √ 8 (1 + π) + 9. 2 T 9.167 T ∴ πT |Fn (x) − Φ(x)| ≤ −T e−t 2 /4 2t2 |t|3 + dt 9 18 48 + − 5 |t|3 48 2t2 = e−t /4 + dt + 9 18 5 −T ∞ ∞ 2 2 1 2 e−t /4 t2 dt + e−t /4 |t|3 dt + 9.6 .6 π 9 3ρ <√ . n .

gives that 1 − x ≤ e−x . |αn − β n | ≤ nγ n−1 |α − β| ⇔ √ √ −5t2 2 2 |ϕ(t/ n) − e−t /2 | ≤ ne 18n (n−1) |ϕ(t/ n) − e−t /2n | 2 2 t2 ≤ ne |ϕ(t/ n) − 1 + − e−t /2n−1−t /2n | 2n √ 2 2 2 t2 t2 | + ne−t /4 |1 − − e−t /2n | ≤ ne−t /4 |ϕ(t/ n) − 1 + 2n 2n 3 4 2 2 ρ|t| t ≤ ne−t /4 3/2 + ne−t /4 2 · 4n2 6n x2 for 0 < x < 1 using |e−x − (1 − x)| ≤ 2 or 2 2 √ 1 ρt2 e−t /4 e−t /4 |t|3 −t2 /2 √ |≤ + |ϕ(t/ n) − e |t| 8n 6 n 2 2 ρt |t|3 = e−t /4 √ + 8n 6 n 2 1 −t2 /4 2t |t|3 ≤ e + T 9 18 −t2 /4 √ . (2) |ϕ(t)| ≤ 1 − t2 /2 + 6 √ 4 n 16 ρ|t| So. for t2 ≤ 2. √ −5t2 2 2 Now. 3ρ t t2 ρ|t|3 t2 ρ|t| |t|2 ϕ √ ≤1− + 3/2 = 1 − +√ 2n 6n 2n n n n 4 t2 5t2 t2 + =1− ≤1− 2n 3 n 18n ≤e −5t2 18n . t/ n = < 2. Recall: σ 2 = 1. m! (n + 1)! n! m=0 n (1) ϕ(t) − 1 + ρ|t|3 . let α = ϕ(t/ n). 3ρ 9 n √ 4 ⇒ Also. So. Proof of claim. 1 < ρ since σ 2 = 1.168 For n ≤ 9. β = e−t /2n and γ = e 18n . |ϕ(t) − ρ|t|3 t2 ≤ and 2 6 E(itX)m |t|X|nt 2|tX|n ≤ E(min . the result follows. Then n ≥ 10 ⇒ γ n−1 ≤ e−t /4 . if T = ≤ (4/3) = if |t| ≤ L ⇒ √ < 2.

X: ω → Rd . . . i = 1. F has the following properties: 1 2 eitx ϕn (t)dt R eity e− 2 t dt R ∞ −∞ 1 2 √ 2 |ϕ(t/ n)n − e−1/2t |dt . 2π Proof. ρ > 1 and n ≥ 10. EXi = 0. 1 2π 1 n(x) = 2π 1 ∴ |fn (x) − η(x)| ≤ 2π fn (x) = under the assumption |ϕ(t)| ≤ e− 4 t for |t| < δ. . If X = (x1 . i.e. .D. both sides are 0. Both have ?? derivatives of social derivative of ϕ(t) at 0 is −1 smaller than the second derivative of r. p. 3 n 3 3 n n Q. . Limit Theorems in Rd Recall: Rd = {(x1 . it is true if more conditions. However. F Xi = 1.i.E.169 √ 1 1 1 4 1 using ρ/ n = 4 T and = √ √ ≤ T · . Sn Theorem.v. where X ≤ x ⇔ Xi ≤ xi .h. xd ): xi ∈ R}. then √ has a density fn which converges uniformly to n 1 −x2 /2 √ e = η(x). 2. (Feller v. At 0. . We deﬁned its distribution function by F (x) = P (X ≤ x).s. . . 489). Sn Question: Suppose F has density f . a r.d. d. If ϕ ∈ L1 . 2 Let Xi be i. . Xd ) is a random vector. . Is it true that the density of √ tends to the n density of the normal? This is not always true. . .

If you have a function satisfying (i) ↔ (ii).e. B(Rd )): µ(A) = P (X ∈ A). 0 ≤ x2 ≤ 1 . y↓x (iii) F is right cont. a2 < 1 ≤ b1 . There is also the distribution measure on (Rd . b2 < ∞ ⇒ F (b1 . a2 ). i. . this may not induce a measure. x→∞ xi →−∞ lim F (x) = 0. b2 ]) = F (b1 . b2 ) P (a < X1 ≤ b1 . (ii) lim F (x) = 1. b2 ) − F (a1 . x2 ) = 2/3 x1 ≥ 1. Example: we must have: P (X ∈ (a1 . b1 ] × (a2 . You know what Xi → −∞. a2 ≤ X2 ≤ b2 ) − F (b1 . a2 ) + F (a1 . a2 ) + F (a1 . Need: measure of each vect. 0 ≤ x1 < 1    else If 0 < a1 . f (x1 . lim F (x) = F (x).  x1 .170 (i) x ≤ y ⇒ F (x) ≤ F (y). a2 ) = 1 − 2/3 − 2/3 + 0 = −1/3. x1 ≥ 1 1    2/3   Example. ≥ 0. Xp → ∞ we mean each coordinate goes to zero. b2 ) − F (a1 .   0 x2 ≥ 1. b2 ) − F (b1 .

(iv) ∀ open sets G. m) m→∞ F has a density if ∃ f ≥ 0 with Rd 0 = f and x1 xd F (x1 .171 The measure has: µ(0. . x real in the marginal distributions of F Fi (x) = lim F (m. −∞ f (y)dy1 . µ(1. (v) ∀ continuity A. Xn ∼ Xn . n→∞ lim P (Xn ∈ A) = P (X∞ ∈ A). then Fi (x) = P (Xi ≤ x). (iii) ∀ closed sets k. if lim Fn (x) = F (x) for all pts of continuity of F . . . . Theorem 2. f . Xn ). A Borel set A is a µ–continuity set if µ(∂A) = 0. m. Xn ⇒ X∞ ⇒ ∃ r. Yn → Y a. ∂A = A − Ao . . limP (Xn ∈ G) ≥ P (X∞ ∈ G). . .v. . 1) = −1/3 for each. .. .. Def: If F Fn is a distribution function in Rd . n→∞ Xn ⇒ X.t. Y ∼ X∞ s. dy2 . we say Fn converges weakly to F . The following statements are equivalent to Xn ⇒ X∞ . (P (X ∈ ∂A). .e. µn ⇒ µ. 1) = µ(1. x2 . limP (Xn ∈ k) ≤ P (X∞ ∈ k). need measure of ?? Other simple ?? Recall: If F is the dist. . Recall: A = set of limits of sequences in A. 0) = 2/3. Ao = Rd \(Rd |A) interior. . xd ) = −∞ . if Fn ⇒ F . . of (x1 . (i) Ef (Xn ) → E(f (X∞ ))∀ bounded cont. xi+1 . closure of A. Theorem 1 Skorho.

Let d(x.. .. Xd ) be a r. So.f. Proof. ϕ(t) = E(eit·X ) is the Ch. fj is cont. Let X = (X1 . . x = (x1 . . xd ]. k)). and bounded by 1 and fj (x) ↓ Ik (x) since k is closed. ∴ lim sup µn (k) = lim E(fj (Xn )) n→∞ n→∞ = E(fj (X∞ )) ↓ P (X∞ ∈ k). at X. . µn is tight if ∃ given ε ≥ 0 s.D. . b1 ] × . . (i) ⇒ (ii). (i) ⇒ (ii) trivial. Q. bd ] with µ(∂A) = 0. M ]d ) ≥ 1 − ε. Ch. Then µ(A) = lim 1 T →∞ (2π)d T T .172 (vi) Let Dp = discontinuity sets of f . + td X d . k) = inf{d(x− y): y ∈ k}. t · X = t1 X 1 + .E. (−∞. . then with A = (−∞. dt2 . . in dis. Xn ⇒ X∞ ⇒ (i) trivial. As in 1–dim. (v) ⇒ implies conv. Q.t. .D. −T −T ψ1 (t1 )ϕ(t) . inf µn ([−M. µnj −→ µ. . . If F is cont. × [ad . . If µn is tight ⇒ ∃µnj s. n weak Theorem. . Fn (x) = F (xn ∈ A) → P (X∞ ∈ A) = F (x). . . f bounded. f. (iii) ⇒ (iv): A open iﬀ Ac closed and P (X ∈ A) + P (X ∈ Ac ) = 1. .  1  ϕj (t) = 1 − jt   0 t≤0 0 ≤ t ≤ j −1 −1 ≤t Let fj (x) = ϕj (dist (x. . ψd (td )ϕ(t)dt1 .E.t. xn ) we have µ(∂A) = 0. . . vector. Inversion formula: Let A = [a1 . x1 ] × . . If P (X∞ ∈ Dp ) = 0 ⇒ E(f (Xn )) → E(f (X∞ )).

T ]d j=1 d ψj (tj )eit·X dtdµ(x) ψj (tj )eitjXj dtdµ(x) = Rd [−T. 1 ≤ n ≤ ∞ be random vectors with Ch. Let Xn .. Continuity Theorem. f (x) = eitx is bounded and cont..+itd ·Xd dµ(x)dt = Rd [−T.T ]d j=1 Proof.T ]d j=1 ψj (tj ) Rd d eit1 ·x1 +. Next.T ] ψj (tj )eitjXj dtj dµ(x) → Rd j=1 π(1(aj . . Proof. Then d ψj (tj ) [−T. Xn ⇒ x implies ϕn (x) = E(f (Xn )) → ϕ∞ (t).173 where ψj (s) = eisaj − e−sbj is 1 T →∞ (2π)d d = lim ψj (tj )ϕ(t)dt [−T. f. b1 ] × .’s ϕn . Apply Fubini’s Theorem: A = [a1 . .T ]d d j=1 = Rd j=1 d [−T. . that sequence is tight. × [ad . we show as earlier.T ]d j=1 d Rd eit·x dµ(x) = [−T. Then Xn ⇒ X∞ ⇔ ϕn (t) → ϕ∞ (t). bd ] with µ(∂A) = 0.bj ) (xj ) + 1[aj .bj ] (xj ) dµ(x) results = µ(A). to.

n lim P (Xi ∈ [Mi . 1). . So. given ε > 0. .D. e Next let X = (X1 .174 ˜ Fix O ∈ Rd .E. ej · Xn is tight. As before.D. Thus. then ϕ∞ (t) is the Ch. of a r.E. Implies: Xn ⇒ X This is called Cram´r–Wold device. Then for ∀ s ∈ R ϕn (sO) → ϕ∞ (sO). if ϕn (t) → ϕa (t) and ϕ∞ is cont. Proof the condition implies E(eiO·Xn ) → E(eiO·Xn )∀ O ∈ Rd · ϕn (O) → ϕ(O) . Xd ) be independent Xi ∼ N (0.t. X∞ and Xn ⇒ X∞ . π[Mi . . Mi ] take M = largest of all. Mi ]) > 1 − ε. .vec. ˜ ˜ ˜ ∴ The dist. n Now. Q. M ]) ≥ 1 − ε. Last time: (Continuity Theorem): Xn ⇒ X∞ iﬀ ϕn (t) → ϕ∞ (t). Then ˜ ϕXn (s) → ϕX (s). for each vector. We showed: O · Xn ⇒ O · X∞ ∀ O ∈ Rd . P (Xn ∈ [M. We also showed: Cram´r–Wold device e If O · Xn ⇒ O · X∞ ∀ O ∈ Rd ⇒ Xn ⇒ X. 2π .f. Let Xn = O · Xn : Then ϕX (s) = ϕX (Os). Q. ∃Mi s. Remark. at 0. Then Xi has |x|2 1 density √ e− 2π . of Xn is tight.

Bx. ΓT = Γ.f. B + x . . . The Ch. Also. Let A = (aij ) be a d × d matrix. . d 2 ϕ(t) = E j=1 eitj Xj = e−|t| /2 . So. |x| = i=1 2 |xi |2 . xd ). At t ij = |At t| ≥ 0. X normal d Yj = l=1 ajl Xl Let Γij = E(Yi Yj ) d d =E l=1 d d ail Xl · m=1 ajm Xm = l=1 m=1 d ail ajm E(Xl Xm ) = l=1 ail ajl Γ = (Γij ) = AAT . Γ is symmetric. t = At t. Γij ti tj = Γt. Recall: For any matrix. . . d/2 (2π) d x = (x1 . Let Y = AX. x = x.175 ∴ X has density 2 1 e−|x| /2 . This is called the standard normal.

j − µj )(X1. . Let D0 = √ D and A = OD0 . be i. If Sn = X1 + . OT ΓO = D − D diagonal. Xn then Sn − nµ √ ⇒χ n where χ is a multivariate normal with covariance matrix Γ = (Γij ). Letting Xn = Xn − µ we may assume µ = 0.176 So.i. Then ∃ O orthogonal s. . X normal. Conversely: Let Γ be a symmetric and nonnegative deﬁnite d × d matrix. random vectors. So. Then Xn = t · Xn ˜ are i. So.t. . if we let Y = AX. X2 . Theorem. the random vector Y = AX has a multivariate normal distribution with covariance matrix Γ. then Y is multivariate normal with covariance matrix Γ. Γ is nonnegative deﬁnite. . E(eit·AX ) = E(eitA = e− − ‡ t·X ) |AT t|2 2 P ij Γij ti tj =e . EXn = µ and covariance matrix Γij = E(X1.i − µi )). . random variables with E(Xn ) = 0 and d 2 ˜ E|Xn |2 = E i=1 ti Xni = ij ti tj Γij . Let t ∈ Rd . T Then AAT = OD0 (D0 OT ) = ODOT = Γ. . Let X1 .d. ˜ Proof. so is A and Y has a density.d.i. If Γ is non–singular.

with Sn = j=1 (t · Xj ) we have − P ij ϕSn (1) = E(e ˜ or ˜ iS n )→e Γij ti tj /2 ϕSn (t) = E(eit·Sn ) → Q.177 n ˜ So.D. .E.

. #4)(i) As in the proof of the reﬂection property. 1. s < t. we apply the strong Markov property to get Ex (Yτ1 ◦ θτ |Fτ ) = Ex (Yτ2 ◦ θτ |Fτ ) why . #3a) Use the “exponential” martingale and . #1b) Approximate the integral by 1 n 1 n k=1 Bk−1 Bk − n n 2 1 and . Ideas for some of the problems in ﬁnal homework assignment. Fall 1996. 2a − v < ω(t − s) < 2a − u 0 else.178 Math/Stat 539. . #2c) Show φ(2λ) ≤ cφ(λ) for some constant c. 1 E 0 Bt dt =E 0 1 1 0 Bs Bt dsdt E(Bs Bt )dtds 0 1 s 1 =2 =2 0 2 sdtds = s 1 . Let Ys1 (ω) = and Ys2 (ω) = 1. p (ec/ε2 − 2p ) ε ∞ #2b) Use (a) and sum the series for the exponential. 3 #2a) Use the estimate Cp = 2p C1 ec/ε √ from class and choose ε ∼ c/ p. 0 s < t and u < ω(t − s) < v else Then Ex (Ys1 ) = Ex (Ys2 ) (why?) and with τ = (inf{s: Bs = a}) ∧ t. .. Use formula E(φ(X)) = 0 φ (λ)P {X ≥ λ}dλ and apply good–λ inequalities. #3b) Take b = 0 in #3a).

gives the result. and . page 402.. v) ↓ x to get P0 {Mt > a. #7)(i) E(Xn+1 |Fn ) = eθSn −(n+1)ψ(θ) E(eθξn+1 |Fn ) = eθSn −nψ(θ) (ξn+1 independent of Fn ).179 and . #5a) Follow Durrett. This implies Xn → 0 in probability. Bt = x} = P0 {Bt = 2a − x} = √ and diﬀerentiate with respect to a.. #4)(i) Let the interval (u. and apply the Markov property at the end. . (ii) Show ψ (θ) = ϕ (θ)/ϕ(θ) and ϕ (θ) ϕ(θ) ϕ (θ) − = ϕ(θ) ϕ (θ) ϕ(θ) 2 (2a−x)2 1 e− 2t 2πt = E(Yθ2 ) − (E(Yθ ))2 > 0 where Yθ has distribution (iii) eθx (distribution of ξ1 ). (Why is true?) ϕ(θ) θ Xn = e 2 θ Sn − n ψ(θ) 2 θ 1 θ/2 = Xn en{ψ( 2 )− 2 ψ(θ)} Strict convexity. imply that E θ Xn = en{ψ( 2 )− 2 ψ(θ)} → 0 θ 1 θ as n → ∞... ψ(0) = 0.

Thus a set is null iﬀ every measurable subset has measure zero. 1]. . (Negative sets): A set A ∈ F is negative if for every measurable subset E ⊂ A. µ = µ1 + µ2 is a measure. 476 Signed measures: If µ1 and µ2 are two measures. measures. ν(E) ≤ 0. ∞ ∞ (ii) ν j=1 Ej = j=1 ν(Ej ). By a signed measure on a measurable space (Ω. f ∈ L1 [0. (a) The Radon–Nikodym Theorem. By (iii) we mean that ∞ the series is absolutely convergent if ν j=1 ∞ Ej is ﬁnite and properly divergent if ν j=1 Ej is inﬁnite or − inﬁnite. Example.180 Chapter 7 1) Conditional Expectation. we could add them. (Null): A set which is both positive and negative is a null set. i. Ej ’s are disjoint in F. particularly prob. then (if f ≥ 0. But what about µ1 − µ2 ? Deﬁnition 1.1. Durrett p.e. F) we mean an extended real valued function ν deﬁned on F such that (i) ν assumes at most one of the values +∞ or −∞. get a measure) ν(E) = E f dx (Positive sets): A set A ∈ F is a positive set if ν(E) ≥ 0 for every mble subset E ⊂ A.

. ∞ (ii) If A1 . . If E is positive we are done.1. Let E be measurable with 0 < ν(E) < ∞. : Let A = n=1 Ai . This is called the Hahn–Decomposition. Null sets are not the same as sets of measure zero. ∴ x ∈ Ei if j > i.181 Remark. ∩ Ac ⊂ Aj 1 j−1 ⇒ ν(Ej ) ≥ 0 ν(F ) = Σν(Ej ) ≥ 0 (*) We show (∗): Let x ∈ Ej ⇒ x ∈ E and x ∈ Ej but x ∈ Aj−1 . consider E|E1 ⊂ E. Then x ∈ Ej . A2 . Take ν given above. Proof. Example. . A positive such that 0 < ν(A). let j = ﬁrst j such that x ∈ Aj . (i): Trivial. . are positive then A = i=1 Ai is positive. Write  ∞ E= Ei . Our goal now is to prove that the space Ω can be written as the disjoint union of a positive set and a negative set. . Then there is mble set A ⊂ E. Ei ∩ Ej = 0. Ai positive. Let E ⊂ A be mble. Lemma 1. If x ∈ E. . i = j n=1  Ej = E ∩ Aj ∩ Ac ∩ . ∞ Proof of (ii). .2. . Lemma 1. A1 . (Such a j exists become E ⊂ A). done. (i) Every mble subset of a positive set is positive. Proof. . Now. Let n1 = smallest positive number such that there is an E1 ⊂ E with ν(E1 ) ≤ −1/n1 .

If not n2 = smallest integer such that 1) ∃E2 ⊂ E|E1 with ν(E2 ) < 1/n2 . Why? ∞ E =A∪ k=1 Ek are disjoint.182 Again. if E|E1 is positive with ν(E|E1 ) > 0 we are done. Now. ∞ ν(E) = ν(A) + k=1 ν(Ek ) ⇒ ν(A) > 0 since negative. Problem 1: Prove that A is also positive. ∞ 0 < ν(E) < ∞ ⇒ k=1 ν(Ek ) converges. Claim: A will do. Continue: Let nk = smallest positive integer such that k−1 ∃Ek ⊂ E j=1 Ej with ν(Ek ) < − Let A = E| k=1 1 . . nk ∞ Ek . First: ν(A) > 0.

Assume ν does not take +∞. λ = lim ν(An ).t. φ ∈ Positive sets.183 Absolutely. . This will do it. Let B = Ac . ∴ 0 ≤ ν(A). We show ν(E) = 0. F). Suppose A is not positive. Since A\An ⊂ A ⇒ ν(A|An ) ≥ 0 and ν(A) = ν(An ) + ν(A|An ) ≥ ν(An ). Also. Proof. Thus. Let λ = sup{ν(A): A ∈ positive sets}. Let E ⊂ B and E positive. Then A has a subset A0 with A0 ) < −ε for some ε > 0. Let ν be a signed measure on (Ω. n→∞ Set A= ∞ An . Claim B is negative. n=1 A is positive.1 (Hahn–Decomposition). Theorem 1. Let An ∈ p s. sup ≥ 0. since Σ nk < ∞. ∴ ∞ k=1 1 <− nk ν(Ek ) < ∞. nk → ∞. λ ≥ ν(A). A ∪ B = X. There is a positive set A and a negative set B with A ∩ B = φ. 1 Now. ν(A) ≥ λ ⇒ 0 ≤ ν(A) = λ < ∞.

Two measures ν1 and ν2 are mutually singular (ν1 ⊥ ν2 ) if there are two measurable subsets A and B with A ∩ B = φ A ∪ B = Ω and ν1 (A) = ν2 (B) = 0. Notice that ν + (B) = 0 and ν − (A) = 0.E.2. b] ν(E) = E f dx. Clearly ν(E) = ν + (E) − ν − (E). The Hahn decomposition give two measures ν + and ν − deﬁned by ν + (E) = ν(A ∩ E) ν − (E) = −ν(B ∩ E).184 For suppose E ⊂ B. Notice that ν + ⊥ ν − . To show ν(E) = 0. ν − (E) = − E f − dx. f ∈ L1 [a. 0 < ν(E) < ∞ ⇒ E has a positive subset of positive measure by Lemma 1. Example. Deﬁnition 1. Problem 1. Q.D. Remark 1. This decomposition is unique.1. .2. Theorem 1. These are two mutually singular measures ν + and ν − such that ν = ν+ − ν−. observe E ∪ A is positive ∴ λ ≥ ν(E ∪ A) = ν(E) + ν(A) = ν(E) + λ ⇒ ν(E) = 0.2 (Jordan Decomposition). Then ν + (E) = E f + dx. Let ν be a signed measure.b: Give an example to show that the Hahn decomposition is not unique.

Theorem 1. . µ) be σ–ﬁnite measure spaces. µ) = ([0. f dµ ⇒ f (x) = 0 for all x ∈ [0. The space needs to be σ–ﬁnite. 3. Let (Ω.185 Deﬁnition 1. Example. F. If m(E) = E m = Lebesgue. dµ Remark 1. ∴ m = 0. Borel. The function f is unique a.2. 1]. (Ω. Assume ν << µ. 1]. ν << µ. Then there is a nonnegative measurable function f such that ν(E) = E f dµ.e. We call f the Radon–Nikodym derivative of ν with respect to µ and write f= dν . F. if µ(A) = 0 implies ν(A) = 0.3. Then m << µ. Contra. [µ].3 (Radon–Nikodym Theorem). Example. µ = counting). Lemmas. Let f ≥ 0 be mble and set ν(A) = A f dµ. The measure ν is absolutely continuous with respect to µ. written ν << µ.

on Bα and f (x) ≥ α a. β < λ ⇒ f (α) < λ. Assume µ(Ω) = 1.E. • If x ∈ Bα . Q.3 but this time α < β implies only µ{Dα \Bβ } = 0. (1) (2) .e. να is a signed measure.E. Suppose {Bα }α∈D as in Lemma 1. Proof of Theorem 1. on Bα .4 is unique µ a. Bα = φ. Then there is a mble c function f such that f (x) ≤ α on Bα and f (x) ≥ α on Bα . inf{φ} = ∞. Suppose {Bα }α∈D is a collection of mble sets index by a countable set of real numbers D. If f (x) < λ. Claim: ∀λ real {x: f (x) < λ} = β<λ β∈D Bβ . Notice: Ω = Bα . Lemma 1. x ∈ Bβ for any β < α and so. if α ≤ 0. then f (x) ≤ α provided we show f is mble. then x ∈ Dβ save β < λ.5.e.3. Then the function in Lemma 1. Suppose Bα ⊂ Bβ whenever α < β. Proof.3. Then there exists a mble function f on Ω such that f (x) ≤ α c a. Suppose D is dense. Let να = ν − αµ.186 Lemma 1. Bα } be the Hahn–Decomp of να . For x ∈ Ω.D. Let {Aα .D. If x ∈ Bβ .4.3 is unique and the function in lemma 1. f (x) ≥ α. Bα |Bβ = Bα ∩ (X|Bβ ) = Bα ∩ Aβ . Lemma 1.e. set f (x) = ﬁrst α such that x ∈ Bα = inf{α ∈ D: x ∈ Bα }. α ∈ Q. Q. • If x ∈ Bα .

N N .e. 1. . on Aα and f (x) ≤ α a. N Bk/N . ∀ α ∈ Q. f ≥ 0 a. .t. . E1 . Thus. Thus. Put Ek = E ∩ ∞ (1) (2) (1) (2) Bk+1 Bk/N . E∞ = Ω k=0 Then E0 . we have µ(Bα |Bβ ) = 0. Since B0 = φ. . . ∞ ν(E) = ν(E∞ ) + k=0 ν(Ek ) on Ek ⊂ Bk+1 Bk+1 Bk/N = ∩ Ak/N . E∞ are disjoint and ∞ E= k=0 Ek ∪ E∞ . . βµ(Bα |Bβ ) ≤ ν(Bα |Bβ ) ≤ αµ(Bα |Bβ ).e. Let N be very large. να (Bα |Bβ ) ≤ 0 νβ (Bα |Bβ ) ≥ 0 or ⇔ ν(Bα |Bβ ) − αµ(Bα |Bβ ) ≤ 0 ν(Bα |Bβ ) − βµ(Bα |Bβ ) ≥ 0. f ≥ α a. .e. ∃n mble f s. So. 2. on Bα .187 Thus. Thus. if α < β. k = 0.

N k+1 µ(Ek ). N N (3) k µ(Ek ) ≤ ν(Ek ) N (2) f (x)dµ ≤ Ek k+1 a. . If µ(E∞ ) > 0.188 We have. k µ(Ek ) ≤ N Also Ek ⊂ Ak/N ⇒ and Ek ⊂ Thus: ν(Ek ) − k 1 µ(Ek ) ≤ µ(Ek ) ≤ f (x)dx N N Ek k 1 ≤ µ(Ek ) + µ(Ek ) N N 1 ≤ ν(Ek ) + µ(Ek ) N Bk+1 k+1 ⇒ ν(Ek ) ≤ µ(Ek ). Add: ν(E) − 1 µ(E) ≤ N f dµ ≤ ν(E) + E 1 µ(E) N Since N is arbitrary. f ≡ ∞ a. N (1) on E∞ . Uniqueness: If ν(E) = E gdµ.e. So. If µ(E∞ ) = 0 ⇒ ν(E∞ ) = 0. k/N ≤ f (x) ≤ and so.e. then ν(E∞ ) = 0 since (ν − αµ)(E∞ ) ≥ 0 ∀ α. we are done. either way: ν(E∞ ) = E∞ f dµ. ∀ E ∈ B ⇒ ν(E) − αµ(E) = E (g − α)dµ ∀ α ∀ E ⊂ Aα .

Put µi (E) = µ(E ∩ Ωi ) and νi (E) = ν(E ∩ Ωi ). λ is σ–ﬁnite. Then νi << µi ∴ ∃fi ≥ 0 s. Similarly. Ωi ∩ Ωj = φ.t. Suppose µ is σ–ﬁnite: ν << µ.e.). on Aα or g ≥ α a. Let Ωi be s. λ(E) = 0 ⇒ µ(E) = ν(E) = 0.e. Let λ = µ + ν.189 Since 0 ≤ ν(E) − αµ(E) = E (g − α)dµ. h singular g a. The measure ν0 and ν1 are unique. Ωi = Ω. ⇒ f = g a. (f ∈ BV ⇒ f = h + g. Let (Ω. Proof.e. Then ∃ν0 ⊥ µ and ν1 << µ such that ν = ν0 + ν1 . νi (E) = E fi dµi or ν(E ∩ Ωi ) = E∩Ωi fi dµ = E dΩi dµ.4 (The Lebesgue decomposition for measures). µ(Ωi ) < ∞.e. ⇒ result. F) be a measurable space and µ and ν σ–ﬁnite measures on F. Theorem 1. g ≤ α a. on Bα . We have: g − α ≥ 0[µ] a. .c. on Aα .t.

Assume µ(E) = 0. A. F ⊂ F0 a You know: P (A|B) = σ–algebra and X ∈ σ(F0 ) with E|X| < ∞. . [λ]. P and B indept. P (A ∩ B) . (f ≥ 0) E on E. Since f > 0 on E ∩ A ⇒ λ(E ∩ A) = 0. Then ν0 (A) = 0. uniqueness. Clearly ν1 + ν0 = ν and it only remains to show that ν1 << µ.e. F0 . µ(B) = 0. B = {f = 0} Ω = A ∪ B. Let ν0 (E) = ν(E ∩ B).D. E ν(E) = Let A = {f > 0}. Problem.N. Now P (B) we work with probability measures. Let (Ω.190 R. Thus ν1 (E) = E∩A gdλ = 0 Q.E. A ∩ B = φ. so ν0 ⊥ µ set ν1 (E) = ν(E ∩ A) = E∩A gdλ. Then f dλ = 0 ⇒ f ≡ 0 a. B. ⇒ µ(E) = E f dλ gdλ. P (A|B) = P (A). P ) be a prob space.

F) by ν(A) = A Xdp. Existence: Consider ν deﬁned on (Ω. A). E|Y | ≤ E|X|. The conditional expectation of X given F. First. is any random variable Y with the properties (i) Y ∈ σ(F ) (ii) ∀A ∈ F. then E|Y | ≤ E|X|. A) = E(Y . So. Uniqueness: If Y also satisﬁes (i) and (ii). A ∈ F. With A = {Y > 0} ∈ F. Existence. written as E(X|F). XdP = A A Y dP or E(X.e. uniqueness.4. let us show that if Y has (i) and (ii).191 Deﬁnition 1. A ⇒ Y = Y a. . then Y dp = A A Y dp ∀ A ∈ F Y dp ∀ A ∈ F or A ⇒ A ydp = (Y − Y )dp = 0 ∀ A ∈ F. observe that Y dp = A A Xdp ≤ A |X|dp and −Y dp = Ac Ac −Xdp ≤ Ac |X|dp.

e. . P (B) In general.t. Ω2 . ∴ A Y dp = A xdp. B c }. Ω. ν(A) = A Y dp ∀ A ∈ F. B be ﬁxed sets in F0 . ) where Ωi are disjoint. If X is a random variable and F = σ(Ω1 . we get integral of 1A over the sets. ∴ ∃ Y ∈ σ(F) s. ν << P .192 ν is a signed measure. i. E(1A |F)P (B) = P (A ∩ B) or E(1A |F)1B (B) = P (A ∩ B) or P (A|B) = E(1A |F)1B = P (A ∩ B) . Ωi ) 1Ωi (ω). Let A. Example 1. Ωi ) P (Ωi ) E(X|F) = i=1 E(X. . ∀ A ∈ F. . E(1A |F)? This is a function so that thwn we integrate over sets in F. Let F = σ{B} = {φ. B. then 1Ωi E(X|F) = or ∞ E(X. P (Ωi ) . E(1A |F)dP = B B 1A dP = P (A ∩ B).

|E(X|F)| ≤ E(|X||F). E(aX + bY ) = aE(X|F) + bE(Y |F). (1) E(X) ∈ F. (σ(X) ⊥ F) P (X ∈ B) ∩ A) = P (X ∈ B)P (A). (vi) Monotone and Fatou’s hold the same way. Thus EXdP = E(X)E(1A ) A = E(X1A ) = A Xdp ∴ E(X|F) = EX. . Proof.e. Y integrable. (i) Done above since clearly a ∈ F. (v) If lim Xn = X and |Xn | ≤ Y.s. Thus E(X|F) = E(X). (ii) Let A ∈ F. Proof. Properties: (1) If X ∈ F ⇒ E(X|F) = X. then E(Xn |F) → E(X|F) n→∞ a.193 Notice that if F{φ. (iii) If x ≤ Y ⇒ E(X|F) ≤ E(X|F). In particular. Suppose that X. Y and Xn are integrable. (i) If X = a ⇒ E(X|F) = a (ii) For constants a and b. i. Theorem 1. E(X) = E(E(X|F)). ⇒ E(X|F) = E(X) (as in P (A|B) = P (A)). To check this. Ω}.5. (2) X is independent of F.

1. E(Zn |Fn ) ↓ decreasing. Then Zn ↓ 0 a. k≥n |E(Xn |F) − E(X|F)| ≤ E(|Xn − Y ||Fn ) ≤ E[Zn |F]. Theorem 1.C. ∀A∈F (iv) Let Zn = sup |Xk − X|.T. let Z = limit. A = ∴ E(X|F) ≤ E(Y |F). Proof. ∴ E(Z) = Ω E(Z|F)dp (E(Z| = E(E(Z|F)) = E(E(Z|F)) ≤ E(E(Zn |F)) = E(Zn ) → 0 by D. If ϕ is convex E|X| and E|ϕ(X)| < ∞. So. We have Zn ≤ 2Y . Need to show Z ≡ 0.6 (Jensen Inequality). then ϕ(E(X|F)) ≤ E(ϕ(X)|F).194 (ii) A aE(x|F)dp + A bE(Y |F)dP (aE(x|F) + bE(Y |I))dp = A =a A E(X|F)dP + b A E(Y |F)dP (aX + bY )dp = A A =a A XdP + b A Y dp = E(aX + bY |F)dP (iii) E(X|F)dP = A A XdP ≤ A Y dP E(Y |F)dP. Need to show E[Zn |F) ↓ 0 with pub. a. By (iii). .s.s.

(1) If F1 ⊂ F2 then (a) E(E(X|F1 )|F2 )) = E(X|F1 ) (b) E(E(X|F2 )|F1 )) = E(X|F1 ).195 ϕ(x0 ) + A(x0 )(x − x0 ) ≤ ϕ(x). (b) E(X|F1 ) ∈ F1 . done before). x0 = E(X|F) x = X.7. Proof. |E(X|F)|p ≤ E(|X|p |F) for 1 ≤ p < ∞ exp(E(X|F)) ≤ E(eX |F). ∴ Done. E[ϕ(E(X|F)) + A(E(X|F))(X − E(X|F)|F) ≤ E(ϕ(X|F) ⇒ ϕ(E(X|F)) + A(E(X|F))[E(X|F) − E(X|F)] ≤ E(ϕ(X|F)). A = A E(X|F2 )dp = . ϕ(E(X|F)) + A(E(X|F))(X − E(X|F)) ≤ ϕ(X). E|Y |. Note expectation of both sides. (2) If X ∈ σ(F). Corollary. Theorem 1. So A ∈ F1 ⊂ F2 we have E(X|F1 )dp = A A (X)dp E(E(x|F2 )|F2 )dP. (The smallest ﬁeld always wins). (a) E(X|F1 ) ∈ (F1 ) ⊂ (F2 ). E|XY | < ∞ ⇒ E(XY |F) = XE(Y |F) (measurable functions act like constants). (Y = 1.

Need to show E(|X − Y |2 ) ≥ E|X − E(X|F1 ))2 for any y ∈ L(F1 ). L2 (F1 )) = E(x − y)2 Theorem 1. Now. Set Z = Y − E(X|F) ∈ L2 (F1 ).3 p.225: #1. Suppose Ex2 < ∞. Then L2 (F1 ) is a closed subspace of L2 (F0 ).196 Durrett: p 220: #1.8. Given any X ∈ L2 (F0 ). 222: #1. 227: 1. 228: 1. In fact.6 p. ∃ Y ∈ L2 (F1 ) such that dist (X.2 p.8. since E(ZE(X|F)) = E(ZX|F)) = E(ZX) . L2 (F0 ) and L2 (F1 ) are Hilbert spaces. X2 = E(X1 · X2 ). X∈L2 (F1 ) Proof. then inf E(|X − Y |2 ) = E(|X − (EX|F1 ))2 . L2 (F1 ) is closed subspace in L2 (F0 ). Let y ∈ L2 (F1 ) out set. with X1 . Y = Z + E(X|F). Let L2 (F0 ) = {X ∈ F0 : EX 2 N ∞} and L2 (F1 ) = {Y ∈ F1 : EY 2 < ∞}.1 p.

v.D. y)dx R . Q.E. y)dx > 0 ∀y. E(g(X)|Y ) = h(Y ) and h(y) = R g(x)f (x. Recall conditional expectation for ?? Suppose X and Y have joint density f (x. By the way: If X and Y are two r. if E|g(X)| < ∞. We claim that in this case. we deﬁne E(X|Y ) = E(x|σ(Y )). Y = y) P (Y = y) = f (x. (f (x. y)dxdy. Y ) ∈ B) = B f (x. y) P ((X. And suppose then R f (x. y)dy .197 we see that E(ZE(X|F)) − E(ZX) = 0. y)dx R Treat the “given” density as if the second probability P (X = x|Y = y) = P (X = x. B ⊂ R2 . f (x. E(X − Y )2 = E{X − Z − E(X|F)}2 = E(X − E(X|F))2 + E(Z 2 ) − 2E((X − E(X|F))Z) ≥ E(X − E(X|F))2 − 2E(XZ) + 2E(ZE(X|F)) = E(X − E(X|F))2 . y) .

e. h can be anything where . y)dy = 0. y)dx g(3)f (3. deﬁne h by h(y) f (x. A) = E(g(X). y)d3 dy f (x. To verify: (i) clearly h(Y ) ∈ σ(Y ).198 Now: Integrale: E(g(X)|Y = y) = g(x)P (X = x)Y = y))dy. E(h(Y )1R (X) · A) = B R h(y)f (x. Then need to show E(h(Y ). y)dxdy g(3)f (3. L.S. y) dy = 0). y)dxdy = B R = E(g(X)1B (Y )) = E(g(X).H. y)dy i.. f (x. y)d3 R B R R f (x. (If f (x. A). y)dx = g(x)f (x. For (ii): let A = {Y ∈ B} for B ∈ B(R). A).

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->