Professional Documents
Culture Documents
Jason Swanson
April 23, 2013
Conditioning on -algebras
P pB X Aq
,
P pAq
for all B P F.
ErX | As X dQ,
(1.1)
whenever this integral is well-defined. Note that Er1B | As P pB | Aq.
Theorem 1.1. If Er|X|1A s 8, then X is Q-integrable. If X 0 or Er|X|1A s 8, then
ErX | As
ErX1A s
.
P pAq
(1.2)
pAq
,
P pAq
(1.3)
where d X dP . Also note that (1.2) gives us the formula ErX1A s P pAqErX | As. If
X 1B , then this reduces to the familiar multiplication rule, P pA X Bq P pAqP pB | Aq.
Proof of Theorem 1.1. Note that if P pBq 0, then QpBq 0. Hence Q ! P . Also note
that
1A
QpBq
dP,
for all B P F.
B P pAq
Thus, dQ{dP 1A {P pAq. It follows that if X 0, then
dQ
1A
ErX1A s
ErX | As X dQ X
dP E X
.
dP
P pAq
P pAq
Therefore, if Er|X|1A s 8, then X is Q-integrable, and the same formula holds.
1
Lemma 1.3. Any finite -algebra F on can be written as F ptAj unj1 q, where tAj unj1
n
is a partition of , that is, 9 j1 Aj . Moreover, the partition tAj unj1 is unique.
Proof. For each P , let A be the smallest measurable set containing . That is,
A F , where F tA P F : P Au. Since this is a finite intersection, A P F. In
particular, E tA : P u is a finite set. We claim that E is a partition of and that
F pEq.
To see that F pEq, simply note that any A P F can be written as A PA A , and
that this is a finite union.
n
For uniqueness, suppose that F ptBj unj1 q, where 9 j1 Bj . If P Bj , then
l
A Bj . Therefore, E tBj unj1 .
Exercise 1.4. Show that every infinite -algebra is uncountable.
Let p, F, P q be a probability space and X an integrable random variable. Let G F be
a finite -algebra. Write G ptAj unj1 q, where tAj unj1 is a partition of . The conditional
expectation of X given G, written ErX | Gs, is a random variable defined by
$
ErX | A1 s if P A1 ,
&ErX | A2 s if P A2 ,
ErX | Gspq
..
%ErX | A s if P A .
n
ErX | Aj s1Aj .
(1.4)
j1
he calls, I can get prepared by computing ErX | Aj s for all j, and then encoding all this
information into the single random variable ErX | Gs.
In probability theory, we model information by -algebras. In this example, the -algebra
G generated by the partition tAj u models the information I will receive from my friends
phone call. Imagine that while I am waiting for my friends phone call, an interviewer starts
asking me questions. For various events A, the interviewer asks me, After your friend calls,
will you know with certainty whether or not A has occurred? Depending on the event A,
I will have to answer either yes, no, or maybe. The events A P G are precisely those events
for which I can answer yes.
Theorem 1.5. Let p, F, P q be a probability space and X an integrable random variable.
Let G F be a finite -algebra, and let d X dP . Then
ErX | Gs
dp|G q
.
dpP |G q
(1.5)
p|G qpAq
Z dpP |G q, for all A P G.
A
If
A P G, then p|G qpAq pAq ErX1A s; and if Z is G-measurable, then A Z dpP |G q
Z dP ErZ1A s. It therefore follows that (1.5) is equivalent to (i) and (ii). Uniqueness
A
follows from the uniqueness for the Radon-Nikodym derivative.
Write G ptAj unj1 q, where tAj unj1 is a partition of . By (1.4), it is clear that Z
is G-measurable, so we need only verify (ii). By the linearity of the expected value, it will
suffice to show that ErX1Aj s ErZ1Aj s for all j. Using (1.4), we have
n
n
ErZ1Aj s E
ErX | Ai s1Ai 1Aj E
ErX | Ai s1Ai XAj
i1
i1
A 9 kPC tY ku. By the linearity of the expected value, we may assume without loss of
generality that A tY ku. In this case,
Erf pY q1A s Erf pY q1tY ku s Erf pkq1tY ku s f pkqP pY kq
ErX|Y ksP pY kq ErX1tY ku s ErX1A s.
Note that we have again used (1.2).
f px, kq dx
A
.
(2.1)
P pX P A | Y kq
f px, kq dx
R
The integral in the denominator is necessary in order to make the function x f px, kq a
probability density function. In this example, we will explore the sense in which this formula
is rigorously valid. (In an undergraduate class, it may be rigorously valid by definition. But
for us, as usual, it is a special case of something more general.)
4
Theorem 2.3. Let X and Y have joint density f px, yq. Let g be a measurable function such
that E|gpXq| 8. Define
gpxqf px, kq dx
R
hpkq
,
f px, kq dx
R
whenever
Remark 2.4. If gpxq 1A pxq, then hpkq agrees with the right-hand side of (2.1).
Also, as can
be seen from the proof below, we could have defined hpkq arbitrarily when R f px, kq dx 0.
Proof of Theorem 2.3. Since hpY q is pY q-measurable, we only need to show that
ErhpY q1A s ErgpXq1A s for all A P pY q. Let A P pY q, so that A tY P Bu for
some B P R. We now have
ErhpY q1A s ErhpY q1B pY qs
hpyqf px, yq dx dy
B R
hpyq f px, yq dx dy
hpyq f px, yq dx dy,
B
BXC
where
C
ty
:
f
px,
yq
dx
0u.
Note
that
for
all
y
P
C,
we
have
hpyq
f px, yq dx
R
R
c
gpxqf
px,
yq
dx.
Also,
for
all
y
P
C
,
we
have
f
px,
yq
0
for
Lebesgue
almost
every x.
R
c
Thus, y P C implies R gpxqf px, yq dx 0. It therefore follows that
ErhpY q1A s
gpxqf px, yq dx dy
BXC
gpxqf px, yq dx dy
B
ErgpXq1B pY qs ErgpXq1A s,
which was what we needed to prove.
3
3.1
Proof. It follows trivially, since X is G-measurable and ErX1A s ErX1A s for all A P G. l
Recall that X and G are independent if pXq and G are independent, which in turn
means that P pA X Bq P pAqP pBq whenever A P pXq and B P G. Hence, X and G are
independent if and only if P ptX P Cu X Bq P pX P CqP pBq for all C P R and B P G.
Proposition 3.3. If X and G are independent, then ErX | Gs ErXs. In particular,
ErX | tH, us ErXs.
Proof. A constant random variable is measurable with respect to every -algebra, so ErXs
is trivially G-measurable. Also, for all A P G, we have ErX1A s ErXsEr1A s ErErXs1A s.
The final claim holds since every random variable is independent of the trivial -algebra. l
Theorem 3.4. If G1 G2 F, then ErErX | G1 s | G2 s ErErX | G2 s | G1 s ErX | G1 s.
Remark 3.5. In words, this says that in a battle between nested -algebras, the smallest
-algebra always wins.
Proof of Theorem 3.4. Since G1 G2 and ErX | G1 s is G1 -measurable, it is also G2 measurable. Hence, by Proposition 3.2, ErErX | G1 s | G2 s ErX | G1 s. The other
equality holds since ErX | G1 s is G1 -measurable and, for all A P G1 G2 , we have
ErErX | G1 s1A s ErX1A s ErErX | G2 s1A s.
l
Theorem 3.6. If Y and XY are integrable, and X is G-measurable, then
ErXY | Gs XErY | Gs a.s.
Proof. Let d Y dP , so that ErY | Gs dp|G q{dpP |G q. Let d1 X d` , d2 X d ,
and 1 2 , so that d XY dP and ErXY | Gs dp|G q{dpP |G q. Since X is
G-measurable, we have dp1 |G q{dp` |G q dp2 |G q{dp |G q X. Hence,
ErXY | Gs
dp|G q
dp1 |G q dp2 |G q
dpP |G q
dpP |G q
dpP |G q
dp1 |G q dp` |G q
dp2 |G q dp |G q
dp` |G q dpP |G q
dp |G q dpP |G q
dp` |G q dp |G q
dp|G q
X
X
XErY | Gs, P -a.s.,
dpP |G q
dpP |G q
dpP |G q
l
6
Lemma 3.8. Suppose U and V are H-measurable random variables. If ErU 1A s ErV 1A s
for all A P H, then U V a.s. If ErU 1A s ErV 1A s for all A P H, then U V a.s.
Proof. By reversing the roles of U and V , the second claim follows from the first. To prove
the first, suppose ErU 1A s ErV 1A s for all A P H. Let A tU V u P H and define
Z pU V q1A , so that Z 0. Note that EZ ErU 1A s ErV 1A s 0. Hence, EZ 0,
so Z 0 a.s., which implies P pAq 0.
l
Theorem 3.9. (monotonicity) If X Y a.s., then ErX | Gs ErY | Gs a.s.
Proof. For all A P G, we have ErErX | Gs1A s ErX1A s ErY 1A s ErErY | Gs1A s.
Hence, by Lemma 3.8, ErX | Gs ErY | Gs a.s.
l
Theorem 3.10. Suppose X and Y are independent and is a measurable function such
that E|pX, Y q| 8, then ErpX, Y q | Xs gpXq, where gpxq Erpx, Y qs.
Remark 3.11. It is important here that X and Y are independent. This result is not true
when X and Y are dependent.
Proof of Theorem 3.10. Clearly, gpXq is pXq-measurable. Let A P R. Then
ErpX, Y q1tXPAu s
px, yq1A pxqY pdyqX pdxq
1A pxq
px, yqY pdyq X pdxq 1A pxqgpxqX pdxq ErgpXq1tXPAu s,
and we are done.
Example 3.12. Let X, Y, Z be iid, uniformly distributed on p0, 1q. We shall compute the
distribution of pXY qZ . We begin by computing the distribution of W XY . Let w P p0, 1q.
Then
P pW wq P pXY wq Er1tXY wu s ErEr1tXY wu | Xss.
By Theorem 3.10, Er1tXY wu | Xs f pXq, where
w
w
1txwu ` 1txwu .
f pxq Er1txY wu s P pxY wq P Y
x
x
Thus,
w
P pW wq Erf pXqs E 1tXwu ` 1tXwu w `
X
1
w
w
dx w w log w.
x
gpwq P pw xq P
log x
Z
log w
7
log x
1
log w
1twxu .
Thus,
log x
log x
P ppXY q xq E
1
1tW xu
1
p log wq dw
log W
log w
0
x
log w dw ` x log x x.
Z
3.2
n8
Theorem 3.14. (Fatous lemma) If Xn 0 a.s., each Xn is integrable, and lim inf n8 Xn
is integrable, then
Erlim inf Xn | Gs lim inf ErXn | Gs a.s.
n8
n8
n8
Exercise 3.16. Prove Theorem 3.15 by mimicking the proof of the ordinary dominated
convergence theorem.
pcq pc hq
h
(3.1)
Theorem 3.19. (H
olders inequality) Let p, q P p1, 8q be conjugate exponents, so that
1{p ` 1{q 1. Suppose that |X|p and |Y |q are integrable. Then
Er|XY | | Gs pEr|X|p | Gsq1{p pEr|Y |q | Gsq1{q a.s.
Proof. Note that by the ordinary Holders inequality, XY is integrable, so that Er|XY | | Gs
is well-defined. Let U pEr|X|p | Gsq1{p and V pEr|Y |q | Gsq1{q . Note that both U and V
are G-measurable. Observe that
Er|X|p 1tU 0u s ErEr|X|p 1tU 0u | Gss Er1tU 0u Er|X|p | Gss Er1tU 0u U p s 0.
Hence, |X|1tU 0u 0 a.s., which implies
Er|XY | | Gs1tU 0u Er|XY |1tU 0u | Gs 0.
Similarly, Er|XY | | Gs1tV 0u 0. It therefore suffices to show that Er|XY | | Gs1H U V ,
where H tU 0, V 0u. For this, we will use Lemma 3.8 to prove that
Er|XY | | Gs
1H 1 a.s..
UV
Note that the left-hand side is defined to be zero on H c .
9
Er|XY | | Gs
|XY |
E
1H 1A E E
1G G
UV
UV
|X|
|Y |
E
1G
1G
U
V
1{p q 1{q
|X|p
|Y |
E
1
E
1G
G
Up
Vq
1{p
1{q
Er|Y |q | Gs
Er|X|p | Gs
1G
E
1G
E
Up
Vq
pEr1G sq1{p pEr1G sq1{q Er1G s Er1A s.
Applying Lemma 3.8 finishes the proof.
3.3
Remark 3.21. In the language of Hilbert spaces and Lp spaces, this theorem says the
following: X is an element of the Hilbert space L2 p, F, P q, and ErX | Gs is the orthogonal
projection of X onto the subspace L2 p, G, P q.
10
If G is a finite -algebra, so that G ptAj unj1 q, where tAj unj1 is a partition of , then
$
P pX P A | A1 q if P A1 ,
&P pX P A | A2 q if P A2 ,
P pX P A | Gqpq
..
%P pX P A | A q if P A .
n
With a structure such as this, expectations conditioned on -algebras behave very much
like ordinary expectations. When this happens, we are able to make valuable intuitive
connections to mathematical ideas that we are already familiar with. It would be nice if
P pX P | Gq could always be viewed as a conditional distribution that depends on , even
when G is infinite. Unfortunately, this is not the case in the most general setting. However,
under certain conditions it is true and regular conditional distributions are the mathematical
objects that express this.
To introduce regular conditional distributions, we begin with some terminology. A
measurable space pS, Sq is a standard Borel space if there exists an pS, Rq-measurable
bijection : S R such that 1 is pR, Sq-measurable. The following two theorems are
Theorems 1.4.12 and 5.1.9 in [1].
Theorem 4.1. If S is a Borel subset of a complete separable metric space M , and S is the
collection of Borel subsets of S, then pS, Sq is a standard Borel space.
Theorem 4.2. Let p, F, P q be a probability space and pS, Sq a measurable space. Let X
be an S-valued random variable, that is, X : S is pF, Sq-measurable. Let G F be
a -algebra. If pS, Sq is a standard Borel space, then there exists a function X,G from
S to r0, 1s such that
11
Er1A pXq | Gs P pX P A | Gq p, Aq
By linearity (Theorem 3.7), the result holds for all simple functions f . By monotone
convergence (Theorem 3.13), the result holds for all nonnegative functions f satisfying
E|f pXq| 8. And finally, by considering the positive and negative parts, the result holds
for all measurable functions f satisfying E|f pXq| 8.
l
For our second example, we give a simple proof of Theorem 3.19 (Holders inequality).
Proof of Theorem 3.19. Since pR2 , R2 q is a standard Borel space, there exists a regular
conditional distribution for pX, Y q given G, which we denote by . Since |X|p and |Y |q are
integrable, the ordinary Holders inequality implies |XY | is integrable. Thus, by Theorem
4.3,
Er|XY | | Gs
For P -a.e. P , we may apply the ordinary Holders inequality to the measure p, q,
yielding
1{p
1{q
p
q
Er|XY | | Gs
|x| p, dx dyq
|y| p, dx dyq
a.s.
R2
R2
For our final example, let us first consider a property of unconditioned expectations. If
X is a real-valued random variable, and h : R r0, 8q is absolutely continuous with h1 0
a.e. and hpxq 0 as x 8, then
This is the content of Exercise 2.2.7 in [1], and one way to see it is to use Fubinis theorem:
X
1
ErhpXqs hpXq dP
h ptq dt dP
1tXtu h1 ptq dt dP
8
R
1
h ptq 1tXtu dP dt h1 ptqP pX tq dt.
R
12
It is then natural to ask whether a similar thing is true for conditional expectations, and
whether a similar proof can demonstrate it. We will answer both questions in the affirmative
by using regular conditional probabilities.
Theorem 4.4. Let X be a real-valued random variable on a probability space p, F, P q and
G F a -algebra. Let h : R r0, 8q be absolutely continuous with h1 0 a.e. and hpxq 0
as x 8. Suppose that E|hpXq| 8. Then it is possible to choose, for each t P R, a
version of P pX t | Gq so that the function t h1 ptqP pX t | Gq is almost surely Lebesgue
integrable on R, and satisfies
Proof. Since pR, Rq is a standard Borel space, there exists a regular conditional distribution
for X given G, which we denote by . For each t P R, let us choose the version of P pX t | Gq
determined by , that is, P pX t | Gqpq p, pt, 8qq. Then for P -a.e. P , we have
1
1txtu h ptq dt p, dxq hpxq p, dxq ErhpXq | Gspq 8.
R
1
1txtu h1 ptq p, dxq dt
h ptqP pX t | Gqpq dt
R R
R
classical fundamental theorem of calculus no longer applies when one is working with
stochastic integrals. It must be replaced by a new rule called Itos rule. Itos rule gives
rise to a whole new calculus called stochastic calculus.
References
[1] Rick Durrett. Probability: theory and examples. Cambridge Series in Statistical and
Probabilistic Mathematics. Cambridge University Press, Cambridge, fourth edition, 2010.
15