Chapter 2: Belief, Probability, and Exchangeability

Chapter 2: Belief, Probability, and Exchangeability
MSU-STT-465: Summer-20B
Lecture 2: Independence and Exchangeability
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 1 / 15

Independent random variables
The rvs Y1 , . . . , Yn are conditionally independent, given θ, if
Pr (Y1 ∈ A1 . . . , Yn ∈ An | θ) = Pr (Y1 ∈ A1 | θ) × · · · × Pr (Yn ∈ An | θ).
Indeed, it follows that
Pr (Yi ∈ Ai | θ, Yj ∈ Aj ) = Pr (Yi ∈ Ai | θ, )
showing that Yj has no information about Yi given θ.
Also, the joint density is
p (y1 , . . . , yn | θ) = p1 (y1 | θ) × · · · × pn (yn | θ).
where pi (yi ) denotes the density of Yi .
When Yi ’s are observed from common population with density p (y | θ),
n
Y
p (y1 , . . . , yn | θ) = p (yi | θ),
i =1
that is, Y1 , . . . , Yn are conditionally i.i.d. given θ and we write:
(Y1 , . . . , Yn | θ) ∼ i.i.d. p (y | θ).
Independence and Exchangeability
For simplicity, let p (y1 , y2 , . . . , y8 ) = Pr (Y1 = y1 , Y2 = y2 , . . . , Y8 = y8 ). In

many practical situations, we would expect that the specific order of
observations is not important. For example, consider a random sample of
3 observations from an infinite population which may or may not have a
property (with probability θ). It makes sense to assign equal probabilities to
p (0, 0, 1) = p (1, 0, 0) = p (0, 1, 0).
since each of the three sets has one 1’s and two 0’s. This property is
called exchangeability. Loosely speaking, the subscripts of the
observations convey no information about the outcomes.
Definition 0.1
Exchangeability. Let Y1 , . . . , Yn be n random variables. If the joint density
p (y1 , . . . , yn ) = p (yπ1 , . . . , yπn ) for all permutations π = (π1 , . . . , πn ) of
{1, . . . , n}, then Y1 , . . . , Yn are exchangeable

Independence and Exchangeability
Definition 0.2
Two random variables X and Y are said to have the identical
d
distribution, denoted by X = Y , if Fx (x ) = Fy (x ) , ∀x ∈ R.
d d
Note: Let X and Y be random vectors. If X = Y , then f (X ) = f (Y ), where
f is a real-valued (deterministic) function.
Clearly, a collection (X1 , X2 , . . . , Xn ) of iid rvs is exchangeable, but the

converse is not true.
This is because exchangeability does not imply independence. Construct

a counter-example (Try with two discrete Bernoulli rvs). But, it is easy to
show that if X1 , X2 , . . . , Xn are exchangeable, then they have the same
marginal distributions, that is, identicalness is retained.

Conditional Independence and Exchangeability
Conditional Independence and Exchangeability
Assume the population has 1300 units and also we know the proportion θ
of units having a particular property. We observe the property on 8 units
and suppose it is reasonable to to assume
Pr (Y8 = 1 | θ) = θ
Pr (Y8 = 1 | Y1 = y1 , . . . , Y7 = y7 , θ) = θ
Pr (Y4 = 1 | Y1 = y1 , . . . , Y3 = y3 , Y5 = y5 , . . . , Y8 = y8 , θ) = θ
and for similar conditional distributions. Since the population size is quite
large the compared to the sample size n = 8, the sampling without
replacement (WOR) is approximately same as sampling with replacement
(WR). That is, we may consider Yi ’s to be conditionally independent.

Exchangeability (Contd.)
Also, under the conditional independence of the Yi , given θ,
Pr (Yi = yi | Yj = yj , j , i , θ) =Pr (Yi = yi | θ) = θyi (1 − θ)1−yi ;

8
Y
Pr (Y1 = y1 , Y2 = y2 , . . . , Y8 = y8 | θ) = θyi (1 − θ)1−yi
i =1
P8 P8
=θ i =1 y i (1 − θ)8− i =1 yi .
Suppose assume next that θ is unknown and we express our belief by

p (θ). In that case,
Z 1
p ( y1 , y2 , . . . , y8 ) = p (y1 , y2 , . . . , y8 | θ)p (θ) d θ
0
Z 1 P
8 P8
= θ i =1 y i (1 − θ)8− i =1 y i p (θ)d θ.
0

So, for binary four sequences,
Z 1
p (1, 0, 0, 1, 0, 0, 0, 1) = θ3 (1 − θ)5 p (θ)d θ
0
Z 1
p (0, 0, 1, 1, 0, 0, 0, 1) = θ3 (1 − θ)5 p (θ)d θ
0
Z 1
p (1, 1, 0, 0, 0, 0, 0, 1) = θ3 (1 − θ)5 p (θ)d θ
0
Z 1
p (0, 0, 1, 0, 0, 1, 1, 0) = θ3 (1 − θ)5 p (θ)d θ.
0
As all the probabilities are equal, it seems like Y1 , Y2 , . . . , Y8 are

exchangeable. Indeed, we prove this result next.
Lemma 1
If θ ∼ p (θ) and (Y1 , . . . Yn | θ) are i.i.d., then marginally (Y1 , . . . Yn ) are
exchangeable.
Proof.
Let (π1 , . . . , πn ) be a permutation of (1, . . . , n). Then
Z
p (y1 , . . . , yn ) = p (y1 , . . . , yn | θ)p (θ) d θ
n

Z Y 
p (yi | θ) p (θ) d θ (i.i.d.)
 
= 
i =1
Z Y n

p (yπi | θ) p (θ) d θ (order of product doesn’t matter)
 
= 
i =1
= p (yπ1 , . . . , yπn ).

Remark 0.1
Frequentists assume the Bernoulli variables X1 , X2 , . . . , Xn are
independent outcomes of the same experiment (e.g. a coin flip), that is,
independence.
But continuing to observe Xj ’s should result in a change of opinion about

the distribution of coin flip outcomes (e.g. gradually learning coin bias).
But, Bayesian statisticians assume exchangeability, a weaker condition

than independence.

de Finetti’s theorem
We have see that if θ ∼ p (θ) and (Y1 , . . . Yn | θ), n ≥ 1, are i.i.d implies
exchangeability of (Y1 , . . . Yn ). What about the converse, especially for
large n?
Theorem 0.1
de Finetti’s theorem. Let Y1 , . . . , Yn be a finite subset of an infinitely
exchangeable, but not necessarily i.i.d., rvs so that
p (y1 , . . . , yn ) = p (yπ1 , . . . , yπn ), for all n ≥ 1,
and for all permutations π = (π1 , . . . , πn ). Then the joint distribution of

Y1 , . . . , Yn can be written as
Z Y n

p (y1 , . . . , yn ) = p (yi | θ) p (θ) d θ.
 

1
for some parameter θ, p (θ) and p (y |θ).

de Finetti’s theorem
Thus, in general, we have
(Y1 , . . . , Yn | θ are i.i.d.) ⇐⇒ Y1 , . . . , Yn are exchangable for all n ≥ 1.
Importantly, if we sample from a sufficiently large population, then we can

model the observations as being approximately conditionally i.i.d.
We next give de Finetti’s theorem for Bernoulli rvs. We need the following.
A sequence {cn }n≥1 is said to be completely monotone if
(−1)r ∆r cn ≥ 0, r ≥ 0,
where ∆cn = cn+1 − cn and ∆2 cn = ∆(∆cn ) and etc.
The following result is from Feller (1971, Vol. 2, p. 225).

Alternative statement of theorem
Theorem 0.2
A sequence {ck } of reals with c0 = 1 corresponds to the moments of a
distribution function F on [0,1] if and only if it is completely monotone.
Proof. Let X ∼ F on [0, 1] and ck = E(X k ). Then,
−∆ck = ck − ck +1 = E(X k (1 − X )) ≥ 0.
Inductively, it follows
(−1)r ∆r ck = E(X k (1 − X )r ) ≥ 0, for all r ≥ 0,
and hence the moment sequence is completely monotone.

The converse part is rather complicated, as it involves Bernstein
polynomials.

de Finetti’s theorem for Bernoulli Variables
The following beautiful result due to de Finetti.
Theorem 0.3
de Finetti’s theorem. Let {Xn }n≥1 be an exchangeable sequence of
Bernoulli random variables and Sn = ni=1 Xi . Then there exists a
P
probability distribution F on [0, 1] such that
P (X1 = 1, . . . , Xk = 1, Xk +1 = 0, . . . , Xn = 0)
Z 1
= θk (1 − θ)n−k dF (θ);
0
!Z 1
n
P (Sn = k ) = θk (1 − θ)n−k dF (θ)
k 0
= Eθ [dbinom(k |n, θ)],

that is, (X1 , . . . , Xn |θ) are iid Ber (θ), and (Sn |θ) ∼ Bin(n, θ).

Proof (Contd.)
An outline of the proof.
Proof. Let c0 = 1. For 0 ≤ k ≤ n, let
pk ,n = P (X1 = 1, . . . , Xk = 1, Xk +1 = 0, . . . , Xn = 0)
For n = 1, 2, . . .,
cn = pn,n = P (X1 = 1, . . . , Xn = 1).
Then, it can be seen that
pn−1,n = pn−1,n−1 − pn,n = −∆cn−1 .
Thus, in general
pk ,n = pk ,n−1 − pk +1,n = (−1)n−k ∆n−k ck ≥ 0,
for all 0 ≤ k ≤ n. That is, the sequence {cn } is completely monotone.

Proof (Contd.)
Hence, there exists a unique F on [0, 1] such that
Z 1
ck = θk dF (θ),
0
and
pk ,n = (−1)n−k ∆k ck = E(θk (1 − θ)n−k )

Z 1
= θk (1 − θ)n−k dF (θ).
0
Also. since {Xn } is exchangeable,

!
n
P (Sn = k ) = pk ,n
k
!Z 1
n
= θk (1 − θ)n−k dF (θ),
k 0
which proves the theorem.


Chapter 2: Belief, Probability, and Exchangeability

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2: Belief, Probability, and Exchangeability

Uploaded by

Copyright:

Available Formats

Chapter 2: Belief, Probability, and Exchangeability

Lecture 2: Independence and Exchangeability

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 1 / 15

For simplicity, let p (y1 , y2 , . . . , y8 ) = Pr (Y1 = y1 , Y2 = y2 , . . . , Y8 = y8 ). In

p (0, 0, 1) = p (1, 0, 0) = p (0, 1, 0).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 3 / 15

Clearly, a collection (X1 , X2 , . . . , Xn ) of iid rvs is exchangeable, but the

This is because exchangeability does not imply independence. Construct

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 4 / 15

Conditional Independence and Exchangeability

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 5 / 15

Pr (Yi = yi | Yj = yj , j , i , θ) =Pr (Yi = yi | θ) = θyi (1 − θ)1−yi ;

Suppose assume next that θ is unknown and we express our belief by

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 6 / 15

As all the probabilities are equal, it seems like Y1 , Y2 , . . . , Y8 are

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 8 / 15

But continuing to observe Xj ’s should result in a change of opinion about

But, Bayesian statisticians assume exchangeability, a weaker condition

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 9 / 15

p (y1 , . . . , yn ) = p (yπ1 , . . . , yπn ), for all n ≥ 1,

and for all permutations π = (π1 , . . . , πn ). Then the joint distribution of

for some parameter θ, p (θ) and p (y |θ).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 10 / 15

Thus, in general, we have

(Y1 , . . . , Yn | θ are i.i.d.) ⇐⇒ Y1 , . . . , Yn are exchangable for all n ≥ 1.

Importantly, if we sample from a sufficiently large population, then we can

A sequence {cn }n≥1 is said to be completely monotone if

where ∆cn = cn+1 − cn and ∆2 cn = ∆(∆cn ) and etc.

The following result is from Feller (1971, Vol. 2, p. 225).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 11 / 15

Proof. Let X ∼ F on [0, 1] and ck = E(X k ). Then,

(−1)r ∆r ck = E(X k (1 − X )r ) ≥ 0, for all r ≥ 0,

and hence the moment sequence is completely monotone.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 12 / 15

The following beautiful result due to de Finetti.

= Eθ [dbinom(k |n, θ)],

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 13 / 15

cn = pn,n = P (X1 = 1, . . . , Xn = 1).

Then, it can be seen that

pn−1,n = pn−1,n−1 − pn,n = −∆cn−1 .

pk ,n = pk ,n−1 − pk +1,n = (−1)n−k ∆n−k ck ≥ 0,

for all 0 ≤ k ≤ n. That is, the sequence {cn } is completely monotone.

pk ,n = (−1)n−k ∆k ck = E(θk (1 − θ)n−k )

Also. since {Xn } is exchangeable,

which proves the theorem.

You might also like