You are on page 1of 15

Chapter 2: Belief, Probability, and Exchangeability

MSU-STT-465: Summer-20B

Lecture 2: Independence and Exchangeability

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 1 / 15


Independent random variables
The rvs Y1 , . . . , Yn are conditionally independent, given θ, if
Pr (Y1 ∈ A1 . . . , Yn ∈ An | θ) = Pr (Y1 ∈ A1 | θ) × · · · × Pr (Yn ∈ An | θ).
Indeed, it follows that
Pr (Yi ∈ Ai | θ, Yj ∈ Aj ) = Pr (Yi ∈ Ai | θ, )
showing that Yj has no information about Yi given θ.
Also, the joint density is
p (y1 , . . . , yn | θ) = p1 (y1 | θ) × · · · × pn (yn | θ).
where pi (yi ) denotes the density of Yi .
When Yi ’s are observed from common population with density p (y | θ),
n
Y
p (y1 , . . . , yn | θ) = p (yi | θ),
i =1
that is, Y1 , . . . , Yn are conditionally i.i.d. given θ and we write:
(Y1 , . . . , Yn | θ) ∼ i.i.d. p (y | θ).
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 2 / 15
Independence and Exchangeability

For simplicity, let p (y1 , y2 , . . . , y8 ) = Pr (Y1 = y1 , Y2 = y2 , . . . , Y8 = y8 ). In


many practical situations, we would expect that the specific order of
observations is not important. For example, consider a random sample of
3 observations from an infinite population which may or may not have a
property (with probability θ). It makes sense to assign equal probabilities to

p (0, 0, 1) = p (1, 0, 0) = p (0, 1, 0).

since each of the three sets has one 1’s and two 0’s. This property is
called exchangeability. Loosely speaking, the subscripts of the
observations convey no information about the outcomes.

Definition 0.1
Exchangeability. Let Y1 , . . . , Yn be n random variables. If the joint density
p (y1 , . . . , yn ) = p (yπ1 , . . . , yπn ) for all permutations π = (π1 , . . . , πn ) of
{1, . . . , n}, then Y1 , . . . , Yn are exchangeable

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 3 / 15


Independence and Exchangeability

Definition 0.2
Two random variables X and Y are said to have the identical
d
distribution, denoted by X = Y , if Fx (x ) = Fy (x ) , ∀x ∈ R.

d d
Note: Let X and Y be random vectors. If X = Y , then f (X ) = f (Y ), where
f is a real-valued (deterministic) function.

Clearly, a collection (X1 , X2 , . . . , Xn ) of iid rvs is exchangeable, but the


converse is not true.

This is because exchangeability does not imply independence. Construct


a counter-example (Try with two discrete Bernoulli rvs). But, it is easy to
show that if X1 , X2 , . . . , Xn are exchangeable, then they have the same
marginal distributions, that is, identicalness is retained.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 4 / 15


Conditional Independence and Exchangeability

Conditional Independence and Exchangeability

Assume the population has 1300 units and also we know the proportion θ
of units having a particular property. We observe the property on 8 units
and suppose it is reasonable to to assume

Pr (Y8 = 1 | θ) = θ
Pr (Y8 = 1 | Y1 = y1 , . . . , Y7 = y7 , θ) = θ
Pr (Y4 = 1 | Y1 = y1 , . . . , Y3 = y3 , Y5 = y5 , . . . , Y8 = y8 , θ) = θ

and for similar conditional distributions. Since the population size is quite
large the compared to the sample size n = 8, the sampling without
replacement (WOR) is approximately same as sampling with replacement
(WR). That is, we may consider Yi ’s to be conditionally independent.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 5 / 15


Exchangeability (Contd.)
Also, under the conditional independence of the Yi , given θ,

Pr (Yi = yi | Yj = yj , j , i , θ) =Pr (Yi = yi | θ) = θyi (1 − θ)1−yi ;


8
Y
Pr (Y1 = y1 , Y2 = y2 , . . . , Y8 = y8 | θ) = θyi (1 − θ)1−yi
i =1
P8 P8
=θ i =1 y i (1 − θ)8− i =1 yi .

Suppose assume next that θ is unknown and we express our belief by


p (θ). In that case,
Z 1
p ( y1 , y2 , . . . , y8 ) = p (y1 , y2 , . . . , y8 | θ)p (θ) d θ
0
Z 1 P
8 P8
= θ i =1 y i (1 − θ)8− i =1 y i p (θ)d θ.
0

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 6 / 15


Exchangeability (Contd.)
So, for binary four sequences,
Z 1
p (1, 0, 0, 1, 0, 0, 0, 1) = θ3 (1 − θ)5 p (θ)d θ
0
Z 1
p (0, 0, 1, 1, 0, 0, 0, 1) = θ3 (1 − θ)5 p (θ)d θ
0
Z 1
p (1, 1, 0, 0, 0, 0, 0, 1) = θ3 (1 − θ)5 p (θ)d θ
0
Z 1
p (0, 0, 1, 0, 0, 1, 1, 0) = θ3 (1 − θ)5 p (θ)d θ.
0

As all the probabilities are equal, it seems like Y1 , Y2 , . . . , Y8 are


exchangeable. Indeed, we prove this result next.
Lemma 1
If θ ∼ p (θ) and (Y1 , . . . Yn | θ) are i.i.d., then marginally (Y1 , . . . Yn ) are
exchangeable.
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 7 / 15
Exchangeability (Contd.)

Proof.
Let (π1 , . . . , πn ) be a permutation of (1, . . . , n). Then
Z
p (y1 , . . . , yn ) = p (y1 , . . . , yn | θ)p (θ) d θ
n

Z Y 
p (yi | θ) p (θ) d θ (i.i.d.)
 
= 
i =1
Z Y n

p (yπi | θ) p (θ) d θ (order of product doesn’t matter)
 
= 
i =1
= p (yπ1 , . . . , yπn ).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 8 / 15


Exchangeability (Contd.)

Remark 0.1
Frequentists assume the Bernoulli variables X1 , X2 , . . . , Xn are
independent outcomes of the same experiment (e.g. a coin flip), that is,
independence.

But continuing to observe Xj ’s should result in a change of opinion about


the distribution of coin flip outcomes (e.g. gradually learning coin bias).

But, Bayesian statisticians assume exchangeability, a weaker condition


than independence.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 9 / 15


de Finetti’s theorem
We have see that if θ ∼ p (θ) and (Y1 , . . . Yn | θ), n ≥ 1, are i.i.d implies
exchangeability of (Y1 , . . . Yn ). What about the converse, especially for
large n?

Theorem 0.1
de Finetti’s theorem. Let Y1 , . . . , Yn be a finite subset of an infinitely
exchangeable, but not necessarily i.i.d., rvs so that

p (y1 , . . . , yn ) = p (yπ1 , . . . , yπn ), for all n ≥ 1,

and for all permutations π = (π1 , . . . , πn ). Then the joint distribution of


Y1 , . . . , Yn can be written as
Z Y n

p (y1 , . . . , yn ) = p (yi | θ) p (θ) d θ.
 

1

for some parameter θ, p (θ) and p (y |θ).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 10 / 15


de Finetti’s theorem

Thus, in general, we have

(Y1 , . . . , Yn | θ are i.i.d.) ⇐⇒ Y1 , . . . , Yn are exchangable for all n ≥ 1.

Importantly, if we sample from a sufficiently large population, then we can


model the observations as being approximately conditionally i.i.d.

We next give de Finetti’s theorem for Bernoulli rvs. We need the following.

A sequence {cn }n≥1 is said to be completely monotone if

(−1)r ∆r cn ≥ 0, r ≥ 0,

where ∆cn = cn+1 − cn and ∆2 cn = ∆(∆cn ) and etc.

The following result is from Feller (1971, Vol. 2, p. 225).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 11 / 15


Alternative statement of theorem

Theorem 0.2
A sequence {ck } of reals with c0 = 1 corresponds to the moments of a
distribution function F on [0,1] if and only if it is completely monotone.

Proof. Let X ∼ F on [0, 1] and ck = E(X k ). Then,

−∆ck = ck − ck +1 = E(X k (1 − X )) ≥ 0.

Inductively, it follows

(−1)r ∆r ck = E(X k (1 − X )r ) ≥ 0, for all r ≥ 0,

and hence the moment sequence is completely monotone.


The converse part is rather complicated, as it involves Bernstein
polynomials.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 12 / 15


de Finetti’s theorem for Bernoulli Variables

The following beautiful result due to de Finetti.

Theorem 0.3
de Finetti’s theorem. Let {Xn }n≥1 be an exchangeable sequence of
Bernoulli random variables and Sn = ni=1 Xi . Then there exists a
P
probability distribution F on [0, 1] such that

P (X1 = 1, . . . , Xk = 1, Xk +1 = 0, . . . , Xn = 0)
Z 1
= θk (1 − θ)n−k dF (θ);
0
!Z 1
n
P (Sn = k ) = θk (1 − θ)n−k dF (θ)
k 0

= Eθ [dbinom(k |n, θ)],


that is, (X1 , . . . , Xn |θ) are iid Ber (θ), and (Sn |θ) ∼ Bin(n, θ).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 13 / 15


Proof (Contd.)
An outline of the proof.
Proof. Let c0 = 1. For 0 ≤ k ≤ n, let

pk ,n = P (X1 = 1, . . . , Xk = 1, Xk +1 = 0, . . . , Xn = 0)

For n = 1, 2, . . .,

cn = pn,n = P (X1 = 1, . . . , Xn = 1).

Then, it can be seen that

pn−1,n = pn−1,n−1 − pn,n = −∆cn−1 .

Thus, in general

pk ,n = pk ,n−1 − pk +1,n = (−1)n−k ∆n−k ck ≥ 0,

for all 0 ≤ k ≤ n. That is, the sequence {cn } is completely monotone.


(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 14 / 15
Proof (Contd.)
Hence, there exists a unique F on [0, 1] such that
Z 1
ck = θk dF (θ),
0

and

pk ,n = (−1)n−k ∆k ck = E(θk (1 − θ)n−k )


Z 1
= θk (1 − θ)n−k dF (θ).
0

Also. since {Xn } is exchangeable,


!
n
P (Sn = k ) = pk ,n
k
!Z 1
n
= θk (1 − θ)n−k dF (θ),
k 0

which proves the theorem.


(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 15 / 15

You might also like