You are on page 1of 48

Markov combinations

Wong Liang Ze

Institute of High Performance Computing, A*STAR

November 20, 2020


Outline

1 Motivation
Outline

1 Motivation

2 Markov combinations of multivariate normal distributions


Outline

1 Motivation

2 Markov combinations of multivariate normal distributions

3 Markov combinations: the general case


Outline

1 Motivation

2 Markov combinations of multivariate normal distributions

3 Markov combinations: the general case


Outline

1 Motivation

2 Markov combinations of multivariate normal distributions

3 Markov combinations: the general case

References:
Combining statistical models (Massa & Lauritzen, 2010)
Algebraic representations of Gaussian Markov combinations
(Massa & Riccomagno, 2017)
Motivation

Let A, B, C be jointly normal with variance 1.

Suppose we also know:

Cov(A, B) = 0.8
Cov(B, C ) = 0.9
Motivation

Let A, B, C be jointly normal with variance 1.

Suppose we also know:

Cov(A, B) = 0.8
Cov(B, C ) = 0.9

What can we say about Cov(A, C )?


Motivation

Let A, B, C be jointly normal with variance 1.

Suppose we also know:

Cov(A, B) = 0.8
Cov(B, C ) = 0.9

What can we say about Cov(A, C )?


 
1 0.8 ?
Σ = · · · 1 0.9
··· ··· 1
Motivation

 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Motivation

 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x 6= 0 (A and C should not be independent)
Motivation

 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x 6= 0 (A and C should not be independent)
x 6= 1 (but they should not be perfectly correlated)
Motivation

 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x 6= 0 (A and C should not be independent)
x 6= 1 (but they should not be perfectly correlated)
x should be somewhere between 0 and 1
Motivation

 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x 6= 0 (A and C should not be independent)
x 6= 1 (but they should not be perfectly correlated)
x should be somewhere between 0 and 1
Motivation

 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x 6= 0 (A and C should not be independent)
x 6= 1 (but they should not be perfectly correlated)
x should be somewhere between 0 and 1

For Σ(x) to be a valid covariance matrix:


It should be positive semi-definite
i.e. all eigenvalues should be ≥ 0
Motivation

Let’s see how the eigenvalues of Σ(x) change for −1 ≤ x ≤ 1:


Motivation

Let’s see how the eigenvalues of Σ(x) change for −1 ≤ x ≤ 1:


Motivation

Let’s see how the eigenvalues of Σ(x) change for −1 ≤ x ≤ 1:


Motivation

Can we do better?
Motivation

Can we do better?

Can we get a more precise estimate for x?


Motivation

Can we do better?

Can we get a more precise estimate for x?

Can we get an estimate with “nice” properties?


Motivation

Can we do better?

Can we get a more precise estimate for x?

Can we get an estimate with “nice” properties?

(What are some nice properties to have?)


Motivation

Can we do better?

Can we get a more precise estimate for x?

Can we get an estimate with “nice” properties?

(What are some nice properties to have?)

What about the higher-dimensional case?


(where A, B, C are random vectors)
Markov combinations of multivariate normal distributions

Suppose we have normal random vectors:


    
A ΦAA ΦAB
X = ∼ N 0, Φ =
B · · · ΦBB

    
B ΨBB ΨBC
Y = ∼ N 0, Ψ =
C ··· ΨCC

where ΦBB = ΨBB


Markov combinations of multivariate normal distributions

Suppose we have normal random vectors:


    
A ΦAA ΦAB
X = ∼ N 0, Φ =
B · · · ΦBB

    
B ΨBB ΨBC
Y = ∼ N 0, Ψ =
C ··· ΨCC

where ΦBB = ΨBB


The Markov combination of X and Y is the random vector:
ΦAA ΦAB ΦAB Φ−1
    
A BB ΨBC
X ∗ Y = B  ∼ N 0,  · · · ΦBB ΨBC 
C ··· ··· ΨCC
Markov combinations of multivariate normal distributions

How can we interpret Σ of X ∗Y?


     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0

ΣAC = ΦAB Φ−1


BB ΨBC
Markov combinations of multivariate normal distributions

How can we interpret Σ of X ∗Y?


     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0

ΣAC = ΦAB Φ−1


BB ΨBC

Simpler to look at Σ−1 :


Markov combinations of multivariate normal distributions

How can we interpret Σ of X ∗Y?


     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0

ΣAC = ΦAB Φ−1


BB ΨBC

Simpler to look at Σ−1 :


     
0 0 0 0 0 0 0
Φ−1
Σ−1 =  0 + 0 − 0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0
Markov combinations of multivariate normal distributions

How can we interpret Σ of X ∗Y?


     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0

ΣAC = ΦAB Φ−1


BB ΨBC

Simpler to look at Σ−1 :


     
0 0 0 0 0 0 0
Φ−1
Σ−1 =  0 + 0 − 0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0
Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.
Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0
Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

the edges/dependencies given by Φ−1


Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

the edges/dependencies given by Φ−1


the edges/dependencies given by Ψ−1
Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

the edges/dependencies given by Φ−1


the edges/dependencies given by Ψ−1
minus the ‘double-counted’ dependencies in ΦBB −1 = ΨBB −1
Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

the edges/dependencies given by Φ−1


the edges/dependencies given by Ψ−1
minus the ‘double-counted’ dependencies in ΦBB −1 = ΨBB −1
and no other edges/dependencies
Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

the edges/dependencies given by Φ−1


the edges/dependencies given by Ψ−1
minus the ‘double-counted’ dependencies in ΦBB −1 = ΨBB −1
and no other edges/dependencies
Markov combinations of multivariate normal distributions

Non-zero entries of Σ−1 are edges in a gaussian graphical model,


which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

the edges/dependencies given by Φ−1


the edges/dependencies given by Ψ−1
minus the ‘double-counted’ dependencies in ΦBB −1 = ΨBB −1
and no other edges/dependencies

X ∗ Y has only the dependencies in X & Y – no more, no less.


Markov combinations

More generally, if we have probability density functions

pX (A, B) and pY (B, C )

such that the marginals on B agree

pX (B) = p(B) = pY (B)


Markov combinations

More generally, if we have probability density functions

pX (A, B) and pY (B, C )

such that the marginals on B agree

pX (B) = p(B) = pY (B)

we can form the Markov combination X ∗ Y whose pdf is

pX ∗Y (A, B, C ) = pX (A|B) pY (C |B) p(B)


Markov combinations

More generally, if we have probability density functions

pX (A, B) and pY (B, C )

such that the marginals on B agree

pX (B) = p(B) = pY (B)

we can form the Markov combination X ∗ Y whose pdf is

pX ∗Y (A, B, C ) = pX (A|B) pY (C |B) p(B)

pX (A, B) pY (C , B)
=
p(B)
Markov combinations

Properties of X ∗ Y :
Markov combinations

Properties of X ∗ Y :
Preserves marginals:

pX ∗Y (A, B) = pX (A, B) pX ∗Y (B, C ) = pY (B, C )


Markov combinations

Properties of X ∗ Y :
Preserves marginals:

pX ∗Y (A, B) = pX (A, B) pX ∗Y (B, C ) = pY (B, C )

Conditional independence A ⊥
⊥ C | B:

pX ∗Y (A, C |B) = pX ∗Y (A|B) pX ∗Y (C |B)


Markov combinations

Properties of X ∗ Y :
Preserves marginals:

pX ∗Y (A, B) = pX (A, B) pX ∗Y (B, C ) = pY (B, C )

Conditional independence A ⊥
⊥ C | B:

pX ∗Y (A, C |B) = pX ∗Y (A|B) pX ∗Y (C |B)

Maximizes entropy:

HX ∗Y (A ∪ C ) ≥ HZ (A ∪ C )

for all other joint distributions pZ (A, B, C ) that marginalize to


pX and pY .
Summary

1 The Markov combination X ∗ Y is what we get when we


include exactly the conditional dependencies from X and Y .
Summary

1 The Markov combination X ∗ Y is what we get when we


include exactly the conditional dependencies from X and Y .

2 It maximizes entropy among distributions that preserve


marginals.
Summary

1 The Markov combination X ∗ Y is what we get when we


include exactly the conditional dependencies from X and Y .

2 It maximizes entropy among distributions that preserve


marginals.

3 We have a formula for the covariance matrix of X ∗ Y when


X and Y are normal.
Summary

1 The Markov combination X ∗ Y is what we get when we


include exactly the conditional dependencies from X and Y .

2 It maximizes entropy among distributions that preserve


marginals.

3 We have a formula for the covariance matrix of X ∗ Y when


X and Y are normal.

4 This formula corresponds to ‘pasting’ of graphical models.

You might also like