Markov Combinations (Slides)

Markov combinations
Wong Liang Ze
Institute of High Performance Computing, A*STAR
November 20, 2020

Outline
1 Motivation
Outline
1 Motivation
2 Markov combinations of multivariate normal distributions

Outline
1 Motivation
3 Markov combinations: the general case

Outline
1 Motivation

Outline
1 Motivation
References:
Combining statistical models (Massa & Lauritzen, 2010)
Algebraic representations of Gaussian Markov combinations
(Massa & Riccomagno, 2017)
Motivation
Let A, B, C be jointly normal with variance 1.
Suppose we also know:
Cov(A, B) = 0.8
Cov(B, C ) = 0.9
Motivation
Cov(A, B) = 0.8
Cov(B, C ) = 0.9
What can we say about Cov(A, C )?

Motivation
Cov(A, B) = 0.8
Cov(B, C ) = 0.9
What can we say about Cov(A, C )?

 
1 0.8 ?
Σ = · · · 1 0.9
··· ··· 1
Motivation
 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Motivation
 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x 6= 0 (A and C should not be independent)
Motivation
 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x 6= 1 (but they should not be perfectly correlated)
Motivation
 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
x should be somewhere between 0 and 1
Motivation
 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
Motivation
 
1 0.8 x
Σ(x) = · · · 1 0.9
··· ··· 1
Intuitively,
For Σ(x) to be a valid covariance matrix:

It should be positive semi-definite
i.e. all eigenvalues should be ≥ 0
Motivation
Let’s see how the eigenvalues of Σ(x) change for −1 ≤ x ≤ 1:

Motivation

Motivation

Motivation
Can we do better?
Motivation
Can we do better?
Can we get a more precise estimate for x?

Motivation
Can we do better?
Can we get an estimate with “nice” properties?

Motivation
Can we do better?
(What are some nice properties to have?)

Motivation
Can we do better?
(What are some nice properties to have?)
What about the higher-dimensional case?

(where A, B, C are random vectors)
Markov combinations of multivariate normal distributions
Suppose we have normal random vectors:

A ΦAA ΦAB
X = ∼ N 0, Φ =
B · · · ΦBB

B ΨBB ΨBC
Y = ∼ N 0, Ψ =
C ··· ΨCC
where ΦBB = ΨBB

Suppose we have normal random vectors:

A ΦAA ΦAB
X = ∼ N 0, Φ =
B · · · ΦBB

B ΨBB ΨBC
Y = ∼ N 0, Ψ =
C ··· ΨCC
where ΦBB = ΨBB

The Markov combination of X and Y is the random vector:
ΦAA ΦAB ΦAB Φ−1
    
A BB ΨBC
X ∗ Y = B  ∼ N 0,  · · · ΦBB ΨBC 
C ··· ··· ΨCC
How can we interpret Σ of X ∗Y?

     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0
ΣAC = ΦAB Φ−1

BB ΨBC

     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0
ΣAC = ΦAB Φ−1

BB ΨBC
Simpler to look at Σ−1 :


     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0
ΣAC = ΦAB Φ−1

BB ΨBC

     
0 0 0 0 0 0 0
Φ−1
Σ−1 =  0 + 0 − 0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

     
0 0 0 0 0 0 ΣAC
Φ − 0
Σ= 0 + 0 ΦBB 0 
Ψ
0 0 0 0 Σ>
AC 0 0
ΣAC = ΦAB Φ−1

BB ΨBC

     
0 0 0 0 0 0 0
Φ−1
Σ−1 =  0 + 0 − 0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0
Non-zero entries of Σ−1 are edges in a gaussian graphical model,

which encode conditional dependencies.

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0
the edges/dependencies given by Φ−1


So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

the edges/dependencies given by Ψ−1

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

minus the ‘double-counted’ dependencies in ΦBB −1 = ΨBB −1

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

and no other edges/dependencies

So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0


So Σ−1 includes:
     
−1 0 0 0 0 0 0 0
 Φ 0 + 0  −  0 (ΦBB )−1 0 
Ψ−1
 
0 0 0 0 0 0 0

X ∗ Y has only the dependencies in X & Y – no more, no less.

Markov combinations
More generally, if we have probability density functions
pX (A, B) and pY (B, C )
such that the marginals on B agree
pX (B) = p(B) = pY (B)

Markov combinations
pX (B) = p(B) = pY (B)
we can form the Markov combination X ∗ Y whose pdf is
pX ∗Y (A, B, C ) = pX (A|B) pY (C |B) p(B)

Markov combinations
pX (B) = p(B) = pY (B)
we can form the Markov combination X ∗ Y whose pdf is
pX ∗Y (A, B, C ) = pX (A|B) pY (C |B) p(B)
pX (A, B) pY (C , B)
=
p(B)
Markov combinations
Properties of X ∗ Y :
Markov combinations
Preserves marginals:
pX ∗Y (A, B) = pX (A, B) pX ∗Y (B, C ) = pY (B, C )

Markov combinations
Conditional independence A ⊥
⊥ C | B:
pX ∗Y (A, C |B) = pX ∗Y (A|B) pX ∗Y (C |B)

Markov combinations
Conditional independence A ⊥
⊥ C | B:
pX ∗Y (A, C |B) = pX ∗Y (A|B) pX ∗Y (C |B)
Maximizes entropy:
HX ∗Y (A ∪ C ) ≥ HZ (A ∪ C )
for all other joint distributions pZ (A, B, C ) that marginalize to

pX and pY .
Summary
1 The Markov combination X ∗ Y is what we get when we

include exactly the conditional dependencies from X and Y .
Summary

2 It maximizes entropy among distributions that preserve

marginals.
Summary


marginals.
3 We have a formula for the covariance matrix of X ∗ Y when

X and Y are normal.
Summary


marginals.
3 We have a formula for the covariance matrix of X ∗ Y when

X and Y are normal.
4 This formula corresponds to ‘pasting’ of graphical models.

Markov Combinations (Slides)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Markov Combinations (Slides)

Uploaded by

Copyright:

Available Formats

Markov combinations

Institute of High Performance Computing, A*STAR

November 20, 2020

2 Markov combinations of multivariate normal distributions

2 Markov combinations of multivariate normal distributions

3 Markov combinations: the general case

2 Markov combinations of multivariate normal distributions

3 Markov combinations: the general case

2 Markov combinations of multivariate normal distributions

3 Markov combinations: the general case

Let A, B, C be jointly normal with variance 1.

Suppose we also know:

Let A, B, C be jointly normal with variance 1.

Suppose we also know:

What can we say about Cov(A, C )?

Let A, B, C be jointly normal with variance 1.

Suppose we also know:

What can we say about Cov(A, C )?

For Σ(x) to be a valid covariance matrix:

Let’s see how the eigenvalues of Σ(x) change for −1 ≤ x ≤ 1:

Let’s see how the eigenvalues of Σ(x) change for −1 ≤ x ≤ 1:

Let’s see how the eigenvalues of Σ(x) change for −1 ≤ x ≤ 1:

Can we get a more precise estimate for x?

Can we get a more precise estimate for x?

Can we get an estimate with “nice” properties?

Can we get a more precise estimate for x?

Can we get an estimate with “nice” properties?

(What are some nice properties to have?)

Can we get a more precise estimate for x?

Can we get an estimate with “nice” properties?

(What are some nice properties to have?)

What about the higher-dimensional case?

Suppose we have normal random vectors:

where ΦBB = ΨBB

Suppose we have normal random vectors:

where ΦBB = ΨBB

How can we interpret Σ of X ∗Y?

ΣAC = ΦAB Φ−1

How can we interpret Σ of X ∗Y?

ΣAC = ΦAB Φ−1

Simpler to look at Σ−1 :

How can we interpret Σ of X ∗Y?

ΣAC = ΦAB Φ−1

Simpler to look at Σ−1 :

How can we interpret Σ of X ∗Y?

ΣAC = ΦAB Φ−1

Simpler to look at Σ−1 :

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

the edges/dependencies given by Φ−1

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

the edges/dependencies given by Φ−1

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

the edges/dependencies given by Φ−1

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

the edges/dependencies given by Φ−1

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

the edges/dependencies given by Φ−1

Non-zero entries of Σ−1 are edges in a gaussian graphical model,

the edges/dependencies given by Φ−1

X ∗ Y has only the dependencies in X & Y – no more, no less.

More generally, if we have probability density functions