Professional Documents
Culture Documents
Aglietti2020 Part2
Aglietti2020 Part2
Gaussian Processes
Virginia Aglietti
1
Causal decision-making
1
Systems/processes decompose in sets of interconnected nodes
Eelworm
t +1
Soil
fumigants
Figure 1: Causal Graph for Crop Yield. Nodes denote variables, arrows
represent causal effects and dashed edges indicate unobserved confounders.
2
Systems/processes decompose in sets of interconnected nodes
Chlα Salin.
ΩA TA
pHsw DIC
Nutr. PCO2
Light Tem.
NEC
Figure 2: Causal Graph for Net Ecosystem Calcification (NEC). Dotted nodes
represent non-manipulative variables.
3
Systems/processes decompose in sets of interconnected nodes
BMI
Age Aspirin
Cancer
Statin PSA
Figure 3: Causal Graph for Prostate Specific Antigen (PSA) level. Dotted
nodes represent non-manipulative variables.
4
Common elements in these examples
Research goal
Efficiently find the system configuration that optimises the target node.
5
Research Goal
Research goal
Efficiently find the system configuration that optimises the target node.
6
Questions to answer
Soil
fumigants
7
Questions to answer
Soil
fumigants
7
Questions to answer
7
Research Goal
Research goal
Efficiently find the system configuration that optimises the target node.
1. Perform experiments
2. Integrate interventional and observational data
3. Transfer interventional information.
8
Causal models and do-calculus
Causal models and do-calculus (1 of 3)
9
Causal models and do-calculus (2 of 3)
C C
X Y X =x Y
C = fc (Uc ) C = fc (Uc )
X = fx (C , Ux ) X = x
Y = fy (X , C , Uy ) Y = fy (x, C , Uy )
X Y
R
Back-door adjustment: p(Y |do(X = x)) = P(Y |c, X = x)P(c)dc.
11
Causal Optimization
12
Gobal optimization vs. Causal optimization
X?s , x?s = arg min E[Y |do (Xs = xs )] x? = arg min E[Y |do (X = x)]
Xs ∈P(X) x∈D(X)
xs ∈D(Xs )
A E B E D A C
Y
Y
B C D
13
Solving Global Optimization
Bayesian Optimization
14
Bayesian optimization
15
Solving Causal Optimization
+
• Causal graph
Idea: Run interventions (Xs1 , xs1 ), . . . , (Xsn , xsn ) to find the optimum as
fast as possible.
16
Do we need to explore all 2|X| sets in P(X)? NO!
17
Do we need to explore all 2|X| sets in P(X)? NO!
X Z Y
X = X
Z = exp(−X ) + Z
Z
Y = cos(Z ) − exp(− ) + Y
20
MG,Y = {∅, {X }, {Z }}
PG,Y = {{Z }}
BG,Y = {{X , Z }}
18
Modelling E[Y|do (Xs = xs )] for each Xs with Causal GP prior
where
||x −x0 ||2
• kRBF (xs , x0s ) := exp(− s 2l 2 s )
q
• σ(xs ) = V̂(Y |do (Xs = xs )) with V̂ is the variance of the causal
effects estimated from observational data.
19
Modelling E[Y|do (Xs = xs )] for each Xs with Causal GP prior
α? := max{α1 , . . . , α|es| }
s? = argmax αs
s∈{1,··· ,|es|}
21
Solving the intervention-observation trade-off
-greedy criteria
Vol(C(DO )) N
= × ,
Vol(×X ∈X (D(X ))) Nmax
23
Simulation analysis
24
Take home messages:
24
Limitations of CBO
25
Limitation of CBO
X Z Y
MG,Y = {∅, {X }, {Z }}
• fX (x)
• fZ (z)
• fX ,Z (x, z)
26
Limitation of CBO
Research goal
Efficiently find the system configuration that optimises the target node.
27
Research goal
|P(X)|
T = {ts (x)}s=1 ts (x) = fs (x) = Ep(Y |do(Xs =x)) [Y ] = E[Y |do (Xs = x)].
I
I Ns
given DO = {xn , yn }N I I I I
S
n=1 and D = (X , Y ) with X = s {xsi }i=1 and
I
N
YI = s {ysiI }i=1
S s
.
Goal
Define p(T) and compute p(T|DI ) so as to make probabilistic
predictions for T at some unobserved intervention sets and levels.
28
Steps in the methodology
|P(X)|
1. Study the correlation among functions in T = {fs (x)}s=1 which
varies with the topology of G.
2. Define a joint prior distribution p(T).
3. Develop a multi-task model based on p(T) so as to compute the
posterior p(T|DI ).
29
1. Study the correlation among intervention functions
Z Z
Ls (f )(x) = ··· πs (x, (vsN , c))f (v, c)dvsN dc,
30
1. Study the correlation among intervention functions
Z Z
Ls (f )(x) = ··· πs (x, (vsN , c))f (v, c)dvsN dc,
• f (v, c) = E Y |do (I = v) , CN = c
• CN is the set of variables in G that are directly confounded with Y
and are not colliders.
• I is the set Pa(Y ).
• πs (x, (vsN , c)) = p(cIs |cN N N
s )p(vs , cs |do (Xs = x)) is the integrating
measure for the set Xs .
30
1. Study the correlation among intervention functions
31
1. Study the correlation among intervention functions
• (a) I = {Z }, CN = ∅
• (b) I = {E , D}, CN = {A, B}
32
2. Define a joint prior p(T)
33
2. Define a joint prior p(T)
where V̂ and Ê represent the variance and expectation of the causal effects
estimated from DO using do-calculus.
34
2. Define a joint prior p(T)
• Step (2): propagating this prior through Ls (·) for all ts (x)
35
The DAG-GP model
36
A helicopter view
Interventional data
No Yes
Single-task Multi-task
Observational data
gp
dag-gp
Mechanistic p(T ) = s p(ts (x)) p(T) = s p(ts (x)|f )
No model
ts (x) ∼ GP(0, KRBF (x, x′ )) ts (x) = f (b)πs (x, bs )dbs
f (b) ∼ GP(0, KRBF (b, b′ ))
gp+
dag-gp+
Yes do-calculus p(T ) = s p(ts (x)) p(T) = s p(ts (x)|f )
ts (x) ∼ GP(m+ (x), K + (x, x′ )) ts (x) = f (b)πs (x, bs )dbs
f (b) ∼ GP(m+ (b), K + (b, b′ ))
37
Learning intervention function from data
38
DAG-GP as surrogate model in CBO
39
DAG-GP as surrogate model in CBO
Figure 9: Convergence of the CBO algorithm to the global optimum for PSA
(E[PSA? |do (Xs = x)]) when a single-task or a multi-task GP model are used as
surrogate models.
40
Take home messages:
40
Thanks for your attention!
40