You are on page 1of 10

Probabilistic evaluation of sequential plans from causal models

with hidden variables

Judea Pearl James Robins


Cognitive Systems Laboratory Departments of Epidemiology & Biostatistics
Computer Science Department School of Public Health
University of California, Los Angeles Harvard University
Los Angeles, CA 90095-1596 Boston, MA 02115
judea@cs.ucla.edu robins@hsph.harvard. edu

Abstract
The paper concerns the probabilistic eval-
uation of plans in the presence of unmea-
sured variables, each plan consisting of sev-
eral concurrent or sequential actions. We
establish a graphical criterion for recogniz-
ing when the effects of a given plan can
be predicted from passive observations on
measured variables only. When the crite-
rion is satisfied, a closed-form expression is
provided for the probability that the plan
will achieve a specified goal.

Key words: Plan evaluation, causal effect, sequential


treatments, causal diagrams, graphical models. Figure 1: Illustrating the problem of evaluating the
effect of the plan (do(xl), do(x2)) on Y, from nonex-
perimental data taken on X I , Z , X2 and Y.
1 INTRODUCTION
The problem addressed in this paper is the probabilis-
an indicator of the patient's underlying immune status
tic evaluation of the effects of plans, when knowledge
(Ug) which can cause death (Y). X I and Xg stand for
is encoded in the form of a partially specified causal
bactrim - a drug that prevents P C P ( 2 ) and may also
diagram. We are given the topology of the diagram
prevent death by other mechanisms. Doctors used the
but not the conditional probabilities on all variables.
patient's earlier PCP history (U1) to prescribe X I , but
Numerical probabilities are given to only a subset of
its value was not recorded for data analysis.
variables which are deemed "observable", while those
deemed "unobservable" serve only to specify possible The problem we face is as follows. Assume we have
connections among observed quantities, but are not
given numerical probabilities.
-
collected a large amount of data on the behavior of
many patients and physicians, which is summarized in
the form of (an estimated) joint distribution P of the
To motivate the discussion, consider an example dis-
observed four variables (XI, Z, Xg, Y). A new pa-
cussed in Robins (1993, Appendix 2), as depicted in
tient comes in and we wish todetermine'the i m ~ a c of
t
Figure 1. The variables X I and Xg stand for treat-
the (unconditional) plan (do(xl), do(x2)) on survival
ments that physicians prescribe to a patient at two dif-
(Y), where x l and 2 2 are two predetermined dosages of
ferent times, Z represents observations that the second
bactrim, to be administered a t two prespecified times.
physician consults t o determine Xg, and Y represents
the patient's survival. The hidden variables U1 and More generally, our problem amounts to that of evalu-
U2 represent, respectively, part of the patient history ating a new plan by watching the performance of other
and the patient disposition to recover. A simple real- planners whose decision strategies are indiscernible.
ization of such structure could be found among AIDS Physicians do not provide a description of all inputs
patients, where Z represents episodes of PCP - a com- which prompted them to prescribe a given treatment;
mon opportunistic infection of AIDS patients which, all they communicate to us is that Ul was consulted
as the diagram shows, does not have a direct effect on in determining X1 and that Z and X1 were consulted
survival (Y) (since it can be treated effectively) but is in determining X2. But U1, unfortunately, was not
Plan Evaluation 445

recorded. to extract from such data a consistent estimate of the


probability of Y under hypothetical interventions with
The problem of learning from the performance of other variables X . In a related paper [Galles & Pearl 19951
planners is that one is never sure whether an observed it is shown that the identification of causal effect be-
response is due to the planner's action or due to the tween two singleton variables (say X1 and Yl) can be
event which triggered that action and simultaneously accomplished systematically, in time polynomial in the
caused the response. Such events are called "con- number of variables in the graph.
founders". The standard techniaues of dealing with -
potential confounders is to adjust for possible varia- This paper extends certain results
tions of confounders by stratification. This amounts to of [Galles & Pearl 19951 t o the case where X stands
conditioning the distribution on the various states of for a compound action, consisting of several atomic in-
the confounding variables, evaluating the effect of the terventions which are implemented either concurrently
plan in each state separately, then taking the (weighted or sequentially. We establish a graphical criterion for
) average over those states. However, in planning prob- recognizing when the effect of X on Y is identifiable
lems like the one above stratification is exacerbated by and, in case the diagram satisfies this criterion, we
two problems. First, some of the potential confounders provide a closed-form expression for the distribution
are unobservable (e.g., Ul), so they cannot be condi- of an outcome variable Y under the plan defined by
tioned on. Second, some of the confounders (e.g., Z ) the compound action do(X = z). The derived expres-
are affected by the control variables and, one of the sions invoke only measured probabilities as obtained,
deadliest sins in the design of statistical experiments for example, by recording past performances of other
[Cox 1958, page 481 is to stratify on such variables. planning agents or, in case the elements of X are not
The sin being that stratification simulates holding a controlled by agents, by taking passive measurements
variable constant, but holding constant a variable that from the environment. If Y stands for a goal variable,
stands between an action and its consequence prevents then the formula provides an expression for the proba-
us from obtaining an accurate reading on the unmedi- bility that the plan X would lead to goal satisfaction.
ated effect of that action.
The techniques developed in this paper will enable us
t o recognize in general, by graphical means, whether a 2 PLAN IDENTIFICATION
proposed plan can be evaluated from the joint distri-
bution on the observables and, if the answer is positive, Notation:
which covariates should be adjusted for, and how. A control problem consists of a directed acyclic graph
Our starting point is a knowledge specification scheme (DAG) G with vertex set V, partitioned into four dis-
in the form of a causal diagram, like the one shown joint sets V = {X, 2, U , Y )
in Figure 1, which provides a qualitative summary
of the analyst's understanding of the relevant data- X - represents the set of control variables (exposures,
generating processes.1 The semantics behind causal interventions, etc.)
diagrams and their relations to actions and be-
lief networks have been discussed in prior publica- Z - represents the set of observed variables, often
tions [Pearl & Verma 1991, Goldszmidt & Pearl 1992, called covariates.
Druzdzel & Simon 1993, Pearl 1993a,
Spirtes et al. 1993, Pearl 1993bI. In Spirtes et al. U - represents the set of unobserved (latent) vari-
(1993) and later in Pearl (1993b), for example, it was ables.
shown how causal networks can be used to facilitate
quantitative predictions of the effects of interventions, Y - represents an outcome variable.
including interventions that were not contemplated
during the network's construction. A more recent pa- In this section, we let the control variables be tempo-
per [Pearl 19941 reviews this aspect of causal networks, rally ordered X = X I , X2,. . . , X, such that every XI,
and proposes a calculus for deriving probabilistic as- is an ancestor of X k + j ( j > 0) in G , and we let the
sessments of the effects of actions in the presence of outcome Y be a descendent of X,. We relax these
unmeasured variables. Using this calculus (reviewed assumptions in Section 6. Let Nk stand for the set
in Appendix I) graphical criteria can be established of observed nodes that are nondescendents of XI,. A
for deciding whether the effect of one variable (X) on plan is an ordered sequence (21, 22, . . . , x,) of value-
another (Y) is identifiable from sample data involving assignments to the control variables, where x k means
only observed variables, namely, whether it is possible "XI, is set to xk" . A conditional plan is an ordered
sequence (gl (z), g2(z), . . . , gn(z)) where each gk is a
An alternative specification scheme using counterfac- function from Z to XI,, and gk(z) stands for the state-
tual statements was developed earlier by Robins (1986,
1987), and was used to study the identification problem by ment "set X k to gk(z) whenever Z attains the value
non-graphical techniques. Robins' scheme extended Ru- z". The support of each gk(z) function must not con-
bin's (1978) counterfactual scheme for singleton actions to tain any Z variables which are descendants of XI, in
compound actions and plans. G.
446 Pearl and Robins

Our problem is to evaluate an unconditional plan2, control problem shown in Figure 1. First, we will show
namely, to compute P ( y J x l , 22,. . . , xn) which repre- that P ( y ) f1, x2) cannot be identified without measur-
sents the impact of the plan (f 1 , . . . ,2,) on the out- ing Z , namely, the sequence Z1 = (01, Z2 = (0) would
come variable Y. The expression P(y 121, f 2 , . . . , i n ) not satisfy conditions (1)-(2). The two d-separation
is said to be identifiable in G if, for every assign- tests encoded in (2) are:
ment (xl, x2, . . . ,x,), the expression can be deter-
mined uniquely from the joint distribution of the ob-
servable~{X, Y, 2). A control problem is said to be
identifiable whenever P(ylP1, x2, . . . , kn) is identifi- The two subgraphs associated with these tests are

P
able. shown in Figure 2, (a) and (b), respectively. We see
Our main identifiability criteria are presented in Theo-
rems 1 and 3 below. These invoke d-separation tests on Ul Ul
various subgraphs of G, defined as follows. Let X , Y, . .?,
..' ..
, ,*?,
.,' ..
,I
and Z be arbitrary disjoint sets of nodes in a DAG G.
We use the expression (X 11 Y JZ)G to denote that X, t -..,, ,....-:.?
the set Z d-separates X f r o m y in G. We denote by ' 2 ----- --...-
1-.-.--
..--.--..vu2
Ga the graph obtained by deleting from G all arrows +- .... ........,,
pointing to nodes in X . Likewise, we denote by Gx the Z. .a. T,'
graph obtained by deleting from G all arrows emerging
from nodes in X . To represent the deletion of both
.."
xz,. XZ,..,'
incoming and outgoing arrows, we use the notation
A ".,
GFz. Finally, the expression P(y(k,z) =
P(y, zlx)/P(zJx) stands for the probability of Y = y Y Y
given that Z = z is observed and X is held constant
at x. Figure 2: The two subgraphs of G used in testing the
identifiability of (&, x2); (a) Gx x2and (b) Gx .
-1 '
-2

3 ADMISSIBLE MEASUREMENTS
that, while (Y 1) X l ) holds in Gx y2,(Y 11 X21X1)
Theorem 1 P(ylxl, . . . , itn) is identifiable if for ev- fails to hold in&-. Thus, in o z i r to passthe test,
ery 1 5 k 5 n there exists a set 21,of covariates sat-
isfying: we must have eith; Z1 = {Z) or Z2 = {Z) but, since
Z is a descendant of X I . onlv" the second alternative
c
Zk Nk, (1) A ,

remains: Z2 = ( 2 ) . The test applicable to the se-


(i.e., Zk consists of non-descendants of X k ) and quence Z1 = {0), Z2 = {Z) are: (Y - (I XI)G%,f,
and (Y - 11 X21X1,2 ) ~ Figure~ ~ 2. shows that -doth
tests are now satisfied, because {XI, Z ) d-separates
(2)
When the conditions above are satisfied, the plan eval- Y from X2 in Figure 2(b). Having satisfied conditions
uates to3 (1)-(2), Eq. (3) provides a formula for the effect of plan
(i1,P2) on Y:

The question naturally arises whether the sequence


Z1 = (0) Z2 = {Z) can be identified without exhaus-
tive search. This will be answered in Corollary 2 and
Theorem 3.
Before presenting its proof, let us demonstrate how Proof of Theorem 1: The proof given here is based
Theorem 1 can be used t o test the identifiability of the on the inference rules described in Appendix I which
facilitate the reduction of causal-effect formulas to hat-
2~dentification of conditional plans has been con- free expressions. An alternative proof is provided in
sidered in [Robins 1986, 19871 and certain extensions
of our graphical results are presented in [Pearl 1994,
Section 6.1.
Robins & Pearl 19951.
3 ~ h computation
e and estimation of sum-product ex- 1. The condition Zk C Nk implies Zk Nj for all
pressions of the form given in Eq. (3), where Z k stand for j 2 k. Therefore, we have
any subset of N k , were investigated by J. Robins under the
rubric "G-computation algorithm formula" [Robins 19861,
hereafter G-formula.
Plan Evaluation 447

This is so because no node in Theorem 1 provides a declarative condition for plan


{Z1, . . . , Zk,X I , . . . , Xk-l} can be a descendant identifiability. It can be used to ratify a proposed
of any node in {Xk,. . . , Xn}, hence, Rule 3 al- causal effect formula for a given plan, but does not
lows us to delete the hat variables from the ex- provide an effective procedure for deriving such for-
pression. mulas, because the choice of each Zk is not spelled out
2. Condition (2) permits us to invoke Rule 2 and procedurally; the possibility exists that some choices
write: of Z k , satisfying (1) and (2), might prevent us from
continuing the reduction process even in cases where
such reduction exists.
This is illustrated in Figure 3. Here W is an ad-
Thus, we have missible choice for Z1, but if we make this choice
we will not be able to complete the reduction, since
P(yIi1,. . ., i n ) no set Z2 can be found that satisfies condition (2):
= C P ( Y ,...,i..)P(zllil,...,fn) (Y -1) X21x1,W, Z ~ ) G , In this case it would be
~ ~ , ~ ~ , ~ Zwiser to choose Z1 = Z2 = 0, which satisfies both:
-2

21

= C P ( y l z 1 , x 1 , ~ 2 ., . , i n ) P ( ~ l ) II X I I 0)GLl , and (Y -
(Y - II X21Xl. 0 ) ~ ~ ~ ~ ~
21

Figure 3: Illustrating an admissible choice Z1 = W


that rules out any admissible choice for Zz.

Definition 1 Any sequence Z1, . . . , Zn of covariates 4 EVALUATION BY G-FORMULA


satisfying conditions ( 1 ) and ( 2 ) will be called uadmis-
sible" and any expression P(ylP1, 22, . . . , 2,) which is Let Lk consist of all non descendants of X k which are
identifiable b y the criterion of Theorem 1 will be called descendants of Xk-1, including both observed and un-
G-identifiable. observed variables but exclusive of the controlled vari-
ables. Robins 119871 has shown, using counterfactual
An immediate corollary of the definition above is analysis, that
Corollary 1 A control problem is G-identifiable if it
has an admissible sequence.

G-identifiability is sufficient but not necessary for plan


identifiability as defined above (see also Definition 3,
Appendix I). The reasons are two fold. First, the com-
pleteness of the three inference rules used in the reduc-
tion of (3) is still a pending conjecture. Second, the and named (5) the G-formula based on L1,. . . , L,.
kth step in the reduction of (3) refrains from condi- One way of verifying (5) is to write the post-
tioning on variables Zk that are descendants of X k ; intervention distribution on all uncontrolled variables
namely, variables that may be affected by the action (using (18))
do(Xk = xk). In certain causal structures, the identi-
fiability of causal effects requires that we condition on
such variables [Pearl 19941.
448 Pearl and Robins

n
= ~ ( Y I P ~ ~ PY () ~ ~ I ~ ~(6)~ L ~ )
k=l

then take the marginal distribution on Y by summing

.
on the l k ' s The identity of (6) and (5) follows from
the independence
P(lkIparLk) = P(lkI11,. . ., h-1, x i , . . . , ~ k - i ) (7)
X,
Upon explicating the lk's in (5), we may find that some
factors contain latent variables. When this happens we
may try to use the conditional independencies encoded
in the graph to eliminate those latent variables and,
if we succeed, the plan would be identifiable and the
resulting formula would give the desired causal effect.
Let us demonstrate this method on the of Figure 4: Illustrating non-uniqueness of minimal ad-
Figure 1. The Lk sequence 's given by L1 = {Ul} and missible sets: Z1 and z: are each minimal and admis-
L2 = {Z, U2}. Substituting in the G-formula yields. sible.

satisfying (2) having no proper subset which satisfies


(2). However, since there are usually a large number
of such minimal sets (see Figure 4), the question re-
Using the mains whether every choice of a minimal Zk is "safe",
graph independencies (Y I( {Z, U1) I {XI, X2, U~})G namely whether we can be sure that no choice of a
II U l l X l ) ~we
and (Uz - , get minimal subsequence 21, . . . , Zk will ever prevent us
from finding an admissible Zk+l, in case some admis-
sible sequence Zf , . . . ,Z; exists.

The next result guarantees the "safety" of every min-


imal subsequence Z1, . . . , Zk and, hence, provides an
effective test for G-identifiability.

T h e o r e m 2 If there exists an admissible sequence


Zf , . . . , Z;, then for every minimally admissible subse-
quence Z1,. . . , Zk-1 of covariates, there is an admis-
sible set Zk.
= C ~ ( ~ 1 x 1 ,z ) ~ ( z l x 1 )
z
"2,
Proof: The proof will be based on Lemmas 1 and 2
which are proved separately in Appendix 11.
which agrees with (4) under the admissible sequence
Z1 = 0 Z2 = Z . Thus, by succeeding to eliminate L e m m a 1 For any DAG G and any two disjoint sub-
the U variable from the G-formula, we obtain a confir- sets of nodes X and Y, let the ancestor-set of (X, Y),
mation of plan identifiability together with the correct denoted A(X, Y), be the set of nodes which have a de-
causal effect estimands. scendant in either X o r Y.
The elimination method above still requires some The fo/[owing two separation conditions hold for any
search and algebraic skill. In addition, when the num- sets nodes w and 2:
ber of latent variables increases, the expressions tend
to become rather involved. We now return to the prob- 11 XIZ, WnA(X,Y))G whenever (Y
(Y - e11 X I Z ) ~
lem of finding an admissible sequence, if one exists, (8)
thus eliminating the search altogether. I( XIW r l A(X,Y))G whenever (Y -
(Y - /I XIW)G
(9)
5 FINDING AN ADMISSIBLE
SEQUENCE Eq. (8) asserts that conditioning on nodes from an
ancestral set can only create, never destroy indepen-
The obvious way to avoid bad choices of covariates, like dencies. Eq. (9) asserts that conditioning on all the
the one illustrated in Figure 3, is to insist on always nodes outside the ancestral set can only destroy, never
choosing a "minimal" Z k , namely, a set of covariates create independencies.
Plan Evaluation 449

Lemma 2 Denote by Gk the subgraph Gx ,T~+~ -, , , , ,From


, ~ the proof of Theorem 2, it is obvious that we
-4
of G , and let Ak be the ancestral set of (Xk, Y) i n Gk. need not insist on choosing minimal Z k . That re-
For any j > 0, Ak is a subset of the ancestral set of quirement only insures that we do not step outside
Ak and spoil the selection of future subsets. In fact,
(Xk ,Y) i n Gk+j.
Lemma 1 guarantees that if an admissible sequence
We now prove Theorem 2 by contradiction. Suppose exists, then Wl, W2,. . . , Wn is such a sequence, where
that Z1, . . . , Zk-1 is minimally admissible sequence, Wk = Ak n Nk. Accordingly, we can now rewrite The-
and that no admissible set Zk exists. This means, in orem 1 in terms of an explicit sequence of covariates.
particular, that the set Zk = Ak n Nk is inadmissible,
Theorem 3 P(ylxl, . . . , 2,) is G-identifiable if and
i.e., only if the following condition holds for every 1 5 k 5
n
-
11 XkIXl,...,Xk-l,Wl,W2,...,Wk)~~~,y
(Y -
Now observe that no node in the sequence
2 1 , . . . , Zk-1 can reside outside Ak n Nk. This is so where Wk = Ak n Nk, namely, Wk is the set of all
because admissibility dictates Zi E Ni and covariates in G that are both non-descendants of Xk
and have either Y or Xk as descendant. Moreover,
when the condition above is satisfied the plan evaluates
to
so, the lowest i for which Zi contains a nonmember of
Ak will violate minimality (by (9)). Indeed, Lemma P(yIx1, . . . , xn) =
2 insures that the violating Zi must also contain non-
members of Ai (in Gi), and (9) implies that if we re-
C
Wl,...,W,
P ( Y [ w I ,.. . , wn, X I , .. ., xn)

move all non-Ai from a conditioning set, we do not de-


stroy any separation. Moreover, since such a removal
from {XI, . . . , Xi-1, 21,. . . , Zi) will only affect Z i , we
nn

k=l
P(wklw1,. k - 1 1 ,- 1 (14)
can substitute Ai for Zi in Eq. (11). This implies that
Ai satisfies (2) and Zi is non-minimal, which is a con-
tradiction. We are now assured that Z1,. . . , Zk-1 are 6 GENERALIZATIONS
in Ak n Nk. Likewise, since {XI, . . . , Xk-l) is also in
Ak n Nk, (10) can be rewritten as 6.1 Y AND Z NON-DISJOINT
In practice, we will often be interested in a vector out-
come Y with components of Y being ancestors of con-
To prove that (10) is false, contrast (12) with the trol variables Xk for some k. For instance, in our AIDS
assumption that there exists an admissible sequence example, we may be interested in survival Y not only
Z;, . . .,Z;. Let Z* = Z; U:~(Z: U {Xi)). Admis- at a time after subjects have received treatment X2 but
sibility states that (2) is satisfied by 21,= 22, hence, also at a time after receiving treatment X1 but before
(Y 11 Xk ~ Z * ) G , .By (9), we can intersect the condi- receiving X2. If a component of Y is both an ancestor
tioning set Z* with Ak, yielding (Y - 11 XklZ* n Ak). of a control variable Xk and of a later component of
Finally, since Z* Nk, we have Y, it is necessary to regard the former component as a
confounding variable that must be adjusted for to esti-
mate the effect of the plan on Y. To do so, we no longer
impose the assumption that Y is a descendent of Xn
But (12) and (13) together contradicts (8), because (8) and that Y and Z are disjoint. Rather, we shall only
asserts that whenever we add to the conditioning set require that Y C Z where, henceforth, Z represents all
members of Ak, we preserve independencies. &ED observed non-control variables. With this redefinition
Theorem 2 now provides an effective decision proce- of Y , with the understanding that (Y - 11 XIX)G VX,
dure for testing G-identifiability: we prove below that

Corollary 2 A control problem is G-identifiable if Theorem 4 Given Y C Z, Theorem 1 remains true.


and only if the following algorithm exits with success:
Further, under the above redefinitions of Y and Z , we
1. Set k = 1 also obtain a natural generalization of Theorem 3. Let
Y; be the subset of Y that is not in Nk and let Yk
2. Choose any minimal Zk E Nk satisfying (2)) be the subset of Y that is in Nk. Redefine Ak to be
the ancestral set of (Xk ,Y;) in graph G k . Robins and
3. If no such Zk exists, exit with failure. Else set pearl (1995) prove
k=k+l,
Theorem 5 : Given Y C Z , Theorem 3 remains true
4. If k = n + 1, exit with success, else go t o step 2. with Wk redefined to be (Ak Nk) UYk. n
450 Pearl and Robins

A key step in the proof of Theorem 5 is the following to calculate that (i) for any x;, h(y I x , f l , . . . ,ek) =
Lemma proved in [Robins & Pearl 19951. Pkx(y I X1 = 21,...,Xk - 1 = xk-1,Xk = x;,
L1 = el, . . . , Lk = t k ) , and (ii) the conditional dis-
Lemma 3 If a sequence Zk is G-admissible, then the tributions of L1, . . . , Lk given (Zl, . . . , Zk, X I , . . . ,Xk)
sequence Zk U Yk is also G-admissible. are the same under P and Pkr. Hence, the premise of
Lemma (4) is equivalent to the conditional indepen-
We shall also need this Lemma 3 in Section 6.3 below. dence under the distribution Pk, of Y and Xk given
(Z1,. . . , Z k , X 1= 21,. . .,Xk-l = xk-1).
Proof of Theorem 4: Given a plan x = (XI,. . . , x,),
define We note that the premise of Lemma (4) is a non-
graphical condition that is weaker than the graphical
premise of Theorem 4 and yet implies identifiability
by the G-formula based on 2 1 , . . ., Z n . However, as a
non-graphical condition, the premise of Lemma (4) is
much more difficult to check that the graphical premise
of Theorem 4.

To prove Theorem 4, we shall use the following Lemma 6.2 Xk+j NEED NOT BE A DESCENDENT
which is an easy consequence of the corollary to The- O F Xk
orem (AD.l) in Robins (1987).
In this subsection, we relax the assumption that Xk+j
Lemma 4 If, for each k, Zk c Nk and the expression is a descendent of Xk for all k, j > 0. As in Sec. 6.1,
Z remains the set of all observed non-control vari-
ables. Given X C V, we say X = ( X I , . . ., X n )
is consistent ordering of X in G if, for each k,
Xk is a non-descendent of {Xk+1,. . . , X,). Hence-
forth, given a consistent ordering of X , we redefine
Nk to be the set of observed non-control variables
does not depend on x; then Eq. (3) is true. that are non-descendents of any element in the set
{ X k ,Xk+1,.. . , Xn). Robins and Pearl (1995) proved
One can also prove Lemma (4) directly by using induc-
tion on n to show that the right hand side of Eq. (5) Theorem 6 Given a consistent ordering
plus the premise of the lemma imply the right hand ( X I , . . . , X,) of X with Xk not necessarily an ancestor
side of Eq. (3). of Xk+j, Theorems 4 and 5 remain true.
To complete the proof of Theorem 4, we shall show
that (i) the premise of Lemma (4) is equivalent to the Theorem 6 is an immediate corollary of Theorems 4
statement that and 5, and the following Theorem proved in Robins
and Pearl (1995) characterizing arrows that can be
added into and out of the Xk without destroying
when probabilities are computed under a particular Eqs. (1) and (2). Given a graph G , a consistent order-
joint distribution Pkxfor the variables V in G and (ii) ing (XI, . . . , X,) of X , and sets Z1, . . . , Z,, Zk C Nk,
Pkxis represented by the DAG Gk G, F ~ + ~ - let graph G* be the graph in which, for each k, all ar-
, , , , ,,~
[i.e., by definition, Pkx(v) = nj -4'
Pkx(vj I pajk) where rows
the
are included (i) from Xk both to each member of
set {Xk+1,.. . , X,) and to each variable (observed
P a j k are the parents of 1.;. on Gk and pajk is the value
of P a j k when V = v]. It then follows that Eqs. (1) and or unobserved) that is a descendent of some member
(2) imply the premise of Lemma (4), proving Theorem of {Xk+1,. . . , X,) and (ii) from each member of the
4. set Z I U . . . U Zk t o X k .

Let P denote the distribution of variables V on G.


Now, given a plan x = ( i l . . .i+,), define P k x (v) = Theorem 7 Eqs. (1)-(2) hold for graph G if and only
nj Pkx(Vj 1 pajk) where (i) if = Xm for some if Eqs. (1)-(2) hold for graph G*.
m, m = k + 1 , . . . , n , then Pkx(v, 1 pajk) = 1 if
vj = x,, and (ii) if 1.;. # Xm for m = k + Robins and Pearl (1995) show that the choice of con-
1 , . . . , n, Pkx(vj I pajk) = P (vj I pajk) when XI, is sistent ordering for X does matter. Specifically, they
not a parent of 1.;. on G , and Pkx(vj 1 pajk) = provide an example with X = (X,, Xb) bivariate in
P (vj I X k = x k ,pajk) when Xk is a parent of 1.;. on which both the ordering (Xi, Xz) = (X,, Xb) and the
G. By construction, Pkxis represented by the DAG Gk ordering (XI, X2) = (Xb,X,) are consistent orderings
and, therefore, X k U Y I L1, . . . , Lk,X I , . . . , Xk-1 un- of X . However, P (y I 2) is only G-identifiable based
der the distribution Pkx. Further, it is straightforward on the ordering X = (Xb,X,).
Plan Evaluation 451

6.3 VARIABLES THAT CAN BE References


DISCARDED
[Cox 19581 Cox, D.R. (1958) The Planning of Exper-
iments. New York: John Wiley and Sons.
Eqs. (1)-(2) provide sufficient conditions for G-
identification solely in terms of associations between [Druzdzel & Simon 19931 Druzdzel, M.J., and H.A.
observed variables. In the epidemiologic literature, Simon (1993) Causality in Bayesian Be-
sufficient conditions for G-identification are often ex- lief. In Proceedings of the Ninth Confer-
pressed in terms of associations between unobserved ence on Uncertainty in Artificial Intelli-
and observed variables. For example, for the ef- gence (eds. D. Heckerman and A. Mam-
fect of a singleton action X on Y , it is a stan- dani), CA, 3-11.
dard result that an unobserved non-descendent of X , [Galles & Pearl 19951 Galles, D. and J . Pearl (1995)
say U, is a "non-confounder given data on a non- Testing identifiability of causal effects.
descendent Z1 of X" [i.e., P (y 1 3) is G-identifiable Technical Report R-226, UCLA Cogni-
based on Z1] if either U and X are conditionally in- tive Systems Laboratory. To appear in
dependent given Z1 or if U and Y are conditionally UAI-95.
independent given (21, X ) [Miettinen & Cook 1981,
Robins & Morgenstern 1987, [Goldszmidt & Pearl 19921 Goldszmidt, M., and J .
Greenland & Robins 19861. Extensions to compound Pearl (1992) Default Ranking: A Practi-
actions are discussed in Robins (1986, Sec. 8 and Ap- cal Framework for Evidential Reasoning,
pendix F; 1989) and Robins et al. (1992, Sec. A2.13). Belief Revision and Update. In Proceed-
The following theorem recasts Theorem 4 into this ings of the Third International Confer-
more familiar epidemiologic form. Given Z1, . . . , Zn ence on Knowledge Representation and
with Zk C N k , let U;t* be all non-descendents of Reasoning, 661-672.
{Xk, . . . , X n ) (observed and unobserved) that are [Greenland & Robins 19861 Greenland, S., and
both non-control variables and are disjoint from J.M. Robins (1986) Identifiability, Ex-
Z1, . . . , Zk. Robins and Pearl (1995) prove changeability and Epidemiologic Con-
founding. International Journal of Epi-
demiology, 15, 413-419.
Theorem 8 Suppose that, for each k , Yk c Zk. Then
Eqs. (1)-(2) hold if and only if, for each k , Ui = [Miettinen & Cook 19811 Miettinen, O.S., and
(U,*k,U:k) for (possibly empty) disjoint sets U:k, Uk: E.F. Cook (1981) Confounding essence
satisfying and detection. American Journal of Epi-
demiology, 114, 593-603.
[Pearl 1993al Pearl, J. (1993a) From Conditional
(i) ( U & U X k 121,..., Z k , x l , . . . , X k - ~ ) ~ - - Oughts t o Qualitative Decision Theory.
Xk+,, ....x n
and In Proceedings of the Ninth Conference
on Uncertainty in Artificial Intelligence
(ii) (Uik U Y; 1 X I , . . . , Xk, 21,. . . , Zk, U:k)G- - . (eds. D. Heckerman and A. Mamdani),
Note that, in view of Lemma 3, the assumption Yr, C 12-20.
Zk is completely non-restrictive since we can always [Pearl 1993bl Pearl, J . (1993b) Graphical Models,
replace Zk by Zk UYk without destroying Eq. (1) or Causality, and Intervention. Statistical
Eq. (2). Science, 8(3), 266-273.
An important issue not treated in this paper is to [Pearl 19941 Pearl, J . (1994) A Probabilistic Calculus
derive sufficient conditions for the identification of of Actions. In Proceedings of the Tenth
P (y I 2) when P (y ( 2) is not G-identifiable. Robins Conference on Uncertainty in Artificial
and Pearl (1995) provides sufficient conditions for Intelligence (eds. R. Lopez de Mantaras
identification of non-G-identifiable effects P (y I 3). and D. Poole), 454-462.
When these criteria are satisfied, they provide a closed- [Pearl 19951 Pearl, J . (1995) Causal Diagrams for Ex-
form expression, called the composite-G-formula, for perimental Research. Technical Report
P ( Y 12). R-218-B, UCLA Cognitive Systems Lab-
oratory. To appear in Biometrika.
[Pearl & Verma 19911 Pearl, J., and T . Verma (1991)
Acknowledgment A Theory of Inferred Causation. In Prin-
ciples of Knowledge Representation and
Professor Pearl's research was partially supported by Reasoning: Proceedings of the Second In-
Air Force grant #AFOSR/F496209410173, NSF grant
ternational Conference (eds. J.A. Allen,
#IRI-9420306, and Rockwell/Northrop Micro grant R. Fikes, and E. Sandewall), 441-452.
#94-100 and Professor Robins' research was supported [Robins 19861 Robins, J.M. (1986) A new approach
by NIH grant # R01-AI32475. to causal inference in mortality studies
452 Pearl and Robins

with a sustained exposure period - appli- of X , P(yli) gives the probability of Y = y induced by
cations to control of the healthy workers the action do(X = x).
survivor effect. Mathematical Modelling,
7, 1393-1512. If causal knowledge is organized as a set T of structural
[Robins 19871 Robins, J.M. (1987) Addendum to "A equations
new approach to causal inference in mor-
tality studies with a sustained exposure
period - applications to control of the where pai are the parents of Xi in G, fi are (un-
healthy workers survivor effect". Com- specified) deterministic functions and ~i are mutually
puters and Mathematics with Applica- independent disturbances [Pearl & Verma 19911, then
tions, 14, 923-945. the joint distribution of the observed variables has the
product from
[Robins 19891 Robins J.M. (1989) The control of
confounding by intermediate variables.
Statistics in Medicine,8 , 679-701.
[Robins 19931 Robins J.M. (1993) Analytic methods independent of the fi's in T. In such a process-based
for estimating HIV treatment and co- theory, the effect of the action d o ( 4 = vv(i) amounts to
factor effects. In: Methodological Issues overruling the process governed by fi and substituting
of AIDS Mental Health Research (eds. the process 6 = v: instead. Consequently, the induced
D.G. Ostrow, R. Kessler) New York: distribution in the mutilated theory T,,. would be
Plenum Publishing, 213-290. 3

[Robins & Morgenstern 19871 Robins J.M., and


H. Morgenstern (1987) The Foundations
of Confounding in Epidemiology. Com-
puters and Mathematics with Applica-
tions, 14, 869-916.
[Robins & Pearl 19951 Robins, J., and J . Pearl (1995) independent of T. The partial product reflects the re-
Causal Effects of Dynamic Policies. In moval the factor P(vj Ipai) from the product of (16).
preparation. Multiple actions result in the removal of the corre-
[Robins et al. 19921 Robins, J.M., D. Blevins, sponding factors from (16).
G. Ritter, and M. Wulfsohn (1992) G-
estimation of the effect of prophylaxis
Definition 3 (identifiability) The causal effect of X
therapy for pneumocystis carinii pneu- on Y is said to be identifiable if the quantity P(y1x)
can be computed uniquely from any positive distri-
monia on the survival of AIDS patients.
bution of the observed variables. In other words
Epidemiology, 3, 319-336.
PT,(yIx) = PTa(ylx) whenever PT,(v) = PT,(v) > 0.
[Rubin 19781 Rubin, D.B. (1978) Bayesian Inference
for Causal Effects: The Role of Random- Identifiability means that P ( y l i ) can be estimated
ization. The Annals of Statistics, 6, 34- consistently from an arbitrarily large sample randomly
58. drawn from the distribution of the observed variables.
[Spirtes et al. 19931 Spirtes, P., C. Glymour, and
R. Schienes (1993) Causation, Predic- The following theorem states the three basic inference
tion, and Search. New York: Springer- rules used in the text.
Verlag.
Theorem 9 Let G be the directed acyclic graph asso-
APPENDIX I ciated with a causal model, and let P(.) stand for the
probability distribution induced by that model. For
any disjoint subsets of variables X , Y, Z, and W we
This appendix summarizes the basic definitions, no- have:
tations and inference rules used in the body of the
paper. Details and proofs can be found in [Pearl 1994, Rule 1 Insertion/deletion of observations
Pearl 19951.
Let V = {Vl, V2, . . . , Vn) be the set of all variable in a
directed acyclic graph (dag) G.

Definition 2 (causal effect) Given two disjoint sets Rule 2 Action/observation exchange
of variables, X and Y , the causal effect of X on Y,
denoted P(ylx), is a function from X to the space of
probability distributions on Y. For each realization x
Plan Evaluation 453

R u l e 3 Insertion/deletion of actions
II ZIX, WIG=
P(yl2,2, w) = P(yl2, w) if (Y -
(20)
where Z(W) is the set of Z-nodes that are not
ancestors of any W-node in GX.

Each of the inference rules above follows from the ba-


sic interpretation of the "2" operator as a replacement
of the causal mechanism that connects X to its pre-
action parents by a new mechanism X = x introduced
by the intervening force. The result is a submodel Figure 5:
characterized by the subgraph GT (named "manipu-
lated graph" in Spirtes et al. (1993)) which supports
Since all paths were blocked prior to conditioning on
all three rules.
w, it must be that all paths from w to Y are blocked
as well. But, since w is an ancestor of Y, this means
Rule 1 reaffirms d-separation as a valid test for condi- that some member of Z resides on a directed path
tional independence in the distribution resulting from from w to Y. This, however, means that TI and 7 ~ 2
the intervention set(X = x), hence the graph GF were not d-separated prior to conditioning on w; thus
This rule follows from the fact that deleting equations contradicting our basic assumption that conditioning
from the system does not introduce any new depen- on w opened a new pathway between X and Y. A
dencies among the remaining variables. symmetrical argument applies if w is an ancestor of
X (or of both). Repeating the proof for each w E
A(X, Y) completes the proof of (8).
Rule 2 provides a condition for an external interven-
tion do(Z = z) to have the same effect on Y as the To prove (9), we show that any path p between X and
passive observation Z = z. The condition amounts to Y that is blocked by W will remain blocked when we
{X U W ) blocking all back-door (i.e., spurious) paths remove from W all nodes that are descendant of either
from Z to Y (in GT), since GFz retains all (and only) X or Y. Indeed, in order to unblock a path p by re-
such paths. moving nodes from W some of the removed nodes must
be non-colliders on p. Now, if p is totally in A(X, Y)
no node on p will be removed. On the other hand,
Rule 3 provides conditions for introducing (or delet- if p has some nodes outside A(X, Y), it must have at
ing) an external intervention do(Z = z) without af- least one collider c, such that c and all its descendants
fecting the probability of Y = y. The validity of are outside A(X, Y). Therefore, when we remove from
this rule stems, again, from simulating the intervention W all non-ancestral nodes, we must leave c and all its
do(Z = z) by pruning all links entering the variables descendant unconditioned, hence p must remain un-
in Z (hence the graph G-). blocked. QED
Corollary 3 A causal
effect q : P ( y l , . . . , yk 1x1,. . . , f,) is identifiable in a Proof of Lemma 2
model characterized by a graph G if there exists a fi-
nite sequence of transformations, each conforming to We shall first prove that any ancestor of (Y, Xi) in Gi
one of the inference rules in Theorem 9, which reduces is also an ancestor of (Y, Xi) in Gi+j. If t is an ancestor
q into a standard (i.e., hat-free) probability expression of Xi in Gi then clearly it must be an ancestor of Xi in
involving observed quantities. Gi+j; going from Gi+j to Gi does not affect any path
incoming to Xi. Now assume that t is an ancestor of
APPENDIX I1 Y in Gi but not in Gi+j. This can only happen if all
paths (in G) from t to Y go through Xi+j and get
blocked in Gi by removing the outgoing arrows from
Proof of Lemma 1
Xi+j. But any such path will be blocked in Gi as well,
We will prove (8) by showing that if (Y 11 XIZ)G because all incoming arrows to Xi+j are removed in Gi,
holds, then augmenting Z by any additionahode w E hence, t cannot be an ancestor of Y in Gi, which is a
A(X,Y) preserves the separation between X and Y. contradiction. We conclude that any ancestor of Y in
Assume w is an ancestor of Y. If (Y 11 XIZ)G is Gi must also be an ancestor of Y in Gi+l. Combining
true and (Y 11 XIZ, w ) is~ false, then there must be the two cases, completes the proof of Lemma 2.
a path between a node in X and Y that is blocked (Remark: This proof relies on the assumption that
by Z and become unblocked by Z U {w). Let TI and each Xk+j is an ancestor of Xk.)
na be two parents of w which became dependent by
conditioning on w and assume TI d-connects to X .