Seminar on Causality

The SGS- and the PC-Algorithm
Alexandra Federer, Moritz D¨ umbgen
Monday, March 23, 2009
1 Repetition
Let’s quickly recall some of the concepts from the previous talks. We are acting on a proba-
bility space (Ω, F, P) with random variables X
1
, ..., X
n
. We want to illustrate certain prop-
erties of the probability measure P graphically. Therefore, we look at graphs G = (V, E)
where V = {v
1
, ..., v
n
} is the set of vertices (representing the random variables) and E
are the edges. In case of an undirected graph (no arrowheads), E = {{v
i
, v
j
} ; i = j ⊆ V },
and in case of a directed graph (with arrowheads), E = {(v
i
, v
j
); i = j ⊆ V }. We talk of
a directed acyclic graph (DAG), if there is no cycle of edges in a directed graph.
One possible construction of a DAG is to draw an edge from X
i
to X
j
if i ∈ I
j
where
I
j
⊆ {1, ..., j −1} is a minimal set such that P[x
j
|x
1
, ..., x
j−1
] = P[x
j
|{x
i
; i ∈ I
j
}]. Then,
P[x
1
, ..., x
n
] =
n

j=1
P[x
j
|x
1
, ..., x
j−1
] =
n

j=1
P[x
j
|{x
i
; i ∈ I
j
}]
and we say that G represents P. In such a DAG, the parents pa(v
j
) of a vertex v
j
are
therefore given as {v
i
; i ∈ I
j
}. Note the role of the ordering of the random variables here.
For a different ordering we mostly get a different representation!
Let’s now consider a DAG G representing P.
Definition 1.1 (d-separation for a path) A path is d-separated (blocked) by a set Z ⊂
V if and only if
• the path contains a chain (v
1
→ v
2
→ v
3
), such that v
2
∈ Z.
or
• the path contains a fork (v
1
← v
2
→ v
3
), such that v
2
∈ Z.
or
• the path contains a collider (v
1
→ v
2
← v
3
), such that v
2
/ ∈ Z and Z does not
contain any descendant of v
2
.
1
Definition 1.2 (d-separation for arbitrary sets) Disjoint subsets of vertices X, Y
are said to be d-separated by another set of vertices Z if and only if Z d-separates every
path from a vertex in X to a vertex in Y . We denote this by X⊥Y |Z.
Consider a chain v
1
→ v
2
→ v
3
. Since G represents P,
P[x
3
|x
1
, x
2
] = P[x
3
|x
2
] ⇒ X
3
⊥⊥ X
1
|X
2
Similarly, in a fork v
1
← v
2
→ v
3
we have
P[x
3
|x
1
, x
2
] = P[x
3
|x
2
] ⇒ X
3
⊥⊥ X
1
|X
2
However, in a collider v
1
→ v
2
← v
3
we get that
P[x
3
|x
1
] = P[x
3
] ⇒ X
3
⊥⊥ X
1
⇒ X
3
⊥⊥ X
1
|X
2
.
The above findings can be generalized to arbitrary d-separated sets.
Theorem 1.1 Let X, Y be disjoint, Z be an arbitrary subset of vertices (the corresponding
random variables).
• X⊥Y |Z ⇒ X ⊥⊥ Y |Z.
• ¬(X⊥Y |Z) ⇒ ¬(X ⊥⊥ Y |Z) in some distribution
ˆ
P represented by G.
2 New Concepts
Definition 2.1 (faithful distribution) A probability distribution P is called faithful with
respect to a graph G if and only if for disjoint subsets X, Y and an arbitrary subset Z of
vertices (the corresponding random variables)
X⊥Y |Z ⇔ X ⊥⊥ Y |Z
Definition 2.2 (faithful list of d-separations) A list of d-separations L is called faith-
ful if and only if there exists a DAG G such that all and only the d-separations of L are
true in G. Then we also say that G faithfully represents L.
2
3 The SGS Algorithm
Now we have all tools to construct a graph G (or an equivalence class of graphs) that
faithfully represents some d-seperations L that we input.
(i) Form the complete undirected graph H on the vertex set V
(ii) ∀v
1
= v
2
: If there exists a subset S ⊆ V \ {v
1
, v
2
} that d-separates v
1
, v
2
, remove
the edge between them from H.
(iii) Let
ˆ
H be the undericted graph we obtain after step (ii). ∀v
1
= v
2
= v
3
such that
v
1
, v
2
and v
2
, v
3
are adjacent in
ˆ
H and v
1
, v
3
are not: orient them as v
1
→ v
2
← v
3
if and only if there is no subset S of v
2
∪ V \ {v
1
, v
3
} that d-separates v
1
and v
3
.
(iv) Repeat
– If v
1
→ v
2
, v
2
and v
3
are adjacent, v
1
and v
3
are not adjacent, and there is
no arrowhead at v
2
, then orient v
2
−v
3
as v
2
→ v
3
,
– If there is a directed path from v
1
to v
2
and an edge between v
1
and v
2
, then
orient v
1
−v
2
as v
1
→ v
2
,
until no more edges can be oriented.
3.1 Example
3
3.2 Correctness, Complexity and Stability
Correctness follows from the following theorem.
Theorem 3.1 If P is faithful to some DAG, then P is faithful to G if and only if
(i) for all vertices v
1
, v
2
of G, v
1
and v
2
are adjacent if and only if v
1
and v
2
are
dependent conditional on every set of vertices of G that does not include v
1
or
v
2
.
(ii) for all vertices v
1
, v
2
, v
3
such that v
1
, v
2
and v
2
, v
3
are adjacent and v
1
, v
3
are not,
v
1
→ v
2
← v
3
is a subgraph of G if and only if v
1
, v
3
are dependent conditional
on every set containing v
2
but not v
1
nor v
3
.
Regarding complexity, we see that step (ii) of the SGS algorithm is an exponential search,
hence it is very slow. The number of d-separation tests in the worst case is only bounded
by

n
2

2
n−2
.
Step (ii) is relatively stable. Excluding a correct d-separation relation from the input,
say (v
1
⊥v
2
|Z), the algorithm will produce the correct undirected graph unless there is
no other set besides Z which d-separates v
1
, v
2
. Including a wrong d-separation relation
(v
1
⊥v
2
|Z, say), the algorithm will not connect v
1
and v
2
(possibly by mistake) but no
other error will be made.
Step (iii) is unstable. Since the colliders determine the orientations of other edges in the
graph, we have less stability. If an input error leads the algorithm to include or exclude a
collision, the error may affect the orientation of many other edges.
4 The PC Algorithm
(i) Form the complete undirected graph H on the vertex set V
(ii) k = 0.
repeat
repeat
select an ordered pair of variables v
1
, v
2
that are adjacent in H such that
Adjacencies(H, v
1
) \ v
2
has cardinality greater than or equal to k, and a subset S
of Adjacencies(H, v
1
)\v
2
of cardinality k, and if v
1
and v
2
are d-separated given S
delete the edge v
1
−v
2
from H and record S in Sepset(v
1
, v
2
) and Sepset(v
2
, v
1
)
until all ordered pairs of adjacent variables v
1
and v
2
such that Adjacencies
(H, v
1
) \ v
2
has cardinality greater than or equal to k, and all subsets S of Adja-
cencies (H, v
1
) \ v
2
of cardinality k have been tested for d-separation.
4
k = k + 1
until for each ordered pair of adjacent vertices v
1
, v
2
, Adjacencies (H, v
1
) \ v
2
is
of cardinality less than k
(iii) For each triple of vertices v
1
, v
2
, v
3
such that the pairs v
1
, v
2
and v
2
, v
3
are adjacent
but v
1
, v
3
are not, orient v
1
−v
2
−v
3
as v
1
→ v
2
← v
3
if and only if v
2
is not in
Sepset(v
1
, v
3
).
(iv) Repeat
– If v
1
→ v
2
, v
2
and v
3
are adjacent, v
1
and v
3
are not adjacent, and there is
no arrowhead at v
2
, then orient v
2
−v
3
as v
2
→ v
3
,
– If there is a directed path from v
1
to v
2
and an edge between v
1
and v
2
, then
orient v
1
−v
2
as v
1
→ v
2
,
until no more edges can be oriented
4.1 Example
5
4.2 Correctness, Complexity, Stability
Correctness again follows from Theorem 3.1. We only need to search for independence
conditional on a subset of the adjacencies of one of the vertices.
Regarding complexity, denote by k the largest degree (number of adjacencies) of a vertex
of G. The number of conditional independence tests is then bounded by
2

n
2

k−1

i=0

n −2
i

≤ 3n
2
(n −2)
k
.
Note that for k = n−1 we again have an exponential boundary in the worst case. However,
in practice the algorithm can be much faster (for example when we have sparse graphs).
Step (ii) is unstable. If an edge is mistakenly removed from the true graph, other edges
which are not in the true graph may be included in the output. This can also lead to
orientation errors. If an edge is mistakenly left in the graph (and there are no other errors
in the input) the only further errors are that some edges which theoretically could be
oriented will not be oriented.
Step (iii) is unstable for the same reason as step (iii) of the SGS-algorithm.
However, in practice step (ii) is more reliable than step (iii).
6

Sign up to vote on this title
UsefulNot useful