This action might not be possible to undo. Are you sure you want to continue?

# Seminar on Causality

**The SGS- and the PC-Algorithm
**

Alexandra Federer, Moritz D¨ umbgen

Monday, March 23, 2009

1 Repetition

Let’s quickly recall some of the concepts from the previous talks. We are acting on a proba-

bility space (Ω, F, P) with random variables X

1

, ..., X

n

. We want to illustrate certain prop-

erties of the probability measure P graphically. Therefore, we look at graphs G = (V, E)

where V = {v

1

, ..., v

n

} is the set of vertices (representing the random variables) and E

are the edges. In case of an undirected graph (no arrowheads), E = {{v

i

, v

j

} ; i = j ⊆ V },

and in case of a directed graph (with arrowheads), E = {(v

i

, v

j

); i = j ⊆ V }. We talk of

a directed acyclic graph (DAG), if there is no cycle of edges in a directed graph.

One possible construction of a DAG is to draw an edge from X

i

to X

j

if i ∈ I

j

where

I

j

⊆ {1, ..., j −1} is a minimal set such that P[x

j

|x

1

, ..., x

j−1

] = P[x

j

|{x

i

; i ∈ I

j

}]. Then,

P[x

1

, ..., x

n

] =

n

j=1

P[x

j

|x

1

, ..., x

j−1

] =

n

j=1

P[x

j

|{x

i

; i ∈ I

j

}]

and we say that G represents P. In such a DAG, the parents pa(v

j

) of a vertex v

j

are

therefore given as {v

i

; i ∈ I

j

}. Note the role of the ordering of the random variables here.

For a diﬀerent ordering we mostly get a diﬀerent representation!

Let’s now consider a DAG G representing P.

Deﬁnition 1.1 (d-separation for a path) A path is d-separated (blocked) by a set Z ⊂

V if and only if

• the path contains a chain (v

1

→ v

2

→ v

3

), such that v

2

∈ Z.

or

• the path contains a fork (v

1

← v

2

→ v

3

), such that v

2

∈ Z.

or

• the path contains a collider (v

1

→ v

2

← v

3

), such that v

2

/ ∈ Z and Z does not

contain any descendant of v

2

.

1

Deﬁnition 1.2 (d-separation for arbitrary sets) Disjoint subsets of vertices X, Y

are said to be d-separated by another set of vertices Z if and only if Z d-separates every

path from a vertex in X to a vertex in Y . We denote this by X⊥Y |Z.

Consider a chain v

1

→ v

2

→ v

3

. Since G represents P,

P[x

3

|x

1

, x

2

] = P[x

3

|x

2

] ⇒ X

3

⊥⊥ X

1

|X

2

Similarly, in a fork v

1

← v

2

→ v

3

we have

P[x

3

|x

1

, x

2

] = P[x

3

|x

2

] ⇒ X

3

⊥⊥ X

1

|X

2

However, in a collider v

1

→ v

2

← v

3

we get that

P[x

3

|x

1

] = P[x

3

] ⇒ X

3

⊥⊥ X

1

⇒ X

3

⊥⊥ X

1

|X

2

.

The above ﬁndings can be generalized to arbitrary d-separated sets.

Theorem 1.1 Let X, Y be disjoint, Z be an arbitrary subset of vertices (the corresponding

random variables).

• X⊥Y |Z ⇒ X ⊥⊥ Y |Z.

• ¬(X⊥Y |Z) ⇒ ¬(X ⊥⊥ Y |Z) in some distribution

ˆ

P represented by G.

2 New Concepts

Deﬁnition 2.1 (faithful distribution) A probability distribution P is called faithful with

respect to a graph G if and only if for disjoint subsets X, Y and an arbitrary subset Z of

vertices (the corresponding random variables)

X⊥Y |Z ⇔ X ⊥⊥ Y |Z

Deﬁnition 2.2 (faithful list of d-separations) A list of d-separations L is called faith-

ful if and only if there exists a DAG G such that all and only the d-separations of L are

true in G. Then we also say that G faithfully represents L.

2

3 The SGS Algorithm

Now we have all tools to construct a graph G (or an equivalence class of graphs) that

faithfully represents some d-seperations L that we input.

(i) Form the complete undirected graph H on the vertex set V

(ii) ∀v

1

= v

2

: If there exists a subset S ⊆ V \ {v

1

, v

2

} that d-separates v

1

, v

2

, remove

the edge between them from H.

(iii) Let

ˆ

H be the undericted graph we obtain after step (ii). ∀v

1

= v

2

= v

3

such that

v

1

, v

2

and v

2

, v

3

are adjacent in

ˆ

H and v

1

, v

3

are not: orient them as v

1

→ v

2

← v

3

if and only if there is no subset S of v

2

∪ V \ {v

1

, v

3

} that d-separates v

1

and v

3

.

(iv) Repeat

– If v

1

→ v

2

, v

2

and v

3

are adjacent, v

1

and v

3

are not adjacent, and there is

no arrowhead at v

2

, then orient v

2

−v

3

as v

2

→ v

3

,

– If there is a directed path from v

1

to v

2

and an edge between v

1

and v

2

, then

orient v

1

−v

2

as v

1

→ v

2

,

until no more edges can be oriented.

3.1 Example

3

3.2 Correctness, Complexity and Stability

Correctness follows from the following theorem.

Theorem 3.1 If P is faithful to some DAG, then P is faithful to G if and only if

(i) for all vertices v

1

, v

2

of G, v

1

and v

2

are adjacent if and only if v

1

and v

2

are

dependent conditional on every set of vertices of G that does not include v

1

or

v

2

.

(ii) for all vertices v

1

, v

2

, v

3

such that v

1

, v

2

and v

2

, v

3

are adjacent and v

1

, v

3

are not,

v

1

→ v

2

← v

3

is a subgraph of G if and only if v

1

, v

3

are dependent conditional

on every set containing v

2

but not v

1

nor v

3

.

Regarding complexity, we see that step (ii) of the SGS algorithm is an exponential search,

hence it is very slow. The number of d-separation tests in the worst case is only bounded

by

n

2

2

n−2

.

Step (ii) is relatively stable. Excluding a correct d-separation relation from the input,

say (v

1

⊥v

2

|Z), the algorithm will produce the correct undirected graph unless there is

no other set besides Z which d-separates v

1

, v

2

. Including a wrong d-separation relation

(v

1

⊥v

2

|Z, say), the algorithm will not connect v

1

and v

2

(possibly by mistake) but no

other error will be made.

Step (iii) is unstable. Since the colliders determine the orientations of other edges in the

graph, we have less stability. If an input error leads the algorithm to include or exclude a

collision, the error may aﬀect the orientation of many other edges.

4 The PC Algorithm

(i) Form the complete undirected graph H on the vertex set V

(ii) k = 0.

repeat

repeat

select an ordered pair of variables v

1

, v

2

that are adjacent in H such that

Adjacencies(H, v

1

) \ v

2

has cardinality greater than or equal to k, and a subset S

of Adjacencies(H, v

1

)\v

2

of cardinality k, and if v

1

and v

2

are d-separated given S

delete the edge v

1

−v

2

from H and record S in Sepset(v

1

, v

2

) and Sepset(v

2

, v

1

)

until all ordered pairs of adjacent variables v

1

and v

2

such that Adjacencies

(H, v

1

) \ v

2

has cardinality greater than or equal to k, and all subsets S of Adja-

cencies (H, v

1

) \ v

2

of cardinality k have been tested for d-separation.

4

k = k + 1

until for each ordered pair of adjacent vertices v

1

, v

2

, Adjacencies (H, v

1

) \ v

2

is

of cardinality less than k

(iii) For each triple of vertices v

1

, v

2

, v

3

such that the pairs v

1

, v

2

and v

2

, v

3

are adjacent

but v

1

, v

3

are not, orient v

1

−v

2

−v

3

as v

1

→ v

2

← v

3

if and only if v

2

is not in

Sepset(v

1

, v

3

).

(iv) Repeat

– If v

1

→ v

2

, v

2

and v

3

are adjacent, v

1

and v

3

are not adjacent, and there is

no arrowhead at v

2

, then orient v

2

−v

3

as v

2

→ v

3

,

– If there is a directed path from v

1

to v

2

and an edge between v

1

and v

2

, then

orient v

1

−v

2

as v

1

→ v

2

,

until no more edges can be oriented

4.1 Example

5

4.2 Correctness, Complexity, Stability

Correctness again follows from Theorem 3.1. We only need to search for independence

conditional on a subset of the adjacencies of one of the vertices.

Regarding complexity, denote by k the largest degree (number of adjacencies) of a vertex

of G. The number of conditional independence tests is then bounded by

2

n

2

k−1

i=0

n −2

i

≤ 3n

2

(n −2)

k

.

Note that for k = n−1 we again have an exponential boundary in the worst case. However,

in practice the algorithm can be much faster (for example when we have sparse graphs).

Step (ii) is unstable. If an edge is mistakenly removed from the true graph, other edges

which are not in the true graph may be included in the output. This can also lead to

orientation errors. If an edge is mistakenly left in the graph (and there are no other errors

in the input) the only further errors are that some edges which theoretically could be

oriented will not be oriented.

Step (iii) is unstable for the same reason as step (iii) of the SGS-algorithm.

However, in practice step (ii) is more reliable than step (iii).

6