Professional Documents
Culture Documents
Structure
X Y
Z
Learning Bayesian networks from data
consists of learning the structure of a DAG
from data.
Suppose we have data concerning smoking, lung cancer,
bronchitis, fatigue, and chest X-ray results on many patients.
We might learn this DAG:
s m o k in g h is to r y
b r o n c h itis lu n g c a n c e r
B L
F X
fa tig u e c h e s t X -ra y
There are two approaches to learning structure.
1. Score-based approach.
– In this approach we assign scores to DAGs based in
the data.
– We choose the highest scoring DAG.
2. Constraint-based approach.
– In this approach we learn the DAG from the
conditional independencies that the data suggests
are present.
I will discuss the constraint-based approach.
http://videolectures.net/kdd07_tutorials/.
Causal Faithfulness Assumption
If we assume
1. the observed probability distribution P of a set of
observed random variables V satisfies the
Markov condition with the causal DAG G
containing the variables,
2. All conditional independencies in the observed
distribution P are entailed by the Markov
condition in G,
{I(X, Y)}.
X Y
(c )
Example 2. Suppose V = {X, Y} and our set of conditional
independencies is the empty set.
{I(X, Y)}.
X Z Y
• There can be no edge between X
and Y in the causal DAG owing to (a )
the reason in Example 1.
• There must be edges between X X Z Y
and Z and between Y and Z owing
(b )
to the reason in Example 2.
• So we must have the links in (a).
• We cannot have the causal DAG in X Z Y
(b) or in (c) or in (d). (c )
• The reason is that Markov condition
entails I(X, Y|Z), and this conditional X Z Y
independency is not present.
(d )
• Furthermore, the Markov condition
does not entail I(X, Y).
• So we must have the causal DAG in X Z Y
(e). (e )
Example 4. Suppose V = {X, Y, Z} and our set of conditional
independencies is
{I(X, Y | Z)}.
• Then, as before, the only edges in the X Z Y
causal DAG must be between X and Z (a )
and between Y and Z.
• So we must have the links in (a). X Z Y
Z Z Z
W W W
(a ) (b ) (c )
• Due to the previous theorem, the links must be those in (a).
• We must have the arrow directions in (b) because I(X, Y).
• Therefore, we must have the arrow directions in the (c) because
we do not have I(W, X).
Relaxing the Assumption of No
Hidden Common Causes
• It seems the main exception to the causal
Markov (and therefore causal faithfulness)
assumption is the presence of hidden
common causes.
• In most domains it does not seem
reasonable to assume that there can be no
hidden common causes.
• It is still possible to learn some causal
influences if we relax this assumption.
• The causal embedded faithfulness
assumption allows for hidden common
causes.
Causal Embedded Faithfulness Assumption
• If we assume the observed distribution P of the
variables is embedded faithfully in a causal
DAG containing the variables, we say we are
making the causal embedded faithfulness
assumption.
• Suppose we have a probability distribution P of
the variables in V, V is a subset of W, and G is a
DAG whose set of nodes is W.
Then P is embedded faithfully in W if all and
only the conditional independencies in P are
entailed by the Markov condition applied to W
and restricted to the nodes in V.
Basically, when we make the causal
embedded faithfulness assumption,
we are assuming that there is a
causal DAG with which the observed
distribution is faithful, but that DAG
might contain hidden variables.
Learning Causal Influences
Under the Causal Embedded
Faithfulness Assumption
Example 3 revisited. Suppose V = {X, Y, Z} and our set of
conditional independencies is
{I(X, Y)}.
X Z Y
(c )
Example 5 revisited. Suppose V = {X, Y, Z, W} and our set of
conditional independencies is
H H H H
Z 1 Z 2 1
Z 2
H 3
W W W
(a ) (b ) (c )
• P is embedded faithfully in the DAG in (a) and in (b).
• P is not embedded faithfully in the DAG in (c). That DAG entails I(W,{X,Y}).
• We conclude Z causes W.
• So we can learn causes from data on four variables.
Example 7. Suppose we have these variables:
S
• We conclude the graph on the right.
• The one-headed arrow means R causes
S or they have a hidden common cause.
• The two-headed arrow means S causes
L. L
Example 8. University Student Retention (Druzdzel and
Glymour, 1999)
Variable What the Variable Represents
grad Fraction of students who graduate from the institution
rejr Fraction of applicants who are not offered admission
tstsc Average standardized score of incoming students
tp10 Fraction of incoming students in the top 10% of high
school class
acpt Fraction of students who accept the institution's
admission offer
spnd Average educational and general expenses per
student
sfrat Student/faculty ratio
salar Average faculty salary
s a la r s a la r
ts ts c tp 1 0 ts ts c tp 1 0
g ra d s fra t g ra d s fra t
a lp h a = .2 a lp h a = .1
s a la r s a la r
ts ts c tp 1 0 ts ts c tp 1 0
s fra t g ra d s fra t
g ra d
a lp h a = .0 5 a lp h a = . 0 1
Example 9. Racial Incidents in the U.S. Military (Neapolitan
and Morris, 2003)
Variable What the Variable Represents
race Respondent's race/ethnicity
yos Respondent's years of military service
inc Whether respondent replied he/she experienced a
racial incident
rept Whether incident was reported to military personnel
resp Whether respondent held the military responsible
for the incident
yos
in c re s p ra c e
re p t
yos
in c re s p ra c e
re p t
ra c e in c
says_
in c
s m o k in g lu n g c a n c e r
M S L
P (m 1 ) = .5 P ( s 1 |m 1 ) = 1
P (m 2 ) = .5 P ( s 2 |m 1 ) = 0
P ( s 1 |m 2 ) = 0
P ( s 2 |m 2 ) = 1
• We do not really want to manipulate
people and make them smoke.
s m o k in g lu n g c a n c e r
c a u s a tio n
S L
s m o k in g lu n g c a n c e r
c o r r e la tio n
S L
s m o k in g lu n g c a n c e r
c a u s a tio n
S L
s m o k in g lu n g c a n c e r
c o r r e la tio n
S L
s m o k in g lu n g c a n c e r
S L
It is difficult to learn causal influences
when we have data on only two variables.
T e s to s te ro n e D ih y d r o - te s t o s t e r o n e E r e c tile fu n c tio n
T DH T E
The Causal Markov Assumption:
C c o ld
s n e e z in g S R ru n n y n o s e
a la r m
P(B | E) = P(B) I(B,E)
E ‘discounts’ B.
h is to r y o f s m o k in g
P (H ) = .2
b r o n c h itis lu n g c a n c e r
P ( B |H ) = .2 5 P (L |H ) = .0 0 3
B L
P (B |¬ H ) = .0 5 P (L |¬ H ) = .0 0 0 0 5
F X
fa tig u e c h e s t X -ra y
P (F |B ,L ) = .7 5 P (X |L ) = .6
P (F |B ,¬ L ) = .1 0 P (X |¬ L ) = .0 2
P ( F |¬ B ,L ) = .5
P ( F |¬ B ,¬ L ) = .0 5
P(B | L) > P(B) P(B | L,H) = P(B | H)
P(B | X) > P(B) P(B | X,H) = P(B | H)
I(B,{L,X} | H)
Exceptions to the Causal Markov
Assumption
1. Hidden common causes
H C c o ld
s n e e z in g S R ru n n y n o s e
¬I(S,R|C)
2. Causal feedback
S G
S tu d y in g G o o d G ra d e s
3. Selection bias
fin a s te r id e F L h a ir lo s s
h y p e r te n s io n
• We obtain data on only F and L.
• Everyone in our sample suffers from hypertension.
• Finasteride discounts hair loss.
• ¬I(F,L) in our observed distribution.
4. The entities in the population
are units of time
causal D A G
F in a s te r id e D HT E r e c tile
D y s fu n c tio n
I(E,F)
Causal Faithfulness Assumption
If we assume
1. the observed probability distribution P of a set of
observed random variables V satisfies the
Markov condition with the causal DAG G
containing the variables,
2. All conditional independencies in the observed
distribution P are entailed by the Markov
condition in G,