You are on page 1of 23

CMSC 471 Fall 2002

Class #19 – Monday, November 4

1

Today’s class
• (Probability theory) • Bayesian inference
– From the joint distribution – Using independence/factoring – From sources of evidence

• Bayesian networks
– – – – Network structure Conditional probability tables Conditional independence Inference in Bayesian networks

2

Bayesian Reasoning / Bayesian Networks Chapters 14.1-15. 15.2 3 .

and unsatisfiable propositions have probability 0: • P(true) = 1 . P(false) = 0 3. All probabilities are between 0 and 1: • 0 <= P(a) <= 1 2. Valid propositions (tautologies) have probability 1. Cox. and Carnap have also provided compelling arguments for these axioms 1. The probability of a disjunction is given by: • P(a  b) = P(a) + P(b) – P(a  b) a ab b 4 .Why probabilities anyway? • Kolmogorov showed that three simple axioms lead to the rules of probability theory – De Finetti.

09 earthquake .1 * 9. ¬earthquake) = α [ (. alarm. alarm) = α [P(Burglary.173 = .e.001.008.01) + (.173 = .9173 5 .Inference from the joint: Example alarm ¬alarm earthquake burglary ¬burglary . α = 1/(.0001 .001 ¬earthquake . alarm. earthquake) + P(Burglary.1) = 9.009.79 P(Burglary | alarm) = α P(Burglary.1) ] Since P(burglary | alarm) + P(¬burglary | alarm) = 1.008 .0009 . .009 * 9.109 – quizlet: how can you verify this?) P(burglary | alarm) = .08255 P(¬burglary | alarm) = . .001 ..173 (i. . P(alarm) = 1/α = .01 ¬earthquake .009+.09) ] = α [ (.

Independence • When two sets of propositions do not affect each others’ probabilities. {moon-phase. light level doesn’t affect whether the alarm goes off • We need a more complex notion of independence. the moon phase doesn’t affect whether we are burglarized – Once we’re burglarized. alarm. B) → P(A  B) = P(A) P(B). it might not: Burglars might be more likely to burglarize houses when there’s a new moon (and hence little light) – But if we know the light level. earthquake} – Then again. and methods for reasoning about these kinds of relationships 6 . light-level} might be independent of {burglary. we call them independent. and can easily compute their joint and conditional probability: – Independent (A. P(A | B) = P(A) • For example.

equivalently. P(A) = P(A | B) and P(B) = P(B | A) • A and B are conditionally independent given C if – P(A  B | C) = P(A | C) P(B | C) • This lets us decompose the joint distribution: – P(A  B  C) = P(A | C) P(B | C) P(C) • Moon-Phase and Burglary are conditionally independent given Light-Level • Conditional independence is weaker than absolute independence.Conditional independence • Absolute independence: – A and B are independent if P(A  B) = P(A) P(B). but still useful in decomposing the full joint probability distribution 7 .

Bayes’ rule • Bayes rule is derived from the product rule: – P(Y | X) = P(X | Y) P(Y) / P(X) • Often useful for diagnosis: – If X are (observed) effects and Y are (hidden) causes. 8 . – We may have a model for how causes lead to effects (P(X | Y)) – We may also have prior beliefs (based on experience) about the frequency of occurrence of effects (P(Y)) – Which allows us to reason abductively from effects to causes (P(Y | X)).

Bayesian inference • In the setting of diagnostic/evidential reasoning H i P(H i ) P(E j | Hi ) hypotheses Em evidence/manifestations P(H i ) P(E j | Hi ) P(Hi | E j ) E1 Ej – Know prior probability of hypothesis conditional probability – Want to compute the posterior probability • Bayes’ theorem (formula 1): P(Hi | E j )  P(Hi )P(E j | Hi ) / P(E j ) 9 .

El • Goal: Find the hypothesis Hi with the highest posterior – Maxi P(Hi | E1. El) 10 . … H n • Ej and Hi are binary. …. … n. i = 1. j = 1. hypotheses are mutually exclusive (nonoverlapping) and exhaustive (cover all possible cases) – Conditional probabilities: P(Ej | Hi).Simple Bayesian diagnostic reasoning • Knowledge base: – Evidence / manifestations: – Hypotheses / disorders: E1. …. … m • Cases (evidence for a particular instance): E1. … Em H1.

…. …. El) • Assume each piece of evidence Ei is conditionally independent of the others. then: – P(E1. …. El) = P(E1. then we have: – P(Hi | E1. given a hypothesis Hi. ….Bayesian diagnostic reasoning II • Bayes’ rule says that – P(Hi | E1. El) = α P(Hi) lj=1 P(Ej | Hi) 11 . El | Hi) P(Hi) / P(E1. …. El | Hi) = lj=1 P(Ej | Hi) • If we only care about relative probabilities for the Hi.

El | H1  H2) P(H1) P(H2) = α lj=1 P(Ej | H1  H2) P(H1) P(H2) • How do we compute P(Ej | H1  H2) ?? 12 . …. which causes correlated manifestations M1 and M2 • Consider a composite hypothesis H1  H2. …. ….Limitations of simple Bayesian inference • Cannot easily handle multi-fault situation. What is the relative posterior? – P(H1  H2 | E1. where H1 and H2 are independent. nor cases where intermediate (hidden) causes exist: – Disease D causes syndrome S. El | H1  H2) P(H1  H2) = α P(E1. El) = α P(E1.

earthquake) << P(burglar | alarm) • Another limitation is that simple application of Bayes’ rule doesn’t allow us to handle causal chaining: – A: year’s weather. El) = P(H1 | E1. A) = P(C | B) • Need a richer representation to model interacting hypotheses. El? – P(H1  H2 | E1. …. El) P(H2 | E1. but not given Alarm: • P(burglar | alarm. ….Limitations of simple Bayesian inference II • Assume H1 and H2 are independent. given E1. B: cotton production. conditional independence. …. and causal chaining • Next time: conditional independence and Bayesian networks! 13 . El) • This is a very unreasonable assumption – Earthquake and Burglar are independent. C: next year’s cotton price – A influences C indirectly: A→ B → C – P(C | B. ….

so P ( xi |  i )  P ( xi ) 14 . but methods also exist to handle continuous variables) • Arcs: indicate probabilistic dependencies between nodes (lack of link signifies conditional independence) – CPD: conditional probability distribution (BN’s parameters) • Conditional probabilities at each node. so just use priors in CPD:  i  .Bayesian Belief Networks (BNs) • Definition: BN = (DAG. or CPT) P ( xi |  i ) where  i is the set of all parent nodes of xi – Root nodes are a special case – no parents. CPD) – DAG: directed acyclic graph (BN’s structure) • Nodes: random variables (typically binary or discrete. usually stored as a table (conditional probability table.

not P(¬A).01 P(D|~B.01 P(D|~B. since they have to add to one 15 .2 P(C|~A) = 0.001 a P(B|A) = 0.4 P(E|~C) = 0.Example BN P(A) = 0.C) = 0.~C) = 0.00001 P(E|C) = 0.1 P(D|B.002 Note that we only specify P(A) etc.001 b d c e P(C|A) = 0..3 P(B|~A) = 0.~C) = 0.005 P(D|B.C) = 0.

Topological semantics • A node is conditionally independent of its nondescendants given its parents • A node is conditionally independent of all other nodes in the network given its parents. and children’s parents (also known as its Markov blanket) • The method called d-separation can be applied to decide whether a set of nodes X is independent of another set Y. children. given a third set Z 16 .

x n )   n1 P ( x i |  i ) i 17 .Independence and chaining • Independence assumption – P ( xi |  i .. the complete joint probability distribution of all variables in the network can be represented by (recovered from) local CPD by chaining these CPD P ( x1 .... q)  P ( xi |  i ) i where q is any set of variables q (nodes) other than x i and its successors xi –  i blocks influence of other nodes on x i and its successors (q influences x i only through variables in  i ) – With this assumption.

c. d) P(a. c) P(c | a) P(b | a) P(a) 18 . assumption = P(e | c) P(d | a. b. d) by indep. b. c. d) by Bayes’ theorem = P(e | c) P(a. c) P(c | a. c. d. e) = P(e | a. b) P(a. b) = P(e | c) P(d | b.Chaining: Example a b d c e Computing the joint probability for all variables is easy: P(a. b. b. c) P(a. b. b. c. c) = P(e | c) P(d | b.

Direct inference with BNs • Now suppose we just want the probability for one variable • Belief update method • Original belief (no variables are instantiated): Use prior probability p(xi) • If xi is a root. P(xi | πi) is given in the CPT. but computing P(πi) is complicated 19 . – P(xi) = Σ πi P(xi | πi) P(πi) • In this equation. then P(xi) is given directly in the BN (CPT at Xi) • Otherwise.

c) P(b. c) p(¬a. c) (marginalizing) = P(b | a.Computing πi: Example a b • • c • • d e P (d) = P(d | b. c) = P(a. b. b. c) p (a. c) (product rule) = P(b | a) P(c | a) P(a) + P(b | ¬a) P(c | ¬a) P(¬a) If some variables are instantiated. c) + P(¬a. c) P(b. c) + p(b | ¬a. can “plug that in” and reduce amount of marginalization Still have to marginalize over all values of uninstantiated parents – not computationally feasible with large networks 20 .

Representational extensions • Compactly representing CPTs – Noisy-OR – Noisy-MAX • Adding continuous variables – Discretization – Use density functions (usually mixtures of Gaussians) to build hybrid Bayesian networks (with discrete and continuous variables) 21 .

P(NoGas | Gauge=empty.g. Xj | E=e) = P(Xi | e=e) P(Xj | Xi. Starts=false) • Conjunctive queries: – P(Xi. Lights=on.Inference tasks • Simple queries: Computer posterior marginal P(Xi | E=e) – E. probabilistic inference is required to find P(outcome | action. E=e) • Optimal decisions: Decision networks include utility information. evidence) • Value of information: Which evidence should we seek next? • Sensitivity analysis: Which probability values are most critical? • Explanation: Why do I need a new starter motor? 22 ..

Approaches to inference • Exact inference – Enumeration – Variable elimination – Clustering / join tree algorithms • Approximate inference – – – – – – Stochastic simulation / sampling methods Markov chain Monte Carlo methods Genetic algorithms Neural networks Simulated annealing Mean field theory 23 .

08.3489./03.0.80 .70.3/.$250.8433 W 340/0-.

9438 5490808.31089.2.

808 43/943.0 .3.0.808 0.5488-0.0    W 4. !      .3/0.:.4.0 343 4./847/078   2   3 W  .0147.:89.07.702:9:.:8.-908 !     3  2 W .3/90549088 99008954890747 .3/ .79..07.574-.7389.70-3.553 .75490808./03.5.

9 !     !     !  ..8433 W .08.3/.08 7:08.89.3489.70.

-90814790 903 0..9.70..43/943.0410..0574-.03 . 3/0503/039419049078 .549088 903 !     a !   W 1043.50./03.-4:970.!    W 88:200.0 !     .0 8.

!  a !    .

9.80.90 //03 .943 347.3 310703....3/02:9 1..:808.9438 .4254890549088  070 .3/ .29.08.:808089 80.0 W .:80883/7420$ .31089..:989:.054890747 !      .9438418250..3/ W 438/07.989070.808070 390720/.4770.33490.90/ 2.8.703/0503/039 .

!      !   .

!      !  !  .

425:90!      . a !    !  !  W 4/40.

:8.935490808  .7.98250.3 310703.0.72 W ! -:7..88:25943 ..3/.843...3/:7.7.0 W 88:20.07:370.7.703/0503/039 -:9349 .94389.703/0503/039 .49943574/:..55.:8.3/0.7 80.07705708039.4:894.43/943.796:.907.03  !      !     !      W %88.9433090..72 W 3490729.9438418250..796:.43/943.0 31:03.-0.3/0503/03..3309478  ..33 W 09920.3/0503/03.03.0.3/.94341.08.72 0.3/.9439424/03907..7 8.29.7.9   !   !  W 00/.0 ! -:7.4994357.33 0.08 7:0/4083 9 .0 ..083/70.08.

-99..3/42.8408994.90/.-89..574-.850.-08 95.3/0.7.89470/.703934/0841    #44934/08.7..90574-.-3. 41383108.70..7.-908.4393:4:8.574-..34/0 :8:./0503/03...43/943.90.08-0900334/08 .0 !.9:70 W 4/087.-0 .8.9.43/943.08.5..-08 W 7.70398 84:89:8057478 3! 6   X 84 !    6   !   .-0 47!% !    6  0706  89080941 .747/8..43/943.209078 W 43/943.7090 -:9 2094/8.3/0503/03..-9/897-:943  85.5  8897:.80 345..7..300109478 8 W 013943  ! /70.83/.574-..

/0 !    ! =   ! ! ! !      =   =    = =   !    ! =   4909.090. !    ! =   -.250 !    ..094..//94430  .1!  09. 349!  83.9043850.

34907809  .8094134/0883/0503/03941.309 W %02094/.03985.3-0.490734/083 9030947.03/.74.03985.943 .-...%4544.70398 .03.70398 ..550/94/0.70398 W 34/08.39.898.8 W 34/08.43/943./703 8 5.43/943./0 0907.7.3/.0// 805.3/0503/039 4198343 /08.398 .3/0503/039 41.84343.802./703 .97/809  .

0.339080! !   3  " 3 !    6    .831:03.0 41490734/0843     ..-0839030947.88:25943 !    6  6  !   6  6  0706 8..4..-9/897-:94341.0.-08 6   34/08 49079. !-.3/.7...425090439574-.3.33 W 3/0503/03.-083 998.7.0843 6  974:..070/1742 4.088478 631:03.7.3/988:.088478   6  -4.3/988:.3/0503/03...3-07057080390/- 70. ..380941.88:25943 90.

33. ! .! ..  . .. /0 425:9390439574-. /  -3/05 .   ! 0. . ! . ..08 904702  ! 0... ! /.88:25943  ! 0. . ! /.7...8 ! . /   ! 0. ! . ! /. . ! -... -. / ! . ! . ! ..-9147..-0880.  ! 0.. / 0  -.250 . .. .

943 !  8.70389.9390 !%..90/ &805747 574-..098 W 48:554800:89.03/70..9310703.70.-0 W 001:5/.90/  .90 2094/ W 73.7.-001 34.425.7.-9147430.7449 903!  8.9  W 90780 !  Ð  !   !  W 39806:.-08.-95  W 1 8.03390!% -:9 .425:93!  8.39.3990574-.

.  ! - ..93 ..90/ ..0.3/ 70/:..425:93.73..250 . .. /0 W ! / ! /.. ! . ! . 5 . ! .73. ! . ! . .07.97:0 ! -..90/ 5.-08..  5 - .  ! .70389.7.70398 349.10.70309478  .. .  574/:.425:9. 5 .  2.39.0942. .39.73.8-09. -.3 ! -.04.24:39412. W 18420.:0841:3389.. .943. ! . . W ! . .3 5:9.943 W $9.

#05708039.9438 :8:.29:70841.7090.08.4393:4:8..709.:88.38 94-:/ -7/.09038438 W 425.7.3/ .3309478 9/8.943 &80/03891:3.7.4393:4:8..-08  ..943.-08 8.97057080393!%8 48 # 48  W //3.

:00259 9843 $9.06:0708 !   0 !  00 !   0 W 592.310703./03..9.09.084:/0800309 W $0389.943 .843309478 3.3.420.3089.-9.88 W $2506:0708 425:907548907472.943574-.8438 0.:/0:99 31472.!  0   ! 4.3.80 W 43:3./03.7981.702489 .574-.9.73.943 /4300/.88 .:08.0 W '.790724947  ...8.-89.0.:04131472.79.943 0.08706:70/9413/ ! 4:9. W 5.310703./0.

9310703.5574.0 3:207.943 '..0894310703..0 W .7.943 :89073.-0023.

943.82:.89.90310703.47928 W 55742..0 $94.439700.

330..3 0.74.34390.2532094/8 .90/..309478 $2:..742094/8 0309.310/9047  .47928 0:7.8.