4 views

Uploaded by Giovanni

save

You are on page 1of 23

Class #19 – Monday, November 4

1

Today’s class

• (Probability theory) • Bayesian inference

– From the joint distribution – Using independence/factoring – From sources of evidence

• Bayesian networks

– – – – Network structure Conditional probability tables Conditional independence Inference in Bayesian networks

2

Bayesian Reasoning / Bayesian Networks Chapters 14.1-15. 15.2 3 .

and unsatisfiable propositions have probability 0: • P(true) = 1 . P(false) = 0 3. All probabilities are between 0 and 1: • 0 <= P(a) <= 1 2. Valid propositions (tautologies) have probability 1. Cox. and Carnap have also provided compelling arguments for these axioms 1. The probability of a disjunction is given by: • P(a b) = P(a) + P(b) – P(a b) a ab b 4 .Why probabilities anyway? • Kolmogorov showed that three simple axioms lead to the rules of probability theory – De Finetti.

09 earthquake .1 * 9. ¬earthquake) = α [ (. alarm. alarm) = α [P(Burglary.173 = .e.001.008.01) + (.173 = .9173 5 .Inference from the joint: Example alarm ¬alarm earthquake burglary ¬burglary . α = 1/(.0001 .001 ¬earthquake . alarm. earthquake) + P(Burglary.1) = 9.009.79 P(Burglary | alarm) = α P(Burglary.1) ] Since P(burglary | alarm) + P(¬burglary | alarm) = 1.008 .0009 . .009 * 9.109 – quizlet: how can you verify this?) P(burglary | alarm) = .08255 P(¬burglary | alarm) = . .001 ..173 (i. . P(alarm) = 1/α = .01 ¬earthquake .009+.09) ] = α [ (.

Independence • When two sets of propositions do not affect each others’ probabilities. {moon-phase. light level doesn’t affect whether the alarm goes off • We need a more complex notion of independence. the moon phase doesn’t affect whether we are burglarized – Once we’re burglarized. alarm. B) → P(A B) = P(A) P(B). it might not: Burglars might be more likely to burglarize houses when there’s a new moon (and hence little light) – But if we know the light level. earthquake} – Then again. and methods for reasoning about these kinds of relationships 6 . light-level} might be independent of {burglary. we call them independent. and can easily compute their joint and conditional probability: – Independent (A. P(A | B) = P(A) • For example.

equivalently. P(A) = P(A | B) and P(B) = P(B | A) • A and B are conditionally independent given C if – P(A B | C) = P(A | C) P(B | C) • This lets us decompose the joint distribution: – P(A B C) = P(A | C) P(B | C) P(C) • Moon-Phase and Burglary are conditionally independent given Light-Level • Conditional independence is weaker than absolute independence.Conditional independence • Absolute independence: – A and B are independent if P(A B) = P(A) P(B). but still useful in decomposing the full joint probability distribution 7 .

Bayes’ rule • Bayes rule is derived from the product rule: – P(Y | X) = P(X | Y) P(Y) / P(X) • Often useful for diagnosis: – If X are (observed) effects and Y are (hidden) causes. 8 . – We may have a model for how causes lead to effects (P(X | Y)) – We may also have prior beliefs (based on experience) about the frequency of occurrence of effects (P(Y)) – Which allows us to reason abductively from effects to causes (P(Y | X)).

Bayesian inference • In the setting of diagnostic/evidential reasoning H i P(H i ) P(E j | Hi ) hypotheses Em evidence/manifestations P(H i ) P(E j | Hi ) P(Hi | E j ) E1 Ej – Know prior probability of hypothesis conditional probability – Want to compute the posterior probability • Bayes’ theorem (formula 1): P(Hi | E j ) P(Hi )P(E j | Hi ) / P(E j ) 9 .

El • Goal: Find the hypothesis Hi with the highest posterior – Maxi P(Hi | E1. El) 10 . … H n • Ej and Hi are binary. …. … n. i = 1. j = 1. hypotheses are mutually exclusive (nonoverlapping) and exhaustive (cover all possible cases) – Conditional probabilities: P(Ej | Hi).Simple Bayesian diagnostic reasoning • Knowledge base: – Evidence / manifestations: – Hypotheses / disorders: E1. …. … m • Cases (evidence for a particular instance): E1. … Em H1.

…. …. El) • Assume each piece of evidence Ei is conditionally independent of the others. then: – P(E1. …. El) = P(E1. then we have: – P(Hi | E1. given a hypothesis Hi. ….Bayesian diagnostic reasoning II • Bayes’ rule says that – P(Hi | E1. El) = α P(Hi) lj=1 P(Ej | Hi) 11 . El | Hi) P(Hi) / P(E1. …. El | Hi) = lj=1 P(Ej | Hi) • If we only care about relative probabilities for the Hi.

El | H1 H2) P(H1) P(H2) = α lj=1 P(Ej | H1 H2) P(H1) P(H2) • How do we compute P(Ej | H1 H2) ?? 12 . …. which causes correlated manifestations M1 and M2 • Consider a composite hypothesis H1 H2. …. ….Limitations of simple Bayesian inference • Cannot easily handle multi-fault situation. What is the relative posterior? – P(H1 H2 | E1. where H1 and H2 are independent. nor cases where intermediate (hidden) causes exist: – Disease D causes syndrome S. El | H1 H2) P(H1 H2) = α P(E1. El) = α P(E1.

earthquake) << P(burglar | alarm) • Another limitation is that simple application of Bayes’ rule doesn’t allow us to handle causal chaining: – A: year’s weather. El) = P(H1 | E1. A) = P(C | B) • Need a richer representation to model interacting hypotheses. El? – P(H1 H2 | E1. …. El) P(H2 | E1. but not given Alarm: • P(burglar | alarm. ….Limitations of simple Bayesian inference II • Assume H1 and H2 are independent. given E1. B: cotton production. conditional independence. …. and causal chaining • Next time: conditional independence and Bayesian networks! 13 . El) • This is a very unreasonable assumption – Earthquake and Burglar are independent. C: next year’s cotton price – A influences C indirectly: A→ B → C – P(C | B. ….

so P ( xi | i ) P ( xi ) 14 . but methods also exist to handle continuous variables) • Arcs: indicate probabilistic dependencies between nodes (lack of link signifies conditional independence) – CPD: conditional probability distribution (BN’s parameters) • Conditional probabilities at each node. so just use priors in CPD: i .Bayesian Belief Networks (BNs) • Definition: BN = (DAG. or CPT) P ( xi | i ) where i is the set of all parent nodes of xi – Root nodes are a special case – no parents. CPD) – DAG: directed acyclic graph (BN’s structure) • Nodes: random variables (typically binary or discrete. usually stored as a table (conditional probability table.

not P(¬A).01 P(D|~B.01 P(D|~B. since they have to add to one 15 .2 P(C|~A) = 0.001 a P(B|A) = 0.4 P(E|~C) = 0.Example BN P(A) = 0.C) = 0.~C) = 0.00001 P(E|C) = 0.1 P(D|B.002 Note that we only specify P(A) etc.001 b d c e P(C|A) = 0..3 P(B|~A) = 0.~C) = 0.005 P(D|B.C) = 0.

Topological semantics • A node is conditionally independent of its nondescendants given its parents • A node is conditionally independent of all other nodes in the network given its parents. and children’s parents (also known as its Markov blanket) • The method called d-separation can be applied to decide whether a set of nodes X is independent of another set Y. children. given a third set Z 16 .

x n ) n1 P ( x i | i ) i 17 .Independence and chaining • Independence assumption – P ( xi | i .. the complete joint probability distribution of all variables in the network can be represented by (recovered from) local CPD by chaining these CPD P ( x1 .... q) P ( xi | i ) i where q is any set of variables q (nodes) other than x i and its successors xi – i blocks influence of other nodes on x i and its successors (q influences x i only through variables in i ) – With this assumption.

c. d) P(a. c) P(c | a) P(b | a) P(a) 18 . assumption = P(e | c) P(d | a. b. d) by indep. b. c. d) by Bayes’ theorem = P(e | c) P(a. c) P(c | a. c. d. e) = P(e | a. b) P(a. b) = P(e | c) P(d | b.Chaining: Example a b d c e Computing the joint probability for all variables is easy: P(a. b. b. c) P(a. b. b. c. c) = P(e | c) P(d | b.

Direct inference with BNs • Now suppose we just want the probability for one variable • Belief update method • Original belief (no variables are instantiated): Use prior probability p(xi) • If xi is a root. P(xi | πi) is given in the CPT. but computing P(πi) is complicated 19 . – P(xi) = Σ πi P(xi | πi) P(πi) • In this equation. then P(xi) is given directly in the BN (CPT at Xi) • Otherwise.

c) P(b. c) p(¬a. c) (marginalizing) = P(b | a.Computing πi: Example a b • • c • • d e P (d) = P(d | b. c) = P(a. b. b. c) p (a. c) (product rule) = P(b | a) P(c | a) P(a) + P(b | ¬a) P(c | ¬a) P(¬a) If some variables are instantiated. c) + P(¬a. c) P(b. c) + p(b | ¬a. can “plug that in” and reduce amount of marginalization Still have to marginalize over all values of uninstantiated parents – not computationally feasible with large networks 20 .

Representational extensions • Compactly representing CPTs – Noisy-OR – Noisy-MAX • Adding continuous variables – Discretization – Use density functions (usually mixtures of Gaussians) to build hybrid Bayesian networks (with discrete and continuous variables) 21 .

P(NoGas | Gauge=empty.g. Xj | E=e) = P(Xi | e=e) P(Xj | Xi. Starts=false) • Conjunctive queries: – P(Xi. Lights=on.Inference tasks • Simple queries: Computer posterior marginal P(Xi | E=e) – E. probabilistic inference is required to find P(outcome | action. E=e) • Optimal decisions: Decision networks include utility information. evidence) • Value of information: Which evidence should we seek next? • Sensitivity analysis: Which probability values are most critical? • Explanation: Why do I need a new starter motor? 22 ..

Approaches to inference • Exact inference – Enumeration – Variable elimination – Clustering / join tree algorithms • Approximate inference – – – – – – Stochastic simulation / sampling methods Markov chain Monte Carlo methods Genetic algorithms Neural networks Simulated annealing Mean field theory 23 .

08.3489./03.0.80 .70.3/.$250.8433 W 340/0-.

9438 5490808.31089.2.

808 43/943.0 .3.0.808 0.5488-0.0 W 4. ! .3/0.:.4.0 343 4./847/078 2 3 W .0147.:89.07.702:9:.:8.-908 ! 3 2 W .3/90549088 99008954890747 .3/ .79..07.574-.7389.70-3.553 .75490808./03.5.

9 ! ! ! ..8433 W .08.3/.08 7:08.89.3489.70.

-90814790 903 0..9.70..43/943.0410..0574-.03 . 3/0503/039419049078 .549088 903 ! a ! W 1043.50./03.-4:970.! W 88:200.0 ! .0 8.

! a ! .

9.80.90 //03 .943 347.3 310703....3/02:9 1..:808.9438 .4254890549088 070 .3/ .29.08.:808089 80.0 W .:80883/7420$ .31089..:989:.054890747 ! .9438418250..3/ W 438/07.989070.808070 390720/.4770.33490.90/ 2.8.703/0503/039.

! ! .

! ! ! .

425:90! . a ! ! ! W 4/40.

:8.935490808 .7.98250.3 310703.0.72 W ! -:7..88:25943 ..3/.843...3/:7.7.0 W 88:20.07:370.7.703/0503/039 -:9349 .94389.703/0503/039 .49943574/:..55.:8.3/0.7 80.07705708039.4:894.43/943.796:.907.03 ! ! ! W %88.9433090..72 W 3490729.9438418250..796:.43/943.0 31:03.-0.3/0503/03..3309478 ..33 W 09920.3/0503/03.03.0.3/.94341.08.72 0.3/.9439424/03907..7 8.29.7.9 ! ! W 00/.0 ! -:7.4994357.33 0.08 7:0/4083 9 .0 ..083/70.08.

-99..3/42.8408994.90/.-89..574-.850.-08 95.3/0.7.89470/.703934/0841 #44934/08.7..90574-.-3. 41383108.70..7.-908.4393:4:8.574-..34/0 :8:./0503/03...43/943.90.08-0900334/08 .0 !.9:70 W 4/087.-0 .8.9.43/943.08.5..-08 W 7.70398 84:89:8057478 3! 6 X 84 ! 6 ! .-0 47!% ! 6 0706 89080941 .747/8..43/943.209078 W 43/943.7090 -:9 2094/8.3/0503/03..-9/897-:943 85.5 8897:.80 345..7..300109478 8 W 013943 ! /70.83/.574-..

/0 ! ! = ! ! ! ! = = = = ! ! = 4909.090. ! ! = -.250 ! ..094..//94430 .1! 09. 349! 83.9043850.

34907809 .8094134/0883/0503/03941.309 W %02094/.03985.3-0.490734/083 9030947.03/.74.03985.943 .-...%4544.70398 .03.70398 ..550/94/0.70398 W 34/08.39.898.8 W 34/08.43/943./703 8 5.43/943./0 0907.7.3/.0// 805.3/0503/039 4198343 /08.398 .3/0503/039 41.84343.802./703 .97/809 .

0.339080! ! 3 " 3 ! 6 .831:03.0 41490734/0843 ..-0839030947.88:25943 ! 6 6 ! 6 6 0706 8..4..-9/897-:94341.0.-08 6 34/08 49079. !-.3/.7...425090439574-.3.33 W 3/0503/03.-083 998.7.0843 6 974:..070/1742 4.088478 631:03.7.3/988:.088478 6 -4.3/988:.3/0503/03...3-07057080390/- 70. ..380941.88:25943 90.

33. ! .! .. . .. /0 425:9390439574-. / -3/05. ! 0. . ! . ..08 904702 ! 0... ! /.88:25943 ! 0. . ! /.7...8 ! . / ! 0. ! . ! /. . ! -... -. / ! . ! . ! ..-9147..-0880. ! 0.. / 0 -.250 . .. .

943 ! 8.70389.9390 !%..90/ &805747 574-..098 W 48:554800:89.03/70..9310703.70.-0 W 001:5/.90/ .90 2094/ W 73.7.-001 34.425.7.-9147430.7449 903! 8.9 W 90780 ! Ð ! ! W 39806:.-08.-95 W 1 8.03390!% -:9 .425:93! 8.39.3990574-.

. ! - ..93 ..90/ ..0.3/ 70/:..425:93.73..250 . .. /0 W ! / ! /.. ! . ! . 5 . ! .73. ! . ! . .07.97:0 ! -..90/ 5.-08.. 5 - . ! .70389.7.70398 349.10.70309478 .. . 574/:.425:9. 5 . 2.39.0942. .39.73.8-09. -.3 ! -.04.24:39412. W 18420.:0841:3389.. .943. ! . . W ! . .3 5:9.943 W $9.

#05708039.9438 :8:.29:70841.7090.08.4393:4:8..709.:88.38 94-:/ -7/.09038438 W 425.7.3/ .3309478 9/8.943 &80/03891:3.7.4393:4:8..-08 ..943.-08 8.97057080393!%8 48 # 48 W //3.

:00259 9843 $9.06:0708 ! 0 ! 00 ! 0 W 592.310703./03..9.09.084:/0800309 W $0389.943 .843309478 3.3.420.3089.-9.88 W $2506:0708 425:907548907472.943574-.8438 0.:/0:99 31472.! 0 ! 4.3.80 W 43:3./03.7981.702489 .574-.9.73.943 /4300/.88 .:08.0 W '.790724947 ...8.-89.0.:04131472.79.943 0.08706:70/9413/ ! 4:9. W 5.310703./0.

9310703.5574.0 3:207.943 '..0894310703..0 W .7.943 :89073.-0023.

943.82:.89.90310703.47928 W 55742..0 $94.439700.

330..3 0.74.34390.2532094/8 .90/..309478 $2:..742094/8 0309.310/9047 .47928 0:7.8.

- Nonlinear FilteringUploaded bysomiadventure
- Info 159/259 HW 2Uploaded byYvonne Yifan Zhou
- Machine Learning Cheat Sheet 2015.pdfUploaded byFlorent Bersani
- Conditional ProbabilityUploaded byManuel Vargas Tapia
- What Are BBN ModelsUploaded byprasadgvb
- bayesian confirmation theory.pdfUploaded byHerbert Jeckyll Frankstein
- FiguresUploaded byBiantoroKunarto
- UT Dallas Syllabus for psci7381.001.07f taught by Patrick Brandt (pxb054000)Uploaded byUT Dallas Provost's Technology Group
- Hidden Markov ModelsUploaded bymaddog
- Machine learning and systems genomics approaches for multi-omics dataUploaded bySergio Villicaña
- Random ProcessesUploaded byA.Benhari
- Kalman Filter(1)Uploaded byweky59
- Financial Econometrics lecture notes 6Uploaded bycdertert
- Hersheys Kiss ActivityUploaded bySean Malloy
- walk the line statistics projectUploaded byapi-314041777
- StatisticUploaded bySam Ai Sia
- NR-420501 Simulation and ModellingUploaded bySrinivasa Rao G
- River ProjectUploaded bykrmorales
- Naive Bayes Classifier From WikipediaUploaded byalfianafitri
- example independent sample t test.pdfUploaded byYamuna Govindaraj