You are on page 1of 30

Bayesian Networks

Chapter 2 (Duda et al.) Section


2.11
CS479/679 Pattern Recognition
Dr. George Bebis

Statistical Dependences
Between Variables
Many times, the only knowledge we have about a
distribution is which variables are or are not
dependent.
Such dependencies can be represented efficiently
using a Bayesian Network (or Belief Networks).

Example of Dependencies
State of an automobile

Engine temperature
Brake fluid pressure
Tire air pressure
Wire voltages

Causally related variables


Engine temperature
Coolant temperature

NOT causally related variables


Engine oil pressure
Tire air pressure

Bayesian Net Applications


Microsoft (Answer Wizard, Print Troubleshooter)
US Army: SAIP (Battalion Detection from SAR, IR
etc.)
NASA: Vista (DSS for Space Shuttle)
GE: Gems (real-time monitoring of utility
generators)

Definitions and Notation


A bayesian net is usually a Directed Acyclic
Graph (DAG)
Each node represents a system variable.
Each variable assumes certain states (i.e.,
values).

Relationships Between
Nodes
A link joining two nodes is directional and
represents a causal influence (e.g., X depends on
A or A influences X)
Influences could be direct or indirect (e.g., A
influences X directly and A influences C indirectly
through X).

Parent/Children Nodes
Parent nodes P of X
the nodes directly before X (connected to X)

Children nodes C of X:
the nodes directly after X (X is connected to them)

Prior / Conditional
Probabilities
Each variable is associated with prior or
conditional probabilities (discrete or
continuous) .

probabilities
sum to 1

Markov Property
Each node is conditionally independent of
its ancestors
given its parents

Computing Joint
Probabilities
Using the chain rule, the joint probability of a
set of variables x1, x2, , xn is given as:

p ( x1 , x2 ,..., xn ) =
p ( x1 / x2 ,..., xn ) p( x2 / x3 ,..., xn )... p( xn 1 / xn ) p( xn )
Using the Markov property (i.e., node xi is
conditionally independent of its ancestors
n
given its parents i), we
have :
p ( x1 , x2 ,..., xn ) p( xi / i )
much simpler!
i 1

Computing Joint
Probabilities (contd)
We can compute the probability of any
configuration of variables in the joint density,
e.g.:
P(a3, b1, x2, c3, d2)=P(a3)P(b1)P(x2 /a3,b1)P(c3
/x2)P(d2 /x2)=
0.25 x 0.6 x 0.4 x 0.5 x 0.4 = 0.012

Computing the Probability at


a Node
e.g., determine the probability at D

Fundamental Problems in
Bayesian Nets
Evaluation (inference): Given the
model and the values of the observed
variables (evidence), estimate the
values of some other nodes (typically
hidden nodes).
Learning: Given training data and
prior information (e.g., expert
knowledge, causal relationships),
estimate the network structure, or the
parameters of the distribution, or both.

Example: Medical
Diagnosis
Uppermost nodes: biological agents
(bacteria, virus)

causes

Intermediate nodes: diseases


Lowermost nodes:

symptoms

Given some evidence (biological agents,


symptoms), find most likely disease.

effects

Evaluation (Inference)
Problem
In general, if X denotes the query variables and
e denotes the evidence, then

P( X, e)
P ( X / e)
P( X, e)
P (e)
where =1/P(e) is a constant of proportionality.

Evaluation (Inference)
Problem (contd)
Exact inference is an NP-hard problem
because the number of terms in the
summations (or integrals) for discrete (or
continuous) variables grows exponentially
with increasing number of variables.
For some restricted classes of networks
(e.g., singly connected networks where
there is no more than one path between
any two nodes) exact inference can be
efficiently solved in time linear in the
number of nodes.

Evaluation (Inference)
Problem (contd)
For singly connected Bayesian
networks:
P ( X / e) P ( X / eC , eP ) P( X / eP ) P(eC / X)
eC : children nodes, eP : parent nodes
However, approximate inference
methods have to be used in most cases.
Sampling (Monte Carlo) methods
Variational methods
Loopy belief propagation

Example
Classify a fish given that the fish is light (c1) and
was caught in south Atlantic (b2) -- no evidence
about what time of the year the fish was caught
nor its thickness.

Example (contd)
P ( X, e)
P ( X / e)
P ( X, e)
P (e)

Example (contd)

Example (contd)
Similarly,
P(x2 / c1,b2)= 0.066
Normalize probabilities (not needed
necessarily):
P(x1 /c1,b2)+ P(x2 /c1,b2)=1 (=1/0.18)
salmon
P(x1 /c1,b2)= 0.73
P(x2 /c1,b2)= 0.27

Another Example
You have a new burglar alarm installed at home.
It is fairly reliable at detecting burglary, but also
sometimes responds to minor earthquakes.
You have two neighbors, Ali and Veli, who
promised to call you at work when they hear the
alarm.

Another Example (contd)


Ali always calls when he hears the
alarm, but sometimes confuses
telephone ringing with the alarm and
calls too.
Veli likes loud music and sometimes
misses the alarm.

Another Example (contd)


Design a Bayesian network to estimate
various probabilities
e.g., given the evidence of who has or has
not called, we would like to estimate the
probability of a burglary.

Another Example (contd)


What are the main variables?
Alarm
Causes
Burglary, Earthquake

Effects
Ali calls, Veli calls

Another Example (contd)


What are the conditional dependencies
among them?
Burglary (B) and earthquake (E) directly
affect the probability of the alarm (A)
going off
Whether or not Ali calls (AC) or Veli calls
(VC) depends only on the alarm.

Another Example (contd)

Another Example (contd)


What is the probability that the alarm
has sounded but neither a burglary nor
an earthquake has occurred, and both
Ali and Veli call?

Another Example (contd)


What is the probability that there is a
burglary given that Ali calls?

What about if Veli also calls right after


Ali hangs up?

Nave Bayesian Network


When dependency relationships among features are
unknown, we can assume that features are
conditionally independent given the class:

Nave Bayesian Network

Simple assumption but usually works well in


practice!

You might also like