Lecture 5 Bayesian Networks

The University of Texas at Arlington
Lecture-5
Bayesian Networks
CSE 5301 – Data Modeling and Analysis

Techniques
Dr. Gergely Záruba
Modeling Complexity and Dependence
• Modeling distributions and inferring probability

values in multi-variable systems is
computationally complex
• Model size and probabilistic inference are exponential
in the number of random variables
• Independence relations can reduce this
complexity
• Model size is exponential only in the number of
mutually dependent variables
• Conditional independence in probability limits the
number of random variables that have to be
2
considered to make inference
1
Graphical Models
• Graphical models provide an efficient structure to represent dependencies
in probabilistic systems.
• There are two main types of graphical models for probabilistic systems:
• Bayesian Networks are directed graphical models
• Markov Networks (Markov Random Fields) are undirected graphical models
(they can model dependencies Bayes nets cannot. – Will not be discussed.
• Both types of models can represent different types of dependencies
• Graphical models in probabilistic systems allow the representation
of the interdependencies of random variables
• Structure shows dependency relations
• Inference can use the structure to control the computations
• Graphical models provide a basis for a number of efficient problem
solutions
• Inference of prior and conditional probabilities
• Learning of network structure
3
BAYESIAN NETWORKS
2
Bayesian Networks
• Bayesian networks are graphical representation for

conditional independence providing a compact
specification of joint probability distributions
• Bayesian networks are directed, acyclic graphs G(V,E)
• Vertices represent random variables
|1
• Edges represent “direct influences”
, | directly influences
• Nodes are annotated with the conditional probability distribution of
the node given its parents
|
• Probabilities in the network represent the joint distribution
5
A Simple BayesNet Example
3
Joint Distribution
• Remember a Bayesian Network should be a simple representation
of a system with a large number of probabilistic variables with some
independencies.
• Calculating the joint distribution can be done using:
,…, |
• E.g., the probability of:

P(sp,!hp,bd,!rs,!fl)=
P(!fl|bd)*P(!rs|bd)*P(bd|sp,!hp)*P(!hp)*P(sp)=0.7*0.8*0.5*0.99*0.1=0.02772
• If all variables are Boolean, then keeping the joint probability table
would result in maintaining 2n values. If we can limit the parents of
each node to be no more than k, then with a Bayesain network we
can reduce that value to O(n*2k)
Conditional Independence
• Bayesian networks are powerful as they

capture conditional independence
between random variables.
• Both the following expressions are true::
• A node is conditionally independent of all of
its non-descendants given its parents
• A node is conditionally independent of all
other nodes in the network given its parents,
its children and its children’s parents. This is
what we call the Markov blanket. 8
4
Node Ordering
• Note, that the node ordering matters. The best

node ordering is usually “causal”.
• Add the root causes first, then add variables
they influence from top to bottom until you reach
the leaves.
• Fortunately in most situation this causal
relationship is what the researcher requires.
• Note, that any ordering is possible but the
number of links may grow significantly if the
relationship is not casual.
9
Discrete and Continuous Variables
• Obviously any discrete distribution is

representable in a node in a Bayesian network.
• An arbitrary continuous distribution cannot be
easily represented.
• One trick could be to discretize the distribution, where
precision of the size of the containers could be
balanced against the size of the network.
• Distributions that can be given by a formula and
parameters (e.g., exponential, Gaussian) can be used
if attention is paid at their meanings.
10
5
Child continuous, Parent Discrete
• It is common to change the parameter of

the distribution in a continuous child node
based on the discrete parent node’s
probabilities.
• E.g., it is common to use a Gaussian
distribution with a fixed variance but a mean
that is influenced by the parent node.
11
Hybrid Example
,! ,
, ,
12
Example taken from RN2003
6
Linear Gaussian Distribution
13
Discrete variable with

Continuous Parent
• Common to use soft threshold functions.
• probit, if Φ 0,1 then
P(Buy=true|Cost=c)=Φ /
• sigmoid (logit) P(Buy=true|Cost=c)=
14
7
So, why did we do all of these?
INFERENCE IN BAYESIAN NETS
15
Inference
• With a nicely constructed Bayisian Network we can

make diagnosis and thus can make good informed
decisions.
• We can fix the value of any or more of the nodes in the
network (to a precise value) and see how that changes
the probabilities of the distributions.
• Thus we can observe evidence variables and see what
their impact is on some other variables, the query
variables. Variables in the network that are not used for
evidence nor for query are called hidden variables.
• So, what is the posterior distribution of P(xq|xe ,.., xe )
1 n
16
8
Inference by Enumeration
• Conditional probabilities can be computed

from the joint probabilities.
• A query can be answered as the
normalized sum of joint probabilities, thus
as the normalized sum of products of
conditional probabilities found in the
network.
• Recall: , ∑ , ,
17
Inference by Enumeration –
Example
• How much is P(SP|rs,fl)
• , , , ∑ , , , , ,
• This requires 4 additions over n multiplications
• The worst case complexity is 2
• Example shows “variable elimination” with which real complexity can be
reduced. Complexity also depends on sparsity of the network and which
variables are used for evidence, query and which are hidden)
18
9
Approximate Inference
• So, large multi-connected Bayesian Networks

can pose a problem for exact inference.
• We can use Mote Carlo methods to determine
interesting conditional probabilities (i.e., to do
inference).
• The four basic methods are:
• Direct sampling
• Sampling from an empty network (for joint probabilities)
• Rejection sampling in Bayesian networks (for inference)
• Likelihood weighting
• Markov chain Monte Carlo
19
Sampling from an Empty Network
• Simplest method.
• Forget any evidence you may have for nodes.
• Sample each variable in topological order based on the
outcomes of the previous samples. Do this many times
(let’s say M times).
• This is going to result in M number of N-tuples:
e.g., { , , , , |1
• Individual and joint probabilities now can be estimated by
how many times out of M samples something has
happened.
20
10
Rejection Sampling in
Bayesian Networks
• Recall: rejection sampling was used to sample from a
hard to sample distribution given an easy one.
• Used in this context to add evidence and thus to
determine conditional probabilities.
• Having the M N-tuples, we can count how many times
the evidence happened and out of those times, how
many times the query happened (for Boolean). The
conditional probability will be the ratio of these two.
• The problem is that some probabilities may become very
low and thus using them as evidence variables will
require huge sets of samples.
21
Likelihood Weighting
• Avoids the previous inefficiency by only generating

samples that conform to the evidence.
• We sample all variables in order as before. However if
we are about to sample an evidence variable we set the
variable and (we do not sample) modify a weight value
for this n-tuple. For each tuple, the weight starts from 1
and gets multiplied by P(e|Parentoutcomes(e)) for each
evidence.
• Thus each n-tuple has the correct evidence and an
additional weight capturing the likelihood of such n-tuple.
• Conditional probabilities are then sums of the weight for
the various outcomes of the query variable normalized
over the total of the weights. 22
11
Markov Chain Monte Carlo
• Think of an n-tuple as the state of a process.

• Evidence variables reduce the size n (as they
are fixed and will never change).
• Initialize non-evidence variables randomly.
• Next state is determined by changing exactly
one non-evidence variable by its distribution on
the current state and its Markov blanket.
| |
• The conditional probability is the normalized

number of states over the query variable. 23
References
• [RN2003] S. Russel, P Norvig, “Artificial

Intelligence, A Modern Approach,” Second
edition, Prentice Hall, 2003 (Chapter 14)
• Eugene Charniak, “Bayesian Networks
Without Tears,” AI Magazine, 12(4), 1991,
http://www.aaai.org/ojs/index.php/aimagaz
ine/article/view/918
24
12

Lecture 5 Bayesian Networks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 5 Bayesian Networks

Uploaded by

Copyright:

Available Formats

The University of Texas at Arlington

CSE 5301 – Data Modeling and Analysis

Modeling Complexity and Dependence

• Modeling distributions and inferring probability

• Bayesian networks are graphical representation for

A Simple BayesNet Example

• E.g., the probability of:

• Bayesian networks are powerful as they

• Note, that the node ordering matters. The best

Discrete and Continuous Variables

• Obviously any discrete distribution is

• It is common to change the parameter of

Discrete variable with

INFERENCE IN BAYESIAN NETS

• With a nicely constructed Bayisian Network we can

• Conditional probabilities can be computed

• So, large multi-connected Bayesian Networks

Sampling from an Empty Network

• Avoids the previous inefficiency by only generating

• Think of an n-tuple as the state of a process.

• The conditional probability is the normalized

• [RN2003] S. Russel, P Norvig, “Artificial

You might also like