Professional Documents
Culture Documents
Dr. A. Obulesh
Associate Professor
Monte Carlo Artificial Intelligence:
Bayesian Networks
2
Why This Matters
Bayesian networks have been the most
important contribution to the field of AI in
the last 10 years
Provide a way to represent knowledge in
an uncertain domain and a way to reason
about this knowledge
Many applications: medicine, factories,
help desks, spam filtering, etc.
3
Medical diagnostics
Digital image processing
Natural language processing
4
Conditional probability
Probability
Example:
The world is not discrete , multiple things
are happen simultaneously .
5
What need to happen for A& B to
happen ?
Lets say first A needs to happen and then
A already happening , B needs to happen
p
P(A) = P(B|A)
(B) = P(A|B)
P(one event| Another event)
6
Bayes Theorem
P(A|B) = P(B|A) * P(A)/ P(B)
P(B/A) =
7
Joint probability
Joint probability distribution describes
about how two or more variables are
distributed simultaneously.
To get a probability from the joint
distribution of A and B , you would
consider P(A = and B= b)
8
conditional probability :
9
A Bayesian Network
B P(B) E P(E) A Bayesian network is
false 0.999 false 0.998
made up of two parts:
true 0.001 true 0.002
1. A directed acyclic
Burglary Earthquake graph
2. A set of parameters
B E A P(A|B,E)
Alarm
false false false 0.999
false false true 0.001
false true false 0.71
false true true 0.29
true false false 0.06
true false true 0.94
true true false 0.05
true true true 0.95
A Directed Acyclic Graph
Burglary Earthquake
Alarm
11
A Directed Acyclic Graph
Burglary Earthquake
Alarm
12
A Set of Parameters
B P(B) E P(E) Burglary Earthquake
false 0.999 false 0.998
true 0.001 true 0.002
B E A P(A|B,E)
Alarm
false false false 0.999
false false true 0.001 Each node Xi has a conditional probability
false true false 0.71 distribution P(Xi | Parents(Xi)) that quantifies the
false true true 0.29 effect of the parents on the node
true false false 0.06
The parameters are the probabilities in these
true false true 0.94
conditional probability distributions
true true false 0.05
Because we have discrete random variables, we
true true true 0.95
have conditional probability tables (CPTs)
13
A Set of Parameters
Conditional Probability Stores the probability distribution
Distribution for Alarm for Alarm given the values of
Burglary and Earthquake
B E A P(A|B,E)
false false false 0.999
For a given combination of values of the
false false true 0.001
parents (B and E in this example), the
false true false 0.71
entries for P(A=true|B,E) and P(A=false|
false true true 0.29
B,E) must add up to 1 eg. P(A=true|
true false false 0.06 B=false,E=false) + P(A=false|
true false true 0.94 B=false,E=false)=1
true true false 0.05
true true true 0.95
15
Bayes Nets Formalized
A Bayes net (also called a belief network) is an augmented
directed acyclic graph, represented by the pair V , E
where:
◦ V is a set of vertices.
◦ E is a set of directed edges joining vertices. No loops
of any length are allowed.
17
Bayesian Network Example
Weather Cavity
19
A Representation of the Full Joint
Distribution
We will use the following abbrevations:
◦ P(x1, …, xn) for P( X1 = x1 … Xn = xn)
◦ parents(Xi) for the values of the parents of Xi
From the Bayes net, we can calculate:
n
P ( x1 ,..., xn ) P ( xi | parents ( X i ))
i 1
20
The Full Joint Distribution
P ( x1 ,..., xn )
P ( xn | xn 1 ,..., x1 ) P ( xn 1 ,..., x1 ) ( Chain Rule)
21
n
The Full Joint Distribution
n
P( x | x ,..., x ) P( x | parents ( x ))
i 1
i i 1 1
i 1
i i
22
Example
Burglary Earthquake
Alarm
JohnCalls MaryCalls
23
Conditional Independence
There is a general topological criterion called d-
separation
d-separation determines whether a set of nodes
X is independent of another set Y given a third
set E
24
D-separation
We will use the notation I(X, Y | E) to
mean that X and Y are conditionally
independent given E
Theorem [Verma and Pearl 1988]:
If a set of evidence variables E d-separates X
and
Y in the Bayesian Network’s graph, then
I(X, Y | E)
d-separation can be determined in linear
time using a DFS-like algorithm
25
D-separation
Let evidence nodes E V (where V are
the vertices or nodes in the graph), and X
and Y be distinct nodes in V – E.
We say X and Y are d-separated by E in
the Bayesian network if every undirected
path between X and Y is blocked by E.
What does it mean for a path to be
blocked? There are 3 cases…
26
Case 1
There exists a node N on the path such that
• It is in the evidence set E (shaded grey)
• The arcs putting N in the path are “tail-to-
tail”.
X N Y
Alarm
Your house has a twitchy burglar alarm that is also sometimes triggered by
earthquakes
Earth obviously doesn’t care if your house is currently being broken into
While you are on vacation, one of your nice neighbors calls and lets you
know your alarm went off
30
Case 3 (Explaining Away)
Burglary Earthquake
Alarm
32
Conditional Independence
Note: D-separation only finds random variables
that are conditionally independent based on the
topology of the network
Some random variables that are not d-separated
may still be conditionally independent because of
the probabilities in their CPTs
33
Joint probability
Joint probability distribution describes
about how two or more variables are
distributed simultaneously.
To get a probability from the joint
distribution of A and B , you would
consider P(A = and B= b)
34
conditional probability :
35
libpgm
libpgm was developed at CyberPoint
Labs during the Summer of 2012 by
Charles Cabot working under the
direction of James Ulrich and Mark
Raugas.
libpgm
The library consists of a series of
importable modules, which either
represent types of Bayesian graphs,
contain methods to operate on them, or
both.
libpgm
dictionary
graphskeleton
orderedskeleton
nodedata
discretebayesiannetwork
hybayesiannetwork
lgbayesiannetwork
dyndiscbayesiannetwork
tablecpdfactorization
tablecpdfactor
sampleaggregator
pgmlearner
CPDtypes
discrete
linear gaussian
linear gaussian + discrete
crazy (test type)
INPUT
Because Bayesian probability graphs are
large and contain a lot of data, the library
works with .txt files as inputs.
The formatting used is JavaScript Object
Notation (JSON), with some flexibility
Internally, the library stores these files
as json objects from python’s json library.
Deterministic Inference
◦ Compute the probability distribution over a
specific node or nodes in a discrete-CPD
Bayesian network (given evidence, if present)
◦ Compute the exact probability of an outcome
in a discrete-CPD Bayesian network (given
evidence, if present)
Approximative Inference
◦ Compute the approximate probability
distribution by generating samples
PyLearning
• Learn the CPDs of a discrete-CPD Bayesian network,
given data and a structure
• Learn the structure of a discrete Bayesian network,
given only data
• Learn the CPDs of a linear Gaussian Bayesian
network, given data and a structure
• Learn the strcutre of a linear Gaussian Bayesian
network, given only data
• Learn entire Bayesian networks (structures and
parameters) from data
Features
PyMC provides functionalities to make Bayesian analysis as
painless as possible. Here is a short list of some of its features:
Fits Bayesian statistical models with Markov chain Monte Carlo
and other algorithms.
Includes a large suite of well-documented statistical distributions.
Uses NumPy for numerics wherever possible.
Includes a module for modeling Gaussian processes.
Sampling loops can be paused and tuned manually, or saved and
restarted later.
Features Contd..
Creates summaries including tables and plots.
Traces can be saved to the disk as plain text, Python pickles,
SQLite or MySQL database, or hdf5 archives.
Several convergence diagnostics are available.
Extensible: easily incorporates custom step methods and unusual
probability distributions.
MCMC loops can be embedded in larger programs, and results
can be analyzed with the full power of Python
new in version 2
PyMC 2 series provides:
the New flexible object model and syntax (not backward-
compatible).
Reduced redundant computations: only relevant log-probability
terms are computed, and these are cached.
Optimized probability distributions.
New adaptive blocked Metropolis step method.
New slice sampler method.
Much more!
History
PyMC began development in 2003, as an effort to generalize the
process of building Metropolis-Hastings samplers, with an aim to
making Markov chain Monte Carlo (MCMC) more accessible to
non-statisticians (particularly ecologists).
The choice to develop PyMC as a python module, rather than a
standalone application, allowed the use MCMC methods in a
larger modeling framework. By 2005.
PyMC was reliable enough for version 1.0 to be released to the
public.
History
In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the
development team for PyMC 2.0. This iteration of the software strives for
more flexibility, better performance and a better end-user experience than
any previous version of PyMC.
PyMC 2.2 was released in April 2012. It contains numerous bugfixes and
optimizations, as well as a few new features, inculding improved output
plotting, csv table output, improved imputation syntax, and posterior
predictive check plots.
subclasses of Node.
Dependencies Cont..
PyMC probability models are simply linked groups
of Stochastic, Deterministic and Potential objects.
The Stochastic class
A stochastic variable has the following primary attributes:
value:The variable’s current value.
logp:The log-probability of the variable’s current value given the values of its parents.
A stochastic variable can optionally be endowed with a method called random, which draws a value for the
variable given the values of its parents 1. Stochastic objects have the following additional attributes:
parents: A dictionary containing the variable’s parents. The keys of the dictionary correspond to the names
assigned to the variable’s parents by the variable, and the values correspond to the actual parents. For example, the
keys of ss’s parents dictionary in the mining disaster model would be 't_l' and 't_h'. Thanks to Python’s dynamic
typing, the actual parents (i.e. the values of the dictionary) may be of any class or type.
children:A set containing the variable’s children.
extended_parents:A set containing all the stochastic variables on which the variable depends either directly or via a
sequence of deterministic variables. If the value of any of these variables changes, the variable will need to
recompute its log-probability.
extended_children:A set containing all the stochastic variables and potentials that depend on the variable either
directly or via a sequence of deterministic variables. If the variable’s value changes, all of these variables will need
to recompute their log-probabilities.
observed:A flag (boolean) indicating whether the variable’s value has been observed (is fixed).
dtype:A NumPy dtype object (such as numpy.int) that specifies the type of the variable’s value to fitting methods.
If this is None (default) then no type is enforced
The Stochastic class
There are three main ways to create stochastic variables, called
the automatic, decorator, and direct interfaces.
Automatic