You are on page 1of 59

libpgm for Bayesian networks

Dr. A. Obulesh
Associate Professor
Monte Carlo Artificial Intelligence:
Bayesian Networks

2
Why This Matters
Bayesian networks have been the most
important contribution to the field of AI in
the last 10 years
Provide a way to represent knowledge in
an uncertain domain and a way to reason
about this knowledge
Many applications: medicine, factories,
help desks, spam filtering, etc.

3
Medical diagnostics
Digital image processing
Natural language processing

4
Conditional probability
Probability

Example:
The world is not discrete , multiple things
are happen simultaneously .

5
What need to happen for A& B to
happen ?
Lets say first A needs to happen and then
A already happening , B needs to happen
p
P(A) = P(B|A)
(B) = P(A|B)
P(one event| Another event)

6
Bayes Theorem
 P(A|B) = P(B|A) * P(A)/ P(B)

P(B/A) =

7
Joint probability
 Joint probability distribution describes
about how two or more variables are
distributed simultaneously.
To get a probability from the joint
distribution of A and B , you would
consider P(A = and B= b)

8
conditional probability :

 Conditional probability distribution looks


at how the probabilities of A are
distributed given a certain value say for
B : P( A=a | B=b)

9
A Bayesian Network
B P(B) E P(E) A Bayesian network is
false 0.999 false 0.998
made up of two parts:
true 0.001 true 0.002
1. A directed acyclic
Burglary Earthquake graph
2. A set of parameters
B E A P(A|B,E)
Alarm
false false false 0.999
false false true 0.001
false true false 0.71
false true true 0.29
true false false 0.06
true false true 0.94
true true false 0.05
true true true 0.95
A Directed Acyclic Graph

Burglary Earthquake

Alarm

1. A directed acyclic graph:


 The nodes are random variables (which can be discrete or
continuous)
 Arrows connect pairs of nodes (X is a parent of Y if there is an
arrow from node X to node Y).

11
A Directed Acyclic Graph
Burglary Earthquake

Alarm

 Intuitively, an arrow from node X to node Y means X has a direct


influence on Y (we can say X has a casual effect on Y)
 Easy for a domain expert to determine these relationships
 The absence/presence of arrows will be made more precise later
on

12
A Set of Parameters
B P(B) E P(E) Burglary Earthquake
false 0.999 false 0.998
true 0.001 true 0.002

B E A P(A|B,E)
Alarm
false false false 0.999
false false true 0.001 Each node Xi has a conditional probability
false true false 0.71 distribution P(Xi | Parents(Xi)) that quantifies the
false true true 0.29 effect of the parents on the node
true false false 0.06
The parameters are the probabilities in these
true false true 0.94
conditional probability distributions
true true false 0.05
Because we have discrete random variables, we
true true true 0.95
have conditional probability tables (CPTs)

13
A Set of Parameters
Conditional Probability Stores the probability distribution
Distribution for Alarm for Alarm given the values of
Burglary and Earthquake
B E A P(A|B,E)
false false false 0.999
For a given combination of values of the
false false true 0.001
parents (B and E in this example), the
false true false 0.71
entries for P(A=true|B,E) and P(A=false|
false true true 0.29
B,E) must add up to 1 eg. P(A=true|
true false false 0.06 B=false,E=false) + P(A=false|
true false true 0.94 B=false,E=false)=1
true true false 0.05
true true true 0.95

If you have a Boolean variable with k Boolean parents, how big is


the conditional probability table?
How many entries are independently specifiable?
Semantics of Bayesian
Networks

15
Bayes Nets Formalized
A Bayes net (also called a belief network) is an augmented
directed acyclic graph, represented by the pair V , E
where:
◦ V is a set of vertices.
◦ E is a set of directed edges joining vertices. No loops
of any length are allowed.

Each vertex in V contains the following information:


◦ The name of a random variable
◦ A probability distribution table indicating how the
probability of this variable’s values depends on all
possible combinations of parental values.
16
Semantics of Bayesian Networks
Two ways to view Bayes nets:
1. A representation of a joint probability
distribution
2. An encoding of a collection of
conditional independence statements

17
Bayesian Network Example
Weather Cavity

Toothache Catch P(A|B,C) = P(A|C)


I(ToothAche,Catch|Cavity)

• Weather is independent of the other variables, I(Weather,Cavity) or


P(Weather) = P(Weather|Cavity) = P(Weather|Catch) = P(Weather|
Toothache)
• Toothache and Catch are conditionally independent given Cavity
• I(Toothache,Catch|Cavity) meaning P(Toothache|Catch,Cavity) =
P(Toothache|Cavity)
18
Conditional Independence
We can look at the actual graph structure and determine
conditional independence relationships.
1. A node (X) is conditionally independent of its non-
descendants (Z1j, Znj), given its parents (U1, Um).

19
A Representation of the Full Joint
Distribution
We will use the following abbrevations:
◦ P(x1, …, xn) for P( X1 = x1  …  Xn = xn)
◦ parents(Xi) for the values of the parents of Xi
From the Bayes net, we can calculate:
n
P ( x1 ,..., xn )   P ( xi | parents ( X i ))
i 1

20
The Full Joint Distribution
P ( x1 ,..., xn )
 P ( xn | xn 1 ,..., x1 ) P ( xn 1 ,..., x1 ) ( Chain Rule)

 P ( xn | xn 1 ,..., x1 ) P ( xn 1 | xn  2 ,..., x1 ) P( xn 2 ,..., x1 ) ( Chain Rule)


 P ( xn | xn 1 ,..., x1 ) P ( xn 1 | xn  2 ,..., x1 )...P ( x2 | x1 ) P ( x1 )
n ( Chain Rule)
  P ( xi | xi 1 ,..., x1 )
i 1 We’ll look at this step
n more closely
  P ( xi | parents ( xi ))
i 1

21
n
The Full Joint Distribution
n

 P( x | x ,..., x )   P( x | parents ( x ))
i 1
i i 1 1
i 1
i i

To be able to do this, we need two things:


1. Parents(Xi)  {Xi-1, …, X1}
This is easy – we just label the nodes according to the
partial order in the graph
2. We need Xi to be conditionally independent of its
predecessors given its parents
This can be done when constructing the network. Choose
parents that directly influence Xi.

22
Example
Burglary Earthquake

Alarm

JohnCalls MaryCalls

P(JohnCalls, MaryCalls, Alarm, Burglary, Earthquake)


= P(JohnCalls | Alarm) P(MaryCalls | Alarm ) P(Alarm | Burglary,
Earthquake ) P( Burglary ) P( Earthquake )

23
Conditional Independence
There is a general topological criterion called d-
separation
d-separation determines whether a set of nodes
X is independent of another set Y given a third
set E

24
D-separation
We will use the notation I(X, Y | E) to
mean that X and Y are conditionally
independent given E
Theorem [Verma and Pearl 1988]:
If a set of evidence variables E d-separates X
and
Y in the Bayesian Network’s graph, then
I(X, Y | E)
d-separation can be determined in linear
time using a DFS-like algorithm
25
D-separation
Let evidence nodes E  V (where V are
the vertices or nodes in the graph), and X
and Y be distinct nodes in V – E.
We say X and Y are d-separated by E in
the Bayesian network if every undirected
path between X and Y is blocked by E.
What does it mean for a path to be
blocked? There are 3 cases…

26
Case 1
There exists a node N on the path such that
• It is in the evidence set E (shaded grey)
• The arcs putting N in the path are “tail-to-
tail”.

X N Y

X = “Owns N = “Rich” Y = “Owns expensive


expensive car” home”

The path between X and Y is blocked by N


27
Case 2
There exists a node N on the path such that
• It is in the evidence set E
• The arcs putting N in the path are “tail-to-
head”.
X N Y

X=Education N=Job Y=Rich

The path between X and Y is blocked by N


28
Case 3
There exists a node N on the path such that
• It is NOT in the evidence set E (not shaded)
• Neither are any of its descendants
• The arcs putting N in the path are “head-to-
head”.
X N Y

The path between X and Y is blocked by N


(Note N is not in the evidence set) 29
Case 3 (Explaining Away)
Burglary Earthquake

Alarm

Your house has a twitchy burglar alarm that is also sometimes triggered by
earthquakes
Earth obviously doesn’t care if your house is currently being broken into
While you are on vacation, one of your nice neighbors calls and lets you
know your alarm went off

30
Case 3 (Explaining Away)
Burglary Earthquake

Alarm

But if you knew that a medium-sized earthquake happened, then you’re


probably relieved that it’s probably not a burglar
The earthquake “explains away” the hypothetical burglar
This means that Burglary and Earthquake are not independent given Alarm.
But Burglary and Earthquake are independent given no evidence ie. learning
about an earthquake when you know nothing about the status of your alarm
doesn’t give you any information about the burglary
31
d-separation Recipe
To determine if I(X, Y | E), ignore the directions
of the arrows, find all paths between X and Y
Now pay attention to the arrows. Determine if
the paths are blocked according to the 3 cases
If all the paths are blocked, X and Y are d-
separated given E
Which means they are conditionally
independent given E

32
Conditional Independence
Note: D-separation only finds random variables
that are conditionally independent based on the
topology of the network
Some random variables that are not d-separated
may still be conditionally independent because of
the probabilities in their CPTs

33
Joint probability
 Joint probability distribution describes
about how two or more variables are
distributed simultaneously.
To get a probability from the joint
distribution of A and B , you would
consider P(A = and B= b)

34
conditional probability :

 conditional probability distribution looks


at how the probabilities of A are
distributed given a certain value say for
B : P( A=a | B=b)

35
libpgm
libpgm was developed at CyberPoint
Labs during the Summer of 2012 by
Charles Cabot working under the
direction of James Ulrich and Mark
Raugas.
libpgm
The library consists of a series of
importable modules, which either
represent types of Bayesian graphs,
contain methods to operate on them, or
both.
libpgm
 dictionary
 graphskeleton
 orderedskeleton
 nodedata
 discretebayesiannetwork
 hybayesiannetwork
 lgbayesiannetwork
 dyndiscbayesiannetwork
 tablecpdfactorization
 tablecpdfactor
 sampleaggregator
 pgmlearner
 CPDtypes
 discrete
 linear gaussian
 linear gaussian + discrete
 crazy (test type)
INPUT
Because Bayesian probability graphs are
large and contain a lot of data, the library
works with .txt files as inputs.
The formatting used is JavaScript Object
Notation (JSON), with some flexibility
Internally, the library stores these files
as json objects from python’s json library.
Deterministic Inference
◦ Compute the probability distribution over a
specific node or nodes in a discrete-CPD
Bayesian network (given evidence, if present)
◦ Compute the exact probability of an outcome
in a discrete-CPD Bayesian network (given
evidence, if present)
Approximative Inference
◦ Compute the approximate probability
distribution by generating samples
PyLearning
• Learn the CPDs of a discrete-CPD Bayesian network,
given data and a structure
• Learn the structure of a discrete Bayesian network,
given only data
• Learn the CPDs of a linear Gaussian Bayesian
network, given data and a structure
• Learn the strcutre of a linear Gaussian Bayesian
network, given only data
• Learn entire Bayesian networks (structures and
parameters) from data
Features
 PyMC provides functionalities to make Bayesian analysis as
painless as possible. Here is a short list of some of its features:
 Fits Bayesian statistical models with Markov chain Monte Carlo
and other algorithms.
 Includes a large suite of well-documented statistical distributions.
 Uses NumPy for numerics wherever possible.
 Includes a module for modeling Gaussian processes.
 Sampling loops can be paused and tuned manually, or saved and
restarted later.
Features Contd..
 Creates summaries including tables and plots.
 Traces can be saved to the disk as plain text, Python pickles,
SQLite or MySQL database, or hdf5 archives.
 Several convergence diagnostics are available.
 Extensible: easily incorporates custom step methods and unusual
probability distributions.
 MCMC loops can be embedded in larger programs, and results
can be analyzed with the full power of Python
new in version 2
PyMC 2 series provides:
 the New flexible object model and syntax (not backward-
compatible).
 Reduced redundant computations: only relevant log-probability
terms are computed, and these are cached.
 Optimized probability distributions.
 New adaptive blocked Metropolis step method.
 New slice sampler method.
 Much more!
 History
 PyMC began development in 2003, as an effort to generalize the
process of building Metropolis-Hastings samplers, with an aim to
making Markov chain Monte Carlo (MCMC) more accessible to
non-statisticians (particularly ecologists).
 The choice to develop PyMC as a python module, rather than a
standalone application, allowed the use MCMC methods in a
larger modeling framework. By 2005.
 PyMC was reliable enough for version 1.0 to be released to the
public.
 History
 In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the
development team for PyMC 2.0. This iteration of the software strives for
more flexibility, better performance and a better end-user experience than
any previous version of PyMC.

 PyMC 2.2 was released in April 2012. It contains numerous bugfixes and
optimizations, as well as a few new features, inculding improved output
plotting, csv table output, improved imputation syntax, and posterior
predictive check plots.

 PyMC 2.3 was released on October 31, 2013. It included Python 3


compatibility, the addition of the half-Cauchy distribution, improved
summary plots, and some important bug fixes.
 History
 NOTE:
 The current version PyMC (version 3) has been moved to its
own repository called `pymc3`_.
 Unless you have a good reason for using this package, we
recommend all new users adopt PyMC3.
 Dependencies
 PyMC requires some prerequisite packages to be present on the
system. Fortunately, there are currently only a few hard
dependencies, and all are freely available online.
 Python version 2.6 or later.
 NumPy (1.6 or newer): The fundamental scientific programming
package, it provides a multidimensional array type and many
useful functions for numerical analysis.
 Matplotlib (1.0 or newer): 2D plotting library which produces
publication quality figures in a variety of image formats and
interactive environments
 Dependencies Cont..
 SciPy (optional): Library of algorithms for mathematics, science
and engineering.
 pyTables (optional): Package for managing hierarchical datasets
and designed to efficiently and easily cope with extremely large
amounts of data. Requires the HDF5 library.
 pydot (optional): Python interface to Graphviz’s Dot language, it
allows PyMC to create both directed and non-directed graphical
representations of models. Requires the Graphviz library.
 IPython (optional): An enhanced interactive Python shell and an
architecture for interactive parallel computing.
 nose (optional): A test discovery-based unittest extension (required
to run the test suite).
 Dependencies Cont..
 Bayesian inference begins with specification of a probability
model relating unknown variables to data. PyMC provides three
basic building blocks for Bayesian probability
models: Stochastic, Deterministic and Potential.
 A Stochastic object represents a variable whose value is not
completely determined by its parents,
 Deterministic object represents a variable that is entirely
determined by its parents.
 In object-oriented programming parlance, 
 Stochastic and Deterministic are subclasses of the Variable class,
which only serves as a template for other classes and is never
actually implemented in models.
 Dependencies Cont..
 The third basic class, Potential, represents ‘factor potentials’ (

[Lauritzen1990],[Jordan2004]_), which are not variables but

simply log-likelihood terms and/or constraints that are multiplied

into joint distributions to modify them. Potential and Variable are

subclasses of Node.
 Dependencies Cont..
 PyMC probability models are simply linked groups

of Stochastic, Deterministic and Potential objects.
The Stochastic class
 A stochastic variable has the following primary attributes:
 value:The variable’s current value.
 logp:The log-probability of the variable’s current value given the values of its parents.
 A stochastic variable can optionally be endowed with a method called random, which draws a value for the
variable given the values of its parents 1. Stochastic objects have the following additional attributes:
 parents: A dictionary containing the variable’s parents. The keys of the dictionary correspond to the names
assigned to the variable’s parents by the variable, and the values correspond to the actual parents. For example, the
keys of ss’s parents dictionary in the mining disaster model would be 't_l' and 't_h'. Thanks to Python’s dynamic
typing, the actual parents (i.e. the values of the dictionary) may be of any class or type.
 children:A set containing the variable’s children.
 extended_parents:A set containing all the stochastic variables on which the variable depends either directly or via a
sequence of deterministic variables. If the value of any of these variables changes, the variable will need to
recompute its log-probability.
 extended_children:A set containing all the stochastic variables and potentials that depend on the variable either
directly or via a sequence of deterministic variables. If the variable’s value changes, all of these variables will need
to recompute their log-probabilities.
 observed:A flag (boolean) indicating whether the variable’s value has been observed (is fixed).
 dtype:A NumPy dtype object (such as numpy.int) that specifies the type of the variable’s value to fitting methods.
If this is None (default) then no type is enforced
The Stochastic class
 There are three main ways to create stochastic variables, called

the automatic, decorator, and direct interfaces.

 Automatic

 AutomaticStochastic variables with standard distributions

provided by PyMC (see chapter Probability distributions) can be

created in a single line using special subclasses of Stochastic.


The Stochastic class
 Decorator

 Uniformly-distributed discrete stochastic variable switchpoint in

(disaster_model) could alternatively be created from a function

that computes its log-probability as follows:


DirectIt’s possible to instantiate Stochastic directly:
 class TabularCPD(DiscreteFactor): """ Defines the
conditional probability distribution table (CPD table)
Parameters ---------- variable: int, string (any hashable python
object)
The variable whose CPD is defined.
variable_card: integer Cardinality/no. of states of `variable`
values: 2D array, 2D list or 2D tuple Values for the CPD
table.
Please refer the example for the exact format needed.
 evidence: array-like List of variables in evidences(if any)
w.r.t. which CPD is defined.
evidence_card: array-like cardinality/no. of states of variables
in `evidence`(if any)
 Does an offer depends on Genetics ?
does offer depends on genetics , if you
know practice
does an offer depends on genetics if you
know olympic trails performance
Thank you

You might also like