Professional Documents
Culture Documents
net/publication/313959314
CITATIONS READS
8 953
2 authors, including:
Aakanksha Bapna
International Institute of Information Technology Bangalore
6 PUBLICATIONS 33 CITATIONS
SEE PROFILE
All content following this page was uploaded by Aakanksha Bapna on 15 June 2018.
Abstract: A B C
We propose a novel method for Bayesian learning of the param-
eters of a mixed belief network. Given the structure of a network,
the parameters of conditional distribution of a node based on its
type (discrete or continuous) and the types of its parents are learnt F
D E
from the data. This node-wise updating scheme puts no restric-
tion on the number and type of parents any node can have. We
also extended the traditional algorithm for learning pure Gaus-
sian networks to (i) deal with conditional Gaussian nodes, (ii) al- I
G H
low continuous nodes to be multivariate Gaussian and (iii) be able
to converge to actual mean, covariance and weights of the network
FIGURE 1. An example of a mixed Bayesian network. Circles represent
with which we generated the data. continuous nodes and squares represent discrete nodes,
Keywords:
Mixed Bayesian Networks, Bayesian parameter learning been ideal in this case. Existing algorithms for Parameter learn-
ing in mixed Bayesian networks rely on Maximum Likelihood
1. Introduction (MLE).
We propose an algorithm for Bayesian learning of the param-
Bayesian Networks are ubiquitous in machine learning ap- eters of a mixed belief network. The algorithm
plications as these capture the relationships between different 1. works for any topology of discrete and continuous nodes.
variables very well. However Bayesian network algorithms 2. handles multivariate continuous nodes.
have traditionally been designed for handling categorical vari- The rest of the paper is organised as follows. Some previous
ables and Bayesian networks have not typically been suitable attempts similar to our work have been discussed in Section 2.
for dealing with continuous valued variables. There has been Details of our learning procedure are given in Section 3. Our
reasonable success in handling networks where all the variables results and conclusions are shared in Section 4.
are continuous valued, in particular Gaussian. Most real data,
however, in many domains like health care, image processing 2. Related Works
and e-commerce consists of a mix of categorical (discrete val-
ued) and continuous (Gaussian) variables. Because of the lack There have been numerous attempts to learn the parameters
of good algorithms to learn Bayesian networks with mixed vari- of a Bayesian network. Table 1 summarizes some open source
ables their usability has remained limited to modeling discrete libraries for Bayesian networks compared on the basis of some
variables. Works that have attempted to deal with mixed vari- basic functionalities. (C and D used anywhere in the paper
able types have tried to keep the two kinds of variables sep- stand for continuous and discrete nodes respectively). These
arate. For instance [1] models occurrences of objects as dis- were the only open source libraries which dealt with mixed
crete variables and and learns mixtures of Gaussians using ex- Bayesian networks. However most of them have one or the
pectation maximization for the locations of objects (continu- other drawback. Bayes Net Toolbox (BNT) given by Kevin
ous variables). A single Bayesian network that encompasses Murphy [2] proved to be the most powerful one. It deals with
all the variables and is able to model the inter-dependencies pure discrete, pure Gaussian, multivariate Gaussian and even
between the continuous and categorical variables would have hybrid cases but carries out only MLE for parameter learn-
TABLE 1. Features supported by few open source Bayesian network libraries.
ing. Bayesian parameter learning is supported only for discrete Some other works in this field, the implementations of which
nodes. Leveraging the fact that BNT already supports inferenc- have not been open sourced yet are described below.
ing using numerous inference engines and structure learning In an attempt proposed in [9], the continuous nodes are dis-
using all prevalent algorithms we choose to extend BNT to in- cretized using quantized intervals of values for attributes to
clude generic Bayesian parameter learning for mixed networks. yield generalized techniques for learning mixed Bayesian net-
Hence our algorithm is just an add-on over BNT. works. However accuracy of the inference using networks
A similar attempt in this area is the algorithm by S.G. learnt this way is directly affected by the widths of the quanti-
Bottcher [3], implemented as the R library deal [4]. Bottcher zation intervals. Acceptable accuracy invariably requires quan-
proposed a new master prior procedure for parameter learning tization at very fine intervals and that in turn makes the learning
for a Bayesian network. Apart from the little ambiguity im- and inference rather slow.
plicit in this work regarding the distinction between conditional Davis and Moore [10], propose a different interpretation of
Gaussian and plain Gaussian nodes (since Bottcher’s algorithm Bayesian network. They model a Bayesian network as low-
needs the parameter learning for learning the structure of the dimensional mixtures of Gaussians. These mixtures of Gaus-
network), Bottcher’s algorithm in fact does not handle the fol- sians over different subsets of the domain variables are com-
lowing cases: (i) networks with full continuous nodes, (ii) mul- bined into a coherent joint probability model over the entire do-
tivariate continuous nodes and (iii) discrete nodes with contin- main. But it stores a lot of redundant information per node and
uous parents. Another one in R is BNlearn [5] which provides also puts a restriction on the dimensionality of data. It doesn’t
Bayesian parameter estimation only for discrete data. allow the discrete variables to take many distinct values. In a re-
cent paper [11] Krauthausen and Hanebeck proposed an MLE
Very recently a matlab package, CGBayesNets [6] was re-
algorithm for learning hybrid Bayesian Networks with Gaus-
leased, which deals with conditional Gaussian networks. They
sian mixture and Dirac mixture conditional densities, from data
claimed to have included everything from parameter learning,
given the network structure.
structure learning and inference but no clear idea about the pa-
rameter learning algorithm they used can be found in their pa-
per. 3 Detailed Learning Procedure
Two popular Bayesian network libraries in python are:
libpgm [7] (developed by students under Daphne Koller) and Our algorithm sequentially updates the parameters of the
BayesPy [8] (provides tools for Bayesian inference). However nodes in the network, one node at a time. The update step how-
both of them lack the most essential feature of Bayesian param- ever depends on the configuration of the parents of the node
eter estimation. whose parameters are being computed. The conditional dis-
TABLE 2. The probability distributions of a node based on its type (discrete or continuous) and the different types of parents it can have.
tribution of any node based on any possible parent configura- is given as-
tion is shown in Table 2. Our algorithm successfully learns
the hyperparameters for each of these distributions. Markov exp(w(:, k)0 ∗ x + b(k))
Pr(Xi = k|cpa(i) = x) = P 0
(1)
property of the Bayesian network ensures that the parameter j exp(w(:, j) ∗ x + b(j))
learning for a node depends only on itself and its parents. Let
The parameters of a softmax node, w(:, k) and b(k), k = 1..si ,
Fi = {i ∪ cpa(i) ∪ dpa(i)} where cpa(i) and dpa(i) are con-
have the following interpretation: w(:, k)−w(:, j) is the normal
tinuous and discrete parents of node i respectively.
vector to the decision boundary between classes k and j, and
b(k) − b(j) is its offset (bias).
3.1 Discrete Case If a softmax node also has discrete parents (e.g. parent nodes
E,F and child node I in Figure 1), we need to find a different
There are two methods for learning the parameters for a dis- set of w/b parameters for every configuration of these discrete
crete node i based on whether Fi has continuous nodes or not: parents.
the updated joint covariance matrix of the t-distribution as Where pj is the probability of parents of i being in their j th
ν ∗ (α∗ − n + 1) ∗ −1 configuration, calculated from data and µj is the mean learnt
T∗ = (β ) (8) for that node for that case.
ν∗ + 1
3.3.2 Base mean for Inference
While inferencing from a network the value of a continuous
node with linear Gaussian distribution, given the value of its
→
continuous parents is X, is calculated as
→
Y = N (µbase
i + bi X, Σi ) (11)