Introduction To Bayesian Networks & Bayesialab: Stefan - Conrady@Bayesia - Us

Introduction to Bayesian Networks & BayesiaLab
Stefan Conrady, stefan.conrady@bayesia.us

Dr. Lionel Jouffe, jouffe@bayesia.com
September 3, 2013
Table of Contents
Introduction 4
The New Bayesian Network Paradigm in Context 5

A Map of Analytic Modeling 5
Quadrant 2: Predictive Modeling 6
Quadrant 4: Explanatory Modeling 7
Bayesian Networks: Theory and Data 8
Bayesian Networks: Association and Causation 9
Bayesian Network Theory 10

A Bayesian Network Example 11
A Dynamic Bayesian Network Example 12
Probabilistic Semantics 13
Evidential Reasoning 15
Causal Reasoning 15
Causal Discovery 17
Bayesian Networks and BayesiaLab in Practice 19

BayesiaLab 5.2 19
The BayesiaLab Workflow 20
BayesiaLab in Context 21
BayesiaLab’s Features and Functions in Practice 22
Knowledge Modeling (Quadrant 4) 22
Knowledge Discovery with Machine Learning (Quadrant 3) 23
Reasoning Under Uncertainty 23
Discrete, Nonlinear and Nonparametric Modeling 23
Unsupervised Structural Learning (Quadrant 3) 24
Supervised Learning (Quadrant 2) 24
Clustering (Quadrant 2/3) 25
Observational Inference (Quadrant 1/2) 26
Causal Inference (Quadrant 3/4) 26
Diagnosis, Prediction and Simulation 26
ii www.bayesia.us | www.bayesia.sg | www.bayesia.com

Effects Analysis (Quadrants 3/4) 27

Analyzing Observational Studies 27
Optimization (Quadrant 4) 28
Summary 28
References 29
Contact Information 30
Bayesia USA 30
Bayesia Singapore Pte. Ltd. 30
Bayesia S.A.S. 30
Copyright 30
www.bayesia.us | www.bayesia.sg | www.bayesia.com iii

Introduction
With Professor Judea Pearl receiving the prestigious 2011 A.M. Turing Award, Bayesian networks have pre-
sumably received more public recognition than ever before. Judea Pearl’s achievement of establishing Bayes-
ian networks as a new paradigm is fittingly summarized by Stuart Russell:
“[Judea Pearl] is credited with the invention of Bayesian networks, a mathematical formalism for
defining complex probability models, as well as the principal algorithms used for inference in these
models. This work not only revolutionized the field of artificial intelligence but also became an im-
portant tool for many other branches of engineering and the natural sciences. He later created a
mathematical framework for causal inference that has had significant impact in the social sciences.”
While their theoretical properties made Bayesian networks immediately attractive for academic research,
especially with regard to the study of causality, the arrival of practically feasible machine learning algo-
rithms has allowed Bayesian networks to grow beyond its origin in the field of computer science. Since the
first release of the BayesiaLab software package in 2001, Bayesian networks have finally become accessible
to a wide range of scientists and analysts for use in many other disciplines.
In this introductory paper, we present Bayesian networks (the paradigm) and BayesiaLab (the software
tool), from the perspective of the applied researcher.
In Chapter 1 we begin with the role of Bayesian networks in today’s world of analytics, juxtaposing them
with traditional statistics and more recent innovations in data mining.
Once we establish how Bayesian networks fit into the proverbial big picture, we present in Chapter 2 the
mathematical formalism that underpins this paradigm. While employing Bayesian networks for research has
become remarkably easy with BayesiaLab, we need to emphasize importance of their theory. Only a deep
understanding of this theory will allow researchers to fully appreciate the wide-ranging benefits of Bayesian
networks.
Finally, in Chapter 3, we provide an overview of the BayesiaLab software platform, which leverages the
Bayesian networks paradigm to far greater extent than any other tool that has ever been available in this
field. We show how the theoretical properties of Bayesian networks translate into an extremely powerful
and universal research tool for many fields of study, ranging from bioinformatics to marketing science and
beyond.
4 www.bayesia.us | www.bayesia.sg | www.bayesia.com

Feedback, comments, questions? Email us: info@bayesia.us
1. The New Bayesian Network Paradigm in Context

As we introduce Bayesian networks as a new paradigm, we will first present them in the context of what can
perhaps be described as the field of “analytic modeling.”
Such context is particularly important given the attention that Big Data and related technologies receive
these days. Their dominance in terms of publicity does perhaps drown out other many other important
methods of scientific inquiry.
Equally important is positioning Bayesian networks vis-à-vis traditional parametric statistical methods,
which have supported a myriad of scientific advances in the 20th century and that continue to serve as valid
and valuable tools for researchers today.
A Map of Analytic Modeling

Following the ideas of Breiman (2001) and Shmueli (2010), we create a map of “analytic modeling” that is
defined by two axes (Figure 1).
The x-axis reflects the Modeling Purpose, ranging from Association/Correlation to Causation. Tags on the
x-axis furthermore indicate a conceptual progression that includes description, prediction, explanation,
simulation, and optimization.
The y-axis shows the Model Source, or, more precisely, the source of the model specification: On the one
end, we have Theory as the source, one the other end, we have Data as the source. Theory is furthermore
tagged with Parametric as the prevalent modeling approach, and Human Intelligence, indicating the origin
of Theory.
On the opposite end of the y-axis, Data is associated with Machine Learning and Artificial Intelligence. It is
also tagged with Algorithmic, to highlight the contrast with the mostly parametric modeling generated from
theory.
www.bayesia.us | www.bayesia.sg | www.bayesia.com 5

Predictive Modeling
Scoring
Machine
Learning Data Forecasting
Artificial Classification
Algorithmic
Intelligence
Model Source
Q2 Q3
Q1 Q4
Explanatory Modeling Operations Research
Human Parametric Economics

Risk Analysis
Learning
Social Sciences
Human Theory Epidemiology
Decision Analysis
Intelligence
"Reasoning"
Description Prediction Explanation Simulation Optimization
Modeling Purpose
Association Causation
Correlation
Figure 1: A Conceptual Map of Analytic Modeling
Needless to say, this is a highly simplified view of the world, and readers can rightfully point out the limita-
tions of this presentation.1 Despite this caveat, we will now use the proposed map and its coordinate system
to position different modeling approaches.
Quadrant 2: Predictive Modeling

Many of today’s predictive modeling techniques are algorithmic and would fall mostly into Quad-
rant 2. In Quadrant 2, a researcher would be mostly interested in the predictive performance, i.e.
Y = f (X) .
of interest
Neural networks are a typical example of implementing machine learning techniques in this con-
text. Such models are often devoid of any theory, however, they can be excellent “statistical de-
vices” for producing predictions.
1 For instance, one could easily expand this overview by adding a third dimension, perhaps including the type of pa-
rameter estimation. With such an additional axis, one could differentiate “frequentist” and “Bayesian” estimation
methods.

Quadrant 4: Explanatory Modeling

In Quadrant 4, the researcher is interested in identifying a model structure that best reflects the un-
derlying “true” data generating process, i.e. we are looking for an explanatory model.
Thus, the function f is of greater importance than Y:
Y= f (X) .
of interest
Traditional statistical techniques, which have an explanatory purpose and that are used in epidemi-
ology and the social sciences, would mostly belong in Quadrant 4. Regressions are the best-known
models in this context. Extending further into the causal direction, we would progress into the field
of operations research, including simulation and optimization.
Despite the diverging objectives of Predictive Modeling and Explanatory Modeling, i.e. predicting Y versus
learning f, the respective methods are not necessarily incompatible. In Figure 1, this is suggested by the blue
boxes that gradually fade out as they cross the boundaries and extend beyond their “home” quadrant.
However, the best-performing modeling approaches do rarely serve predictive and explanatory purposes
equally well. In many situations, the optimal fit-for-purpose models remain very distinct from each other. In
fact, Schmueli (2010) has shown that structurally “less true” models can yield better predictive performance
than the “true” explanatory model.
We should also point out that recent advances in machine learning and data mining have mostly occurred in
Quadrant 2, and thus disproportionately benefitted predictive modeling. Unfortunately, most machine-
learned models are also remarkably difficult to interpret in terms of their structural meaning, so new theo-
ries are rarely generated this way. For instance, the well-known Netflix Prize competition generated
phenomenally-performing predictive models, but they yielded little explanatory insight into consumers’
choices.
Conversely, in Quadrant 4, purposefully machine learning explanatory models remains rather difficult. As
opposed to Quadrant 2, the availability of ever-increasing amounts of data is not even a major advantage in
the discovery of theory through machine learning.

Bayesian Networks: Theory and Data

With regard to the observed division (horizontal in our map) between Theory and Data as a model source,
Bayesian networks have a special role. Bayesian networks can be built from human knowledge, i.e. from
theory, or, they can be machine learned from data. Thus they cover the entire spectrum in terms of their
model source.
Also, due to their graphical structure, machine-learned Bayesian networks are intuitively interpretable, thus
facilitating human learning and theory building. As emphasized by the bidirectional arc in Figure 2, Bayes-
ian networks allow human learning and machine learning to interact efficiently. This way, Bayesian net-
works can be developed from a combination of human and artificial intelligence.
Machine
Learning Data
Artificial
Algorithmic
Intelligence
Model Source
Bayesian
Networks
Human Parametric
Learning
Human Theory
Intelligence
Figure 2: Bayesian Networks Spanning Theory and Data

Bayesian Networks: Association and Causation

Beyond transcending the boundaries between theory and data (on the y-axis or our map), Bayesian net-
works have very unique properties when it comes to causality. Under certain conditions and with specific
theory-driven assumptions, Bayesian networks can perform causal inference. Thus Bayesian network models
can cover the entire range from association to causation, thus spanning the entire x-axis of our analytics
map.
Machine
Learning Data
Artificial
Algorithmic
Intelligence
Model Source
Bayesian Q2 Q3
Networks Q1 Q4
Human Parametric
Learning
Causal Assumptions
Human Theory
Intelligence
"Reasoning"
Model Purpose
Correlation
Figure 3: Bayesian Networks Spanning Data to Theory, and Association to Causation
As a result, Bayesian networks are a highly versatile modeling framework, making them suitable for many
problem domains. The mathematical formalism underpinning the Bayesian network paradigm will be pre-
sented in the next chapter, Bayesian Network Theory.

2. Bayesian Network Theory2

Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the
work of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, such
models are known as directed graphical models; within cognitive science and artificial intelligence (AI), such
models are known as Bayesian networks. The name honors the Rev. Thomas Bayes (1702-1761), whose
rule for updating probabilities in the light of new evidence is the foundation of the approach.
Rev. Bayes addressed both the case of discrete probability distributions of data and the more complicated
case of continuous probability distributions. In the discrete case, Bayes’ theorem relates the conditional and
marginal probabilities of events A and B, provided that the probability of B does not equal zero:
P(B∣A)P(A)
P(A∣B) =
P(B)
In Bayes’ theorem, each probability has a conventional name:
P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the sense
that it does not take into account any information about B; however, the event B need not occur after
event A. In the nineteenth century, the unconditional probability P(A) in Bayes’s rule was called the “ante-
cedent” probability; in deductive logic, the antecedent set of propositions and the inference rule imply con-
sequences. The unconditional probability P(A) was called “a priori” by Ronald A. Fisher.
• P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is
derived from or depends upon the specified value of B.
• P(B|A) is the conditional probability of B given A. It is also called the likelihood.
• P(B) is the prior or marginal probability of B, and acts as a normalizing constant.
Bayes theorem in this form gives a mathematical representation of how the conditional probability of event
A given B is related to the converse conditional probability of B given A.
The initial development of Bayesian networks in the late 1970s was motivated by the need to model the top-
down (semantic) and bottom-up (perceptual) combination of evidence in reading. The capability for bidirec-
tional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesian
2 For the technical portion of this introduction, we defer to the words of Judea Pearl, who originally coined the term
“Bayesian network.” We are grateful to him for allowing us to use and adapt large sections from one of his technical
reports for our purposes (Pearl and Russell, 2000).

networks as the method of choice for uncertain reasoning in AI and expert systems replacing earlier, ad hoc
rule-based schemes.
The nodes in a Bayesian network represent variables of interest (e.g. the temperature of a device, the gender
of a patient, a feature of an object, the occurrence of an event) and the links represent statistical (informa-
tional)3 or causal dependencies among the variables. The dependencies are quantified by conditional prob-
abilities for each node given its parents in the network. The network supports the computation of the poste-
rior probabilities of any subset of variables given evidence about any other subset.
A Bayesian Network Example

Figure 4 shows a very simple Bayesian network consisting of only two nodes and one link, representing the
joint probability distribution of the variables Eye Color and Hair Color in a given population. In this case,
the conditional probabilities of Hair Color given the values of its parent, Eye Color, are provided in a table.
It is important to point out that this Bayesian network does not contain any causal assumptions, i.e. we
have no knowledge of the causal order between the variables, so the interpretation here should be merely
statistical (informational).
Figure 4: A Bayesian network representing the statistical relationship between to two variables.
Figure 5 illustrates another simple yet typical Bayesian network. In contrast to the statistical relationships in
Figure 4, the diagram in Figure 5 describes the causal relationships among the season of the year (X1),
whether it is raining (X2), whether the sprinkler is on (X3), whether the pavement is wet (X4), and whether
the pavement is slippery (X5). Here the absence of a direct link between X1 and X5, for example, captures
our understanding that there is no direct influence of season on slipperiness — the influence is mediated by
the wetness of the pavement (if freezing is a possibility then a direct link could be added).
3 “informational” and “statistical” are treated here as equivalent concepts and can be used interchangeably.

Figure 5: A Bayesian network representing causal influences among five variables
Perhaps the most important aspect of a Bayesian networks is that they are direct representations of the
world, not of reasoning processes. The arrows in the diagram represent real causal connections and not the
flow of information during reasoning (as in rule-based systems and neural networks). Reasoning processes
can operate on Bayesian networks by propagating information in any direction. For example, if the sprin-
kler is on, then the pavement is probably wet (deduction, prediction, simulation); if someone slips on the
pavement, that also provides evidence that it is wet (abduction, reasoning to a probable cause or diagnosis).
On the other hand, if we see that the pavement is wet, that makes it more likely that the sprinkler is on or
that it is raining (abduction); but if we then observe that the sprinkler is on, that reduces the likelihood that
it is raining (explaining away). It is this last form of reasoning, explaining away, that is especially difficult to
model in rule-based systems and neural networks in any natural way, because it seems to require the propa-
gation of information in two directions.
A Dynamic Bayesian Network Example

Entities that live in a changing environment must keep track of variables whose values change over time.
Dynamic Bayesian networks capture this process by representing multiple copies of the state variables, one
for each time step. A set of variables Xt denotes the world state at time t and a set of variables Et denotes the
observations available at time t. The model P(Et|Xt) is encoded in the conditional probability distributions
for the observable variables, given the state variables. The transition model P(Xt+1|Xt) relates the state at
time t to the state at time t+1. Keeping track of the world means computing the current probability distribu-
tion over world states given all past observations, i.e., P(Xt|E1,…,Et). Dynamic Bayesian networks are

strictly more expressive than other temporal probability models such as hidden Markov models and Kalman
filters.
time step t
time step t+1
Figure 6: Dynamic Bayesian Network
Probabilistic Semantics
Any complete probabilistic model of a domain must, either explicitly or implicitly, represent the joint prob-
ability distribution — the probability of every possible event as defined by the combination of the values of
all the variables. There are exponentially many such events, yet Bayesian networks achieve compactness by
factoring the joint distribution into local, conditional distributions for each variable given its parents. If xi
denotes some value of the variable Xi and pai denotes some set of values for the parents of Xi, then P(xi|pai)
denotes this conditional distribution. For example, P(x4|x2,x3) is the probability of wetness given the values
of sprinkler and rain. The global semantics of Bayesian networks specifies that the full joint distribution is
given by the product
P(xi ,..., xn ) = ∏ P(xi pai )

i (1)
In our example network, we have

P(x1 , x2 , x3 , x4 , x5 ) = P(x1 )P(x2∣x1 )P(x3∣x1 )P(x4∣x2 , x3 )P(x5∣x4 ) (2)

It becomes clear that the number of parameters, i.e. the number of conditional probability distributions,
grows linearly with the size of the network, i.e. the number of variables, however, the conditional probabil-
ity distribution grows exponentially with the number of parents. Further savings can be achieved using
compact parametric representations — such as noisy-OR models, decision trees, or neural networks — for
the conditional distributions.
There is also an entirely equivalent local semantics, which asserts that each variable is independent of its
nondescendants in the network given its parents. For example, the parents of X4 in Figure 7 are X2 and X3
and they render X4 independent of the remaining nondescendant, X1. That is,
P(x4∣x 1 , x2 , x3 ) = P(x4∣x2 , x3 ) .
Non-Descendants
Parents
Descendant
Figure 7: Variable X4 is independent of its non-descendants, in this case X1, given its parents, X3 and X2
The collection of independence assertions formed in this way suffices to derive the global assertion in Equa-
tion 1, and vice versa. The local semantics is most useful in constructing Bayesian networks, because select-
ing as parents all the direct causes (or direct relationships) of a given variable invariably satisfies the local
conditional independence conditions. The global semantics leads directly to a variety of algorithms for rea-
soning.

Evidential Reasoning
From the product specification in Equation 1 one can express the probability of any desired proposition in
terms of the conditional probabilities specified in the network. For example the probability that the sprin-
kler is on given that the pavement is slippery is
P(X 3 = on, X5 = true)

P(X 3 = on∣X5 = true) =
P(X5 = true)
=
∑ x1 , x2 , x4 P(x1 , x2 , X 3 = on, x4 , X5 = true)
∑ x1 , x2 , x3 , x4 P(x1 , x2 , x3 , x4 , X5 = true)
==
∑ x1 , x2 , x4 P(x1 )P(x2∣x1 )P(X 3 = on∣x1 )P(x4∣x2 , X 3 = on)P(X5 = true∣x4 )
∑ x1 , x2 , x3 , x4 P(x1 )P(x2∣x1 )P(x3∣x1 )P(x4∣x2 , x3 )P(X5 = true∣x4 )
The first algorithms proposed for probabilistic calculations in Bayesian networks used a local distributed
message-passing architecture, typical of many cognitive activities. Initially this approach was limited to tree-
structured networks, but was later extended to general networks in Lauritzen and Spiegelhalter’s (1988)
method of junction tree propagation. A number of other exact methods have been developed and can be
found in recent textbooks.
It is easy to show that reasoning in Bayesian networks subsumes the satisfiability problem in propositional
logic and hence is NP-hard. Monte Carlo simulation methods can be used for approximate inference (Pearl,
1988) giving gradually improving estimates as sampling proceeds. These methods use local message propa-
gation on the original network structure unlike junction tree methods. Alternatively, variational methods
provide bounds on the true probability.
Causal Reasoning
Most probabilistic models including, general Bayesian networks, describe a distribution over possible ob-
served events — as in Equation 1 — but say nothing about what will happen if a certain intervention oc-
curs. For example, what if I turn the sprinkler on instead of just observing that it is turned? What effect
does that have on the season, or on the connection between wetness and slipperiness? A causal network,
intuitively speaking, is a Bayesian network with the added property that the parents of each node are its
direct causes — as in Figure 2. In such a network, the result of an intervention is obvious: the sprinkler node
is set to X3 = on and the causal link between the season X1 and the sprinkler X3 is removed (see Figure 8).
All other causal links and conditional probabilities remain intact so the new model is

P(x1 , x2 , x4 , x5 ) = P(x1 )P(x2∣x1 )P(x4∣x2 , X 3 = on)P(x5∣x4 ).
Notice that this differs from observing that X3=on, which would result in a new model that included the
term P(X3=on|x1). This mirrors the difference between seeing and doing: after observing that the sprinkler is
on, we wish to infer that the season is dry, that it probably did not rain, and so on; an arbitrary decision to
turn the sprinkler on should not result in any such beliefs.
Figure 8: A causal network reflecting the intervention, X3=on
Causal networks are more properly defined, then, as Bayesian networks in which the correct probability
model after intervening to fix any node’s value is given simply by deleting links from the node’s parents. For
example, Fire → Smoke is a causal network whereas Smoke → Fire is not, even though both networks are
equally capable of representing any joint distribution on the two variables. Causal networks model the envi-
ronment as a collection of stable component mechanisms. These mechanisms may be reconfigured locally by
interventions, with correspondingly local changes in the model. This, in turn, allows causal networks to be
used very naturally for prediction by an agent that is considering various courses of action.
Learning Bayesian Networks
In machine learning approaches, the conditional probabilities P(xi|pai) are typically estimated with the
maximum likelihood approach (observed frequencies in the dataset). In pure Bayesian approaches, models

are designed by expertise and include hyperparameter nodes. Data (usually scarce) is used as pieces of evi-
dence set in the networks for incrementally updating the distributions of the hyperparameters (Bayesian up-
dating).
It is also possible to machine learn the structure of a Bayesian network, and two families of methods are
available for that purpose. The first one, the constraint-based algorithms, is based on the probabilistic se-
mantic of Bayesian networks. Links are added or deleted according to the results of statistical tests, which
identify marginal and conditional independencies. The second approach, the score-based algorithms, is
based on a metric measuring the quality of candidate networks with respect to the observed data. This met-
ric trades off network complexity against degree of fit to the data, typically expressed as the likelihood of
the data given the network.
As a substrate for learning, Bayesian networks have the advantage that it is relatively easy to encode prior
knowledge in network form, e.g. by fixing portions of the structure or defining forbidden arcs.
Causal Discovery
One of the most exciting prospects in recent years has been the possibility of using Bayesian networks to
discover causal structures in raw statistical data — a task previously considered impossible without con-
trolled experiments. Consider, for example, the following intransitive pattern of dependencies among three
events: A and B are dependent. B and C are dependent, yet A and C are independent. If you ask a person to
supply an example of three such events, the example would invariably portray A and C as two independent
causes and B as their common effect, namely, A → B ← C. (For instance A and C could be the outcomes of
two fair coins and B represents a bell that rings whenever either coin comes up heads.)

Figure 9: Causal model for variables A, C and B, representing two fair coins and a bell respectively
Fitting this dependence pattern with a scenario in which B is the cause and A and C are the effects is
mathematically feasible but very unnatural, because it must entail fine tuning of the probabilities involved;
the desired dependence pattern will be destroyed as soon as the probabilities undergo a slight change.
Such thought experiments tell us that certain patterns of dependency, which are totally void of temporal
information, are conceptually characteristic of certain causal directionalities and not others. When put to-
gether systematically, such patterns can be used to infer causal structures from raw data and to guarantee
that any alternative structure compatible with the data must be less stable than the one(s) inferred; namely
slight fluctuations in parameters will render that structure incompatible with the data.

3. Bayesian Networks and BayesiaLab in Practice

While the conceptual advantages mentioned in the previous chapter have been known in the world of aca-
demia for some time, prior to BayesiaLab’s first release in 2001, leveraging these properties for practical
research applications was virtually impossible for non-computer scientists.
BayesiaLab 5.2
BayesiaLab is a powerful desktop application (Windows/Mac/Unix) with a highly-sophisticated graphical
user interface, which provides scientists a comprehensive “lab” environment for machine learning, knowl-
edge modeling, diagnosis, analysis, simulation, and optimization. With BayesiaLab, Bayesian networks have
become a powerful and practical tool to gain deep understanding of high-dimensional domains. It leverages
the inherently graphical structure of Bayesian networks for exploring and explaining complex problems.
Figure 10: BayesiaLab 5.2 Professional Screenshot
BayesiaLab is the result of nearly twenty years of software development by Dr. Lionel Jouffe and Dr. Paul
Munteanu. In 2001, their research efforts led to the formation of Bayesia S.A.S., headquartered in Laval in
northwestern France. Today, the company has grown to become the leading supplier of Bayesian network-
related technologies for hundreds major corporations around the world.

The BayesiaLab Workflow

BayesiaLab is designed around the Bayesian network paradigm, as is illustrated in Figure 11. It covers the
entire research workflow from model generation to analysis, simulation, and optimization. The entire re-
search workflow is fully contained in a single “lab” environment, which provides analysts the ultimate
flexibility in moving back and forth between different elements of the research task.
KNOWLEDGE MODELING
EXPERT ANALYTICS SIMULATION
KNOWLEDGE
DECISION SUPPORT
RISK
MANAGEMENT
DATA
KNOWLEDGE DISCOVERY B AY E S I A N DIAGNOSIS OPTIMIZATION
NETWORK
Figure 11: BayesiaLab Workflow with Bayesian Networks at its core.

BayesiaLab in Context
In Chapter 1, we presented — at a conceptual level — that Bayesian networks can cover the entire map of
analytics. Figure 12 shows what this means in practice for the researcher. BayesiaLab’s functions, repre-
sented as blue boxes, are positioned across this map, and they demonstrate the universal applicability of the
Bayesian network paradigm.
Implementing Bayesian Networks with BayesiaLab
Supervised
Learning
Machine
Learning Data Unsupervised
Data Variable
Artificial Structural
Algorithmic Clustering Clustering
Intelligence Learning
Model Source
Probabilistic
Parameter
Q2 Q3 Structural
Total &
Target
Direct Effects
Learning Equation Optimization
Q1 Q4 Models
Analysis
Bayesian
Human Parametric Updating
Learning
Human Theory Knowledge

Modeling
Influence
Diagrams
Intelligence
"Reasoning"
Modeling Purpose
Correlation
Figure 12: BayesiaLab functions positioned on the analytics map.

BayesiaLab’s Features and Functions in Practice

In Chapter 1 we presented a very conceptual context for Bayesian networks; Chapter 2 provided a theoreti-
cal rationale. In the following, last section of Chapter 3 we switch to a less formal narrative that connects
researchers’ problem-solving needs to specific BayesiaLab functions. We link features and functions to their
corresponding quadrant on the analytics map, where applicable.
Knowledge Modeling (Quadrant 4)

In today’s business environment that strives to be data-driven, old-fashioned expert knowledge is sometimes
perceived as merely qualitative or seen as “soft” knowledge. With billions of “hard” data points being ac-
cumulated every second, what cannot be counted may not count for much these days. A lifetime of experi-
ence in any particular domain may appear insignificant in comparison to the huge quantities of newly gen-
erated data.
This mindset has a critical flaw, which is that causal relationships remain difficult to be machine-learned
from data. Rather, causal reasoning typically requires some form of assumptions, i.e. assumptions coming
from human knowledge.
Experts often express causal paths in the form of

graphs. This visual representation of causes and ef-
fects has a direct analogue in the network graph in
BayesiaLab’s graph panel. Nodes (representing vari-
ables) can be added and positioned with a mouse-
click, arcs (representing relationships) can be
“drawn” between nodes. The causal direction is
simply encoded in the direction of the arc.
The quantitative nature of dependencies, plus many

other attributes can be managed in the Node Editor,
which is available by right-clicking any node.
BayesiaLab thus facilitates intuitively encoding one’s

own understanding of a domain with a minimum of
effort. Simultaneously it enforces internal consis-
tency, so that no impossible conditions are acciden-
tally encoded.
In addition to allowing users to directly encode their explicit knowledge by drawing a network in the graph
panel, the Bayesia Expert Knowledge Elicitation Environment (BEKEE) is available as an extension to
BayesiaLab. It allows to systematically elicit both explicit and tacit knowledge of a group of experts during
brainstorming sessions.

Knowledge Discovery with Machine Learning (Quadrant 3)

Despite our emphasis on the relevance of human expert knowledge, especially for identifying causal rela-
tions, there is no doubt that there is a lot to learn from data, regardless of whether the data is sparse or
“big”. BayesiaLab features a very comprehensive array of highly optimized learning algorithms that can
quickly uncover so-far-unknown structures in datasets. This proves to be particularly powerful regardless of
whether you have a handful of variables or thousands of variables, with millions of potentially relevant rela-
tionships.
Reasoning Under Uncertainty

Based on a Bayesian network, BayesiaLab can re-
liably carry out inference with multiple pieces of
uncertain and even conflicting evidence. The inher-
ent ability of Bayesian networks to facilitate com-
putations under uncertainty makes them highly
suitable for a wide range of real-world applica-
tions. Reasoning under uncertainty applies in two ways:
• Diagnosis (inference from effect to cause)
• Simulation (inference from cause to effect)
Maintaining uncertainty during inference automatically prevents presenting potentially misleading point
estimates.
Discrete, Nonlinear and Nonparametric Modeling

BayesiaLab processes all data on a discre-
tized basis. As part of BayesiaLab’s Data
Import Wizard, a number of methods are
available to discretize any continuous vari-
ables.
In BayesiaLab, all “parameters” describing

probabilistic relationships between variables
are contained in conditional probability
tables (or cubes/hypercubes when two di-
mensions are exceeded), which means that
no functional forms are utilized. Given this
nonparametric, discrete approach, Bayesia-
Lab can implicitly handle highly nonlinear
relationships between variables.
All the optimization criteria of BayesiaLab’s

learning algorithms are based on informa-

tion theory (e.g. the Minimum Description Length). With that, no assumptions of linearity are made at any
point.
Unsupervised Structural Learning (Quadrant 3)

In statistics, unsupervised learning is typically understood to be a
classification or clustering task. To make a very clear distinction, we
put emphasis on “structural” in “Unsupervised Structural Learning”,
which covers a number of important algorithms in BayesiaLab.
Unsupervised Structural
Learning means that Bayesia-
Lab can discover probabilistic
relationships between a large
number of variables, without
the need to define inputs or
outputs. One might say that
this is the quintessential form
of knowledge discovery, as no
assumptions whatsoever are
required to perform these al-
gorithms on unknown
datasets.4
Supervised Learning (Quadrant 2)

Supervised Learning in BayesiaLab has the same objective as many
traditional modeling techniques, i.e. to develop a model for predict-
ing a target variable. Some other data mining packages also offer
“Bayesian Networks” as an option in their array of available tech-
niques. However, in most cases, these packages are restricted in their
capabilities to a very limited type of network, i.e. the Naïve Bayesian
Network.
4 However, the analyst can still use any available domain knowledge to define structural constraints.

Within BayesiaLab, a vastly greater number of

algorithms is available to search for a Bayesian
network that best describes the target variable,
while taken into account the complexity of the
resulting network. The Markov Blanket algo-
rithm should be highlighted here as its speed is
particularly helpful whenever dealing with a
larger number of variables. In this context, the
Markov Blanket also serves as an exceptionally
powerful variable selection algorithm.
Finally, structural coefficient analysis, cross-

validation and data perturbation functions are
available for thoroughly bootstrapping, testing
and validating the robustness of candidate networks, helping the analyst to make a trade-off between preci-
sion and parsimony. These validation methods are applicable to both Supervised and Unsupervised Learn-
ing.
Clustering (Quadrant 2/3)

Clustering in BayesiaLab covers both data clustering (e.g. by observations) and
variable clustering, which, as the name implies, allows the grouping of variables
according to the strength of their mutual relationships.

A third variation of this concept is of particular importance in BayesiaLab: the semi-automatic Multiple
Clustering workflow can be described as a kind of nonlinear, nonparametric and nonorthogonal factor
analysis.
In practice, Multiple Clustering often serves as the basis for developing Probabilistic Structural Equation
Models (Quadrant 3/4) with BayesiaLab.
Observational Inference (Quadrant 1/2)

One of the basic properties of Bayesian networks is that they are “omnidirectional observational inference
engines.” Given an observation on any of the networks nodes (or a subset of nodes), one can compute the
posterior probabilities of all other nodes in the network. Both exact and approximate observational infer-
ence algorithms are implemented in BayesiaLab.
Causal Inference (Quadrant 3/4)

Besides observational inference, BayesiaLab also offers causal inference for computing the impact of inter-
vening on a subset of variables instead of merely observing their states. Both Pearl’s Do-Operator and
Jouffe’s Likelihood Matching are available for this purpose.
Missing Values Processing
BayesiaLab offers a range of sophisticated methods for missing values processing from which the analyst
can choose. During network learning, BayesiaLab performs missing values processing automatically “behind
the scenes”. More specifically, the Structural Expectation-Maximization algorithm and the Dynamic Com-
pletion algorithm are automatically applied after each modification of the network during learning, i.e. after
every single arc addition, suppression and inversion.
Diagnosis, Prediction and Simulation

In the Bayesian network framework, diagnosis, prediction and simulation are identical computations. They
all consist of inference conditional upon evidence. The distinction only exists from the perspective of the
researcher, who would presumably sees the symptom of a disease as an effect and the disease itself as the
cause. Hence, carrying out inference based on observed symptoms is interpreted as “diagnosis.”

BayesiaLab offers a considerable number of functions relating to inference. For instance, inference can be
performed by setting evidence, i.e. clicking on any one of the Monitors, and results are returned instantly
for all the other Monitors.
Batch Inference is available when inference needs to be computed for a large number of records. For in-
stance, this can be used for applying a predictive score for all customers in a database.
The Adaptive Questionnaire function provides guidance in terms of the optimum sequence for seeking evi-
dence. With every piece of evidence set, BayesiaLab determines which is the next best piece of evidence to
obtain for a maximum information gain with respect to the target variable. In a medical context, this allows
to optimally “escalate” diagnostic procedures, from “low-cost & small-gain evidence (e.g. measuring the
patient’s blood pressure) to “high-cost & large-gain” evidence (e.g. performing an MRI scan).
Effects Analysis (Quadrants 3/4)

Many research activities focus on estimating the size of an effect, for instance establishing the treatment ef-
fect of a new drug or determining the sales impact of a new advertising campaign. Other studies are about
attribution, i.e. they attempt to decompose observed effects into their causes and thus allocate contributions.
All of the above questions can be answered, if the domain is fully understood, which is a priori never the
case. However, if we are able to build an adequate model of the domain that captures all of its dynamics,
BayesiaLab will be able to extract the effects.
BayesiaLab employs simulation to derive effects, as parameters per se do not exist in this nonparametric
framework. As all the dynamics of the domain are encoded in discrete conditional probability tables, effect
sizes only manifest themselves when different conditions are simulated.
Total Effects Analysis, Target Mean Analysis and many more of BayesiaLab’s functions offer the analyst
ways to study effects, especially nonlinear and interactive effects.
Analyzing Observational Studies

This simulation approach also offers special opportunities for evaluating observational studies. More spe-
cifically, it can help overcome the problem of systematic differences between treatment and control groups.
BayesiaLab’s Likelihood Matching performs on-the-fly matching of pretreatment covariates as part of the
Direct Effects Analysis, thus yielding the “exclusive” effect of a particular variable on the target, everything
else being equal. This also obliterates the need for separately preforming matching techniques, such as pro-
pensity score matching.

Optimization (Quadrant 4)
The ability to perform inference across all possible states of
all nodes of the network also facilitates searching for opti-
mum values. BayesiaLab’s Target Dynamic Profile and
Target Optimization provide the toolsets for this purpose.
Using this function in combination with Direct Effects is of

particular interest when searching for the optimum combi-
nation of variables that have a nonlinear relationship with
the target (and co-relations between the drivers). A typical
example would be searching for the optimum mix of an
array of marketing instruments. BayesiaLab’s Target Opti-
mization with Direct Effects will search, within the con-
straints set by the analysts, for those scenarios that opti-
mize the target criterion.
Summary
BayesiaLab has consequently implemented (and expanded upon) the theory of Bayesian networks, which
was first introduced by Judea Pearl in the 1980s. In a remarkably short time, BayesiaLab has translated
cutting-edge theoretical research into highly relevant and practical tools for applied scientists. BayesiaLab
has opened up entirely new venues for exploring, understanding and explaining complex problem domains.

References
Barber, David. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.
Barnard, G. A, and T. Bayes. “Studies in the History of Probability and Statistics: IX. Thomas Bayes’s Essay
Towards Solving a Problem in the Doctrine of Chances.” Biometrika 45, no. 3 (1958): 293–315.
Breiman, Leo. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).”
Statistical Science 16, no. 3 (2001): 199–231.
Darwiche, Adnan. “Bayesian Networks.” Communications of the ACM 53, no. 12 (December 2010): 80.
doi:10.1145/1859204.1859227.
———. Modeling and Reasoning with Bayesian Networks. 1st ed. Cambridge University Press, 2009.
Koller, Daphne, and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. 1st ed. The
MIT Press, 2009.
Neapolitan, Richard E. Learning Bayesian Networks. Prentice Hall, 2003.
Pearl, Judea. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press, 2009.
Pearl, Judea, and Stuart Russell. Bayesian Networks. UCLA Congnitive Systems Laboratory, November
2000. http://bayes.cs.ucla.edu/csl_papers.html.
Russell, Stuart. “Judea Pearl - A.M. Turing Award Winner.” Accessed August 31, 2013.
http://amturing.acm.org/award_winners/pearl_2658896.cfm.
Shmueli, Galit. “To Explain or to Predict?” Statistical Science 25, no. 3 (August 2010): 289–310.
doi:10.1214/10-STS330.
Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search, Second Edition.
2nd ed. The MIT Press, 2001.

Contact Information
Bayesia USA
312 Hamlet’s End Way
Franklin, TN 37067
USA
Phone: +1 888-386-8383
info@bayesia.us
www.bayesia.us
Bayesia Singapore Pte. Ltd.

20 Cecil Street
#14-01, Equity Plaza
Singapore 049705
Phone: +65 3158 2690
info@bayesia.sg
www.bayesia.sg
Bayesia S.A.S.
6, rue Léonard de Vinci
BP 119
53001 Laval Cedex
France
Phone: +33(0)2 43 49 75 69
info@bayesia.com
www.bayesia.com
Copyright
© 2013 Bayesia USA, Bayesia S.A.S. and Bayesia Singapore. All rights reserved.


Introduction To Bayesian Networks & Bayesialab: Stefan - Conrady@Bayesia - Us

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Bayesian Networks & Bayesialab: Stefan - Conrady@Bayesia - Us

Uploaded by

Copyright:

Available Formats

Introduction to Bayesian Networks & BayesiaLab

Stefan Conrady, stefan.conrady@bayesia.us

The New Bayesian Network Paradigm in Context 5

Bayesian Network Theory 10

Bayesian Networks and BayesiaLab in Practice 19

ii www.bayesia.us | www.bayesia.sg | www.bayesia.com

Effects Analysis (Quadrants 3/4) 27

www.bayesia.us | www.bayesia.sg | www.bayesia.com iii

4 www.bayesia.us | www.bayesia.sg | www.bayesia.com

1. The New Bayesian Network Paradigm in Context

A Map of Analytic Modeling

www.bayesia.us | www.bayesia.sg | www.bayesia.com 5

Human Parametric Economics

Description Prediction Explanation Simulation Optimization

Figure 1: A Conceptual Map of Analytic Modeling

Quadrant 2: Predictive Modeling

6 www.bayesia.us | www.bayesia.sg | www.bayesia.com

Quadrant 4: Explanatory Modeling

Thus, the function f is of greater importance than Y:

www.bayesia.us | www.bayesia.sg | www.bayesia.com 7

Bayesian Networks: Theory and Data

Figure 2: Bayesian Networks Spanning Theory and Data

8 www.bayesia.us | www.bayesia.sg | www.bayesia.com

Bayesian Networks: Association and Causation

Description Prediction Explanation Simulation Optimization

Figure 3: Bayesian Networks Spanning Data to Theory, and Association to Causation

www.bayesia.us | www.bayesia.sg | www.bayesia.com 9

2. Bayesian Network Theory2

In Bayes’ theorem, each probability has a conventional name:

• P(B|A) is the conditional probability of B given A. It is also called the likelihood.

• P(B) is the prior or marginal probability of B, and acts as a normalizing constant.

10 www.bayesia.us | www.bayesia.sg | www.bayesia.com

A Bayesian Network Example

www.bayesia.us | www.bayesia.sg | www.bayesia.com 11

Figure 5: A Bayesian network representing causal influences among five variables

A Dynamic Bayesian Network Example

12 www.bayesia.us | www.bayesia.sg | www.bayesia.com

time step t+1

Figure 6: Dynamic Bayesian Network

P(xi ,..., xn ) = ∏ P(xi pai )

In our example network, we have

www.bayesia.us | www.bayesia.sg | www.bayesia.com 13

P(x1 , x2 , x3 , x4 , x5 ) = P(x1 )P(x2∣x1 )P(x3∣x1 )P(x4∣x2 , x3 )P(x5∣x4 ) (2)

14 www.bayesia.us | www.bayesia.sg | www.bayesia.com

P(X 3 = on, X5 = true)

www.bayesia.us | www.bayesia.sg | www.bayesia.com 15

P(x1 , x2 , x4 , x5 ) = P(x1 )P(x2∣x1 )P(x4∣x2 , X 3 = on)P(x5∣x4 ).

Figure 8: A causal network reflecting the intervention, X3=on

Learning Bayesian Networks

16 www.bayesia.us | www.bayesia.sg | www.bayesia.com

www.bayesia.us | www.bayesia.sg | www.bayesia.com 17

18 www.bayesia.us | www.bayesia.sg | www.bayesia.com

3. Bayesian Networks and BayesiaLab in Practice

Figure 10: BayesiaLab 5.2 Professional Screenshot

www.bayesia.us | www.bayesia.sg | www.bayesia.com 19

The BayesiaLab Workflow

Figure 11: BayesiaLab Workflow with Bayesian Networks at its core.

20 www.bayesia.us | www.bayesia.sg | www.bayesia.com

Implementing Bayesian Networks with BayesiaLab

Human Theory Knowledge

Description Prediction Explanation Simulation Optimization

Figure 12: BayesiaLab functions positioned on the analytics map.

www.bayesia.us | www.bayesia.sg | www.bayesia.com 21