You are on page 1of 33

BAYESIAN NETWORKS

LECTURER:
Humera Farooq, Ph.D.
Computer Sciences Department,
Bahria University (Karachi Campus)
Outline
2

• Introduction
• Bayesian Interpretation
• Bayes Theorem
• Naïve Bayes
• Bayesian Networks
• Example
• Conclusion
Basics of Bayesian Learning
 Goal: find the best hypothesis from some space H of
hypotheses, given the observed data D.
 Define best to be: most probable hypothesis in H
 In order to do that, we need to assume a probability distribution
over the class H.
 In addition, we need to know something about the relation
between the data observed and the hypotheses (E.g., a coin
problem).
 h is a class variable and D are the examples (features)
Basics of Bayesian Learning
 P(h) - the prior probability of a hypothesis h (class variable)
Reflects background knowledge; before data is observed. If no
information - uniform distribution.

 P(D) - The probability that this sample of the Data is observed.


(No knowledge of the hypothesis)

 P(D|h): The probability of observing the sample D, given that


hypothesis h is the target

 P(h|D): The posterior probability of h. The probability that h is


the target, given that D has been observed.
Bayes Theorem
 In ML problems, we are interested in the probability P(h|D) that h
holds given the observed training data D.
 Bayes Theorem provides a way to calculate the posterior
probability P(h|D), from the prior probability P(h) , together with
P(D) and P(D|h).
 Bayes Theorem:
P ( D | h) P ( h)
P (h | D ) 
P( D)
 P(h|D) increase with P(h) and P(D|h) according to Bayes theorem.
 P(h|D) decreases as P(D) increases , because more probable it is
that D will be observed independent of h, the less evidence D
provides in support of h.
Bayes Theorem : An example
6
Maximum A Posteriori (MAP) Hypothesis, hMAP
7
Maximum Likelihood (ML) Hypothesis, hML
8
Example: Does a patient have cancer or not?
9
Naïve Bayes
 It is a classification technique based on Bayes’ Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier assumes that the
presence of a particular feature in a class is unrelated to the presence of any other feature.

 Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features.

 Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

 It is mainly used in text classification that includes a high-dimensional training dataset.

 It is a probabilistic classifier, which means it predicts on the basis of the probability of an


object.

 Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Types of Naïve Bayes
• Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian
distribution.
• Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier,
but the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.
Bayes Classifiers Working
Let’s assume Bayes Theorem as

where, y is class variable and X is a dependent feature vector (of


size n) where:

we split evidence into the independent parts. Now, if any two events A
(class variable) and B (feature vector) are independent, then,
P (A, B) = P (A) P (B)

Hence:

And can be expressed as

By keeping / ignoring the denominator since it is constant we got


Bayes Classifiers

To create the classifier model , calculate the probability of input for


the class variable y and select the maximum probability :

Find the P(y) and from the dataset of weather overcast


dataset
Example
• Example: Play Tennis
Example
• Learning Phase
Outlook Play=Yes Play=No Temperature Play=Yes Play=No
Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5

Humidity Play=Yes Play=No Wind Play=Yes Play=No


High 3/9 4/5 Strong 3/9 3/5
Normal 6/9 1/5 Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14


Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

– MAP rule

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053


P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.


Naïve Bayes Summary
 Naïve Bayes can handle missing values by ignoring the sample during probability
computation, is robust to outliers and irrelevant features

 Naïve Bayes algorithm is very easy to implement for applications involving textual
information data (e.g., sentiment analysis, news article classification, spam filtering).

 Convergence is quicker relative to logistic regression that discriminative in nature.

 It performs well even when the independence between features assumption does not hold.

 The resulting decision boundaries can be non-linear and/or piecewise.

 Disadvantage: It is not robust to redundant features. If the features have a strong


relationship or correlation with each other, Naïve Bayes is not a good choice. Naïve Bayes
has high bais and low variance and there are no regularization here to adjust the bias thing
Bayesian Networks
 A graphical model for representing probabilistic relationships among inputs, labels. -
Generalizes the idea of naïve Bayes to model distributions over groups of variables with
more complex conditional independence relationships.

 A Bayesian network consists of a collection of conditional probability distributions such


that their product is a full joint distribution over all the variables.
Bayesian Networks
Overview – Example: Bayesian Network for Liver
Disorder Diagnosis(A.. Onisko, M. Druzdzel, and H. Wasyluk, Sept. 1999)
Bayesian Network
20

Edges represent “connection” so no directed cycles are allowed

Each node is conditionally independent of its ancestors given its


parents (Morkov Property)
Example of Simple Bayesian Networks
21

A and B are marginally independent, but when C is give, they are


conditionally dependent. This is called explaining away
Examples of 3-way Bayesian Networks
22

B is given , A and C are marginal independent,

A is past
B is present
C is future
Examples of 3-way Bayesian Networks
23

Here A is given , B and C are conditional independent


For example:
A as the common cause of the two independent effects B and C
Other Examples
24
References
25

A Tutorial On Learning With Bayesian Networks, Haimonti Dutta ,


Department Of Computer And Information Science

A tutorial on Bayesian Network by Rick


Example
26
Example
27

P (B=T) P(B=F) P (E=T) P(E=F)

0.001 0.999 0.002 0.998

B E P (A=T) P(A=F)

T T 0.95 0.05

T F 0.94 0.06

F T 0.29 0.71

F F 0.001 0.999

A P (JC=T) P(JC=F) A P P(MC=F)


T 0.90 0.10 (MC=T)
T 0.70 0.30
F 0.05 0.95
F 0.01 0.99
Constructing a Bayesian Network : Step 1
28
Constructing a Bayesian Network : Step 2
29
The Resulting Bayesian Network
30
Bayesian Networks (different variable
ordering)
31
Bayesian Networks (different variable
ordering)
32
Example
33

• What is the probability that the alarm has sounded but neither
a burglary nor a earthquake has occurred , and both John and
Mary call?

P(JC, MC, A, B, E)


= P(JC  A) P (MC A) P(A   B,  E) P (B) P ( E)
= 0.90 x 0.70 x 0.001 x 0.999 x 0.998
= 0.00062

You might also like