Professional Documents
Culture Documents
Lecture 06 Bayesian Networks 07112022 011127pm
Lecture 06 Bayesian Networks 07112022 011127pm
LECTURER:
Humera Farooq, Ph.D.
Computer Sciences Department,
Bahria University (Karachi Campus)
Outline
2
• Introduction
• Bayesian Interpretation
• Bayes Theorem
• Naïve Bayes
• Bayesian Networks
• Example
• Conclusion
Basics of Bayesian Learning
Goal: find the best hypothesis from some space H of
hypotheses, given the observed data D.
Define best to be: most probable hypothesis in H
In order to do that, we need to assume a probability distribution
over the class H.
In addition, we need to know something about the relation
between the data observed and the hypotheses (E.g., a coin
problem).
h is a class variable and D are the examples (features)
Basics of Bayesian Learning
P(h) - the prior probability of a hypothesis h (class variable)
Reflects background knowledge; before data is observed. If no
information - uniform distribution.
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Types of Naïve Bayes
• Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian
distribution.
• Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier,
but the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.
Bayes Classifiers Working
Let’s assume Bayes Theorem as
we split evidence into the independent parts. Now, if any two events A
(class variable) and B (feature vector) are independent, then,
P (A, B) = P (A) P (B)
Hence:
– MAP rule
Naïve Bayes algorithm is very easy to implement for applications involving textual
information data (e.g., sentiment analysis, news article classification, spam filtering).
It performs well even when the independence between features assumption does not hold.
A is past
B is present
C is future
Examples of 3-way Bayesian Networks
23
B E P (A=T) P(A=F)
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
• What is the probability that the alarm has sounded but neither
a burglary nor a earthquake has occurred , and both John and
Mary call?