Professional Documents
Culture Documents
needs to be maximized
Naïve Bayesian Classification : Bayesian
Theorem
• A simplified assumption: attributes are conditionally independent (i.e., no dependence relation
between attributes):
• This greatly reduces the computation cost: Only counts the class distribution.
• In order to predict the class label of X, P(XjCi)P(Ci) is evaluated for each class Ci. The
classifier predicts that the class label of tuple X is the class Ci if and only if
P(Xj|Ci)P(Ci) > P(Xj|Cj)P(Cj) for 1 <= j < =m, j != i.
In other words, the predicted class label is the class Ci for which P(XjCi)P(Ci) is the
maximum.
Naïve Bayes Classification : Example
• Class:
• C1:buys_computer = ‘yes’
• C2:buys_computer = ‘no’
• Data to be classified:
• X = (age <=30,
• Income = medium,
• Student = yes
• Credit_rating = Fair)
Naïve Bayes Classification :
Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Naïve Bayes Classification
Applications of Naïve Bayes classifier
• Text classification: Naïve Bayes classifier is among the most successful known
algorithms for learning to classify text documents. It classifies the document
where the probability of classifying the text is more. It uses the above algorithm
to check the permutation and combination of the probability of classifying a
document under a particular ‘Title’. It has various applications in document
categorization, language detection, and sentiment detection, which are very
useful for traditional retailers, e-retailors, and other businesses on judging the
sentiments of their clients on the basis of keywords in feedback forms, social
media comments, etc.
Applications of Naïve Bayes classifier
• Spam filtering: Spam filtering is the best known use of Naïve Bayesian text
classification. Presently, almost all the email providers have this as a built-in
functionality, which makes use of a Naïve Bayes classifier to identify spam email on the
basis of certain conditions and also the probability of classifying an email as ‘Spam’.
Naïve Bayesian spam sifting has turned into a mainstream mechanism to recognize
illegitimate a spam email from an honest-to-goodness email (sometimes called ‘ham’).
Users can also install separate email filtering programmes. Server-side email filters such
as DSPAM, Spam Assassin, Spam Bayes, and ASSP make use of Bayesian spam filtering
techniques, and the functionality is sometimes embedded within the mail server software
itself.
Applications of Naïve Bayes classifier
• Solution: Let us denote Malignant Tumour = MT, Positive Lab Test = PT,
Negative Lab Test = NT
• h1 = the particular tumour is of malignant type = MT in our example
• h2 = the particular tumour is not malignant type = !MT in our example
• P(MT) = 0.005 P(!MT) = 0.995
• P(PT|MT) = 0.98 P(PT|!MT) = 0.02
• P(NT|!MT) = 0.97 P(NT|MT) = 0.03
Bayes’ Theorem: Concept Learning
(prior, posterior, and likelihood)
So, for the new patient, if the laboratory test report shows positive result, let us see if we should
declare this as the malignancy case or not:
As P(h |PT) is higher than P(h |PT), it is clear that the hypothesis h has more probability of being
2 1 2
• The two important information points we get from this network graph are
used for the determining the joint probability of the variables.
• First, the arcs assert that the node variables are conditionally independent of its non-
descendants in the network given its immediate predecessors in the network. If two
variables A and B are connected through a directed path, then B is called the
descendent of A.
• Second, the conditional probability table for each variable provides the probability
distribution of that variable given the values of its immediate predecessors
Bayesian Belief Network
• The Bayesian Belief network can represent much more complex scenarios with
dependence and independence concepts. There are three types of connections possible in
a Bayesian Belief network.
1. Diverging Connection
2. Serial Connection
3. Converging Connection
⮚ Thus, Bayesian Belief network provides the mechanism to handle more practical scenario
of considering the conditional independence of a subset of variables while considering
the joint probability distribution of the remaining variables given the observation.
Bayesian Belief Network : types of connections
• Bayesian Belief Networks specify joint conditional probability distributions. They are also
known as Belief Networks, Bayesian Networks, or Probabilistic Networks.
• A Belief Network allows class conditional independencies to be defined between subsets of
variables.
• It provides a graphical model of causal relationship on which learning can be performed.
• We can use a trained Bayesian Network for classification.
• There are two components that define a Bayesian Belief Network −
• Directed acyclic graph
• A set of conditional probability tables
Use of the Bayesian Belief network
in machine learning
• The Bayesian network creates a complete model for the variables and their
relationships and thus can be used to answer probabilistic queries about them. A
common use of the network is to find out the updated knowledge about the state
of a subset of variables, while the state of the other subset (known as the evidence
variables) is observed. This concept, often known as probabilistic inference
process of computing the posterior distribution of variables, given some
evidences, provides a universal sufficient statistic for applications related to
detections. Thus if one wants to choose the values for a subset of variables in
order to minimize some expected loss functions or decision errors, then this
method is quiet effective.
Use of the Bayesian Belief network
in machine learning
• In other words, the Bayesian network is a mechanism for automatically applying
Bayes’ theorem to complex problems. Bayesian networks are used for modelling
beliefs in domains like computational biology and bioinformatics such as protein
structure and gene regulatory networks, medicines, forensics, document
classification, information retrieval, image processing, decision support systems,
sports betting and gaming, property market analysis and various other fields.