ML Unit No.4 Naïve Bayes Classifiers PPT Notes

Unit 4: Naïve Bayes Classifiers
Bayes Theorem
Prof.Sachin S. Patil
D.Y.Patil University Ambi,Pune
Bayes Theorem
Bayes' Theorem states that the conditional probability of an event,
based on the occurrence of another event, is equal to the likelihood
of the second event given the first event multiplied by the probability
of the first event.
Bayes Theorem
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification
problems.
• It is mainly used in text classification that includes a high-
dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
Bayes Theorem
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,

Sentimental analysis, and classifying articles.
• https://codinginfinite.com/naive-bayes-classification-numerical-example/
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain

feature is independent of the occurrence of other features.
• Such as if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple.
• Hence each feature individually contributes to identify that it is an apple without

depending on each other. .
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes Theorem
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is
used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Bayes Theorem
Bayes Theorem
Bayes Theorem
• P(B|A) is the conditional probability of event B given that event A

has already occurred.
• P(A) represents the earlier probability that event A will take place.
• P(B) represents the probability that event B will occur.

Bayes Theorem
• The probability of an event A occurring given evidence B is calculated

by multiplying the likelihood of evidence B given the occurrence of
event A by the prior probability of A and dividing the result by the
prior probability of B.
Compare Bays Theorem Vs Conditional Probability
Problem: If the weather is sunny, then the Player should play or not?
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Frequency table for the Weather Conditions:
Likelihood table weather condition:
Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes) / P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation
that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Applying Bayes'theorem:
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation
that P(Yes|Sunny)>P(No|Sunny)
Advantages of Naïve Bayes Classifier:
• Spam detection, medical diagnosis, picture recognition, and natural
language processing
• Spam detection
• Spam detection is among the machine learning techniques where the
Bayes Theorem is most frequently used. Machine learning algorithms
may precisely detect undesired emails and block them from reaching a
user's mailbox by calculating the likelihood that a message is spam
using the Bayes Theorem.
Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a
class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the

other Algorithms.
• It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between features.
Applications of Naïve Bayes Classifier:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier

is an eager learner.
• It is used in Text classification such as Spam filtering and Sentiment

analysis.
Medical Diagnosis
• In order to determine the likelihood that a patient has a specific
condition based on their symptoms and medical history, the
Bayes Theorem is also utilized in the field of healthcare. This might
aid medical professionals in prescribing the best therapies and
more accurate diagnoses.
• Image Recognition
• The Bayes Theorem is used for identifying objects in photographs.
Machine learning algorithms are good at classifying photos and
identifying objects by calculating the likelihood that an object will
appear in a photograph based on its features.
• Natural Language Processing
• In natural language processing, the Bayes Theorem is widely used
to calculate the likelihood that a certain word or phrase would be
used in a given situation.
• Programs that require to process of natural languages, like speech

recognition and machine translation, can benefit from this.
Types of Naïve Bayes Model:
Types of Naïve Bayes Classifier
Bernoulli Naïve Multinomial Naïve Gaussian Naïve

Bayes Bayes Bayes
Gaussian Naive Bayes classifier
• In Gaussian Naive Bayes, continuous values associated with each
feature are assumed to be distributed according to a Gaussian
distribution. A Gaussian distribution is also called Normal
Distribution. When plotted, it gives a bell shaped curve which is
symmetric about the mean of the feature values as shown below:
:
Gaussian Naive Bayes classifier
• The likelihood of the features is assumed to be Gaussian, hence,

conditional probability is given by:
Naïve Bayes in Scikit- learn
# training the model on training set

from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
# store the feature matrix (X) and response vector (y)

X = iris.data
y = iris.target
# splitting X and y into training and testing sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.4, random_state=1)
# training the model on training set

gnb = GaussianNB()
gnb.fit(X_train, y_train)
# making predictions on the testing set

y_pred = gnb.predict(X_test)
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test, y_pred)*100)
# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB % Import GaussianNB from naïve_bayes
classifier = GaussianNB() % Create object classifier of function GaussianNB()
classifier.fit(x_train, y_train) % GaussianNB classifier to fit it to the training dataset.
# Predicting the Test set results
y_pred = classifier.predict(x_test)
# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
Bernoulli Naïve Bayes
• In the multivariate Bernoulli event model, features are independent

booleans (binary variables) describing inputs. Like the multinomial
model, this model is popular for document classification tasks, where
binary term occurrence(i.e. a word occurs in a document or not)
features are used rather than term frequencies (i.e. frequency of a word
in the document).
https://iq.opengenus.org/bernoulli-naive-bayes/
Que . Using Bernoulli's Naïve Bayes , Find Probability of Buys computer yes or no for
the instance X =(age =youth, income =medium, student =yes, credit rating =fair) and
we need to predict its class label (yes or no)
Solution 
• P(C1)=P(buys_computer = yes) =9/14 =0.643 (since total 9 rows of yes)
• P(C2)=P(buys_computer = no) =5/14 =0.357.
• Here we have x1=age ,x2=income, x3=student , x4=credit_rating
• P(age =youth |buys_computer = yes) =2/9 =0.222
• P(age =youth | buys_computer = no) =3/5 =0.600
• P(income =medium | buys_computer = yes) =4/9 =0.444
• P(income =medium | buys_computer = no) =2/5 =0.400

Solution 
• P(student =yes | buys_computer =yes) =6/9 =0.667
• P(student =yes |buys_computer D=no) =1/5 =0.200
• P(credit_rating =fair |buys_computer =yes) =6/9 =0.667
• P(credit_rating =fair |buys_computer =no) =2/5 =0.400

Solution 
• P(X|buys_computer = yes) =
P(age =youth |buys computer = yes)*
P(income =medium |buys_computer = yes)*
P(student =yes |buys_computer = yes)*
P(credit rating =fair |buys_computer = yes)
=0.222*0.444*0.667*0.667 =0.044.
Similarly, P(X|buys_computer =no) = 0.600*0.400*0.200*0.400 = 0.019.

Solution 
• P(X|buys_computer =yes)*P(buys-computer =yes) = 0.044*0.643 = 0.028
• P(X|buys_computer =no)*P(buys_computer =no) = 0.019*0.357 = 0.007
• P(X|buys_computer =yes)*P(buys-computer =yes) >

P(X|buys_computer =no)*P(buys_computer =no)
• Therefore, the naive Bayesian classifier predicts
buys_computer = yes for instance X (age =youth, income =medium, student =yes, credit
rating =fair)
Multinomial Naïve Bayes
• Feature vectors represent the frequencies with which certain events
have been generated by a multinomial distribution. This is the event
model typically used for document classification.
• Application –
• To find number of times particular word is repeated in text document

then use Multinomial distribution
https://www.upgrad.com/blog/multinomial-naive-bayes-explained/
• Application –
• To find number of times particular word is repeated in text

document then use Multinomial distribution
• To find count of count particular word in text document
• To find frequency of particular word in text document
• To find number of occurrence of word in text document

• import numpy as np
• X = np.random.randint(8, size = (8, 100))
• y = np.array([1, 2, 3, 4, 5, 6, 7, 8])
• from sklearn.naive_bayes import MultinomialNB

• MNBclf = MultinomialNB()
• MNBclf.fit(X, y)
Gaussian Naïve Bayes
• Gaussian Naive Bayes (GNB) is a classification technique used in

Machine Learning (ML) based on the probabilistic approach and
Gaussian distribution. Gaussian Naive Bayes assumes that each
parameter (also called features or predictors) has an independent
capacity of predicting the output variable.
• Gaussian Naive Bayes is a variant of Naive Bayes that follows Gaussian
normal distribution and supports continuous data. We have explored the
idea behind Gaussian Naive Bayes along with an example.
• Naive Bayes are a group of supervised machine learning classification

algorithms based on the Bayes theorem. It is a simple classification
technique, but has high functionality. They find use when the
dimensionality of the inputs is high. Complex classification problems can
also be implemented by using Naive Bayes Classifier.
classifier = GaussianNB()
classifier.fit(x_train, y_train)
# Predicting the Test set results

y_pred = classifier.predict(x_test)
#Accuracy score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
• Good Example
• https://www.youtube.com/watch?v=kufuBE6TJew
• https://www.youtube.com/watch?v=kufuBE6TJew
• https://levelup.gitconnected.com/classification-using-gaussian-naive-
bayes-from-scratch-6b8ebe830266
Guassian Naïve Bayes using sklearn
# fitting naive bayes to the training set
classifier = GaussianNB();
classifier.fit(X_train, y_train)
# predicting test set results

y_pred = classifier.predict(X_test)
# making the confusion matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm
Thank You

ML Unit No.4 Naïve Bayes Classifiers PPT Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Unit No.4 Naïve Bayes Classifiers PPT Notes

Uploaded by

Copyright:

Available Formats

Unit 4: Naïve Bayes Classifiers

• Some popular examples of Naïve Bayes Algorithm are spam filtration,

• Naïve: It is called Naïve because it assumes that the occurrence of a certain

• Hence each feature individually contributes to identify that it is an apple without

• P(B|A) is the conditional probability of event B given that event A

• P(B) represents the probability that event B will occur.

• The probability of an event A occurring given evidence B is calculated

So as we can see from the above calculation

• It can be used for Binary as well as Multi-class Classifications.

• It performs well in Multi-class predictions as compared to the

• It is the most popular choice for text classification problems.

• It is used in medical data classification.

• It can be used in real-time predictions because Naïve Bayes Classifier

• It is used in Text classification such as Spam filtering and Sentiment

• Programs that require to process of natural languages, like speech

Bernoulli Naïve Multinomial Naïve Gaussian Naïve

Gaussian Naive Bayes classifier

• The likelihood of the features is assumed to be Gaussian, hence,

# training the model on training set

# store the feature matrix (X) and response vector (y)

# splitting X and y into training and testing sets

# training the model on training set

# making predictions on the testing set

# Making the Confusion Matrix

• In the multivariate Bernoulli event model, features are independent

• P(C2)=P(buys_computer = no) =5/14 =0.357.

• Here we have x1=age ,x2=income, x3=student , x4=credit_rating

• P(age =youth |buys_computer = yes) =2/9 =0.222

• P(age =youth | buys_computer = no) =3/5 =0.600

• P(income =medium | buys_computer = yes) =4/9 =0.444

• P(income =medium | buys_computer = no) =2/5 =0.400

• P(student =yes |buys_computer D=no) =1/5 =0.200

• P(credit_rating =fair |buys_computer =yes) =6/9 =0.667

• P(credit_rating =fair |buys_computer =no) =2/5 =0.400

P(age =youth |buys computer = yes)*

P(income =medium |buys_computer = yes)*

P(student =yes |buys_computer = yes)*

P(credit rating =fair |buys_computer = yes)

Similarly, P(X|buys_computer =no) = 0.600*0.400*0.200*0.400 = 0.019.

• P(X|buys_computer =no)*P(buys_computer =no) = 0.019*0.357 = 0.007

• P(X|buys_computer =yes)*P(buys-computer =yes) >

• Therefore, the naive Bayesian classifier predicts

• To find number of times particular word is repeated in text document

• To find number of times particular word is repeated in text

• To find count of count particular word in text document

• To find frequency of particular word in text document

• To find number of occurrence of word in text document

• from sklearn.naive_bayes import MultinomialNB

• Gaussian Naive Bayes (GNB) is a classification technique used in

• Naive Bayes are a group of supervised machine learning classification

# Predicting the Test set results

# predicting test set results

# making the confusion matrix

You might also like

Similarly, P(X|buys_computer =no) = 0.6000.4000.200*0.400 = 0.019.

• P(X|buys_computer =no)P(buys_computer =no) = 0.0190.357 = 0.007