You are on page 1of 4

naive bayes classification problem

Himanshu Saini

November 2022

Naive Bayes is a statistical classification technique based on the Bayes Theorem


and one of the simplest Supervised Learning algorithms. The Naive Bayes
classifier is a quick, accurate, and trustworthy method, especially on large
datasets.

A Naive Bayes classifier assumes that the effect of a particular feature in a class is
independent of other features and is based on Bayes’ theorem. Bayes’ theorem is a
mathematical equation used in probability and statistics to calculate conditional
probability. In other words, you can use this theorem to calculate the probability
of an event based on its association with another event.
The simple formula of the Bayes theorem is:

Where P(A) and P(B) are two independent events and (B) is not equal to zero.

• P(A | B) : is the conditional probability of an event A occurring given


that B is true.
• P( B | A): is the conditional probability of an event B occurring given
that A is true.
• P(A) and P(B) : are the probabilities of A and B occurring
independently of one another (the marginal probability).
What is Naive Bayes Classification?
The Naive Bayes classification algorithm is a probabilistic classifier, and it belongs
to Supervised Learning. It is based on probability models that incorporate strong
independence assumptions. The independence assumptions often do not have an
impact on reality. Therefore they are considered naive.

When to use Naive Bayes Classifier?

Naive Bayes classifiers tend to perform especially well in any of the following
situations:

• When the naive assumptions actually match the data.


• For very well-separated categories, when model complexity is less
important.
• And for very high-dimensional data, when model complexity is again
less important.

Let us now calculate the event occurring/classifying using the Naive Bayes
classification method by taking a simple example. Given an example of weather
conditions and playing sports, we need to calculate the probability of playing
sports and not playing sports. And classify whether players will play or not
depending on the weather condition. The sample data set is as follows:
You can use the frequency and likelihood tables to simplify prior and posterior
probability calculations. The Frequency table shows how often labels appear for
each feature. These tables will assist us in calculating the prior and posterior
probabilities.

We have the formula for the Naive Bayes classification which is

P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast).

Now first let us calculate the prior probabilities:

P(Overcast) = 4/14 = 0.29


P(Yes)= 9/14 = 0.64

The next step is to find the posterior probability, which can be easily be calculated
by:

P(Overcast | Yes) = 4/9 = 0.44


Once we have the posterior and prior probabilities, we can put them back in our
main formula to calculate the probability of playing when the weather is overcast.

Probability of Not Playing:

Similarly, we can calculate the probability of not playing any sports when overcast
weather.

First, let us calculate the prior probabilities.

P(Overcast) = 4/14 = 0.29


P(No)= 5/14 = 0.36

The next step is to calculate the posterior probability, which is:

P(Overcast | No) = 0/9 = 0


By putting these probabilities in the main formula we get:

P(No | Overcast) = 0 * 0.36 / 0.29 = 0

We can see that the probability of a Playing class is higher, so if the weather
is Overcast players will play sports.

You might also like