You are on page 1of 4

Naive Bayes Model

The naive Bayes model is a prediction method based on Bayes theorem. Given an
existing set of data and a new set of conditions that we want to predict the outcome for, we
calculate the posterior probability for each outcome and select the highest one.
To make the calculation process realistically possible this model is based on the
assumption that all of the random variables in the data set are ​conditionally independent
given the. (This is the ​naive​ part of the ​naive​ Bayes model.)

Definition of conditional probability:


Given three random variables, X, Y, and Z, X is said to be conditionally independent of Y
given Z if and only if the following holds:

To get an intuitive feeling for conditional independence consider the following example:
take three boolean-valued (‘occurred’ or ‘did not occur’) random variables, ​Thunder​, ​Lightning​,
and ​Rain.​ We could say that ​Thunder​ is independent of R ​ ain​ given ​Lightning​ because lightning
directly causes thunder. Once we know that lightning has occurred, adding in the fact that it is
also raining doesn’t change how likely it is that there will be thunder.
It is also important to note that if X is conditionally independent of Y given Z, this does
not imply that X and Y are independent in general and vice versa. For example, ​Thunder​ may
be independent of ​Rain​ given ​Lightning​, but T ​ hunder​ and ​Rain​ are obviously going to be
positively correlated because they tend to happen together.

Example - Discrete
Given the following data set (based on the data set for 1st/Major Rule)

Outlook (X​O​) Yes No

S​unny 2 3

C​loudy 4 0

R​ainy 3 2

Temp (X​T​) Yes No

H​igh 2 2

M​ild 4 2

C​ool 3 1
Humidity (X​H​) Yes No

H​igh 3 4

N​ormal 6 1

Windy (X​W​) Yes No

T​rue 6 2

F​alse 3 3

Suppose we have a new input, X​O​ = S, X​T​ = C, X​H​ = H, X​W​ = T, and we want to predict the
outcome, Z. As stated previously, we do this by calculating the posterior probability of both
options.

Note: ​“Posterior probability” is just the word for the conditional probability in the context of
Bayes’ theorem. The reasoning behind the terminology is unimportant here, but anyone who is
curious can find an explanation ​here.

If the following statement is true, we would predict the outcome is yes, otherwise we would
predict the outcome is no.

Applying Bayes theorem we get the following:

The denominator is positive because it is a probability, so since all we’re concerned about is
which side is bigger we can multiply it out:
Finally, we apply our assumption that the random variables are conditionally independent to get
the following (for space reasons only the Z=Y part is shown here, but keep in mind that this is
happening on both sides):

All of these values can be directly calculated from our data. Doing so, we get the answer:

In conclusion: given this input, the posterior probability that Z=N is greater than the posterior
probability that Z=Y, so we would predict the outcome Z=N.

Example - Missing Input


Often in the real world we have to work with incomplete data, so what do we do when we are
missing some attribute? The answer is simple: ignore it. For example, let’s take the input from
the previous example and pretend it didn’t have the outlook information. We’d have X​T​ = C, X​H​ =
H, X​W​ = T, and we’d use the following inequality:

Following the same procedure, we get the following answer:

In this case, we have the same prediction as before: Z=N

Example - Continuous attribute values


Let’s say that our temperature attribute was given as actual temperatures rather than
categories.

Temperature
Yes No

83 85
70 80

68 65

64 72

69 71

75

72

81

75

Let’s try to start from the input X​O​ = S, X​T​ = 67, X​H​ = H, X​W​ = T and try to proceed as we did
when all the attributes were discrete.
Starting from the following:

We can get all the way the end of the derivation, but there we reach a problem (note that as
before, for space reasons only the Z=Y part is shown):

Not only do we have no cases where X​T​ = 67, but X​T​ is also continuous, so P(X​T​=x|Z=Y) = 0 for
all x. To deal with this we use the PDF instead.

But what is the PDF here? Because we don’t have any more information about the
temperature’s distribution we’ll assume that it’s a normal distribution. Calculating the means and
variances from the above data set, we find the following distributions.

Plugging these values into the formula we get the following:

So once again we come to the conclusion that we should predict Z=N

You might also like