Professional Documents
Culture Documents
The naive Bayes model is a prediction method based on Bayes theorem. Given an
existing set of data and a new set of conditions that we want to predict the outcome for, we
calculate the posterior probability for each outcome and select the highest one.
To make the calculation process realistically possible this model is based on the
assumption that all of the random variables in the data set are conditionally independent
given the. (This is the naive part of the naive Bayes model.)
To get an intuitive feeling for conditional independence consider the following example:
take three boolean-valued (‘occurred’ or ‘did not occur’) random variables, Thunder, Lightning,
and Rain. We could say that Thunder is independent of R ain given Lightning because lightning
directly causes thunder. Once we know that lightning has occurred, adding in the fact that it is
also raining doesn’t change how likely it is that there will be thunder.
It is also important to note that if X is conditionally independent of Y given Z, this does
not imply that X and Y are independent in general and vice versa. For example, Thunder may
be independent of Rain given Lightning, but T hunder and Rain are obviously going to be
positively correlated because they tend to happen together.
Example - Discrete
Given the following data set (based on the data set for 1st/Major Rule)
Sunny 2 3
Cloudy 4 0
Rainy 3 2
High 2 2
Mild 4 2
Cool 3 1
Humidity (XH) Yes No
High 3 4
Normal 6 1
True 6 2
False 3 3
Suppose we have a new input, XO = S, XT = C, XH = H, XW = T, and we want to predict the
outcome, Z. As stated previously, we do this by calculating the posterior probability of both
options.
Note: “Posterior probability” is just the word for the conditional probability in the context of
Bayes’ theorem. The reasoning behind the terminology is unimportant here, but anyone who is
curious can find an explanation here.
If the following statement is true, we would predict the outcome is yes, otherwise we would
predict the outcome is no.
The denominator is positive because it is a probability, so since all we’re concerned about is
which side is bigger we can multiply it out:
Finally, we apply our assumption that the random variables are conditionally independent to get
the following (for space reasons only the Z=Y part is shown here, but keep in mind that this is
happening on both sides):
All of these values can be directly calculated from our data. Doing so, we get the answer:
In conclusion: given this input, the posterior probability that Z=N is greater than the posterior
probability that Z=Y, so we would predict the outcome Z=N.
Temperature
Yes No
83 85
70 80
68 65
64 72
69 71
75
72
81
75
Let’s try to start from the input XO = S, XT = 67, XH = H, XW = T and try to proceed as we did
when all the attributes were discrete.
Starting from the following:
We can get all the way the end of the derivation, but there we reach a problem (note that as
before, for space reasons only the Z=Y part is shown):
Not only do we have no cases where XT = 67, but XT is also continuous, so P(XT=x|Z=Y) = 0 for
all x. To deal with this we use the PDF instead.
But what is the PDF here? Because we don’t have any more information about the
temperature’s distribution we’ll assume that it’s a normal distribution. Calculating the means and
variances from the above data set, we find the following distributions.