You are on page 1of 5

A Seminar Report on

Naive Bays classifier

Submitted by
Vijayendra Reddy
15CS06F
&
Himanshu Patel
15CS09F
II Sem M.Tech (CSE)
Under CS867
in partial fulfillment for the award of the degree o
MASTER OF TECHNOLOGY
in
COMPUTER SCIENCE & ENGINEERING

Department of Computer Science & Engineering


National Institute of Technology Karnataka, Surathkal.

April 2016
1. INTRODUCTION
1

Naive Bayes is a collection of classification algorithms based on Bayes Theorem. It is


not a single algorithm but a family of algorithms that all share a common principle, that every
feature being classified is independent of the value of any other feature. Naive Bayes model
is easy to build and particularly useful for very large data sets. Along with simplicity, Naive
Bayes is known to outperform even highly sophisticated classification methods. Bayes
theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
Look at the equation below:
P(cx )=

P (cx ) P(c )
P( x )

P ( c / X )=P ( x 1 / c ) P ( x 2 /c ) P ( x n /c ) P ( c )
From the above equation:
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.
A simple example best explains the application of Naive Bayes for classification[2]. It
explains the concept really well and runs through the simple maths behind it. So, let's say we
have data on 1000 pieces of fruit. The fruit being a Banana, Orange or some Other fruit and
imagine we know 3 features of each fruit, whether its long or not, sweet or not and yellow or
not, as displayed in the table below:

[Figure.1.1 Table of Fruit features and count]


2

From the table it is clear that 50% of the fruits are bananas, 30% are oranges and 20% are
other fruits. Based on our training set we can also say the following:
From 500 bananas 400 (0.8) are Long, 350 (0.7) are Sweet and 450 (0.9) are Yellow
Out of 300 oranges 0 are Long, 150 (0.5) are Sweet and 300 (1) are Yellow
From the remaining 200 fruits, 100 (0.5) are Long, 150 (0.75) are Sweet and 50 (0.25)
are Yellow
So now were given the features of a piece of fruit and we need to predict the class. If were
told that the additional fruit is Long, Sweet and Yellow, we can classify it using the following
formula and subbing in the values for each outcome, whether its a Banana, an Orange or
Other Fruit. The one with the highest probability(score) being the winner. from below result
based on the higher score 0.01875 < 0.252, we can conclude this Long, Sweet and Yellow
fruit classified as a Banana.

1.
BananaLong , Sweet ,
P( LongBanana) P(Sweet Banana) P(Banana) P(Banana)
P()=
P (Evidences)

0.8 0.7 0.9 0.5


0.2 5 2
=
P(Evidences)
P(Evidences)
OrangeLong , Sweet ,
P()=0

2.

3.

othersLong , Sweet ,
P( L ongothers) P(Sweet others) P(others) P(others)
P()=
P ( Evidences)
0. 5 0.7 5 0.25 0.2
0. 01875
=
P( Evidences)
P(Evidences)

1.1. Pros and Cons of Naive Bayes:


Pros:
1. It is easy and fast to predict class of test data set.
2. It also perform well in multi class prediction When assumption of independence
holds, a Naive Bayes classifier performs better compare to other models like logistic
regression and you need less training data.

3. It perform well in case of categorical input variables compared to numerical


variable(s). For numerical variable, normal distribution is assumed (bell curve, which
is a strong assumption).
Cons:
1. If categorical variable has a category (in test data set), which was not observed in training
data set, then model will assign a 0 (zero) probability and will be unable to make a
prediction. This is often known as Zero Frequency. To solve this, we can use the
smoothing technique. One of the simplest smoothing techniques is called Laplace
estimation.
2. Another limitation of Naive Bayes is the assumption of independent predictors. In real
life, it is almost impossible that we get a set of predictors which are completely
independent.

1.2. Applications of Naive Bayes Algorithms:

Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast.
Thus, it could be used for making predictions in real time.

Multi class Prediction: This algorithm is also well known for multi class prediction
feature. Here we can predict the probability of multiple classes of target variable.

Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers


mostly used in text classification (due to better result in multi class problems and
independence rule) have higher success rate as compared to other algorithms. As a
result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment
Analysis (in social media analysis, to identify positive and negative customer
sentiments)

Recommendation System: Naive Bayes Classifier and Collaborative


Filtering together builds a Recommendation System that uses machine learning and
data mining techniques to filter unseen information and predict whether a user would
like a given resource or not.

2. SENTIMENT ANALYSIS USING NAVE BAYES


4

References
[1].

Naive Bayes classifier https://en.wikipedia.org/wiki/Naive_Bayes_classifier

[2].

www.datasciencecentral.com/profiles/blogs/

[3]. Yaguang Wang, Wenlong Fu,Aina Sui,Yuqing Ding, Comparison of Four Text
classifiers on Movie Reviews2015 3rd International Conference on Applied
Computing and Information Technology/2nd International Conference on
Computational Science and Intelligence 2015 IEEEMohamed Hamdi, Security of
Cloud Computing, Storage, and Networking, 2012 IEEE