Naive Bayes Classifier Algorithm & KNN Algorithm
Naive Bayes Classifier Algorithm & KNN Algorithm
Introduction
○ The formula for Bayes'Bayes' theorem is also known as Bayes' Rule or Bayes'
law, which is used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
○ theorem is given as:
Where,
P(B|A) is Likelihood probability: Probability of the evidence given that the probability
of a hypothesis is true.
Naive Bayes Classifier
Understanding Naive Bayes Classifier
● Based on the Bayes theorem, the Naive Bayes Classifier gives the
conditional probability of an event A given event B.
● Let us use the following demo to understand the concept of a
Naive Bayes classifier:
● Shopping Example
● Problem statement: To predict whether a person will purchase a
product on a specific combination of day, discount, and free
delivery using a Naive Bayes classifier.
For Bayes theorem, let the event ‘buy’ be A and the independent variables
(discount, free delivery, day) be B.
Considering a combination of the following factors
The likelihood tables can be used to calculate whether a customer will purchase a product
on a specific combination of the day when there is a discount and whether there is free
delivery. Consider a combination of the following factors where B equals:
● Day = Holiday
● Discount = Yes
● Free Delivery = Yes
Let us find the probability of them not purchasing based on the conditions above.
A = No Purchase
From the two calculations above, we find that:
Result: As 84.71 percent is greater than 15.29 percent, we can conclude that an average
customer will buy on holiday with a discount and free delivery.
Advantages
● Less complex: Compared to other classifiers, Naïve Bayes is considered a
simpler classifier since the parameters are easier to estimate. As a result, it’s
one of the first algorithms learned within data science and machine learning
courses.
● Scales well: Compared to logistic regression, Naïve Bayes is considered a
fast and efficient classifier that is fairly accurate when the conditional
independence assumption holds. It also has low storage requirements.
● Can handle high-dimensional data: Use cases, such document classification,
can have a high number of dimensions, which can be difficult for other
classifiers to manage.
Disadvantages:
● Assumes that all the features are independent, which is highly unlikely in practical
scenarios.
● Unsuitable for numerical data.
● The number of features must be equal to the number of attributes in the data for the
algorithm to make correct predictions.
● Encounters ‘Zero Frequency’ problem: If a categorical variable has a category in the test
dataset that wasn’t included in the training dataset, the model will assign it a 0 probability
and will be unable to make a prediction. This problem can be resolved using smoothing
techniques which are out of the scope of this article.
● Computationally expensive when used to classify a large number of items.
Applications of Naïve Bayes Classifier:
There are a lot of real-life applications of the Naive Bayes classifier, some of which are mentioned below:
● Real-time prediction — It is a fast and eager machine learning classifier, so it is used for making
predictions in real-time.
● Multi-class prediction — It can predict the probability of multiple classes of the target variable.
● Text Classification/ Spam Filtering / Sentiment Analysis — They are mostly used in text
classification problems because of its multi-class problems and the independence rule. They are
used for identifying spam emails and also to identify negative and positive customer sentiments on
social platforms.
● Recommendation Systems — A Recommendation system is built by combining Naive Bayes
classifier and Collaborating Filtering. It filters unseen information and predicts whether the user
would like a given resource or not using machine learning and data mining techniques.
K-Nearest Neighbors Algorithm
K-Nearest Neighbors Algorithm Introduction
● K Nearest Neighbour is a simple algorithm that stores all the available cases and
classifies the new data or case based on a similarity measure.
● It is mostly used to classifies a data point based on how its neighbours are
classified.
● K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories
● ‘k’ in KNN is a parameter that refers to the number of nearest neighbours to include
in the majority of the voting process.
K-Nearest Neighbors Algorithm Introduction
● K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
● K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
● It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
● KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will
lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can
easily identify the category or class of a particular dataset. Consider the below diagram:
How shall I choose the value of k in KNN Algorithm?
1. There is no structured method to find the best value for “K”. We need to find out with various
values by trial and error and assuming that training data is unknown.
2. Choosing smaller values for K can be noisy and will have a higher influence on the result.
3. Larger values of K will have smoother decision boundaries which mean lower variance but
increased bias. Also, computationally expensive.
4. Set K as an odd number, so we have an extra point to settle tie-breakers in extreme cases.
5. In general, practice, choosing the value of k is k = sqrt(N) where N stands for the number of
samples in your training dataset.
6. Plot the error rate vs. K graph :
7. Another way to choose K is though cross-validation
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
KNN algorithm helps us identify the nearest points or the groups for a query point.
But to determine the closest groups or the nearest points for a query point we need some metric.
For this purpose, we use below distance metrics:
● Euclidean Distance
● Manhattan Distance
● Minkowski Distance
Euclidean Distance
This is nothing but the cartesian distance between the two points which are in the plane/hyperplane.
Euclidean distance can also be visualized as the length of the straight line that joins the two points which are
into consideration.
This metric helps us calculate the net displacement done between the two states of an object.
Manhattan Distance
This distance metric is generally used when we are interested in the total distance traveled by the object instead
of the displacement. This metric is calculated by summing the absolute difference between the coordinates of
the points in n-dimensions.
Minkowski Distance
We can say that the Euclidean, as well as the Manhattan distance, are special cases of the Minkowski distance
From the formula above we can say that when p = 2 then it is the same as the formula for the Euclidean distance
and when p = 1 then we obtain the formula for the Manhattan distance.
Example. Age vs loan.
Example. Age vs loan.
We need to predict Andrew default status (Yes or No).
Calculate Euclidean distance for all the data points.
With K=5, there are two Default=N and three Default=Y out of five closest neighbors. We can say
default status for Andrew is ‘Y’ based on the major similarity of 3 points out of 5.
Applications of the KNN Algorithm
● Data Preprocessing – While dealing with any Machine Learning problem we first perform the EDA
part in which if we find that the data contains missing values then there are multiple imputation
methods are available as well. One of such method is KNN Imputer which is quite effective ad
generally used for sophisticated imputation methodologies.
● Pattern Recognition – KNN algorithms work very well if you have trained a KNN algorithm using
the MNIST dataset and then performed the evaluation process then you must have come across the
fact that the accuracy is too high.
● Recommendation Engines – The main task which is performed by a KNN algorithm is to assign a
new query point to a pre-existed group that has been created using a huge corpus of datasets. This
is exactly what is required in the recommender systems to assign each user to a particular group
and then provide them recommendations based on that group’s preferences.
Advantages of the KNN Algorithm
● Does not scale – As we have heard about this that the KNN algorithm is also considered a Lazy
Algorithm. The main significance of this term is that this takes lots of computing power as well as
data storage. This makes this algorithm both time-consuming and resource exhausting.
● Curse of Dimensionality – There is a term known as the peaking phenomenon according to this
the KNN algorithm is affected by the curse of dimensionality which implies the algorithm faces a
hard time classifying the data points properly when the dimensionality is too high.
● Prone to Overfitting – As the algorithm is affected due to the curse of dimensionality it is prone
to the problem of overfitting as well. Hence generally feature selection as well as dimensionality
reduction techniques are applied to deal with this problem.