Exp 3 Bi 30

EXPERIMENT NO.
03
Aim: Implement and evaluate using Python
a) Classification Algorithm – Naïve Bayes
Date of Performance: Date of Submission:
THEORY
Naïve Bayes Classifier Algorithm

o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of
an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple
without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
o The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 1

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.
Python Implementation of the Naïve Bayes algorithm:

Now we will implement a Naive Bayes Algorithm using Python. So for this, we will use the
"user_data" dataset, which we have used in our other classification model. Therefore we can
easily compare the Naive Bayes model with the other models.
Steps to implement:
o Data Pre-processing step
o Fitting Naive Bayes to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.
1) Data Pre-processing step:

In this step, we will pre-process/prepare the data so that we can use it efficiently in our code.
It is similar as we did in data-pre-processing. The code for this is given below:
# Step 1: Data Pre-processing
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# Load the dataset

try:
user_data = pd.read_csv("userdata.csv") # Change the file path accordingly
except FileNotFoundError:
print("Error: File not found.")
exit()
# Check if the 'target' column exists
if 'target' not in user_data.columns:
print("Error: 'target' column not found in the dataset.")
exit()
# Split dataset into features and labels
X = user_data.drop(columns=['target']) # Features
y = user_data['target'] # Labels
# Encode categorical labels
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In the above code, we have loaded the dataset into our program using "dataset =
pd.read_csv('user_data.csv'). The loaded dataset is divided into training and test set, and then
we have scaled the feature variable.
The output for the dataset is given as:
2) Fitting Naive Bayes to the Training Set:

After the pre-processing step, now we will fit the Naive Bayes model to the Training set.
Below is the code for it:
# Step 2: Fitting Naive Bayes to the Training set

from sklearn.naive_bayes import GaussianNB
# Create a Naive Bayes classifier

classifier = GaussianNB()
# Train the classifier

classifier.fit(X_train, y_train)
In the above code, we have used the GaussianNB classifier to fit it to the training dataset. We
can also use other classifiers as per our requirement

Output:
3) Prediction of the test set result:

Now we will predict the test set result. For this, we will create a new predictor
variable y_pred, and will use the predict function to make the predictions.
# Step 3: Predicting the test result

y_pred = classifier.predict(X_test)
4) Creating Confusion Matrix:

Now we will check the accuracy of the Naive Bayes classifier using the Confusion matrix.
Below is the code for it:
# Step 4: Test accuracy of the result (Creation of Confusion matrix)

from sklearn.metrics import confusion_matrix, accuracy_score
# Calculate confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Calculate accuracy score
accuracy = accuracy_score(y_test, y_pred)
# Print confusion matrix and accuracy
print("Confusion Matrix:")
print(cm)
print("\nAccuracy:", accuracy)
Output:

As we can see in the above confusion matrix output, there are 7+3= 10 incorrect predictions,
and 65+25=90 correct predictions.
5) Visualizing the training set result:

Next we will visualize the training set result using Naïve Bayes Classifier. Below is the code
for it:
# Step 5: Visualizing the test set result

import matplotlib.pyplot as plt
import numpy as np
# Define function to plot decision regions
def plot_decision_regions(X, y, classifier, resolution=0.02):
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
cmap = plt.get_cmap('Pastel2')
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1

x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution))
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
alpha=0.8, c=[colors[idx]],
marker=markers[idx], label=cl)
# Plot decision regions (assuming only two features)
if X_test.shape[1] == 2:
plt.figure(figsize=(10, 6))
plot_decision_regions(X_test.values, y_test, classifier=classifier)

plt.title('Naive Bayes - Test set')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend(loc='upper right')
plt.show()
else:
print("Cannot visualize decision regions as the dataset has more than two features.")
Output:

In the above output we can see that the Naïve Bayes classifier has segregated the data points
with the fine boundary. It is Gaussian curve as we have used GaussianNB classifier in our
code.
CONCLUSION

Exp 3 Bi 30

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exp 3 Bi 30

Uploaded by

Copyright:

Available Formats

EXPERIMENT NO.

Date of Performance: Date of Submission:

Naïve Bayes Classifier Algorithm

Why is it called Naïve Bayes?

Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 1

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

Python Implementation of the Naïve Bayes algorithm:

1) Data Pre-processing step:

# Load the dataset

Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 2

The output for the dataset is given as:

2) Fitting Naive Bayes to the Training Set:

# Step 2: Fitting Naive Bayes to the Training set

# Create a Naive Bayes classifier

# Train the classifier

Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 3

3) Prediction of the test set result:

# Step 3: Predicting the test result

4) Creating Confusion Matrix:

# Step 4: Test accuracy of the result (Creation of Confusion matrix)

Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 4

5) Visualizing the training set result:

# Step 5: Visualizing the test set result

x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1

Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 5

Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 6

Rupali Shinde | TE-IT | Roll No- 38 | A2 | BI lab 7

You might also like