You are on page 1of 6

Ex.

No: 4
NAIVE BAYESIAN CLASSIFIER FOR A SAMPLE TRAINING
DATE: DATA SET STORED AS A CSV FILE

AIM:
The aim of this experiment is to use Naive Bayes classifier and its application
for classification tasks. and implement the Naive Bayes algorithm using a sample
training dataset stored in a CSV file.

HARDWARE SPECIFICATION:

Processor : Apple M1
Installed RAM : 8.00 GB

SOFTWARE SPECIFICATION:

PYTHON IDLE( 3.12.1 64 BIT)

LIBRARIES:
NumPy
Pandas
Sklearn

ALGORITHM:

1. Import Libraries and Data:


Below is the structured approach encapsulating the steps followed in
the given program.

2. Library Imports:
Begin by importing necessary libraries like numpy, matplotlib,
pandas, and scikit-learn.

3.Data Loading:
Load the dataset named "Social_Network_Ads.csv" into a pandas
DataFrame.
SUDARSAN R
21EE113
4.Data Preparation:
Separate the independent variables (features) as 'x' and the
dependent variable as 'y'.

5.Categorical Data Encoding:


If categorical data exists, encode it. Here, encode the 'Gender'
column using LabelEncoder.

6.Train-Test Split:
Split the dataset into training and testing sets using the
train_test_split function from scikit-learn.

7.Feature Scaling:
Normalize the features using StandardScaler from scikit-learn to
ensure fair comparison.

8.Model Training:
Initialize a Gaussian Naive Bayes classifier and train it using the
training data.

9.Prediction:
Make predictions on the test set using the trained model.

10.Actual vs. Predicted Values:


Print the actual and predicted values of the dependent variable for
comparison.

11.Model Evaluation:
I. Confusion Matrix:
Compute the confusion matrix to assess the model's
performance.
II. Accuracy Score:
Calculate the accuracy score to gauge the model's
effectiveness.

SUDARSAN R
21EE113
12.Result Display:
Print the confusion matrix and accuracy score to understand the
model's performance.

PROGRAM:

#import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#importing the datasets


dataset = pd.read_csv("Social_Network_Ads.csv")
x = dataset.iloc[:, [1, 4]].values
y = dataset.iloc[:, -1].values
print("X values:")
print(x)

# Encoding categorical data (the Gender column)


from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
x[:, 0] = le.fit_transform(x[:, 0])

# Train-test splitting
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20,
random_state=0)

# Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

# Training the naive bayes model on the training set


from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()

SUDARSAN R
21EE113
classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)

# printing values
print("\nActual y_test values:")
print(y_test)
print("\nPredicted y_pred values:")
print(y_pred)

# Calculating and printing confusion matrix and accuracy score


from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test, y_pred)
print("\nConfusion matrix:")
print(cm)
print("\nAccuracy score:")
print(ac)

OUTPUT:

Python 3.12.1 (tags/v3.12.1:2305ca5, Dec 7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)]
on win32
Type "help", "copyright", "credits" or "license()" for more information.
= RESTART: C:/Users/DELL/OneDrive/Documents/agni/program 4/ML program 04.py
X values:

Actual y_test values:


[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0
0010000100101100011001001010100001001
0 0 0 0 1 1]

Predicted y_pred values:


[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0
0010000100101100011001001010100001001
0 0 0 0 1 1]

Confusion matrix:
[[58 0]
[ 0 22]]

Accuracy score:
1.0

SUDARSAN R
21EE113
INFERENCE:

The program performs classification using a Gaussian Naive Bayes model


on a dataset containing social network ads data. It preprocesses the data by
encoding categorical variables, splitting it into training and testing sets, and
scaling features. After training the model, it predicts the target variable for the test
set and evaluates its performance using a confusion matrix and accuracy score.
The achieved accuracy indicates the model's effectiveness in predicting whether
a user will click on a social network ad based on certain features.

SUDARSAN R
21EE113
RUBRICS:

RESULT:

Thus, The program successfully built and trained a Gaussian Naive


Bayes model to classify user clicks on social network ads. It achieved an
accuracy score of 1.0 on the test set, indicating perfect prediction for this
specific data split.

SUDARSAN R
21EE113

You might also like