You are on page 1of 8

Machine Learning 21BEC505

Experiment-5
Objective: Implementation of Naïve Bayes Classifier
Task #1
Code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r"E:\Jay\NIRMA\Sem6\ML\Exp5\Naive-Bayes-Classification-Data.csv")
x = df.iloc[:,:-1].values
y = df.iloc[:,2].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 42)
model = GaussianNB()
model1 = model.fit(x_train,y_train)
GaussianNB(priors=None, var_smoothing=1e-09)
y_pred = model.predict(x_test)
print(y_pred)
print('\n')
accuracy = accuracy_score(y_test, y_pred)*100
print(accuracy)
print('\n')

sns.scatterplot(x="glucose",y="bloodpressure", data=df, hue="diabetes").set(title="Full Data")


plt.show()
sns.scatterplot(x=x_test[:,0],y=x_test[:,1], hue=y_test).set(title="Testing Data")
plt.show()

Output:
Machine Learning 21BEC505

Task #2
Code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
Machine Learning 21BEC505

weather=['sunny','sunny','overcast','rainy','rainy','rainy','overcast','sunny','sunny','rainy','sunny','overcast','over
cast','rainy']
temp =['hot','hot','hot','mild','cool','cool','cool','mild','cool','mild','mild','mild','hot','mild']
play=['no','no','yes','yes','yes','no','yes','no','yes','yes','yes','yes','yes','no']
le = preprocessing.LabelEncoder()
weather_encoded = le.fit_transform(weather)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)
print("Weather Encoded : ",weather_encoded)
print('\n')
print("Temperature Encoded : ",temp_encoded)
print('\n')
print("play: ",label)
print('\n')
features = list(zip(weather_encoded,temp_encoded))
print(features)
print('\n')
model = GaussianNB()
model.fit(features,label)
predicted = model.predict([[0,2]]) # 0 : overcast , 2 : mild
print("Predicted value : ",predicted)

Output:

Task #3
Code:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

wine = datasets.load_wine()
print("Features : ",wine.feature_names)
print('\n')
print("Labels : ",wine.target_names)
print('\n')
print("Data shape : ",wine.data.shape)
Machine Learning 21BEC505

print('\n')
print("Top 5 records",wine.data[0:5],sep="\n")
print('\n')
print("0:Class_0, 1:class_1, 2:class_2 \n",wine.target)
print('\n')
X_train, X_test, y_train, y_test = train_test_split(wine.data,wine.target,test_size = 1/3,random_state = 109)
model = GaussianNB()
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy : ",accuracy*100)

Output:

Exercise:

1. Apply Naïve Bayes classifier on the Iris Flower Species Dataset.

The Iris Flower Dataset involves predicting the flower species given measurements of iris flowers. It is
a multiclass classification problem. The number of observations for each class is balanced. There are
150 observations with 4 input variables and 1 output variable. The variable names are as follows:

 Sepal length in cm
Machine Learning 21BEC505

 Sepal width in cm

 Petal length in cm

 Petal width in cm

 Class
Code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv(r"iris.data")
print("features and their labels")
print(df.iloc[:,:].values)
X = df.iloc[:,:-1].values
y = df.iloc[:,-1].values
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.25, random_state = 42)
model = GaussianNB()
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy : ",accuracy)

Output:
Machine Learning 21BEC505

2. Apply kNN on all the above three tasks of and compare the output of kNN and Naïve Bayes
classifier

Code: For task 1

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix,accuracy_score

df = pd.read_csv(r"E:\Jay\NIRMA\Sem6\ML\Exp5\Naive-Bayes-Classification-Data.csv")

X = df.iloc[:,:-1].values
y = df.iloc[:,-1].values
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 1/3, random_state = 42)
model = KNeighborsClassifier(n_neighbors=3, metric='minkowski', p=2)
Machine Learning 21BEC505

model.fit(X_train,y_train)
y_pred = model.predict(X_test)
cm= confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy for task 1 using KNN : ",accuracy)
model = GaussianNB()
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy for task 1 using Naive Bayes : ",accuracy)

Output:

Code: For task 2

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn import preprocessing

weather
=['sunny','sunny','overcast','rainy','rainy','rainy','overcast','sunny','sunny','rainy','sunny','overcast','overcast','rai
ny']
temp =['hot','hot','hot','mild','cool','cool','cool','mild','cool','mild','mild','mild','hot','mild']
play=['no','no','yes','yes','yes','no','yes','no','yes','yes','yes','yes','yes','no']
le = preprocessing.LabelEncoder()
weather_encoded = le.fit_transform(weather)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)
features = list(zip(weather_encoded,temp_encoded))
X_train, X_test, y_train, y_test = train_test_split(features,label,test_size = 1/3,
random_state = 42)
model = KNeighborsClassifier(n_neighbors=3, metric='minkowski', p=2)
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
cm= confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy for task 1 using KNN : ",accuracy)

model = GaussianNB()
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy for task 1 using Naive Bayes : ",accuracy)
Machine Learning 21BEC505

Output:

Code: For task 3

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn import preprocessing
from sklearn import datasets
wine = datasets.load_wine()
X_train, X_test, y_train, y_test = train_test_split(wine.data,wine.target,test_size = 1/3,
random_state = 109)
model = KNeighborsClassifier(n_neighbors=3, metric='minkowski', p=2)
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
cm= confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy for task 3 using KNN : ",accuracy)
model = GaussianNB()
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy for task 3 using Naive Bayes : ",accuracy)

Output:

For task 1, KNN has higher accuracy, therefore KNN will be preferred for task 1.
For task 2, both the models are showing very low accuracy, also dataset is having very less number of
data, so we can judge which model is better.
For task 3, Naïve Bayes has higher accuracy, therefore Naïve Bayes will be preferred.

Conclusion:
I learned about the Naive Bayes Classification method from this experiment. I predicted the outcomes after
applying the Naive Bayes Classification to various datasets. I likewise estimated the exactness of the model.
I also used KNN to check the accuracy of the models and compare which one is best for which dataset. The
Naive Bayes Classifier is a straightforward but efficient algorithm that can be applied to document
classification, sentiment analysis, spam filtering, and other tasks. Its effortlessness and speed settle on it a
well-known decision for some AI errands, and its precision and execution make it an important instrument for
information investigation and direction.

You might also like