You are on page 1of 6

November, 13, 2023

Dela Cruz, Jovineil V. Dr. Estrada


CS31S2
NB_AT
First Step
I created a python file named IA.py
I ran this block of code to create “play_golf.csv” including its data set

Next I removed the code then pasted this


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('play_golf.csv') #change this to our play-golf dataset


df.head()
df.info()

sns.countplot(data=df,x='play',hue='outlook')
plt.xticks(rotation=45, ha='right');
pre_df = pd.get_dummies(df,columns=['day', 'outlook', 'temp', 'humidity', 'wind'],drop_first=True)
pre_df.head()
from sklearn.model_selection import train_test_split

X = pre_df.drop('play', axis=1)
y = pre_df['play']

X_train, X_test, y_train, y_test = train_test_split(


X, y, test_size=0.25, random_state=125
)
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train);

from sklearn.metrics import (


accuracy_score,
confusion_matrix,
ConfusionMatrixDisplay,
f1_score,
classification_report,
)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_pred, y_test)
f1 = f1_score(y_pred, y_test, average="weighted")
print("Accuracy:", accuracy)
print("F1 Score:", f1)

before I ran this code, I changed test_size to .20 for the 80-20 split

Split 80-20

Next I changed it to .50 for the 50-50 split


Split 50-50
Lastly for the 70-30 split, I set the test_size = .3

Next for the TREE same process just different code


import pandas as pd

df = pd.read_csv('play_golf.csv') #change this to our play-golf dataset


df.head()

df.info()

import seaborn as sns


import matplotlib.pyplot as plt
sns.countplot(data=df,x='play',hue='outlook')
plt.xticks(rotation=45, ha='right');

pre_df = pd.get_dummies(df,columns=['day', 'outlook', 'temp', 'humidity', 'wind'],drop_first=True)

pre_df.head()

from sklearn.model_selection import train_test_split

X = pre_df.drop('play', axis=1)
y = pre_df['play']

X_train, X_test, y_train, y_test = train_test_split(


X, y, test_size=0.3, random_state=125
)

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()

model.fit(X_train, y_train);

from sklearn.metrics import (


accuracy_score,
confusion_matrix,
ConfusionMatrixDisplay,
f1_score,
classification_report,
)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_pred, y_test)


f1 = f1_score(y_pred, y_test, average="weighted")

print("\n----------------\n")
print("Accuracy:", accuracy)
print("F1 Score:", f1)

80-20
50-50

70-30
Insights

While experimenting on the Gaussian NB with different split such as 80-20, 50-50 and 70-
30, the accuracy and F1 scores varies. I noticed that the higher the training ratio the more accurate
it becomes. While on the other hand, the 50-50 split gave us a more balanced accuracy and f1
score as its ratio is not that significant.
On the other hand, in Decision tree, if the training is higher, the accuracy and f1 scores are
the same while if the split is balanced 50-50 there is a slight difference between the accuracy and
the f1 scores.
But then again the chosen split ratio should still decided based on the different
characteristic of the datasets and the goals of the learning project.

You might also like