Professional Documents
Culture Documents
Dela Cruz - NB - AT
Dela Cruz - NB - AT
sns.countplot(data=df,x='play',hue='outlook')
plt.xticks(rotation=45, ha='right');
pre_df = pd.get_dummies(df,columns=['day', 'outlook', 'temp', 'humidity', 'wind'],drop_first=True)
pre_df.head()
from sklearn.model_selection import train_test_split
X = pre_df.drop('play', axis=1)
y = pre_df['play']
model = GaussianNB()
model.fit(X_train, y_train);
before I ran this code, I changed test_size to .20 for the 80-20 split
Split 80-20
df.info()
pre_df.head()
X = pre_df.drop('play', axis=1)
y = pre_df['play']
model = DecisionTreeClassifier()
model.fit(X_train, y_train);
y_pred = model.predict(X_test)
print("\n----------------\n")
print("Accuracy:", accuracy)
print("F1 Score:", f1)
80-20
50-50
70-30
Insights
While experimenting on the Gaussian NB with different split such as 80-20, 50-50 and 70-
30, the accuracy and F1 scores varies. I noticed that the higher the training ratio the more accurate
it becomes. While on the other hand, the 50-50 split gave us a more balanced accuracy and f1
score as its ratio is not that significant.
On the other hand, in Decision tree, if the training is higher, the accuracy and f1 scores are
the same while if the split is balanced 50-50 there is a slight difference between the accuracy and
the f1 scores.
But then again the chosen split ratio should still decided based on the different
characteristic of the datasets and the goals of the learning project.