Professional Documents
Culture Documents
CODE OVERVIEW
1) Data Preprocessing
● Handling missing values by filling with the mean.
● Dropping duplicate rows.
● Standardizing features using different scalers: StandardScaler, RobustScaler, and MinMaxScaler
● Detecting outliers using box plot, kmeans cluster algorithm and RobustScaler.
● Resampling the minority class for balancing.
2) Modeling
● Splitting the balanced data into training and testing sets.
● Implementing three classifiers: Naive Bayes, KNN, and Decision Tree.
● Evaluating each model's performance using classification reports.
● Plotting confusion matrices and ROC curves for model evaluation.
Code Output
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import
train_test_split
import seaborn as sns
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler,
RobustScaler, MinMaxScaler, Binarizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.utils import resample
from sklearn.cluster import KMeans
df.describe()
minority =
creditcard_dataset[creditcard_dataset['Class'] ==
1]
majority =
creditcard_dataset[creditcard_dataset['Class'] ==
0]
minority_upsampled = resample(minority,
replace=True, n_samples=len(majority),
random_state=42)
balanced_df = pd.concat([majority,
minority_upsampled])
# KNN Classifier
knn = KNeighborsClassifier()
knn.fit(x_train, y_train)
y_pred_knn = knn.predict(x_test)
print("KNN Classifier Report:")
print(classification_report(y_test, y_pred_knn))
plot_confusion_matrix(knn, x_test, y_test,
'Confusion Matrix - KNN')
plot_roc_curve(knn, x_test, y_test, 'ROC Curve -
KNN')
Code Output
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
CONCLUSION
1. Naive Bayes
● Performs reasonably well but has slightly lower recall for fraudulent transactions.
2. KNN:
● Shows better performance in identifying both classes with higher precision and recall.
3. Decision Tree:
● Achieves perfect scores, indicating a likely overfitting issue.
These models offer different trade-offs between precision and recall for identifying fraudulent transactions. While KNN
seems balanced, Decision Tree's perfect scores might indicate overfitting and might not generalize well on new data.
Further data validation could enhance the model’s performance and generalize better to new datasets.