0% found this document useful (0 votes)
56 views4 pages

Email Spam Classification

Uploaded by

tryhackkme123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views4 pages

Email Spam Classification

Uploaded by

tryhackkme123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

email-spam-classification

November 6, 2024

[14]: import pandas as pd

[15]: # Read the Dataset


dataframe = pd.read_csv("[Link]")

[40]: # Create a Copy of the Dataset to work upon


df = dataframe
df

[40]: the to ect and for of a you hou in … connevey jay \


0 0 0 1 0 0 0 2 0 0 0 … 0 0
1 8 13 24 6 6 2 102 1 27 18 … 0 0
2 0 0 1 0 0 0 8 0 0 4 … 0 0
3 0 5 22 0 5 1 51 2 10 1 … 0 0
4 7 6 17 1 5 2 57 0 9 3 … 0 0
… … .. … … … .. … … … .. … … …
5167 2 2 2 3 0 0 32 0 0 5 … 0 0
5168 35 27 11 2 6 5 151 4 3 23 … 0 0
5169 0 0 1 1 0 0 11 0 0 1 … 0 0
5170 2 7 1 0 2 1 28 2 0 8 … 0 0
5171 22 24 5 1 6 5 148 8 2 23 … 0 0

valued lay infrastructure military allowing ff dry Prediction


0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0
… … … … … … .. … …
5167 0 0 0 0 0 0 0 0
5168 0 0 0 0 0 1 0 0
5169 0 0 0 0 0 0 0 1
5170 0 0 0 0 0 1 0 1
5171 0 0 0 0 0 0 0 0

[5172 rows x 3001 columns]

1
[17]: [Link]()

<class '[Link]'>
RangeIndex: 5172 entries, 0 to 5171
Columns: 3002 entries, Email No. to Prediction
dtypes: int64(3001), object(1)
memory usage: 118.5+ MB

[18]: # Check for Missing Values


[Link]().sum()

[18]: Email No. 0


the 0
to 0
ect 0
and 0
..
military 0
allowing 0
ff 0
dry 0
Prediction 0
Length: 3002, dtype: int64

[19]: # The 'Email No.' Column is Not Needed, so we drop it


[Link](["Email No."], axis = 1, inplace = True)

[20]: # Set the Independents(x) and the Target Variable(y)


x = [Link](["Prediction"],axis =1)
y = df["Prediction"]

[21]: # Split the Data for Training and Testing


from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.
↪2,random_state=1)

[22]: # KNN Classification


# We use Elbow Method/Continous Iteration to Select the Best Value of K

from [Link] import KNeighborsClassifier

[23]: from sklearn import metrics


accuracy_values = []

for i in range(1,11):
model = KNeighborsClassifier(n_neighbors = i)
[Link](x_train,y_train)

2
y_pred = [Link](x_test)
accuracy = metrics.accuracy_score(y_test,y_pred)
accuracy_values.append(accuracy)

[24]: knn = KNeighborsClassifier(n_neighbors = 7)

[25]: [Link](x_train,y_train)
y_pred = [Link](x_test)

[26]: print(metrics.classification_report(y_test,y_pred))

precision recall f1-score support

0 0.93 0.88 0.91 719


1 0.76 0.84 0.80 316

accuracy 0.87 1035


macro avg 0.84 0.86 0.85 1035
weighted avg 0.88 0.87 0.87 1035

[41]: confusion_matrix_knn = metrics.confusion_matrix(y_test,y_pred)


confusion_matrix_knn

[41]: array([[700, 19],


[189, 127]], dtype=int64)

[29]: # Can Find the Metrics Manually as well :


# Recall = TP/TP+FN
# Precision = TP/TP+FP
# F1 Score = (Recall*Precision)/(Recall+Precision)
# Accuracy = TP+TN/TP+FP+FN+TN

[30]: # SVM
from [Link] import SVC

[31]: svm_model = SVC()

[32]: svm_model.fit(x_train,y_train)
y_pred = svm_model.predict(x_test)

[33]: print(metrics.classification_report(y_test,y_pred))

precision recall f1-score support

0 0.79 0.97 0.87 719


1 0.87 0.40 0.55 316

3
accuracy 0.80 1035
macro avg 0.83 0.69 0.71 1035
weighted avg 0.81 0.80 0.77 1035

[42]: confusion_matrix_svm = metrics.confusion_matrix(y_test,y_pred)


confusion_matrix_svm

[42]: array([[700, 19],


[189, 127]], dtype=int64)

[35]: # Can Find the Metrics Manually as well :


# Recall = TP/TP+FN
# Precision = TP/TP+FP
# F1 Score = (Recall*Precision)/(Recall+Precision)
# Accuracy = TP+TN/TP+FP+FN+TN

[ ]:

You might also like