You are on page 1of 2

ML Activity 3

Participating Students:
BETB116 Sandhya Awari
BETB120 Harinakshi Kumbhare

Code:

import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer


from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn import svm

from sklearn.model_selection import GridSearchCV

##Step1: Load Dataset

dataframe = pd.read_csv("spam.csv")

print(dataframe.describe())

##Step2: Split in to Training and Test Data

x = dataframe["EmailText"]

y = dataframe["Label"]

x_train,y_train = x[0:4457],y[0:4457]

x_test,y_test = x[4457:],y[4457:]

##Step3: Extract Features

cv = CountVectorizer()

features = cv.fit_transform(x_train)

##Step4: Build a model

tuned_parameters = {'kernel': ['rbf','linear'], 'gamma': [1e-3, 1e-4],

'C': [1, 10, 100, 1000]}


model = GridSearchCV(svm.SVC(), tuned_parameters)

model.fit(features,y_train)

print(model.best_params_)

#Step5: Test Accuracy

print(model.score(cv.transform(x_test),y_test))

Output:

Label EmailText
count 5572 5572
unique 2 5169
top ham Sorry, I'll call later
freq 4825 30
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_split.p
y:1978: FutureWarning: The default value of cv will change from 3 to 5 in v
ersion 0.22. Specify it explicitly to silence this warning.
warnings.warn(CV_WARNING, FutureWarning)
{'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}
0.9865470852017937

You might also like