You are on page 1of 1

Data Analysis in python-4

'''
Now we will see the KNN classifier model
'''
#importing the necessary library of KNN
from sklearn.neighbors import KNeighborsClassifier
#nwo creat an instance of the model using K nearest Neighbors Classifer
KNN_classifier=KNeighborsClassifier(n_neighbors=5) #here K vlaue is
5,i.e nearest neighbors of 5 having
#sal less than or equal to 50000 will be considered.
KNN_classifier.fit(train_x,train_y) #fitting the values for x and y
#predicting the test values with this model
prediction=KNN_classifier.predict(test_x)
print(prediction)
#Now performance matrix check
confusion_matrix=confusion_matrix(test_y,prediction)
print('\t','predicted values')
print('original values','\n',confusion_matrix)
accuracy_score=accuracy_score(test_y,prediction)
print(accuracy_score)
print('miss-classified values: %d',(test_y!=prediction).sum())
'''
Now check the effect of K values on classifier
'''
Misclassified_sample=[]
#calculating errors for K values between 1 to 20
for i in range(1,20):
knn=KNeighborsClassifier(n_neighbors=i)
knn.fit(train_x,train_y)
pred_i=knn.predict(test_x)
Misclassified_sample.append((test_y!=pred_i).sum())
print(Misclassified_sample)
#therefor form these K values we can take K=16 for which the
misclassified value is lowest=1401
'''
So, we considered and studied two algorithms for classification problem
1. LogisticRegressiion
2. KNN
'''

You might also like