You are on page 1of 4

EXPERIMENT NO: 3

Implement k-nearest neighbors classification using python

AIM: To write k-nearest neighbors classifier and test it on iris data-set in python

THEORY:
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on the Supervised Learning technique. It is an instance-based or lazy learning
algorithm, here model is not learned using training data prior and the learning
process is postponed to a time when prediction is requested on the new instance.
K-NN Algorithm
 Load the training data.
 Choose K the number of nearest neighbors to look
 Compute the test point’s distance from each training point
 Sort the distances in ascending (or descending) order
 Use the sorted distances to select the Knearest neighbors
 Use majority rule(for classification) or averaging (for regression)
Advantages of KNN
1. Easy to understand
2. No assumptions about data
3. Can be applied to both classification and regression
4. Works easily on multi-class problems
Disadvantages of KNN
1. Memory Intensive / Computationally expensive
2. Sensitive to the scale of data
3. Not work well on rare event (skewed) target variable
4. Struggle when a high number of independent variables
 Need to create two files
 File 1: Knn.py
 Second file: Expt3.py
CODE:
#knn.py file
# Import libraries
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
def euclidean_distance(x1, x2):
return np.sqrt(np.sum((x1 - x2)**2))
class KNN:
def __init__(self, k=3):
self.k = k
def fit(self, X, y):
self.X_train = X
self.y_train = y
def predict(self, X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self, x):
# Compute distances between x and all examples in the training set
distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
# Sort by distance and return indices of the first k neighbors
k_idx = np.argsort(distances)[:self.k]
# Extract the labels of the k nearest neighbor training samples
k_neighbor_labels = [self.y_train[i] for i in k_idx]
# return the most common class label
most_common = Counter(k_neighbor_labels).most_common(1)
return most_common[0][0]
#Expt3.py file
import numpy as np
from collections import Counter
# import the KNN implementation from knn.py
from knn import KNN
cmap=ListedColormap(["#000FFF", "#FFF000","#00FF00"])
# Iris data has 50 samples for each different species of Iris flower(total of 150
examples).
# For each sample we have 4 features sepal length, width and petal length and
width and a iris species
name(class/label).
# Load iris dataset
iris=datasets.load_iris()
X,y=iris.data,iris.target
# Viualize sepal length, sepal width data of iris data
print( “Plot showing sepal length, sepal width data of iris data”)
plt.figure()
plt.scatter(X[:,0],X[:,1],c=y,cmap=cmap,edgecolor='none',s=40)
plt.show()
# split data into train and test set
Xtrain, Xtest, Ytrain, Ytest= train_test_split(X,y,test_size=0.2,
random_state=1234)
clf=KNN(k=5)
clf.fit(Xtrain,Ytrain)
predictions=clf.predict(Xtest)
acc=np.sum(predictions==Ytest)/len(Ytest)
print("Accuracy of user defined KNN classifier for k=5 neighbors on iris dataset
is: %0.3f "%acc)
OUTPUT:
Plot showing sepal length, sepal width data of iris data
 Draw the output graph

You might also like