You are on page 1of 5

5/10/22, 2:58 PM Lab7.

ipynb - Colaboratory

Implementing K Nearest Naighbour for a dataset.

Importing Libraries and Dataset: -

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from google.colab import files

uploaded = files.upload()

Choose Files Iris.csv


Iris.csv(text/csv) - 5107 bytes, last modified: 3/17/2022 - 100% done
Saving Iris.csv to Iris.csv

Creating Data frame: -

df=pd.read_csv('Iris.csv')

Printing first 10 values: -

df.head(10)

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

5 6 5.4 3.9 1.7 0.4 Iris-setosa

6 7 4.6 3.4 1.4 0.3 Iris-setosa

7 8 5.0 3.4 1.5 0.2 Iris-setosa

8 9 4.4 2.9 1.4 0.2 Iris-setosa

9 10 4.9 3.1 1.5 0.1 Iris-setosa

Printing the all information of the dataset: -

df.info()

https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 1/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 150 entries, 0 to 149


Data columns (total 6 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Id 150 non-null int64

1 SepalLengthCm 150 non-null float64

2 SepalWidthCm 150 non-null float64

3 PetalLengthCm 150 non-null float64

4 PetalWidthCm 150 non-null float64

5 Species 150 non-null object

dtypes: float64(4), int64(1), object(1)

memory usage: 7.2+ KB

Checking is there exists any null values in the dataset or not: -

df[df.isnull().any(axis=1)].head()

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

Creating independent variable: -

X=df.iloc[:,[1,2,3,4]].values

Creating dependent variable: -

Y=df.iloc[:,5]

Splitting the dataset: -

from sklearn.model_selection import train_test_split 

train_X,test_X,train_Y,test_Y = train_test_split(X, Y, test_size=0.3, random_state=0)

Standardizing the dataset: -

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
train_X = sc.fit_transform(train_X)
test_X = sc.transform(test_X)

Finding the optimised value of K: -

import math
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 2/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory

n=len(df.index)
li=list()
li2=list()
for i in range(1,int(pow(n,1/2))):
  kclass = KNeighborsClassifier(n_neighbors = i, metric = 'minkowski', p = 2)
  kclass.fit(train_X, train_Y)
  y_pred = kclass.predict(test_X)
  ac = accuracy_score(test_Y,y_pred)
  li.append(ac)
  li2.append(i)

max = li[0]
index = 0
for i in range(1,len(li)):
    if li[i] > max:
        max = li[i]
        index = i

k=li2[index]
print("The value of K is = ",k)

plt.plot(li2,li)
plt.title("Graph showing the Accuracy with K",size=15,fontweight="bold")
plt.xlabel("Value of K",size=12,fontweight="bold")
plt.ylabel("Accuracy",size=12,fontweight="bold")
plt.show()

The value of K is = 3

Importing the KNN classifier for implementing the model: -

from sklearn.neighbors import KNeighborsClassifier

kclass = KNeighborsClassifier(n_neighbors = k, metric = 'minkowski', p = 2)

Training the model: -

https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 3/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory

kclass.fit(train_X, train_Y)

KNeighborsClassifier(n_neighbors=3)

Predicting the values of the Y(y_pred): -

y_pred = kclass.predict(test_X)

The values of the predicted y are : -

y_pred

array(['Iris-virginica', 'Iris-versicolor', 'Iris-setosa',

'Iris-virginica', 'Iris-setosa', 'Iris-virginica', 'Iris-setosa',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',

'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa', 'Iris-setosa',

'Iris-virginica', 'Iris-versicolor', 'Iris-setosa', 'Iris-setosa',

'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor',

'Iris-versicolor', 'Iris-setosa', 'Iris-virginica',

'Iris-versicolor', 'Iris-setosa', 'Iris-virginica',

'Iris-virginica', 'Iris-versicolor', 'Iris-setosa',

'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',

'Iris-virginica', 'Iris-setosa', 'Iris-virginica', 'Iris-setosa',

'Iris-setosa'], dtype=object)

Performance Measure: -

from sklearn.metrics import confusion_matrix,accuracy_score

cm = confusion_matrix(test_Y, y_pred)
print("Confusion Matrix: -\n",cm)

ac = accuracy_score(test_Y,y_pred)
print("\nAccuracy of the model(in %) is = ",ac*100)

Confusion Matrix: -

[[16 0 0]

[ 0 17 1]

[ 0 0 11]]

Accuracy of the model(in %) is = 97.77777777777777

https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 4/5
5/10/22, 2:58 PM Lab7.ipynb - Colaboratory

check 0s completed at 2:57 PM

https://colab.research.google.com/drive/17EdAX0gZZGyDlojce0QA0Dn3jdLQt0Fa?authuser=1#scrollTo=IDas4r15mL2H&printMode=true 5/5

You might also like