Professional Documents
Culture Documents
KNN CLASSIFICATION
K Nearest Neighbor is a simple algorithm that classifies a new data point based on it neighbours. A
group of neighbours are selected (this is k value) and the data point is classified under that class to
which majority of neighbours belong to.
K represents the number of nearest neighbours selected. Choosing k value properly will give more
accuracy. Choosing right value for k is called ‘parameter tuning’.
Example: We have two classes of variables. They are squares and triangles. When a new variable is
taken, how to classify whether it belongs to square class or triangle class?
When k=3, we should consider 3 nearest neighbours to our new variable. There are 2 squares and 1
triangle. Hence the new variable belongs to square.
When k=7, there are 3 squares and 4 triangles in the nearest neighbours. Hence the new variable
belongs to triangle class.
We use KNN algorithm when the data is labelled. Also, when the data is small.
This is done either using Euclidean distance formula or Minkowski distance formula.
Nageswarao Datatechs
Problem: Given data of breast cancer patients. Find out whether the cancer is benign or malignant.
Dataset: breast-cancer-wisconsin.data
import math
n = math.sqrt(len(y_test)) #
n # 11.832159566199232
# find accuracy
accuracy = model.score(x_test, y_test)
accuracy # 0.9857142857142858
Task on KNN: Use KNN model on Indian diabetes patients database and predict whether a new
patient is diabetic (1) or not (0).
Dataset: diabetes.csv
Note: We should not have zeros in the following columns: Glucose, BloodPressure, SkinThickness,
Insulin and BMI. Hence replace any 0s in these columns with their respective mean values. Then only
you can use this dataset.