You are on page 1of 1

The k-nearest neighbours knn is a simple algorithm which is easy to implement and use and

find its applications on pattern finding, regression analysis since it has been found. It classify the
new group based on the old target data and fall under supervised machine learning category. It
relies on unlabeled data to make a function and process unlabeled data based on the function.
It solves the classification problem. A classification problem always has discrete values either
100 or 0 percent but not anything between them. Mathematical operations cannot be achieved
on them as it become illogical when considered a system.it also solves the regression problem
which contain real value analysis with a independent and a dependent variable where each row
called as observation or example and each column called as dimension or independent
variable.The main idea behind k nearest neighbours is similar things are near to each other and
can be said “Birds of a same feather fly together”.From the figure we can clearly understand that
the data of same type should stay together. As the idea is on the similarity or the distance
between the points we need to calculate the distances between the points in the further go. The
algorithm goes this way
1. Load the data sets
2. Initialize k to a limited number of neighbours
3. For each observation in the data
a. Calculate the distance between the test data set and the current data set
b. Add the distance and the index to a orderly collection
c. Sort the collection from smallest to largest based on the indexes of examples.
d. Pick k entries from sorted list
e. Match the labels to the picked entries
f. If regression return the mean of the labela
g. If classification return the mode
Now the main important thing is to chose the value of the k so that the distance from each
classified pairs is less. This acts as separation. If it is not possible to divide using two
dimensions you have to convert it to a higher dimension using gradient function so that we can
find k easily for classification of new data sets.we can use right thumb rule to find the optimal k
value to use in the algorithm.
There are two types of knn classifiers
● Non parametric
● Instance based
In non parametric there is no prior explicit assumption on the variable to be taken thus reducing
the mismodelling the classification whereas in instance based the machine starts memorize the
data sets and keep them to improve the knowledge and acts based on the given data.

You might also like