K-Nearest Neighbors Algorithm Explained

The k-nearest neighbor algorithm is a simple machine learning algorithm that classifies new data points based on the majority class of the k closest training examples in the feature space. It works by finding the k training examples nearest to the new data point, and assigning the most common class among those k examples to the new data point. The value of k is a hyperparameter that is chosen prior to training, and impacts the algorithm's sensitivity to noise and ability to generalize.

Uploaded by

Radu Cimpeanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

704 views4 pages

K-Nearest Neighbors Algorithm Explained

Uploaded by

Radu Cimpeanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

k-nearest neighbor algorithm

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification. The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of its nearest neighbor. The same method can be used for regression, by simply assigning the property value for the object to be the average of the values of its k nearest neighbors. It can be useful to weight the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. (A common weighting scheme is to give each neighbor a weight of 1/d, where d is the distance to the neighbor. This scheme is a generalization of linear interpolation.) The neighbors are taken from a set of objects for which the correct classification (or, in the case of regression, the value of the property) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. The k-nearest neighbor algorithm is sensitive to the local structure of the data. Nearest neighbor rules in effect compute the decision boundary in an implicit manner. It is also possible to compute the decision boundary itself explicitly, and to do so in an efficient manner so that the computational complexity is a function of the boundary complexity.[1]

Algorithm
The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples. In the classification phase, k is a user-defined constant, and an unlabelled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point. Usually Euclidean distance is used as the distance metric; however this is only applicable to continuous variables. In cases such as text classification, another metric such as the overlap metric (or Hamming distance) can be used. Often, the classification accuracy of "k"-NN can be improved significantly if the distance metric is learned with specialized algorithms such as e.g. Large Margin Nearest Neighbor or Neighbourhood components analysis.

A drawback to the basic "majority voting" classification is that the classes with the more frequent examples tend to dominate the prediction of the new vector, as they tend to come up in the k nearest neighbors when the neighbors are computed due to their large number. One way to overcome this problem is to weight the classification taking into account the distance from the test point to each of its k nearest neighbors. KNN is a special case of a variable-bandwidth, kernel density "balloon" estimator with a uniform kernel.[2] [3]

Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 3 it is classified to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 it is classified to first class (3 squares vs. 2 triangles inside the outer circle).

k-nearest neighbor algorithm

Parameter selection
The best choice of k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, but make boundaries between classes less distinct. A good k can be selected by various heuristic techniques, for example, cross-validation. The special case where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is called the nearest neighbor algorithm. The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance. Much research effort has been put into selecting or scaling features to improve classification. A particularly popular approach is the use of evolutionary algorithms to optimize feature scaling.[4] Another popular approach is to scale features by the mutual information of the training data with the training classes. In binary (two class) classification problems, it is helpful to choose k to be an odd number as this avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via bootstrap method.[5]

Properties
The naive version of the algorithm is easy to implement by computing the distances from the test sample to all stored vectors, but it is computationally intensive, especially when the size of the training set grows. Many nearest neighbor search algorithms have been proposed over the years; these generally seek to reduce the number of distance evaluations actually performed. Using an appropriate nearest neighbor search algorithm makes k-NN computationally tractable even for large data sets. The nearest neighbor algorithm has some strong consistency results. As the amount of data approaches infinity, the algorithm is guaranteed to yield an error rate no worse than twice the Bayes error rate (the minimum achievable error rate given the distribution of the data).[6] k-nearest neighbor is guaranteed to approach the Bayes error rate, for some value of k (where k increases as a function of the number of data points). Various improvements to k-nearest neighbor methods are possible by using proximity graphs.[7]

For estimating continuous variables

The k-NN algorithm can also be adapted for use in estimating continuous variables. One such implementation uses an inverse distance weighted average of the k-nearest multivariate neighbors. This algorithm functions as follows: 1. 2. 3. 4. Compute Euclidean or Mahalanobis distance from target plot to those that were sampled. Order samples taking for account calculated distances. Choose heuristically optimal k nearest neighbor based on RMSE done by cross validation technique. Calculate an inverse distance weighted average with the k-nearest multivariate neighbors.

The optimal k for most datasets is 10 or more [8] . That produces much better results than 1-NN. Using a weighted k-NN, where the weights by which each of the k nearest points' class (or value in regression problems) is multiplied are proportional to the inverse of the distance between that point and the point for which the class is to be predicted also significantly improves the results.

k-nearest neighbor algorithm

References
[1] Bremner D, Demaine E, Erickson J, Iacono J, Langerman S, Morin P, Toussaint G (2005). "Output-sensitive algorithms for computing nearest-neighbor decision boundaries". Discrete and Computational Geometry 33 (4): 593604. doi:10.1007/s00454-004-1152-0. [2] D. G. Terrell; D. W. Scott (1992). "Variable kernel density estimation". Annals of Statistics 20: 12361265. doi:10.1214/aos/1176348768. [3] Mills, Peter. "Efficient statistical classification of satellite measurements". International Journal of Remote Sensing. [4] Nigsch, F.; A. Bender, B. van Buuren, J. Tissen, E. Nigsch & J.B.O. Mitchell (2006). "Melting Point Prediction Employing k-nearest Neighbor Algorithms and Genetic Parameter Optimization". Journal of Chemical Information and Modeling 46 (6): 24122422. doi:10.1021/ci060149f. PMID17125183. [5] P. Hall; B. U. Park; R. J. Samworth (2008). "Choice of neighbor order in nearest-neighbor classification". Annals of Statistics 36: 21352152. doi:10.1214/07-AOS537. [6] Cover TM, Hart PE (1967). "Nearest neighbor pattern classification". IEEE Transactions on Information Theory 13 (1): 2127. doi:10.1109/TIT.1967.1053964. [7] Toussaint GT (April 2005). "Geometric proximity graphs for improving nearest neighbor methods in instance-based learning and data mining". International Journal of Computational Geometry and Applications 15 (2): 101150. doi:10.1142/S0218195905001622. [8] Franco-Lopez et al., 2001

Further reading
When Is "Nearest Neighbor" Meaningful? (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31. 1422) Belur V. Dasarathy, ed (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. ISBN0-8186-8930-7. Shakhnarovish, Darrell, and Indyk, ed (2005). Nearest-Neighbor Methods in Learning and Vision. MIT Press. ISBN0-262-19547-X. Mkel H Pekkarinen A (2004-07-26). "Estimation of forest stand volumes by Landsat TM imagery and stand-level field-inventory data". Forest Ecology and Management 196 (2-3): 245255. doi:10.1016/j.foreco.2004.02.049. Franco-Lopez H, Ek AR, Bauer ME (September 2001). "Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method". Remote Sensing of Environment 77 (3): 251274. doi:10.1016/S0034-4257(01)00209-7. Fast k nearest neighbor search using GPU. In Proceedings of the CVPR Workshop on Computer Vision on GPU, Anchorage, Alaska, USA, June 2008. V. Garcia and E. Debreuve and M. Barlaud.

External links
k-nearest neighbor algorithm in C++ and Boost (http://codingplayground.blogspot.com/2010/01/ nearest-neighbour-on-kd-tree-in-c-and.html) by Antonio Gulli k-nearest neighbor algorithm in Java (Applet) (http://www.leonardonascimento.com/en/knn.html) (includes source code) by Leonardo Nascimento Ferreira k-nearest neighbor algorithm in Visual Basic and Java (http://paul.luminos.nl/documents/show_document. php?d=197) (includes executable and source code) k-nearest neighbor tutorial using MS Excel (http://people.revoledu.com/kardi/tutorial/KNN/index.html) STANN: A simple threaded approximate nearest neighbor C++ library that can compute Euclidean k-nearest neighbor graphs in parallel (http://compgeom.com/~stann) TiMBL: a fast C++ implementation of k-NN with feature and distance weighting, specifically suited to symbolic feature spaces (http://ilk.uvt.nl/timbl/) libAGF: A library for multivariate, adaptive kernel estimation, including KNN and Gaussian kernels (http:// libagf.sourceforge.net) OBSearch: A library for similarity search in metric spaces created during Google Summer of Code 2007 (http:// obsearch.net) ANN: A Library for Approximate Nearest Neighbor Searching (http://www.cs.umd.edu/~mount/ANN/)

Article Sources and Contributors

k-nearest neighbor algorithm Source: http://en.wikipedia.org/w/index.php?oldid=413406465 Contributors: Adam McMaster, Algomaster, Altenmann, AnAj, Atreys, B4hand, BD2412, Barro, BlueNovember, Bracchesimo, Caesura, Charles Matthews, Cibi3d, CommodiCast, DARTH SIDIOUS 2, DHN, Delmonde, Dustinsmith, Emslo69, Fly by Night, Garion96, Geomwiz, GiovanniS, Gnack, GodfriedToussaint, Hongooi, Hu12, ITurtle, Janto, Jbom1, Joeoettinger, Joerite, Kozuch, Lars Washington, Leonid Volnitsky, MER-C, Mach7, Manyu aditya, McSly, Mcld, Mdd4696, Melcombe, Memming, Michael Hardy, MisterHand, Miym, Mlguy, Mpx, MrOllie, Nikolaosvasiloglou, Olaf, PM800, Pakaran, Peteymills, Pgan002, Ploptimist, Pradtke, Protonk, RJASE1, Rama, Rickyphyllis, SQFreak, Slambo, Slightsmile, Stimpy, Stoph, Svante1, Tappoz, The Anome, Thorwald, Topbanana, User A1, X7q, Yc319, 118 anonymous edits

Image Sources, Licenses and Contributors

Image:KnnClassification.svg Source: http://en.wikipedia.org/w/index.php?title=File:KnnClassification.svg License: Creative Commons Attribution-Sharealike 2.5 Contributors: User:AnAj

License
Creative Commons Attribution-Share Alike 3.0 Unported http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

KNN Algorithm
100% (1)
KNN Algorithm
3 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
11 pages
k-Nearest Neighbors Algorithm Overview
No ratings yet
k-Nearest Neighbors Algorithm Overview
10 pages
k-Nearest Neighbors Algorithm Overview
No ratings yet
k-Nearest Neighbors Algorithm Overview
2 pages
ML 2
No ratings yet
ML 2
6 pages
445 Lecture 5
No ratings yet
445 Lecture 5
28 pages
K-Nearest Neighbors Algorithm Explained
No ratings yet
K-Nearest Neighbors Algorithm Explained
42 pages
Understanding K-Nearest Neighbors (KNN)
100% (2)
Understanding K-Nearest Neighbors (KNN)
24 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
Session 4 - Chapter 07 KNN
No ratings yet
Session 4 - Chapter 07 KNN
15 pages
3.2.1. K Nearest Neighbors
No ratings yet
3.2.1. K Nearest Neighbors
34 pages
k-Nearest Neighbors Classifier Lab Guide
No ratings yet
k-Nearest Neighbors Classifier Lab Guide
5 pages
04 Unit-Iv - ML
No ratings yet
04 Unit-Iv - ML
23 pages
Lecture 17 - KNN
No ratings yet
Lecture 17 - KNN
18 pages
K-Nearest Neighbors Algorithm Overview
No ratings yet
K-Nearest Neighbors Algorithm Overview
6 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
k-Nearest Neighbour Classifiers Overview
No ratings yet
k-Nearest Neighbour Classifiers Overview
18 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
K-Nearest Neighbor Overview
No ratings yet
K-Nearest Neighbor Overview
14 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
KNN Classification Assignment Guide
No ratings yet
KNN Classification Assignment Guide
6 pages
KNN & Decision Tree Basics
No ratings yet
KNN & Decision Tree Basics
9 pages
K-Nearest Neighbors (K-NN) Algorithm
No ratings yet
K-Nearest Neighbors (K-NN) Algorithm
10 pages
K - Nearest Neighbor
100% (1)
K - Nearest Neighbor
22 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
Nearest Neighbor Algorithms Guide
No ratings yet
Nearest Neighbor Algorithms Guide
26 pages
k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
Unit 5 ML
No ratings yet
Unit 5 ML
13 pages
Understanding K Nearest Neighbour (KNN)
100% (1)
Understanding K Nearest Neighbour (KNN)
24 pages
K-NN & PCA in Machine Learning
No ratings yet
K-NN & PCA in Machine Learning
69 pages
K-Nearest Neighbors Algorithm Explained
No ratings yet
K-Nearest Neighbors Algorithm Explained
56 pages
K-Nearest Neighbors Algorithm Overview
No ratings yet
K-Nearest Neighbors Algorithm Overview
42 pages
kNN Algorithm: Overview and Analysis
No ratings yet
kNN Algorithm: Overview and Analysis
79 pages
k-Nearest Neighbors Density Estimation
No ratings yet
k-Nearest Neighbors Density Estimation
31 pages
Distance-Based Methods - KNN
50% (2)
Distance-Based Methods - KNN
8 pages
Nearest Neighbor Classifier Explained
No ratings yet
Nearest Neighbor Classifier Explained
16 pages
Lect 06
No ratings yet
Lect 06
26 pages
k-NN Consistency in Data Clustering
No ratings yet
k-NN Consistency in Data Clustering
10 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
KNN Classification Assignment Guide
No ratings yet
KNN Classification Assignment Guide
6 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
ML 5
No ratings yet
ML 5
35 pages
K-Nearest Neighbors Explained
No ratings yet
K-Nearest Neighbors Explained
16 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
2 pages
k-Nearest Neighbor Classification Explained
No ratings yet
k-Nearest Neighbor Classification Explained
2 pages
K-Nearest Neighbors Algorithm Overview
No ratings yet
K-Nearest Neighbors Algorithm Overview
22 pages
K-Nearest Neighbor Classification Guide
No ratings yet
K-Nearest Neighbor Classification Guide
15 pages
KNN Basics for Machine Learning Beginners
100% (1)
KNN Basics for Machine Learning Beginners
8 pages
Enhanced k-NN Algorithm with Normalization
No ratings yet
Enhanced k-NN Algorithm with Normalization
5 pages
k-Nearest Neighbours (kNN) Overview
No ratings yet
k-Nearest Neighbours (kNN) Overview
10 pages
K-Nearest Neighbor Algorithm Overview
No ratings yet
K-Nearest Neighbor Algorithm Overview
15 pages
Introduction to K-Nearest Neighbor
No ratings yet
Introduction to K-Nearest Neighbor
10 pages
Unit 4
No ratings yet
Unit 4
20 pages
MKNN Modified K Nearest Neighbor
No ratings yet
MKNN Modified K Nearest Neighbor
4 pages
Nearest Neighbour Based Classifiers - Variants
No ratings yet
Nearest Neighbour Based Classifiers - Variants
22 pages
ML KN
No ratings yet
ML KN
12 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
Learn Python in A Day
93% (15)
Learn Python in A Day
141 pages
Machine Learning Projects in Python
100% (17)
Machine Learning Projects in Python
135 pages
REACT - JS PPT-final
100% (2)
REACT - JS PPT-final
90 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
95% (19)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Full Course of Machine Learning
100% (18)
Full Course of Machine Learning
660 pages
Data Structure and Algorithms With Python
100% (17)
Data Structure and Algorithms With Python
369 pages
Full Stack Java Development With Spring MVC, Hibernate, JQuery, and Bootstrap
100% (10)
Full Stack Java Development With Spring MVC, Hibernate, JQuery, and Bootstrap
712 pages
The Python Bible
97% (34)
The Python Bible
506 pages
Machine Learning?
100% (6)
Machine Learning?
114 pages
A Smarter Way - To Learn JavaScript
95% (60)
A Smarter Way - To Learn JavaScript
288 pages
Machine Learning Notes
83% (12)
Machine Learning Notes
19 pages
JavaScript Notes
50% (2)
JavaScript Notes
41 pages
Learn React.js: Comprehensive Guide
90% (10)
Learn React.js: Comprehensive Guide
962 pages
Machine Learning PPT For Students
73% (11)
Machine Learning PPT For Students
18 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
92% (49)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
500+ Coding Projects With Source Code
73% (11)
500+ Coding Projects With Source Code
12 pages
Linux Essentials For Cybersecurity
96% (26)
Linux Essentials For Cybersecurity
1,966 pages
PHP Cookbook
75% (8)
PHP Cookbook
72 pages
Next Js Ebook
100% (2)
Next Js Ebook
95 pages
Introduction to React Basics
91% (11)
Introduction to React Basics
303 pages
Burkov's Guide to Machine Learning
100% (11)
Burkov's Guide to Machine Learning
135 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Deep Learning With Python
100% (10)
Deep Learning With Python
396 pages
Node.js Coding Guide for Beginners
100% (2)
Node.js Coding Guide for Beginners
418 pages
ReactJS Projects for Developers
67% (9)
ReactJS Projects for Developers
26 pages
Build A Full Stack Web Application Using Angular and Firebase
100% (3)
Build A Full Stack Web Application Using Angular and Firebase
110 pages
Cyber Security Questions and Answers PDF
82% (11)
Cyber Security Questions and Answers PDF
234 pages
The Ultimate React - Js Guide
100% (4)
The Ultimate React - Js Guide
33 pages
JavaScript Algorithms
94% (17)
JavaScript Algorithms
292 pages
Machine Learning With Python
100% (15)
Machine Learning With Python
692 pages
My Resume
No ratings yet
My Resume
4 pages
Entrep12 L5 - L6
No ratings yet
Entrep12 L5 - L6
6 pages
GLORY Vietnam's Brochure
No ratings yet
GLORY Vietnam's Brochure
24 pages
Filtration, Centrifugation, Cell Disruption
No ratings yet
Filtration, Centrifugation, Cell Disruption
26 pages
Cross-Cultural Organizational Practices Analysis
No ratings yet
Cross-Cultural Organizational Practices Analysis
21 pages
Trihal VHE Transformers Specifications
No ratings yet
Trihal VHE Transformers Specifications
9 pages
The Food Mood Solution All Natural Ways To Banish Anxiety Depression Anger Stress Overeating and Alcohol and Drug Problems and Feel Good Again 1st Edition Jack Challem
No ratings yet
The Food Mood Solution All Natural Ways To Banish Anxiety Depression Anger Stress Overeating and Alcohol and Drug Problems and Feel Good Again 1st Edition Jack Challem
408 pages
SDSC Computer Engineering Curriculum Guide
No ratings yet
SDSC Computer Engineering Curriculum Guide
17 pages
Norbert Quard
No ratings yet
Norbert Quard
2 pages
Buckling of Piles
No ratings yet
Buckling of Piles
22 pages
Business Registration Process Guide
No ratings yet
Business Registration Process Guide
2 pages
STD 10 - Part1 of The New Plan of 2025-2026
No ratings yet
STD 10 - Part1 of The New Plan of 2025-2026
20 pages
F&F Policy
No ratings yet
F&F Policy
1 page
Automated CD34 Cell Counter Solution
No ratings yet
Automated CD34 Cell Counter Solution
2 pages
Gender Differences in Service Employee Behavior
No ratings yet
Gender Differences in Service Employee Behavior
16 pages
Cambridge IGCSE ™: History 0470/23
No ratings yet
Cambridge IGCSE ™: History 0470/23
13 pages
Supportive Psychotherapy in the 21st Century
No ratings yet
Supportive Psychotherapy in the 21st Century
16 pages
Workplace Communication Barriers Survey
No ratings yet
Workplace Communication Barriers Survey
2 pages
Japanese Language
100% (1)
Japanese Language
18 pages
Use of 6V3000 Sure Seal Repair Kit (0613, 1400) : Shutdown SIS
100% (1)
Use of 6V3000 Sure Seal Repair Kit (0613, 1400) : Shutdown SIS
9 pages
Microcontroller Function Generator Design
No ratings yet
Microcontroller Function Generator Design
11 pages
Geo2 2e Answers PDF
0% (1)
Geo2 2e Answers PDF
1 page
Numpy Tutorial in Python Programming Language.
No ratings yet
Numpy Tutorial in Python Programming Language.
11 pages
S.S. 3 Science Term 1 Report Card
No ratings yet
S.S. 3 Science Term 1 Report Card
1 page
John Deere 1030
No ratings yet
John Deere 1030
3 pages
James Dignan - Understanding Victims & Restorative Justice (Crime and Justice) (2004)
No ratings yet
James Dignan - Understanding Victims & Restorative Justice (Crime and Justice) (2004)
249 pages
MaxTeff OrcaFlex Spreadsheet
No ratings yet
MaxTeff OrcaFlex Spreadsheet
10 pages
Qualitative Response Regression Insights
No ratings yet
Qualitative Response Regression Insights
10 pages
Module 1 Fundamental of Management Notes
No ratings yet
Module 1 Fundamental of Management Notes
35 pages
ATSM D4956-2019 Retroreflective Sheering Identification Guide
100% (1)
ATSM D4956-2019 Retroreflective Sheering Identification Guide
5 pages

K-Nearest Neighbors Algorithm Explained

Uploaded by

K-Nearest Neighbors Algorithm Explained

Uploaded by

k-nearest neighbor algorithm

k-nearest neighbor algorithm

k-nearest neighbor algorithm

For estimating continuous variables

k-nearest neighbor algorithm

Article Sources and Contributors

Article Sources and Contributors

Image Sources, Licenses and Contributors

You might also like