The document discusses the K-nearest neighbors (KNN) algorithm, a simple machine learning algorithm used for classification. KNN classifies new data based on the classification of its k nearest neighbors, where k is a parameter that must be determined. Choosing k is important, as smaller values may be noisy while larger values are more computationally expensive. The document provides examples to illustrate how KNN works by calculating distances between data points to identify nearest neighbors and determine their class.
The document discusses the K-nearest neighbors (KNN) algorithm, a simple machine learning algorithm used for classification. KNN classifies new data based on the classification of its k nearest neighbors, where k is a parameter that must be determined. Choosing k is important, as smaller values may be noisy while larger values are more computationally expensive. The document provides examples to illustrate how KNN works by calculating distances between data points to identify nearest neighbors and determine their class.
The document discusses the K-nearest neighbors (KNN) algorithm, a simple machine learning algorithm used for classification. KNN classifies new data based on the classification of its k nearest neighbors, where k is a parameter that must be determined. Choosing k is important, as smaller values may be noisy while larger values are more computationally expensive. The document provides examples to illustrate how KNN works by calculating distances between data points to identify nearest neighbors and determine their class.
3/11/2020 Introduction to Data Mining, 2nd Edition 1
What is KNN?
● K Nearest Neighbour is a simple algorithm that
stores all the available cases and classifies the new data or case based on a similarity measure. ● It is mostly used to classifies a data point based on how its neighbours are classified ● ‘k’ in KNN is a parameter that refers to the number of nearest neighbours to include in the majority of the voting process
3/11/2020 Introduction to Data Mining, 2nd Edition 2
Few ideas on picking a value for ‘K’
1. There is no structured method to find the best
value for “K”. 2. 2. Choosing smaller values for K can be noisy and will have a higher influence on the result. 3. Larger values of K will have smoother decision boundaries which mean lower variance but increased computationally expensive. 4. In general, practice, choosing the value of k is k = sqrt(N) where N stands for the number of samples in your training dataset
3/11/2020 Introduction to Data Mining, 2nd Edition 3
Few ideas on picking a value for ‘K’
1. Try and keep the value of k odd in order to avoid
confusion between two classes of data
3/11/2020 Introduction to Data Mining, 2nd Edition 4
How does KNN Algorithm works?
● Similarity is defined according to a distance
metric between two data points. A popular one is the Euclidean distance method
Where N the number of attributes
3/11/2020 Introduction to Data Mining, 2nd Edition 5
Example for KNN Algorithm works
3/11/2020 Introduction to Data Mining, 2nd Edition 6
How does KNN Algorithm works?
3/11/2020 Introduction to Data Mining, 2nd Edition 7
How does KNN Algorithm works?
3/11/2020 Introduction to Data Mining, 2nd Edition 8
How does KNN Algorithm works?
3/11/2020 Introduction to Data Mining, 2nd Edition 9