You are on page 1of 19

K-Nearest Neighbor

Learning
 Classification is the process
of classifying the data with the
help of class labels. On the other
hand,Clustering is similar
to classification but there are no
predefined class
labels. Classification is geared with
supervised learning. As
against, clustering is also known as
unsupervised learning
Nearest Neighbor Classifier
 Nearest-neighbor classifiers are
based on learning by analogy, that
is, by comparing a given test tuple
with training tuples that are similar
to it.
Nearest Neighbor Classifier
 The training tuples are described by
n attributes. Each tuple represents a
point in an n-dimensional space. In
this way, all of the training tuples
are stored in an n-dimensional
pattern space.
Nearest Neighbor Classifier
 When given an unknown tuple, a k-
nearest-neighbor classifier searches
the pattern space for the k training
tuples that are closest to the
unknown tuple. These k training
tuples are the k “nearest neighbors”
of the unknown tuple.
Closeness
 defined in terms of a distance
metric, such as Euclidean distance.
 The Euclidean distance between two
points or tuples, say, X1 = (x11,
x12, … , x1n) and X2 = (x21, x22,
... , x2n) is
Class Label
 For k-nearest-neighbor
classification, the unknown tuple is
assigned the most common class
among its k nearest neighbors.
 When k = 1, the unknown tuple is
assigned the class of the training
tuple that is closest to it in pattern
space.
1-Nearest Neighbor
3-Nearest Neighbor
K-Nearest Neighbor
 An arbitrary instance is represented by
(a1(x), a2(x), a3(x),.., an(x))
 ai(x) denotes features
 Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) -
ar(xj))2)
 Continuous valued target function
 mean value of the k nearest training
examples
K-Nearest Neighbor
 Here is step by step on how to compute K-
nearest neighbor algorithm:
1. Determine parameter K= number of nearest
neighbors
2. Calculate the distance between the query-
instance and all the training samples
3. Sort the distance and determine nearest
neighbors based on the K-th minimum distance
4. Gather category Y of nearest neighbors
5. Use majority of the category of nearest
neighbors as prediction value of the query
instance
Example
 We have data of two attributes to classify
whether a special tissue is ‘good’ or ‘bad’.

X1 = Acid X2 = Strength Y=
Durability (kg/ Square Classification
(Seconds) meter)
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Example
 Now the factory produces a new tissue that pass
laboratory test with X1 = 3 and X2 = 7. Without
another expensive survey, can we guess what
the classification of this new tissue is?
1. Determine parameter K= number of nearest
neighbors. Suppose use K = 3
Example
 Now the factory produces a new tissue that pass
laboratory test with X1 = 3 and X2 = 7. Without
another expensive survey, can we guess what
the classification of this new tissue is?
1. Determine parameter K= number of nearest
neighbors. Suppose use K = 3
Example
2. Calculate the distance between the query-
instance and all the training samples

X1 = Acid X2 = Strength Square distance to


Durability (kg/ Square query instance (3, 7)
(Seconds) meter)
7 7 (7-3)2 + (7-7)2 = 16
7 4 (7-3)2 + (4-7)2 = 25
3 4 (3-3)2 + (4-7)2 = 9
1 4 (1-3)2 + (4-7)2 = 13
Example
3. Sort the distance and determine nearest
neighbors based on the K-th minimum distance

X1 = Acid X2 = Square distance to Rank Is it


Durability Strength (kg/ query instance (3, 7) minimum included
(Seconds) Square distance in 3-
meter) nearest
neighbors
?
7 7 (7-3)2 + (7-7)2 = 16 3 Yes
7 4 (7-3)2 + (4-7)2 = 25 4 No
3 4 (3-3)2 + (4-7)2 = 9 1 Yes
1 4 (1-3)2 + (4-7)2 = 13 2 Yes
Example
4. Gather category Y of nearest neighbors

X1 = X2 = Rank Is it Y=
Acid Strength minimu include Categor
Durabili (kg/ m d in 3- y of
ty Square distance nearest nearest
(Secon meter) neighbo neighbo
ds) rs? rs
7 7 3 Yes Bad
7 4 4 No -
3 4 1 Yes Good
1 4 2 Yes Good
Example
5. Use majority of the category of nearest
neighbors as prediction value of the query
instance
 We have 2 good and 1 bad. So test tuple is
included in good class.
Exercise
 Find class label of the values (Age = 26, Income
= 18)

Age Income Class


($/hr)
25 15 Yes
20 10 No
30 20 Yes
35 25 No