You are on page 1of 15

Classification Using K-Nearest

Neighbor
Distances

• Distance are used Distance


to measure
similarity
• There are many Mahalanobi Euclidean
s Distance Distance
ways to measure
the distances
between two
instances
Hamming Minkowski
Distance distance
Distances

• Manhattan/Rectilinear •  Euclidean Distance


distance • 2
|X1-X2| + |Y1-Y2|
Nearest Neighbor Search

• Given: a set P of n points


• Goal: a data structure, which given a query
point q, finds the nearest neighbor p of q
in P

p
q
KNN

• Select 5 Nearest Neighbors as Value of K=5


KNN

• Decide if majority of Instances over a given


value of K Here, K=5.
Example

Points X1 (Acid Durability ) X2(strength) Y=Classification

P1 7 7 BAD

P2 7 4 BAD

P3 3 4 GOOD

P4 1 4 GOOD
KNN Example

Points X1(Acid Durability) X2(Strength) Y(Classification)


P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 ?
Scatter Plot
Euclidean Distance From Each
Point
KNN
P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)


Euclidean
Distance of
P5(3,7) from Sqrt((7-3) 2 + (7-7)2 ) Sqrt((7-3) 2 + (4-7)2 ) Sqrt((3-3) 2 + (4- Sqrt((1-3) 2 + (4-7)2
= = 7)2 ) = )=
3 Nearest NeighBour

P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)


Euclidean
Distance of
P5(3,7) from Sqrt((7-3) 2 + (7-7)2 ) Sqrt((7-3) 2 + (4-7)2 ) Sqrt((3-3) 2 + (4- Sqrt((1-3) 2 + (4-7)2
= = 7)2 ) = )=

Class BAD BAD GOOD GOOD


KNN Classification

Points X1(Durability) X2(Strength) Y(Classification)


P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 GOOD
Different Values of K
1-Nearest Neighbor
• One of the simplest of all machine learning
classifiers
• Simple idea: label a new point the same as
the closest known point

Label it red.
k – Nearest Neighbor
• Generalizes 1-NN to smooth away noise in the
labels
• A new point is now assigned the most
frequent label of its k nearest neighbors

Label it red, when k = 3

Label it blue, when k = 7

You might also like