You are on page 1of 12

The nearest neighbor classifiers

The nearest neighbor rule

A set of n pairs (x1, t1),..., (xn, tn) is given,


where xi takes real values and ti takes values
in the set {1, ..., M }. Each xi is the outcome
of the set of measurements made upon the ith
individual. Each ti is the index of the category
to which the ith sample belongs. For brevity
we say:

xi belongs to category ti

A set of measurements is made upon a new


individual as x, and we wish to assign x a label
in {1, ..., M }. Let xk be the sample nearest to
x, then the nearest neighbor rule is to assign
x the label associated to xk .

min{d(x, xi)} = d(x, xk ), i = 1, ..., n


1
A commonly used distance measure is the sum
of squares. Suppose x = [x1, x2]T and xk =
[xk1, xk2]T
d(x, xk ) = (x1 − xk1)2 + (x2 − xk2)2

0.9
1
1 2
1
0.8 22
1 1
1 2 2 2
0.7 1 C 2 2
1
3 2
0.6
1 1 3 2
A
0.5 3
3
3
0.4
B 3
3
0.3 3
3
0.2
3
0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Table: There are three classes each having 10


known samples. Three new samples A,B and
C are presented unlabelled. The algorithm can
output the class label, for each new sample, as
the label of its nearest neighbor. The results
are generated by nn.m.
2
Example 1: In order to select the best can-
didates, an over-subscribed secondary school
sets an entrance exam on two subjects of En-
glish and Mathematics. Suppose that we know
the marks and the classification results of 5 ap-
plicants as in the Table below. If an applicant
has been accepted, this is denoted as class 1,
otherwise class 2. Use the nearest neighbor
rule to determine if Andy should be accepted
if his marks of English and Mathematics are 70
and 70 respectively.

Candidate No. English Math Class


1 80 85 1
2 70 60 2
3 50 70 2
4 90 70 1
5 85 75 1

3
Solution:

1. Calculate the distance between Andy’s marks


and those of 5 applicants.

d1 = (70 − 80)2 + (70 − 85)2 = 225

d2 = (70 − 70)2 + (70 − 60)2 = 100

d3 = (70 − 50)2 + (70 − 70)2 = 400

d4 = (70 − 90)2 + (70 − 70)2 = 400

d5 = (70 − 85)2 + (70 − 75)2 = 150


2. Find out the minimum value amongst {d1,
d2, d3, d4, d5}, which is d2 = 100.

3. Look for the value of the Class for the No.2


applicant, which is 2. Hence the applicant is
determined as not acceptable by the algorithm.

4
The k nearest neighbor rule (k-nn)

An obvious extension of the nearest neighbor


rule is the k nearest neighbor rule. This rule
classifies the new sample x by assigning it the
label most frequently represented among the k
nearest samples.

We will restrict our discussion on the case of


two classes.

A decision is made by examining the labels on


the k nearest neighbors and taking a vote (k is
odd to avoid ties).

Using the same example, we can determine if


Andy should be accepted with k nearest neigh-
bor rule, with k = 3.

5
1. Calculate the distance between Andy’s marks
and those of 5 applicants. d1 = 125, d2 = 100,
d3 = 400, d4 = 400 and d5 = 150.

2. Find out the 3 smallest values amongst {d1,


d2, d3, d4, d5}, which is d1, d2, d5.

3. Look for the values of the Class labels for


No.1, No. 2 and No.3 applicants, which are
{1,2,1}.

4. There are more ones in the set of {1,2,1},


so the applicant is determined as acceptable
by the 3 − nn algorithm.

6
1

0.9

0.8 1
1 1
1
0.7
1
1 1
0.6 C

1 1
0.5 1
2
B
0.4
2
2
2 2
0.3
2 A
2
2
0.2

0.1 2

2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Table: There are two classes each having 10


known samples. Three new samples A,B and
C are presented unlabelled. The algorithm can
output the class label, for each new sample,
as the label of the most represented 3 near-
est neighbors. The results are generated by
knn.m.
7
K-means clustering algorithm

Clustering algorithms can be used to find a set


of centers which more accurately reflects the
distribution of the data points. The number
of centers M is decided in advance, and each
center ci is supposed to be representative of a
group of data points.

Suppose there are n data points {xj , j = 1, ...n}


in total, and we wish to find M representative
vectors ci, i = 1, ...M . The algorithm seeks
to partition the data points {xj , j = 1, ...n},
into M disjoint subsets Si containing Ni data
points, in such a way as to minimize the sum-
of-squares clustering function given by
M
∑ ∑
J= ∥xj − ci∥2
i=1 xj ∈Si

where ∈ denotes belongs to.


8
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figures above; the sum-of-squares clustering


function J is the total distance of the lines.
When it is minimized, ci should be in the center
of a group of data points.
9
J is minimized when
1 ∑
ci = xj .
Ni x ∈S
j i

The batch K-means algorithm begins by as-


signing the points at random to M sets and
then computing the mean vectors of the points
in each set.

Next each point is reassigned to a new set ac-


cording to which is the nearest mean vector.
The means of the set are then recomputed.

This procedure is repeated until there is no fur-


ther change in the grouping of the data points.

10
On-line K-means clustering algorithm:

The initial centers are randomly chosen from


the training data set. As each data point xj is
presented.

1. Find the nearest ci to xj ; (i = 1, ..., M ).


Suppose that this is found to be ck .

2. cnew = cold + η(x − cold )


k k j k

3. Set cnew
k as cold
k ;

where η > 0 is a small number called learning


rate. The process repeated until there is no
more changes in all centers.

11
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure: The trajectory of the centers updating


from an on-line K-means clustering algorithm.
Code kms.m is used.

12

You might also like