Le 4

The nearest neighbor classifiers
The nearest neighbor rule
A set of n pairs (x1, t1),..., (xn, tn) is given,

where xi takes real values and ti takes values
in the set {1, ..., M }. Each xi is the outcome
of the set of measurements made upon the ith
individual. Each ti is the index of the category
to which the ith sample belongs. For brevity
we say:
xi belongs to category ti
A set of measurements is made upon a new

individual as x, and we wish to assign x a label
in {1, ..., M }. Let xk be the sample nearest to
x, then the nearest neighbor rule is to assign
x the label associated to xk .
min{d(x, xi)} = d(x, xk ), i = 1, ..., n

1
A commonly used distance measure is the sum
of squares. Suppose x = [x1, x2]T and xk =
[xk1, xk2]T
d(x, xk ) = (x1 − xk1)2 + (x2 − xk2)2
0.9
1
1 2
1
0.8 22
1 1
1 2 2 2
0.7 1 C 2 2
1
3 2
0.6
1 1 3 2
A
0.5 3
3
3
0.4
B 3
3
0.3 3
3
0.2
3
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Table: There are three classes each having 10

known samples. Three new samples A,B and
C are presented unlabelled. The algorithm can
output the class label, for each new sample, as
the label of its nearest neighbor. The results
are generated by nn.m.
2
Example 1: In order to select the best can-
didates, an over-subscribed secondary school
sets an entrance exam on two subjects of En-
glish and Mathematics. Suppose that we know
the marks and the classification results of 5 ap-
plicants as in the Table below. If an applicant
has been accepted, this is denoted as class 1,
otherwise class 2. Use the nearest neighbor
rule to determine if Andy should be accepted
if his marks of English and Mathematics are 70
and 70 respectively.
Candidate No. English Math Class

1 80 85 1
2 70 60 2
3 50 70 2
4 90 70 1
5 85 75 1
3
Solution:
1. Calculate the distance between Andy’s marks

and those of 5 applicants.
d1 = (70 − 80)2 + (70 − 85)2 = 225
d2 = (70 − 70)2 + (70 − 60)2 = 100
d3 = (70 − 50)2 + (70 − 70)2 = 400
d4 = (70 − 90)2 + (70 − 70)2 = 400
d5 = (70 − 85)2 + (70 − 75)2 = 150

2. Find out the minimum value amongst {d1,
d2, d3, d4, d5}, which is d2 = 100.
3. Look for the value of the Class for the No.2

applicant, which is 2. Hence the applicant is
determined as not acceptable by the algorithm.
4
The k nearest neighbor rule (k-nn)
An obvious extension of the nearest neighbor

rule is the k nearest neighbor rule. This rule
classifies the new sample x by assigning it the
label most frequently represented among the k
nearest samples.
We will restrict our discussion on the case of

two classes.
A decision is made by examining the labels on

the k nearest neighbors and taking a vote (k is
odd to avoid ties).
Using the same example, we can determine if

Andy should be accepted with k nearest neigh-
bor rule, with k = 3.
5
1. Calculate the distance between Andy’s marks
and those of 5 applicants. d1 = 125, d2 = 100,
d3 = 400, d4 = 400 and d5 = 150.
2. Find out the 3 smallest values amongst {d1,

d2, d3, d4, d5}, which is d1, d2, d5.
3. Look for the values of the Class labels for

No.1, No. 2 and No.3 applicants, which are
{1,2,1}.
4. There are more ones in the set of {1,2,1},

so the applicant is determined as acceptable
by the 3 − nn algorithm.
6
1
0.9
0.8 1
1 1
1
0.7
1
1 1
0.6 C
1 1
0.5 1
2
B
0.4
2
2
2 2
0.3
2 A
2
2
0.2
0.1 2
2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Table: There are two classes each having 10

known samples. Three new samples A,B and
C are presented unlabelled. The algorithm can
output the class label, for each new sample,
as the label of the most represented 3 near-
est neighbors. The results are generated by
knn.m.
7
K-means clustering algorithm
Clustering algorithms can be used to find a set

of centers which more accurately reflects the
distribution of the data points. The number
of centers M is decided in advance, and each
center ci is supposed to be representative of a
group of data points.
Suppose there are n data points {xj , j = 1, ...n}

in total, and we wish to find M representative
vectors ci, i = 1, ...M . The algorithm seeks
to partition the data points {xj , j = 1, ...n},
into M disjoint subsets Si containing Ni data
points, in such a way as to minimize the sum-
of-squares clustering function given by
M
∑ ∑
J= ∥xj − ci∥2
i=1 xj ∈Si
where ∈ denotes belongs to.

8
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figures above; the sum-of-squares clustering

function J is the total distance of the lines.
When it is minimized, ci should be in the center
of a group of data points.
9
J is minimized when
1 ∑
ci = xj .
Ni x ∈S
j i
The batch K-means algorithm begins by as-

signing the points at random to M sets and
then computing the mean vectors of the points
in each set.
Next each point is reassigned to a new set ac-

cording to which is the nearest mean vector.
The means of the set are then recomputed.
This procedure is repeated until there is no fur-

ther change in the grouping of the data points.
10
On-line K-means clustering algorithm:
The initial centers are randomly chosen from

the training data set. As each data point xj is
presented.
1. Find the nearest ci to xj ; (i = 1, ..., M ).

Suppose that this is found to be ck .
2. cnew = cold + η(x − cold )

k k j k
3. Set cnew
k as cold
k ;
where η > 0 is a small number called learning

rate. The process repeated until there is no
more changes in all centers.
11
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure: The trajectory of the centers updating

from an on-line K-means clustering algorithm.
Code kms.m is used.
12

Le 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Le 4

Uploaded by

Copyright:

Available Formats

The nearest neighbor classifiers

The nearest neighbor rule

A set of n pairs (x1, t1),..., (xn, tn) is given,

A set of measurements is made upon a new

min{d(x, xi)} = d(x, xk ), i = 1, ..., n

Table: There are three classes each having 10

Candidate No. English Math Class

1. Calculate the distance between Andy’s marks

d1 = (70 − 80)2 + (70 − 85)2 = 225

d2 = (70 − 70)2 + (70 − 60)2 = 100

d3 = (70 − 50)2 + (70 − 70)2 = 400

d4 = (70 − 90)2 + (70 − 70)2 = 400

d5 = (70 − 85)2 + (70 − 75)2 = 150

3. Look for the value of the Class for the No.2

An obvious extension of the nearest neighbor

We will restrict our discussion on the case of

A decision is made by examining the labels on

Using the same example, we can determine if

2. Find out the 3 smallest values amongst {d1,

3. Look for the values of the Class labels for

4. There are more ones in the set of {1,2,1},

Table: There are two classes each having 10

Clustering algorithms can be used to find a set

Suppose there are n data points {xj , j = 1, ...n}

where ∈ denotes belongs to.

Figures above; the sum-of-squares clustering

The batch K-means algorithm begins by as-

Next each point is reassigned to a new set ac-

This procedure is repeated until there is no fur-

The initial centers are randomly chosen from

1. Find the nearest ci to xj ; (i = 1, ..., M ).

2. cnew = cold + η(x − cold )

where η > 0 is a small number called learning

Figure: The trajectory of the centers updating

You might also like