Knowledge-Based Systems: Yikun Wang, Zhibin Pan, Jing Dong

Knowledge-Based Systems 235 (2022) 107604
Contents lists available at ScienceDirect
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
A new two-layer nearest neighbor selection method for kNN classifier

∗
Yikun Wang a , Zhibin Pan a,c , , Jing Dong b
a
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, PR China
b
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China
c
Research Institute of Xi’an Jiaotong University, Zhejiang
article info a b s t r a c t
Article history: The k-nearest neighbor (kNN) classifier is a classical classification algorithm that has been applied in
Received 26 March 2021 many fields. However, the performance of the kNN classifier is limited by a simple neighbor selection
Received in revised form 23 August 2021 method, called nearest neighbor (NN) rule, where only the neighborhood of the query is considered
Accepted 13 October 2021
when selecting the nearest neighbors of the query. In other words, the NN rule only uses one-layer
Available online 19 October 2021
neighborhood information of the query.
Keywords: In this paper, we propose a new neighbor selection method based on two-layer neighborhood
kNN classifier information, called two-layer nearest neighbor (TLNN) rule. The neighborhood of the query and the
Two-layer nearest neighbor rule neighborhoods of all selected training instances in this neighborhood are considered simultaneously,
First-layer neighborhood then the two-layer nearest neighbors of the query are determined according to the distance, distri-
Second-layer neighborhood
bution relationship, and backward nearest neighbor relationship between the query and all selected
Extended neighborhood
training instances in the above neighborhoods. In order to verify the effectiveness of the proposed
TLNN rule, a k-two-layer nearest neighbor (kTLNN) classifier is proposed to measure the classification
ability of the two-layer nearest neighbors.
Extensive experiments on twenty real-world datasets from UCI and KEEL repositories show that the
kTLNN classifier outperforms not only the kNN classifier but also seven other state-of-the-art NN-based
classifiers.
© 2021 Elsevier B.V. All rights reserved.
1. Introduction due to the different spatial locations of different queries, any

value of k can only be optimal for one part of the queries but may
The k-nearest neighbor (kNN) classifier [1] is a classical clas- not be suitable for the other part, so the performance of the kNN
sification algorithm, which has been widely studied and used classifier depends largely on the choice of k. In order to solve this
in many fields due to its simplicity, effectiveness, and intuitive- problem, many adaptive k-value methods have been proposed,
ness [2–8]. The basic idea of the kNN classifier is to determine in which different queries are classified using different values of
the class label of the query based on its k-nearest neighbors in the k [11–14].
training set. Specifically, it can assign the query into the class that Secondly, the kNN classifier uses a simple majority voting
appears most frequently among its k-nearest neighbors according rule. In the classification process, the role of k-nearest neigh-
to the majority voting rule. The kNN classifier is a non-parametric bors are equally considered. However, since the similarities be-
classification method and does not require a training process [9]. tween these k-nearest neighbors and the query are different,
their classification capabilities are actually different. To tackle this
In addition, it can asymptotically approach the classification per-
problem, various distance-weighted voting methods have been
formance achieved by the optimal Bayesian classifier under the
developed, in which larger weights are given to the nearest neigh-
constraint of k/N →0, where N is the total number of training
bors that are closer to the query [15–17]. Besides, in some sparse
instances [10].
coefficient-weighted voting methods, the nearest neighbors that
While the kNN classifier offers many significant advantages, it
have a greater effect on the sparse representation of the query
still has some problems to be solved. Firstly, the performance of
are assigned larger weights [18].
the kNN classifier is sensitive to the value of k. The kNN classifier
Thirdly, the kNN classifier usually suffers from the outliers,
uses a fixed and single value of k, which needs to be determined especially in the case of small-size training sets. To address this
in advance and used for the classification of all queries. However, issue, Mitani et al. proposed a local mean-based k-nearest neigh-
bor (LMKNN) classifier, which suppresses the influence of outliers
∗ Corresponding author. by calculating the local mean vector of k-nearest neighbors in
E-mail address: zbpan@mail.xjtu.edu.cn (Z. Pan). each class, and then the distances between the query and local
https://doi.org/10.1016/j.knosys.2021.107604
0950-7051/© 2021 Elsevier B.V. All rights reserved.
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
mean vector in each class instead of using the majority voting extended nearest neighbors from the viewpoint of the ex-
rule to make the classification decision [19]. Since then, the idea tended nearest neighbors. For any extended nearest neigh-
of local mean vector has been extensively studied, and a large bor, if the query is in its neighborhood, it will be kept as a
number of local mean-based classifiers have been proposed one two-layer nearest neighbor in the two-layer neighborhood
after another [20–23]. that is eventually used for classification decision.
In addition to the above three classical problems, a new prob-
lem of the neighbor selection method of the kNN classifier, which Based on the proposed TLNN rule, we propose a k-two-layer
is called nearest neighbor (NN) rule in this paper, needs to be nearest neighbor (kTLNN) classifier, which firstly finds the k-two-
further studied. According to the NN rule, the k training instances layer nearest neighbors of the query and then uses them to make
closest to the query will be selected as its k-nearest neighbors in classification decision according to the majority voting rule.
the kNN classifier, which has the following defects. The rest of this paper is organized as follows. In Section 2, we
Firstly, the similarity metric in the NN rule is too simple. discuss the standard kNN classifier and explain our motivations to
Only the point-to-point distance is used to measure the similarity propose a k-two-layer nearest neighbor classifier. In Section 3, we
between the query and training instances, completely discarding present the two-layer nearest neighbor (TLNN) rule and further
the information about their distribution. Therefore, Sánchez et al. propose the k-two-layer nearest neighbor (kTLNN) classifier. In
first proposed the concept of nearest centroid neighbors (NCN) order to verify the performance of our proposed kTLNN classifier,
and designed a k-nearest centroid neighbor (kNCN) classifier we conduct extensive experiments to compare it with standard
using this new NCN rule [24]. Then, researchers gave some further kNN classifier as well as seven other competitive kNN-based
improvements based on the concept of NCN [25,26]. classifiers in Section 4. Finally, discussions and conclusions are
Secondly, the unilateral similarity used in the NN rule is not given in Section 5 and Section 6, respectively.
comprehensive enough. It only considers whether a training in-
2. Motivation
stance is its nearest neighbor from the viewpoint of the query, but
does not consider whether the query is also its nearest neighbor
In this section, we review the rationale of the kNN classifier
from the viewpoint of the training instance. Hence, Pan et al.
and explain our motivations to propose the kTLNN classifier.
came up with the concept of general nearest neighbors (GNN) and
The kNN classifier is used to assign the query into the class
developed a k-general nearest neighbor (kGNN) classifier [27]. A
that appears most frequently among its k-nearest neighbors in
training instance can be selected as a general nearest neighbor as
the training set according
{ ⏐ to the }N majority voting rule. Consider
long as it is a k-nearest neighbor of the query or the query is a
k-nearest neighbor of it. a training set T = yi ⏐yi ∈ RD i=1 with N training instances in
Thirdly, the neighborhood structure of the query in the NN a D-dimensional space, and C = {ci |ci ∈ {w1 , w2 , . . . , wM } }Ni=1 is
rule is too unitary. It only consists of the neighborhood of the the class label set corresponding to T , where M represents the
query. As reported in an article [28] published in Nature Human number of classes. For a given query x, firstly the kNN classifier
Behavior in 2019, approximately 95% of the potential predictive calculates Euclidean distances between the query x and each
accuracy attainable for an individual is available within the social training instance yi in T by (1).
ties of that individual only, without requiring the individual’s
√
data. In other words, by knowing who an individual’s social ties d (x, yi ) = (x − yi )T (x − yi ), 1≤i≤N (1)
are and what the activities of those ties are, in principle, this Then, the k-nearest neighbors of x are selected according to the
individual can be accurately analyzed even he is not present in ascending
the data. Inspired by this article, we infer that in addition to the { i ⏐ i order }k of the Euclidean distances,
( denoted as( NN k (x)
) =
nnx ⏐nnx ∈ T i=1 , where k ≤ N and d x, nn1x ≤ d x, nn2x ≤
)
nearest neighbors of the query, the neighborhood information of
· · · ≤ d x, nnkx . And the class labels of the k-nearest neighbors
( )
these nearest neighbors also plays a very important role in the }k
are denoted as Ck (x) = cxi ⏐cxi ∈ {w1 , w2 , . . . , wM } i=1 .
{ ⏐
category prediction of the query. Hence, further using the neigh-
borhood of these nearest neighbors to enrich the neighborhood Finally, the kNN classifier determines the class label cx of x
structure of the query may improve the performance of the kNN according to the majority voting rule, as given in (2).
classifier. ∑
cx = arg max δ (wj = cxi ), 1 ≤ i ≤ k, 1≤j≤M (2)
In order to solve the above-mentioned problems in the clas- wj
sical NN rule used by the kNN classifier in the nearest neighbor nnix ∈NN k (x)
selection process, we propose a new neighbor selection method, {

1, wj = cxi
called the two-layer nearest neighbor (TLNN) rule in this paper. where δ wj = cxi =
( )
Compared with NN rule, the new TLNN rule has three main 0, wj ̸= cxi .
advantages: From the above process, it can be seen that the NN rule selects
k training instances closest to the query as the k-nearest neigh-
(1) The TLNN rule uses two-layer neighborhood information. bors based on the perception that smaller distance represents
The first layer is the neighborhood of the query, called stronger similarity. According to this neighbor selection method,
first-layer neighborhood, and the second layer is the neigh- k-nearest neighbors can be found to classify the query in the kNN
borhoods of each first-layer nearest neighbor, called second- classifier. That is, only the k training instances closest to the query
layer neighborhood. are used for making classification decision.
(2) The TLNN rule takes the distribution relationship of the Consider an example of two-class classification problem in the
query and second-layer nearest neighbors into account. 2-dimensional space in Fig. 1, where x is the query from class w1 ,
Those second-layer nearest neighbors that are not only and the other points are training instances. Note that y1 , y2 and
closer to the query but also distributed around the query y3 in the black solid circle are the 3 nearest neighbors of x, and x
will constitute an extended neighborhood together with the is incorrectly assigned to class w2 according to their class labels
first-layer neighborhood. using the kNN classifier when k = 3.
(3) The TLNN rule constrains the backward nearest neighbor The three green dotted circles in Fig. 1 mark the three nearest
relationship between the query and extended nearest neigh- neighbors of y1 , y2 and y3 (x is not included), respectively. Further
bors. It considers the similarity between the query and analysis of them reveals some new findings: (1) y1 belongs to
2
nearest neighbors selected based on the query’s neighborhood

are not comprehensive enough, and there are also some train-
ing instances useful for classification in their respective neigh-
borhood. Therefore, instead of the simple selection method of
nearest neighbors in the NN rule, we propose a new two-layer
nearest neighbor (TLNN) rule, where the neighborhoods of each
traditional nearest neighbor are also considered.
The basic idea of using TLNN rule to find two-layer nearest
neighbors of the query x is as follows:
(1) Find the traditional k-nearest neighbors of x to constitute

the first-layer neighborhood of x;
(2) Find the k-nearest neighbors of each first-layer nearest
neighbor, and remove those that are farther from x, then
the rest constitute the second-layer neighborhood of x;
(3) Remove those second-layer nearest neighbors that are not
distributed around x, and the remaining second-layer near-
est neighbors constitute the extended neighborhood of x
together with the first-layer nearest neighbors.
(4) Remove those extended nearest neighbors that do not in-
Fig. 1. An example of two-class classification problem (k = 3). clude x in their neighborhood, and the remaining extended
nearest neighbors are the two-layer nearest neighbors that
constitute the two-layer neighborhood of x.
class w1 , and two of its three nearest neighbors belong to class Fig. 2 is a brief illustration of the basic idea of this proposed
w1 , one belongs to class w2 ; (2) y2 belongs to class w2 , but two of TLNN rule. y1 , y2 and y3 represent the three nearest neighbors of
its three nearest neighbors belong to class w1 , only one belongs query x; y1,1 , y1,2 , and y1,3 represent the three nearest neighbors
to class w2 ; (3) y3 belongs to class w2 , but all its three nearest of y1 , and the rest can be deduced by analogy. The blue boxes in
neighbors belong to class w1 . In other words, for y1 , it belongs to Fig. 2 represent the first-layer nearest neighbors of x. They are
class w1 and most of its three nearest neighbors also belong to actually the three traditional nearest neighbors of x. The green
class w1 . For y2 and y3 , although they belong to class w2 , most boxes in Fig. 2 represent the second-layer nearest neighbors of x.
of their three nearest neighbors belong to class w1 . Obviously, They are obtained by deleting y1,3 which is farther from x among
the respective three nearest neighbors of y1 , y2 and y3 have a the three nearest neighbors of y1 , y2 and y3 . At this time, there
positive effect on the classification of the query x. If they can be are still three instances in the neighborhood of y2 and y3 , but
added to the neighborhood of x to participate in the classification only two instances are left in the neighborhood of y1 . The orange
decision, the more correct classification result may be obtained. boxes in Fig. 2 represent the extended nearest neighbors of x.
Note that repeated neighbors are used only once. As it can be seen The neighborhood of y2 are not distributed around x, so they
from Fig. 1, there are seven neighbors in the new neighborhood are all deleted. The neighborhoods of y1 and y3 are distributed
of the query x, five of which belong to class w1 . In result, x will around x, so they are kept as the extended nearest neighbors of x
be correctly assigned to class w1 according to the majority voting together with y1 , y2 and y3 . The red boxes in Fig. 2 represent the
rule. two-layer nearest neighbors of x. Since x is not included in the
The example in Fig. 1 shows the weakness of traditional k- neighborhoods of extended nearest neighbors y3,1 and y3,3 , they
nearest neighbors used in the standard kNN classifier and the are deleted. The rest are kept as the two-layer nearest neighbors
complementarity of their respective k-nearest neighbors in classi- of x.
fication. Therefore, we propose a new neighbor selection method, It can be seen from the above illustration that the two-layer
called two-layer nearest neighbor (TLNN) rule, and a k-two-layer nearest neighbors of query are obtained by firstly generalizing
nearest neighbor (kTLNN) classifier, in which traditional k-nearest the traditional k-nearest neighbors, and then filtering them out
neighbors, called first-layer nearest neighbors and their respec- according to the distance, distribution relationship, and backward
tive k-nearest neighbors, called second-layer nearest neighbors nearest neighbor relationship between them and the query. In
are comprehensively considered to find some more reasonable the following paragraphs, we will give a detailed description of
neighbors for following classification. In addition, in order to the specific rules used in the above two-layer nearest neighbor
ensure the reliability of the new neighbors, we also further con- selection process.
sider the distance, distribution relationship, and backward near- The definition of first-layer neighborhood of query x is given
est neighbor relationship between them and the query in the in Definition 1.
neighbor selection process, which will be explained in detail in
Definition 1. Given a query x, the first-layer neighborhood of x in
Section 3.
the training set T represented by NN 1st (x) is given in (3):
{ ⏐ }
3. Proposed method NN 1st (x) = nn1st,x ⏐nn1st,x ∈ NN k (x) (3)
In this section, we firstly introduce the two-layer nearest where NN k (x) is the k-nearest neighborhood of x.
neighbor (TLNN) rule and then propose the k-two-layer nearest nn1st,x is a first-layer nearest neighbor of x.
neighbor (kTLNN) classifier. From Definition 1, it can be seen that the first-layer neigh-
borhood NN 1st (x) of x is actually the k-nearest neighborhood of x
3.1. The TLNN rule used for classification decision in the NN rule. The proposed TLNN
rule attempts to generalize this neighborhood and further restrict
The nearest neighbors are the training instances closest to the it to find a more suitable neighborhood for the classification
query in the NN rule. However, as discussed in Section 2, the decision.
3
Fig. 2. A brief illustration of the basic idea of TLNN rule. y1 , y2 and y3 represent the three nearest neighbors of query x; y1,1 , y1,2 , y1,3 , y2,1 , y2,2 , y2,3 , y3,1 , y3,2 and
y3,3 represent the three nearest neighbors of y1 , y2 and y3 , respectively.
Firstly, consider the k-nearest neighborhoods of each first- first-layer nearest neighbors according to the NN rule. The size
layer nearest neighbor of x to determine the second-layer neigh- of the second-layer neighborhood of x cannot exceed twice the
borhood of x. Those training instances in the k-nearest neigh- size of its first-layer neighborhood, which ensures that second-
borhood of a first-layer nearest neighbor and closer to x consti- layer nearest neighbors will not have poor classification capability
tute the effective neighborhood of this first-layer nearest neigh- because they are not very far from x.
bor, and the effective neighborhoods of all first-layer nearest Secondly, consider the distribution of second-layer neighbor-
neighbors together constitute the second-layer neighborhood of hood of x, specifically the distribution of effective neighborhoods
x, which is given in Definition 2. of each first-layer nearest neighbor, to determine the extended
neighborhood of x. Those effective neighborhoods with a distri-
Definition 2. Given a query x, the second-layer neighborhood of bution close to x will constitute the extended neighborhood of
x in the training set T represented by NN 2nd (x) is given in (4): x together with the first-layer neighborhood, which is given in
Definition 3.
NN 2nd (x) = nn2nd,x ⏐nn2nd,x ∈ NN eff nn1st,x , nn1st,x ∈ NN 1st (x)
{ ⏐ ( ) }
(4) Definition 3. Given a query x, the extended neighborhood of

( ) x in the training set T represented by NN ext (x) is given in (6)
where NN eff nn1st,x is the effective neighborhood of the first- in Box I where cent NN eff (nn1st,x ) is the centroid of the first-layer
layer nearest neighbor nn1st,x in the training set T , expressed by nearest
(5):
( neighbor
) nn1st,x and all its effective nearest neighbors in
NN eff nn1st,x .
{ ⏐
⏐ nneff ,nn
( ) } nnext ,x is an extended nearest neighbor of x.
( ) 1st,x
∈ NN k nn1st,x
NN eff nn1st,x = nneff ,nn1st,x ⏐ (
⏐
⏐∧d x, nneff ,nn1st,x ≤ 2RNN 1st (x)
) The relationship between the first-layer neighborhood and x
is clear, and the similarity between them is very strong, so it is
(5) kept as part of the extended neighborhood. However, the rela-
( ) tionship between the effective neighborhoods of each first-layer
NN k nn1st,x is the k-nearest neighborhood of nn1st,x in T , and nearest neighbor and x is not clear, so we measure the similarity
RNN 1st (x) is the radius of NN 1st (x), that is, the distance from x to
between them based on their distribution information to deter-
its kth first-layer nearest neighbor.
mine whether the effective neighborhood of a first-layer nearest
nn2nd,x is a second-layer nearest neighbor of x, and nneff ,nn1st,x
neighbor will be kept as part of the extended neighborhood.
is an effective nearest neighbor of nn1st,x .
Two distances are used to measure the distribution relation-
Note that RNN 1st (x) is the distance from x to its furthest first- ship between the effective neighborhood of a first-layer
( nearest)
layer nearest neighbor, which can be used to represent the size neighbor and x in Definition 3. One is the distance d x, nn1st,x
of the first-layer neighborhood of x. Similarly, define RNN 2nd (x) as between the first-layer( nearest neighbor)nn1st,x and x, and the
the distance from x to its furthest second-layer nearest neighbor, other is the distance d x, cent NN eff (nn1st,x ) between the centroid
which can be used to represent the size of the second-layer
of nn1st,(x ’s effective neighborhood (including nn1st,x ) and x.
neighborhood of x. )
If d x, cent NN eff (nn1st,x ) < d x, nn1st,x , that is, compared
( )
It can be seen from Definition 2 that the second-layer neigh-
borhood of x is composed of all effective neighborhoods of the with the point-to-point distance between the first-layer nearest
first-layer nearest neighbor, and the distances between the effec- neighbor nn1st,x and x, the distance between the local centroid
tive nearest neighbors of each first-layer nearest neighbor and x of the first-layer nearest neighbor nn1st,x and x becomes smaller
will not exceed 2RNN 1st (x) . Therefore, the size of the second-layer after considering its effective neighborhood. This shows that the
neighborhood and the size of the first-layer neighborhood satisfy distribution relationship between the effective neighborhood of
the relationship RNN 2nd (x) ≤ 2RNN 1st (x) . this first-layer nearest neighbor nn1st,x and x is closer than that
In other words, the second-layer neighborhood of x is actually of itself and x. Therefore, it can be reasonably considered that the
obtained by limiting the size of the effective neighborhoods of x’s effective neighborhood of this first-layer nearest neighbor nn1st,x
4
nnext ,x ∈ NN 1st (x)

{ ⏐ }
NN ext (x) = nnext ,x ⏐⏐
⏐ { ( ) } (6)
∨ nnext ,x ∈ NN eff nn1st,x |d x, cent NN eff (nn1st,x ) < d x, nn1st,x , nn1st,x ∈ NN 1st (x)
( ) ( )
Box I.
has a strong similarity to x, and should be (

kept as part of the)ex- Step 1: Obtain the first-layer neighborhood of x according to
tended neighborhood. On the contrary, if d x, cent NN eff (nn1st,x ) ≥ Eq. (3);
d x, nn1st,x , it means that the distribution relationship between
( )
Step 2: Obtain the second-layer neighborhood of x according to
the effective neighborhood of this first-layer nearest neighbor Eqs. (4)–(5);
nn1st,x and x is more alienated than that of itself and x. In this case,
the effective neighborhood of this first-layer nearest neighbor Step 3: Obtain the extended neighborhood of x according to
nn1st,x has a weak similarity to x and can be deleted. Eq. (6);
Finally, consider the backward nearest neighbor relationship
between the extended neighborhood of x and x to determine the Step 4: Obtain the two-layer neighborhood of x that is finally
two-layer neighborhood of x that is ultimately used for classi- used for classification decision according to Eq. (7).
fication decision. From the viewpoint of each extended nearest
neighbor, those extended nearest neighbors where x is in their For the ease of understanding, Fig. 3 shows an example of
kb -nearest neighborhood will be kept constituting the two-layer using TLNN rule to select two-layer nearest neighbors. In this
neighborhood of x, which is given in Definition 4. It should be example, k and kb are all set to 4.
noted that kb used here is to distinguish it from the k used in the For a given query x, the first-layer nearest neighbors are the
forward nearest neighbor relationship from the viewpoint of x. four nearest neighbors of x in the training set T , that is, the blue
points y1 , y2 , y3 and y4 in Fig. 3(a). In addition, the four black
Definition 4. Given a query x, the two-layer neighborhood of x dotted circles indicate the four nearest neighbors of these four
in the training set T represented by NN t wo (x) is given in (7): first-layer nearest neighbors in the training set T , respectively.
It can be seen that the distances between x and the four
NN t wo (x) = nnt wo,x ⏐nnt wo,x ∈ NN ext (x) ∧ x ∈ NN kb nnt wo,x
⏐
nearest neighbors of y1 , y2 , and y4 are all smaller than 2RNN 1st (x) ,
{ ( )}
so their respective effective neighborhoods contain four training
(7) instances. However, the distance between x and the 4th nearest
neighbor of y3 is larger than 2RNN 1st (x) , so the 4th nearest neighbor
( )
where NN kb nnt wo,x is the kb -nearest neighborhood of nnt wo,x in
the set T ∗ = T ∪ {x}. will not be added to y3 ’s effective neighborhood, that is, the ef-
nnt wo,x is a two-layer nearest neighbor of x. fective neighborhood of y3 contains only three training instances.
The four black solid circles in Fig. 3(b) represent the effective
From Definition 4 anf Definition 3, we can see that two-layer neighborhoods of first-layer nearest neighbors y1 , y2 , y3 and y4 ,
nearest neighbors are selected from the extended neighborhood, which constitute the second-layer neighborhood of x together.
and the extended neighborhood comes from the first-layer neigh- It should be noted that the first-layer nearest neighbors in the
borhood and the second-layer neighborhood. Therefore, some effective neighborhoods are still regarded as first-layer nearest
of the two-layer nearest neighbors come from the first-layer neighbors and the remaining green points are the real second-
neighborhood, denoted as NN 1st t w o (x), and the others come from layer nearest neighbors. Besides, the black dotted lines represent
the second-layer neighborhood, denoted as NN 2nd t w o (x). Obviously, the distances from x to the four first-layer nearest neighbors and
NN t wo (x) = NN 1st 2nd
t w o (x) ∪ NN t w o (x). the local centroids of their respective effective neighborhoods.
The forward nearest neighbor relationship and the backward Obviously, the distance from x to the local centroid of y1 ’s
nearest neighbor relationship measure the similarity between effective neighborhood is larger than the distance from x to y1
query and training instances from two different viewpoints. The itself, so y1 ’s effective neighborhood will not be selected as part
forward nearest neighbor relationship is obtained from the view- of the extended neighborhood of x. The situation is the same
point of query. The NN rule considers the forward nearest neigh- for y4 . For y2 , the distance from x to the local centroid of its
bor relationship when selecting the nearest neighbors. The back- effective neighborhood is closer than the distance from x to itself,
ward nearest neighbor relationship is obtained from the view- so y2 ’s effective neighborhood will be used as part of the extended
point of training instances. The MKNN algorithm [29] and the neighborhood of x. The situation is the same for y3 . The orange
kGNN algorithm [27] use the backward nearest neighbor rela- points y1 , y2 , y3 , y4 , y5 and y6 in Fig. 3(c) are all extended nearest
tionship while using the forward nearest neighbor relationship neighbors of x. The four nearest neighbors of these extended
when selecting the mutual nearest neighbors and general nearest nearest neighbors in T ∗ (T ∗ = T ∪ {x}) are also marked with black
neighbors, respectively. dotted circles. Note that for clarity, the four nearest neighbors of
In the proposed TLNN rule, we consider the backward near- y2 and y3 are not marked in Fig. 3(c).
est neighbor relationship between x and its extended nearest For y4 and y5 , x is not one of their four nearest neighbors, so
neighbors. After obtaining extended nearest neighbors based on they will be deleted. Finally, the two-layer nearest neighbors of x
the forward nearest neighbor relationship, we further filter them are the red points y1 , y2 , y3 , and y6 in Fig. 3(d).
according to the backward nearest neighbor relationship to ob-
tain two-layer nearest neighbors of x for classification decision. 3.2. The kTLNN classifier
Therefore, the two-layer nearest neighbors have a very strong
similarity to query x. The TLNN rule determines two-layer nearest neighbors of the
Summarizing the above process can achieve the complete query based on the neighborhood information of not only query
TLNN rule. For a given query x: but also its first-layer nearest neighbors, as well as the distance,
5
Fig. 3. An example of using TLNN rule to obtain two-layer nearest neighbors of query x (k, kb = 4).
distribution relationship and backward nearest neighbor relation- Table 1

ship. Based on the new TLNN rule, we propose a k-two-layer The characteristics of twenty real-world datasets from UCI and KEEL repositories.
nearest neighbor (kTLNN) classifier. Datasets Source Instances Dimensions Classes
In the kTLNN classifier, the TLNN rule is used to determine Breast UCI 277 9 2
the k-two-layer nearest neighbors of the query x. Then, according Dermatology KEEL 358 34 6
Dna UCI 2000 180 3
to the majority voting rule, x will be assigned into the class
German UCI 1000 24 2
that appears most frequently among these k-two-layer nearest Glass UCI 214 9 6
neighbors. Heart UCI 303 13 2
The pseudo codes of the kTLNN classifier are shown in Algo- Ionosphere UCI 351 34 2
rithm 1. Landsat UCI 2000 36 6
Msplice UCI 3175 240 3
Optdigits KEEL 5620 64 10
4. Experiments Phoneme KEEL 5404 5 2
Ring KEEL 7400 20 2
We conduct extensive experiments on twenty real-world Segment UCI 2310 18 7
datasets from UCI [30] and KEEL [31] repositories to evaluate the Sonar UCI 208 60 2
Tae KEEL 151 5 3
performance of our proposed kTLNN classifier. The twenty real-
Texture KEEL 5500 40 11
world datasets have quite different characteristics in the number Thyroid KEEL 7200 21 3
of instances, dimensions and classes, which are listed in Table 1. Vehicle UCI 846 18 4
Ten-fold cross-validation method is used for each dataset to Wine UCI 178 13 3
validate the classification algorithm in all experiments in this Winequality_red KEEL 1599 11 6
paper. That is to say, the dataset is divided into ten subsets, and
each time in ten-fold cross-validation one subset is used as the
testing set while the remaining nine subsets are used as training
In this section, we first conduct the parameter analysis to
set. The final classification result is the average of the results
fix the value of kb used in Step 4 in the Algorithm 1. Then, we
obtained on these ten testing sets.
The proposed kTLNN classifier aims at selecting more reliable compare the proposed kTLNN classifier with the standard kNN
neighbors to improve the classification accuracy of the kNN clas- classifier and seven competitive kNN-based classifiers to prove
sifier. Thus, the classification error rate [19], which defined as the the superiority of our proposed kTLNN classifier. Additionally,
ratio of the number of queries that are classified incorrectly to we conduct a series of experiments to analyze the properties of
the number of all queries, is the key evaluation criterion in our the proposed kTLNN classifier. We analyze the two-layer nearest
experiments. neighbors from the first-layer neighborhood NN 1stt w o (x) and the
6
two-layer nearest neighbors from the second-layer neighborhood of kb and k is similar. The minimum error rate of the kTLNN
NN 2nd
t w o (x), and discuss the superiority of the selected two-layer classifier under the six settings of kb is not much different on most
nearest neighbors NN t wo (x). Finally, we compare the proposed datasets, and even the same on the datasets Segment, Tae, Wine
kTLNN classifier with the kNN* classifier. and Winequality_red. Note that on these four datasets, the equal
minimum error rates under six settings of kb are all obtained
4.1. Initialization of parameter kb when k = 1, since the two-layer nearest neighbor selected
according to the TLNN rule under six settings of kb is generally
As mentioned in Section 3, k and kb are the neighborhood sizes the traditional nearest neighbor selected according to the NN rule
used to measure the forward and backward nearest neighbor when k = 1.
relationships between the query and training instances, respec- However, the setting of kb = 1.4k still wins with a slight
tively. The meaning of k is the same as that in the kNN classifier, advantage according to the respective results and average results
but kb is a new parameter, which needs to be initialized before on the twenty datasets. The kTLNN classifier has the lowest
verifying the performance of our proposed kTLNN classifier. minimum error rate on 8, 9, 11, 7, 6 and 5 datasets under the
In this subsection, we evaluate the classification performance six settings of kb , which means that the setting of kb = 1.4k can
of the kTLNN classifier with six different relationships of kb and achieve the best performance according to the respective results
k, i.e., kb = 1.0k, kb = 1.2k, kb = 1.4k, kb = 1.6k, kb = 1.8k on the twenty datasets. In addition, the average results on the
and kb = 2.0k. Note that the cases where kb is smaller than k twenty datasets also indicate that the kTLNN algorithm can obtain
are not considered, since strict restrictions on the backward near- the best classification performance under the setting of kb = 1.4k.
est neighbor relationship may result in few selected two-layer As a result, we choose the setting of kb = 1.4k in the following
nearest neighbors, which is unfavorable to classification. experiments.
The final initialization result of kb is determined by comparing
the optimal performance of kTLNN classifier under these six 4.2. Comparisons with the standard kNN classifier
different relationships of kb and k on twenty real-world datasets.
Note that the optimal performance is the minimum error rate of To preliminarily demonstrate the classification performance
kTLNN algorithm when k = 1, 2, . . . , 20. of our proposed kTLNN classifier, we first compare it with the
Table 2 shows the minimum error rate of kTLNN classifier standard kNN classifier on twenty real-world datasets in terms
when k = 1, 2, . . . , 20 under six different relationships of kb and of error rate.
k on twenty real-world datasets, and the value of k corresponding Table 3 shows the error rate of the standard kNN classifier
to the minimum error rate is given in parentheses. and the proposed kTLNN classifier when k = 1, 3, 5, 7, 9, 11,
It can be seen from Table 2 that the best classification per- 13, and 15 on twenty real-world datasets, where kb = 1.4k is set
formance of the kTLNN classifier under six different relationships according to the experimental results in Section 4.1. Bold value
7
Table 2
The minimum error rate (%) of kTLNN classifier when k = 1, 2, . . . , 20 under six different relationships of kb and k
on twenty real-world datasets.
Datasets kb = 1.0k kb = 1.2k kb = 1.4k kb = 1.6k kb = 1.8k kb = 2.0k
Breast 24.23(14) 24.23(11) 24.94(10) 24.59(9) 23.47(8) 24.19(7)
Dermatology 8.35(4) 8.37(2) 8.37(2) 8.92(1) 8.92(1) 8.92(1)
Dna 11.95(20) 12.10(20) 11.60(20) 11.15(20) 11.35(20) 11.35(19)
German 28.60(20) 28.00(19) 28.40(16) 28.30(15) 28.50(15) 28.40(20)
Glass 27.05(7) 26.25(2) 26.25(2) 27.05(12) 27.16(2) 27.16(2)
Heart 19.27(14) 18.72(20) 17.66(20) 18.63(17) 18.31(14) 18.31(13)
Ionosphere 12.22(18) 11.02(18) 10.46(19) 10.19(17) 10.74(15) 10.46(15)
Landsat 10.20(3) 10.25(11) 10.25(10) 10.35(9) 10.35(7) 10.40(9)
Msplice 6.71(20) 6.80(20) 6.68(20) 7.25(20) 7.59(20) 7.94(19)
Optdigits 0.93(17) 0.96(6) 0.96(12) 1.00(5) 0.96(6) 0.98(8)
Phoneme 9.31(1) 9.29(2) 9.29(2) 9.33(1) 9.33(1) 9.33(1)
Ring 12.95(20) 12.47(20) 12.11(20) 11.72(20) 11.54(20) 11.24(20)
Segment 3.55(1) 3.55(1) 3.55(1) 3.55(1) 3.55(1) 3.55(1)
Sonar 16.34(4) 14.91(3) 14.44(3) 14.44(3) 15.86(3) 15.86(3)
Tae 42.23(1) 42.23(1) 42.23(1) 42.23(1) 42.23(1) 42.23(1)
Texture 0.76(6) 0.80(2) 0.80(2) 0.84(5) 0.84(2) 0.84(2)
Thyroid 5.79(9) 5.76(8) 5.76(6) 5.79(6) 5.78(5) 5.81(5)
Vehicle 28.13(17) 28.14(7) 28.15(8) 28.15(7) 28.05(6) 28.28(7)
Wine 25.28(1) 25.28(1) 25.28(1) 25.28(1) 25.28(1) 25.28(1)
Winequality_red 39.65(1) 39.65(1) 39.65(1) 39.65(1) 39.65(1) 39.65(1)
Average 16.67 16.44 16.34 16.42 16.47 16.51
indicates that the error rate is lower than that of the standard In order to make a fair comparison, we use ten-fold cross-
kNN classifier. validation method to obtain the optimized value of k. Note that
From Table 3, we can see that when k = 1, the error rates the FRNN and HBKNN classifiers do not need to optimize the
of the proposed kTLNN classifier and kNN classifier are basically value of k because their classification results are not dependent
the same on all twenty datasets. That is, the classification perfor- on the choice of k. Moreover, kb = 1.4k is set as before.
mance of the two classifiers in the case of k = 1 is basically the The classification error rates with the optimized value of k
same. When k > 1, the error rate of the proposed kTLNN classifier of each classifier on twenty real-world datasets are shown in
is lower than that of the kNN classifier in most cases. On very Table 4, and the rankings of the error rate for each classifier are
few datasets, the error rate of the kTLNN classifier is higher than also given in parentheses. Note that the results of the standard
that of the kNN classifier when k is equal to some certain values, kNN classifier are also presented in Table 4 as a baseline,
but a lower error rate is obtained again when k takes another As it can be seen from Table 4, the proposed kTLNN classifier
value. In other words, the kTLNN classifier achieves significantly ranks first on seven datasets and second on five datasets, and
better performance on most datasets, and similar performance to achieves the highest average ranking and the lowest average error
the kNN classifier on very few datasets. In addition, the average rate on the twenty datasets, which indicates that the proposed
error rate of the kTLNN classifier on the twenty datasets is lower kTLNN classifier has the best classification performance compared
than that of kNN classifier, indicating that its overall classification with standard kNN classifier and all the other seven competitive
performance is better than the kNN classifier. kNN-based classifiers.
Moreover, it can be noticed that the error rate of the proposed
kTLNN classifier is higher than that of the kNN classifier mostly 4.4. Analysis of NN t wo (x)
when the value of k is small. This is because compared with the
benefits of adding some farther nearest neighbors, the loss of After verifying the superiority of our proposed kTLNN clas-
deleting some closer nearest neighbors may be greater when the sifier over standard kNN classifier and other competitive kNN-
value of k is small. based classifiers, we further conduct a series of analyses on the
properties of the kTLNN classifier in the next a few subsections.
4.3. Comparisons with seven competitive kNN-based classifiers As described in Algorithm 1, the acquisition of two-layer near-
est neighbors needs four steps: (1) Find first-layer nearest neigh-
To further verify the classification performance of the pro- bors; (2) Find second-layer nearest neighbors; (3) Find extended
posed kTLNN classifier, we compare it with kNN [1], WKNN [15], nearest neighbors; (4) Find two-layer nearest neighbors. There-
PNN [20], MKNN [29], kGNN [27], CFKNN [32], FRNN [33] and fore, to verify the effectiveness of the introduced two-layer near-
HBKNN [34] classifiers on twenty real-world datasets in terms of est neighbors in the classification process, we compare the clas-
error rate. sification error rate of the neighbors obtained at each step. Fig. 4
WKNN is a famous distance-weighted k-nearest neighbor clas- shows the error rate of first-layer nearest neighbors NN 1st (x),
sifier, where larger weights are given to the nearest neighbors first-layer nearest neighbors and second-layer nearest neighbors
closer to the query. PNN utilizes the weighted average distance NN 1st (x) & NN 2nd (x), extended nearest neighbors NN ext (x), and
from the k-nearest neighbors to the query in each class as the two-layer nearest neighbors NN t wo (x) when k varies from 1 to
distance from the pseudo nearest neighbor to the query for clas- 20 on four real-world datasets. In addition, the optimal k for each
sification decision. MKNN and kGNN take the neighborhood in- dataset is given in parentheses for clarity.
formation of the training instances into account. CFKNN aims The classification error rate of first-layer nearest neighbors
to obtain neighbors containing less redundant information ac- is actually the error rate of kNN classifier. After the introduc-
cording to the representation-based distance. FRNN is a famous tion of second-layer nearest neighbors, the classification error
fuzzy-based k-nearest neighbor classifier, which can obtain richer rate sometimes increases. Since there are many second-layer
class confidence values with fuzzy-rough ownership function. nearest neighbors (at most k2 ), it is easy to remove some second-
HBKNN makes use of the fuzzy strategy and the local and global layer nearest neighbors that are far from the query and have
information of the query. poor classification capability, which has a negative impact on
8
Table 3
Comparisons with the standard kNN classifier on twenty real-world datasets in terms of error rate (%) when k =
1, 3, 5, 7, 9, 11, 13, 15.
Dataset k=1 k=3 k=5 k=7
kNN kTLNN kNN kTLNN kNN kTLNN kNN kTLNN
Breast 31.77 31.77 31.73 32.17 27.04 27.76 27.76 25.97
Dermatology 8.92 8.92 11.70 9.48 11.96 10.02 13.07 11.96
Dna 27.40 27.40 22.70 23.60 19.75 20.85 18.05 18.05
German 33.60 33.70 31.60 32.60 31.90 32.70 30.20 30.70
Glass 28.07 28.07 32.95 29.60 30.68 30.40 32.84 28.58
Heart 22.41 22.73 19.69 22.18 21.95 22.82 22.04 20.01
Ionosphere 12.96 12.96 15.09 12.69 15.56 12.78 16.39 12.96
Landsat 11.25 11.25 10.75 11.20 11.20 10.85 12.00 10.85
Msplice 23.94 23.91 21.14 17.61 18.46 13.14 18.30 11.81
Optdigits 1.19 1.19 1.07 1.07 1.23 1.01 1.33 1.05
Phoneme 9.31 9.33 10.92 10.12 11.43 10.64 12.19 11.21
Ring 25.18 25.18 28.31 22.84 30.55 21.05 32.65 18.96
Segment 3.55 3.55 4.46 3.68 5.11 3.90 6.15 5.06
Sonar 17.29 17.29 18.82 14.44 17.72 17.29 22.58 21.15
Tae 42.23 42.23 56.61 56.16 59.46 53.04 59.46 59.29
Texture 0.87 0.87 1.02 0.87 1.27 0.84 1.56 0.98
Thyroid 6.97 7.00 5.90 6.24 6.00 5.83 6.08 5.76
Vehicle 31.10 31.10 29.33 28.98 29.46 28.60 29.23 29.10
Wine 25.28 25.28 27.64 28.19 31.04 31.46 31.46 29.72
Winequality_red 39.65 39.65 44.22 44.53 48.22 46.28 49.29 46.97
Average 20.15 20.17 21.28 20.41 21.50 20.06 22.13 20.01
Dataset k=9 k = 11 k = 13 k = 15
kNN kTLNN kNN kTLNN kNN kTLNN kNN kTLNN
Breast 26.64 26.01 25.97 27.09 27.04 27.04 24.94 25.61
Dermatology 14.46 13.64 16.68 15.60 17.79 17.53 18.09 19.79
Dna 16.65 16.55 16.05 14.25 15.45 13.60 15.40 13.20
German 30.80 30.70 30.80 30.00 29.10 29.20 29.20 28.70
Glass 38.75 32.84 37.22 28.13 36.76 27.95 36.02 27.95
Heart 20.66 18.82 20.43 18.82 20.85 20.11 20.34 19.46
Ionosphere 16.39 12.96 16.39 12.69 16.11 12.22 16.30 11.85
Landsat 12.05 10.35 12.50 10.60 13.00 11.05 13.10 10.55
Msplice 16.47 10.77 15.81 9.04 14.84 8.38 14.65 8.06
Optdigits 1.39 1.03 1.44 1.01 1.44 1.00 1.62 0.98
Phoneme 12.82 11.93 13.25 12.84 13.45 12.86 13.71 12.99
Ring 33.96 17.51 35.03 16.01 36.08 14.76 36.86 13.97
Segment 6.88 5.24 7.32 6.19 8.05 7.10 8.35 7.32
Sonar 28.87 24.44 33.21 26.34 34.64 26.87 33.68 29.30
Tae 61.34 63.21 60.09 60.71 65.09 59.29 65.09 61.16
Texture 1.73 1.15 1.87 1.22 2.15 1.27 2.22 1.36
Thyroid 6.25 5.92 6.28 5.90 6.44 5.97 6.47 5.93
Vehicle 29.91 28.27 31.68 28.96 30.87 30.15 30.52 29.20
Wine 29.79 30.83 29.86 30.35 31.46 26.39 29.24 27.01
Winequality_red 50.54 46.22 49.03 45.72 49.22 45.85 48.72 46.91
Average 22.82 20.42 23.05 20.07 23.49 19.93 23.23 20.07
Table 4
Comparisons with standard kNN classifier and seven competitive kNN-based classifiers on twenty real-world datasets in terms of
error rate (%) with the optimized value of k.
Dataset kNN WKNN PNN MKNN kGNN CFKNN FRNN HBKNN kTLNN
Breast 24.19(2) 25.26(5) 25.26(5) 24.59(3) 23.83(1) 28.16(9) 26.69(7) 28.11(8) 24.94(4)
Dermatology 8.92(5) 8.92(5) 9.18(7) 7.52(3) 7.24(2) 9.74(8) 6.94(1) 13.91(9) 8.37(4)
Dna 14.65(3) 15.65(7) 15.65(7) 12.20(2) 15.00(5) 15.15(6) 65.60(9) 14.65(3) 11.60(1)
German 28.50(3) 29.20(8) 29.10(7) 28.90(4) 28.40(1) 28.90(4) 30.10(9) 28.90(4) 28.40(1)
Glass 28.07(6) 27.61(5) 26.25(1) 27.33(4) 26.42(3) 46.48(9) 41.31(8) 30.40(7) 26.25(1)
Heart 18.95(6) 19.46(7) 18.40(3) 19.69(9) 18.86(5) 18.72(4) 17.57(1) 19.46(7) 17.66(2)
Ionosphere 12.96(5) 12.96(5) 12.96(5) 11.57(3) 11.67(4) 5.93(1) 60.56(9) 16.57(8) 10.46(2)
Landsat 10.75(5) 10.10(2) 10.20(3) 10.00(1) 10.90(6) 40.50(9) 20.95(8) 12.10(7) 10.25(4)
Msplice 13.01(4) 15.81(8) 15.18(7) 7.05(2) 10.77(3) 14.05(5) 32.22(9) 14.46(6) 6.68(1)
Optdigits 1.07(6) 0.98(3) 0.98(3) 0.91(1) 1.00(5) 1.39(8) 90.07(9) 1.48(7) 0.96(2)
Phoneme 9.31(4) 9.31(4) 9.23(2) 9.21(1) 9.31(4) 26.87(9) 21.67(8) 9.55(7) 9.29(3)
Ring 25.18(4) 25.18(4) 25.18(4) 13.05(2) 24.31(3) 36.36(7) 50.62(9) 36.72(8) 12.11(1)
Segment 3.55(5) 3.51(4) 3.42(2) 3.55(5) 3.42(2) 21.56(9) 3.33(1) 4.63(8) 3.55(5)
Sonar 17.29(6) 16.29(4) 15.34(2) 16.77(5) 18.25(8) 17.67(7) 15.39(3) 21.63(9) 14.44(1)
Tae 42.23(2) 42.23(2) 44.11(7) 42.23(2) 43.48(6) 56.79(9) 49.29(8) 41.16(1) 42.23(2)
Texture 0.87(5) 0.84(4) 0.80(1) 0.82(3) 0.87(5) 5.22(8) 13.09(9) 1.58(7) 0.80(1)
Thyroid 5.90(6) 5.82(5) 5.75(3) 5.72(2) 5.71(1) 10.19(9) 7.42(8) 6.08(7) 5.76(4)
Vehicle 28.61(7) 28.26(5) 28.26(5) 27.55(3) 26.96(2) 23.06(1) 32.18(9) 30.04(8) 28.15(4)
Wine 25.28(2) 25.28(2) 25.28(2) 25.28(2) 25.28(2) 29.79(9) 14.10(1) 26.39(8) 25.28(2)
Winequality_red 39.65(4) 39.03(3) 38.02(2) 39.65(4) 40.03(7) 55.85(9) 41.65(8) 36.84(1) 39.65(4)
Average ranking 4.50 4.60 3.55 3.05 3.75 7.00 6.70 6.50 2.45
Average error rate 17.95 18.08 17.93 16.68 17.58 24.62 32.04 19.73 16.34
9
the classification. However, after filtering the first-layer nearest 4.6. Comparisons with kNN* classifier
neighbors and second-layer nearest neighbors, the classification
error rate of the obtained extended nearest neighbors decreases It can be seen from Section 4.5 that the number of two-
significantly. And the final classification error rate of the two- layer nearest neighbors has no clear relationship with the value
layer nearest neighbors not only greatly decreases, but also is of k. Therefore, in order to further verify the superiority of the
lower than the classification error rate of the original first-layer kTLNN classifier, we compare it with the kNN* classifier. The
nearest neighbors. That is to say, although the simple addition kNN* classifier refers to selecting the top k* nearest neighbors
of second-layer nearest neighbors may result in a decrease in using the NN rule to classify the query, where k*=|NN t wo (x)|.
classification performance, the classification performance of the Fig. 7 shows the error rate of kNN* classifier and kTLNN classifier
obtained neighbors can be continuously improved step by step on four real-world datasets when k varies from 1 to 20, where
through the filtering process, which is ultimately better than the k refers to the first-layer neighborhood size used to measure the
classification performance of the nearest neighbors in the kNN
forward nearest neighbor relationship.
classifier.
From the results shown in Fig. 7, we can see that the error
4.5. Analysis of NN 1st 2nd rate of the proposed kTLNN classifier is smaller than that of
t w o (x) and NN t w o (x)
the kNN* classifier in most cases. This is because, although the
It can be known from the TLNN rule that the superiority of the kTLNN classifier may delete some nearest neighbors closer to the
kTLNN classifier mainly comes from the use of two-layer neigh- query and add some other nearest neighbors farther from the
borhood information, that is, the two-layer nearest neighbors query compared with the kNN* classifier, these two-layer nearest
from the first-layer neighborhood, NN 1st t w o (x), and the two-layer
neighbors have a more positive effect on classification. Therefore,
nearest neighbors from the second-layer neighborhood, NN 2nd the classification performance of the kTLNN classifier is better
t w o (x).
Therefore, in order to further illustrate the necessity of introduc- than that of the kNN* classifier.
ing the two-layer neighborhood information for the selection of
two-layer nearest neighbors, we analyze NN 1st 2nd
t w o (x) and NN t w o (x)
5. Discussions
in detail.
First, we analyze the complementarity of NN 1st t w o (x) and 5.1. Computational complexity
NN 2nd
t w o (x) in classification capability. Fig. 5 gives the classification
error rate of NN 1st t w o (x), NN t w o (x) and NN t w o (x), and the proportion
2nd
In practice, computational complexity is a key factor to be
of queries incorrectly classified by both NN 1st 2nd
t w o (x) and NN t w o (x) on
considered when designing classifiers. Given a query x and a
four real-world datasets. training set T with N training instances in M classes, it can
Experimental results show that the classification capability of be shown that the online computation of the proposed kTLNN
the two-layer nearest neighbors from the second-layer neigh- classifier is only a little more than that of the kNN classifier.
borhood is worse than that of the two-layer nearest neighbors In the offline stage of the kTLNN classifier, the kb -nearest
from the first-layer neighborhood in most cases. However, the neighbors of each training instance can be found. Note that kb
two-layer nearest neighbors composed of them often have better is generally greater than or equal to k, so the k nearest neighbors
classification performance. In addition, the proportion of queries of each training instance are also obtained at the same time.
that are incorrectly classified by both two-layer nearest neigh- In the online stage of the kTLNN classifier, it only has a few
bors from the first-layer neighborhood and two-layer nearest more times of centroid computations, distance computations and
neighbors from the second-layer neighborhood is less than the comparisons than the kNN classifier. Table 5 shows the increased
classification error rate obtained by one of them, which means online computations of the kTLNN classifier compared to the kNN
that the queries incorrectly classified by the two-layer nearest classifier.
neighbors from the first-layer neighborhood are very different In the first step of finding first-layer nearest neighbors of the
from the queries incorrectly classified by the two-layer nearest
query, the computational complexity is exactly the same as that
neighbors from the second-layer neighborhood. Therefore, the
of the kNN classifier to find k-nearest neighbors, that is, N times
combining use of them can reduce the classification error rate,
of distance computations and once sorting operation involving N
that is, the two-layer nearest neighbors from the second-layer
elements.
neighborhood and the two-layer nearest neighbors from the first-
layer neighborhood are in fact complementary in classification In the second step of finding second-layer nearest neighbors
capability. of the query, we need to compare the distances between the k
Then, we observe⏐ the number 1st 2nd nearest neighbors of each first-layer nearest neighbor and the
⏐ of ⏐NN t2nd
w o (x) ⏐and NN t wo (x). Fig. 6
query with 2RNN 1st (x) , which requires k2 times of comparisons at
shows |NN t wo (x)|, ⏐NN 1st t wo (x)⏐ and ⏐NN (x)⏐ on the same four
t wo
real-world datasets, where the operator |·|⏐ represents most.
⏐ the
⏐ size of ⏐a
set. It should be noted that |NN t wo (x)| = ⏐NN 1st ⏐ ⏐ 2nd ⏐ In the third step of finding extended nearest neighbors of
t w o (x) + NN t w o (x) ,
and k refers to the first-layer neighborhood size used to measure the query, we need to compare the distance between the cen-
the forward nearest neighbor relationship between the query and troid of the effective neighborhood of each first-layer nearest
training instances, which is different from the number of nearest neighbor and the query with the distance between this first-
neighbor used for classification decision in the kNN classifier. layer nearest neighbor and the query, which requires k times of
As can be seen from Fig. 6, |NN t wo (x)|may be greater than k centroid computations, k times of distance computations and k
or less than k. In other words, the number of two-layer nearest times of comparisons, where each centroid computation involves
neighbors selected by the kTLNN classifier is uncertain ⏐ and de-⏐ at most (k + 1) training instances (including the first-layer nearest
pends on the characteristics of the dataset. In addition, ⏐NN 1st t w o (x)
⏐ neighbor itself).
1st
is often less than k. Since NN t wo (x) comes from the first-layer In the fourth step of finding two-layer nearest neighbors of
neighborhood, and the first-layer neighborhood is the nearest the query, we need to compare the distance between each ex-
neighborhood used in the kNN classifier, the kTLNN⏐ classifier⏐ tended nearest neighbor and the query with the distance between
actually deletes some ⏐ 2nd ⏐
⏐ nearest
1st
⏐ neighbors. Moreover, NN t wo (x) this extended nearest neighbor and its kb th nearest neighbor. In
is always less than NN t wo (x) , which is reasonable because the
⏐ ⏐ the worst case, all effective neighborhoods of first-layer nearest
second-layer nearest neighbors are farther away from the query neighbors do not overlap, and all first-layer nearest neighbors
and have weaker classification capability. and second-layer nearest neighbors are kept as extended nearest
10
Fig. 4. Comparison of NN 1st (x), NN 1st (x) & NN 2nd (x), NN ext (x) and NN t wo (x) on four real-world datasets in terms of error rate (%) when k varies from 1 to 20.
Fig. 5. Complementary analysis of NN 1st 2nd

t w o (x) and NN t w o (x) on four real-world datasets in terms of error rate (%) when k = 5.
Table 5
The increased online computations of the kTLNN classifier compared to the kNN classifier.
Step Increased online computations Increased online computational cost
Step 1 – –
Step 2 Comparisons k2 times at most
Step 3 Centroid computation k times
Distance computation k times
Comparisons k times
Step 4 Comparisons (k2 + k) times at most
11
Fig. 6. Comparison of NN 1st 2nd

t w o (x) and NN t w o (x) when k varies from 1 to 20 on four real-world datasets.
Table 6 that only the indexes need to be stored when storing the k-
The increased storage cost of the kTLNN classifier compared to the kNN classifier.
nearest neighbors of each training instance. Therefore, in the
Stage Step Increased storage type Increased storage cost
offline stage, a total of Nk indexes and N distances need to be
Offline – Index Nk stored.
Distance N
Online Step 1 – – In the online stage of the kTLNN classifier, the increased stor-
Step 2 – – age cost is small.
Step 3 Centroid k In the first step of finding first-layer nearest neighbors of the
Distance k
query, the space complexity is exactly the same as that of the kNN
Step 4 – –
classifier to find k-nearest neighbors, that is, N distances need to
be stored.
In the second step of finding second-layer nearest neighbors
neighbors, then (k2 + k) times of comparisons are needed, but in of the query, the distances from the k nearest neighbors of each
practice the number is much smaller than (k2 + k). first-layer nearest neighbor to the query need to be compared
Based on the above discussions, although the online compu- with 2RNN 1st (x) , which have already been stored in the first step.
tational complexity of the proposed kTLNN classifier is increased Therefore, there is no increased storage cost in the second step.
compared with the kNN classifier, the increased computations are In the third step of finding extended nearest neighbors of the
not significant since the value of k is usually very small.
query, the centroids of the effective neighborhoods of each first-
layer nearest neighbor and the distance from these centroids to
5.2. Space complexity
the query need to be stored. That is to say, the increased storage
cost is k centroids and k distances.
In addition to computational complexity, space complexity is
In the fourth step of finding two-layer nearest neighbors of the
also a very important factor that needs to be considered when
designing classifiers. In this section, we will analyze the space query, the distances from each extended nearest neighbor and the
complexity of the proposed kTLNN classifier in detail. Table 6 query needs to be compared with the distance from this extended
shows the increased storage cost of the proposed kTLNN classifier nearest neighbor to its kb th nearest neighbor, which have been
compared with the kNN classifier. stored in the first step and offline stage. Therefore, there is no
The increased storage cost mainly occurs in the offline stage increased storage cost in the fourth step.
of the kTLNN classifier. After finding the kb -nearest neighbors Based on the above discussions, it can be seen that although
of each training instance, we need to store the distances from the space complexity of the proposed kTLNN classifier is in-
each training instance to its kb th nearest neighbor and the k- creased compared with the kNN classifier, the increased storage
nearest neighbors of each training instance. It should be noted cost is not large.
12
Fig. 7. Comparison of kNN* and kTLNN classifier on four real-world datasets in terms of error rate (%) when k varies from 1 to 20.
6. Conclusions Declaration of competing interest
In this paper, we propose a new neighbor selection method, The authors declare that they have no known competing finan-
called two-layer nearest neighbor (TLNN) rule. The core idea of cial interests or personal relationships that could have appeared
the TLNN rule is the use of two-layer neighborhood information, to influence the work reported in this paper.
that is, the neighborhood of the query and the neighborhoods of
all training instances in this neighborhood. On this basis, in order Acknowledgments
to ensure the reliability of the obtained two-layer nearest neigh-
bors, we furthermore filter each of the training instance in these This work is supported in part by the National Natural Sci-
neighborhoods based on the distance, distribution relationship, ence Foundation of China (Grant No. U1903213), the Key Sci-
and backward nearest neighbor relationship between them and ence and Technology Program of Shaanxi Province, China (Grant
the query. No. 2020GY-005), the Zhejiang Provincial Commonweal Project,
Based on the TLNN rule, we propose a k-two-layer nearest China (Grant No. LGF21F030002) and the Open Project of the
neighbor (kTLNN) classifier to select k-two-layer nearest neigh- National Laboratory of Pattern Recognition, China (Grant No.
bors of the query, where k-nearest neighbors are used to repre- 202100033).
sent the neighborhood of the query and the neighborhood of all We also thank Dr. Wei Wang with the National Laboratory
training instances in this neighborhood, and kb -nearest neighbors of Pattern Recognition for his insightful discussions and valuable
are used to measure the backward nearest neighbor relationship suggestions.
between the training instance and the query. Then, the majority
voting rule is used to make final classification decision for the References
query.
The performance of the proposed kTLNN classifier is verified [1] T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inform.
through extensive experiments on twenty real-world datasets. Theory 13 (1) (1967) 21–27.
[2] X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, Top 10
Experimental results show that the kTLNN classifier outperforms
algorithms in data mining, Knowl. Inform. Syst. 14 (1) (2008) 1–37.
not only the kNN classifier, but also seven other state-of-the- [3] S. Jiang, G. Pang, M. Wu, L. Kuang, An improved k-nearest neigh-
art kNN-based classifiers. In addition, we find that the first-layer bor algorithm for text categorization, Expert Syst. Appl. 39 (1) (2012)
neighborhood and second-layer neighborhood are in fact com- 1503–1509.
plementary in the classification capability, and the two-layer [4] W. Xia, Y. Mita, T. Shibata, A nearest neighbor classifier employing critical
boundary vectors for efficient on-chip template reduction, IEEE Trans.
nearest neighbors composed of some of the neighbors in these Neural Netw. Learn. Syst. 27 (5) (2016) 1094–1107.
two neighborhoods can improve the classification performance [5] B. Tang, H. He, A local density-based approach for outlier detection,
of the nearest neighbors used in the NN rule. Neurocomputing 241 (2017) 171–180.
13
[6] C.H. Cheng, C.P. Chan, Y.J. Sheu, A novel purity-based k nearest neighbors [20] Y. Zeng, Y. Yang, L. Zhao, Pseudo nearest neighbor rule for pattern
imputation method and its application in financial distress prediction, Eng. classification, Expert Syst. Appl. 36 (2) (2008) 3587–3595.
Appl. Artif. Intell. 81 (2019) 283–299. [21] J. Gou, Y. Zhan, Y. Rao, X. Shen, X. Wang, W. He, Improved pseudo nearest
[7] Wang Yidi, Pan Zhibin, Pan Yiwei, A training data set cleaning method neighbor classification, Knowl. Based Syst. 70 (2014) 361–375.
by classification ability ranking for the k-nearest neighbor classifier, IEEE [22] Z. Pan, Y. Wang, W. Ku, A new k-harmonic nearest neighbor classifier
Trans. Neural Network. Learning Syst. (2020). based on the multi-local means, Expert Syst. Appl. 67 (2017) 115–125.
[8] Pan Yiwei, Pan Zhibin, Wang Yikun, Wang Wei, A new fast search [23] Pan Zhibin, Pan Yiwei, Wang Yidi, Wang Wei, A new globally adaptive k-
algorithm for exact k-nearest neighbors based on optimal triangle- nearest neighbor classifier based on local mean optimization, Soft Comput.
inequality-based check strategy, Knowl. Based Syst. (2020). (2021).
[9] B. Li, Y. Chen, Y. Chen, The nearest neighbor algorithm of local probability [24] J.S. Sánchez, F. Pla, F.J. Ferri, On the use of neighbourhood-based
centers, IEEE Trans. Syst. Man, Cybern. B, Cybern. 38 (1) (2008) 141–154. non-parametric classifiers, Pattern Recognit. Lett. 18 (1179) (1997)
[10] T.J. Wagner, Convergence of the nearest neighbor classifier, IEEE Trans. 11-13–1186.
Inform. Theory 17 (5) (1971) 566–571. [25] J.S. Sánchez, F. Pla, F.J. Ferri, Improving the k-NCN classification rule
[11] N. García-Pedrajas, J.A.D. Castillo, G. Cerruela-García, A proposal for local through heuristic modifications, Pattern Recognit. Lett. 19 (13) (1998)
k values for k-nearest neighbor rule, IEEE Trans. Neural Netw. Learn. Syst. 1165–1170.
28 (2) (2017) 470–475. [26] J. Gou, Y. Zhang, L. Du, T. Xiong, A local mean-based k-nearest centroid
[12] S.S. Mullick, S. Datta, S. Das, Adaptive learning-based k-nearest neighbor neighbor classifier, Comput. J. 55 (9) (2012) 1058–1071.
classifiers with resilience to class imbalance, IEEE Trans. Neural Netw. [27] Z. Pan, Y. Wang, W. Ku, A new general nearest neighbor classification based
Learn. Syst. 29 (11) (2018) 5713–5725. on the mutual neighborhood information, Knowl. Based Syst. 121 (2017)
[13] S. Zhang, X. Li, Z. Ming, X. Zhu, R. Wang, Efficient kNN classification with 142–152.
different numbers of nearest neighbors, IEEE Trans. Neural Netw. Learn. [28] J.P. Bagrow, X. Liu, L. Mitchell, Information flow reveals prediction limits
Syst. 29 (5) (2018) 1774–1785. in online social activity, Nat. Hum. Behav. 3 (2) (2019).
[14] Z. Pan, Y. Wang, Y. Pan, A new locally adaptive k-nearest neighbor [29] H. Liu, S. Zhang, Noisy data elimination using mutual k-nearest neighbor
algorithm based on discrimination class, Knowl. Based Syst. 204 (2020). for classification mining, J. Syst. Softw. 85 (5) (2012) 1067–1074.
[15] S.A. Dudani, The distance-weighted k-nearest-neighbor rule, IEEE Trans. [30] C. Blake, Uci repository of machine learning databases, 1998,
Syst. Man Cybern. 6 (4) (1976) 325–327. (ftp://ftp.ics.uci.edu/pub/machine-learning-databases).
[16] J. Gou, T. Xiong, J. Kuang, A novel weighted voting for K-nearest neighbor [31] J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, Keel data-
rule, J. Comput. 6 (5) (2011) 833–840. mining software tool: Data set repository, integration of algorithms and
[17] J. Gou, L. Du, Y. Zhang, T. Xiong, A new distance-weighted k-nearest experimental analysis framework, Multiple-Valued Logic Soft Comput. 17
neighbor classifier, J. Inf. Comput. Sci. 9 (6) (2011) 1429–1436. (2011) 255–287.
[18] H.G. Ma, J.P. Gou, X.L. Wang, J. Ke, S.N. Zeng, Sparse coefficient-based [32] Y. Xu, Q. Zhu, Z. Fan, M. Qiu, Y. Chen, H. Liu, Coarse to fine K nearest
k-nearest neighbor classification, IEEE Access 5 (2017) 16618–16634. neighbor classifier, Pattern Recognit. Lett. 34 (9) (2013) 980–986.
[19] Y. Mitani, Y. Hamamoto, A local mean-based nonparametric classifier, [33] M. Sarkar, Fuzzy-rough nearest neighbor algorithms in classification, Fuzzy
Pattern Recognit. Lett. 27 (10) (2006) 1151–1159. Sets and Systems 158 (19) (2007) 2134–2152.
[34] Z. Yu, H. Chen, J. Liu, J. You, Hybrid k-nearest neighbor classifier, IEEE
Trans. Cybern. 46 (6) (2016) 1263–1275.
14

Knowledge-Based Systems: Yikun Wang, Zhibin Pan, Jing Dong

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Knowledge-Based Systems: Yikun Wang, Zhibin Pan, Jing Dong

Uploaded by

Copyright:

Available Formats

Knowledge-Based Systems 235 (2022) 107604

Contents lists available at ScienceDirect

A new two-layer nearest neighbor selection method for kNN classifier

1. Introduction due to the different spatial locations of different queries, any

selection process, we propose a new neighbor selection method, {

nearest neighbors selected based on the query’s neighborhood

(1) Find the traditional k-nearest neighbors of x to constitute

(4) Definition 3. Given a query x, the extended neighborhood of

nnext ,x ∈ NN 1st (x)

has a strong similarity to x, and should be (

distribution relationship and backward nearest neighbor relation- Table 1

Fig. 5. Complementary analysis of NN 1st 2nd

Fig. 6. Comparison of NN 1st 2nd

6. Conclusions Declaration of competing interest

You might also like