Professional Documents
Culture Documents
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
article info a b s t r a c t
Article history: The k-nearest neighbor (kNN) classifier is a classical classification algorithm that has been applied in
Received 26 March 2021 many fields. However, the performance of the kNN classifier is limited by a simple neighbor selection
Received in revised form 23 August 2021 method, called nearest neighbor (NN) rule, where only the neighborhood of the query is considered
Accepted 13 October 2021
when selecting the nearest neighbors of the query. In other words, the NN rule only uses one-layer
Available online 19 October 2021
neighborhood information of the query.
Keywords: In this paper, we propose a new neighbor selection method based on two-layer neighborhood
kNN classifier information, called two-layer nearest neighbor (TLNN) rule. The neighborhood of the query and the
Two-layer nearest neighbor rule neighborhoods of all selected training instances in this neighborhood are considered simultaneously,
First-layer neighborhood then the two-layer nearest neighbors of the query are determined according to the distance, distri-
Second-layer neighborhood
bution relationship, and backward nearest neighbor relationship between the query and all selected
Extended neighborhood
training instances in the above neighborhoods. In order to verify the effectiveness of the proposed
TLNN rule, a k-two-layer nearest neighbor (kTLNN) classifier is proposed to measure the classification
ability of the two-layer nearest neighbors.
Extensive experiments on twenty real-world datasets from UCI and KEEL repositories show that the
kTLNN classifier outperforms not only the kNN classifier but also seven other state-of-the-art NN-based
classifiers.
© 2021 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.knosys.2021.107604
0950-7051/© 2021 Elsevier B.V. All rights reserved.
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
mean vector in each class instead of using the majority voting extended nearest neighbors from the viewpoint of the ex-
rule to make the classification decision [19]. Since then, the idea tended nearest neighbors. For any extended nearest neigh-
of local mean vector has been extensively studied, and a large bor, if the query is in its neighborhood, it will be kept as a
number of local mean-based classifiers have been proposed one two-layer nearest neighbor in the two-layer neighborhood
after another [20–23]. that is eventually used for classification decision.
In addition to the above three classical problems, a new prob-
lem of the neighbor selection method of the kNN classifier, which Based on the proposed TLNN rule, we propose a k-two-layer
is called nearest neighbor (NN) rule in this paper, needs to be nearest neighbor (kTLNN) classifier, which firstly finds the k-two-
further studied. According to the NN rule, the k training instances layer nearest neighbors of the query and then uses them to make
closest to the query will be selected as its k-nearest neighbors in classification decision according to the majority voting rule.
the kNN classifier, which has the following defects. The rest of this paper is organized as follows. In Section 2, we
Firstly, the similarity metric in the NN rule is too simple. discuss the standard kNN classifier and explain our motivations to
Only the point-to-point distance is used to measure the similarity propose a k-two-layer nearest neighbor classifier. In Section 3, we
between the query and training instances, completely discarding present the two-layer nearest neighbor (TLNN) rule and further
the information about their distribution. Therefore, Sánchez et al. propose the k-two-layer nearest neighbor (kTLNN) classifier. In
first proposed the concept of nearest centroid neighbors (NCN) order to verify the performance of our proposed kTLNN classifier,
and designed a k-nearest centroid neighbor (kNCN) classifier we conduct extensive experiments to compare it with standard
using this new NCN rule [24]. Then, researchers gave some further kNN classifier as well as seven other competitive kNN-based
improvements based on the concept of NCN [25,26]. classifiers in Section 4. Finally, discussions and conclusions are
Secondly, the unilateral similarity used in the NN rule is not given in Section 5 and Section 6, respectively.
comprehensive enough. It only considers whether a training in-
2. Motivation
stance is its nearest neighbor from the viewpoint of the query, but
does not consider whether the query is also its nearest neighbor
In this section, we review the rationale of the kNN classifier
from the viewpoint of the training instance. Hence, Pan et al.
and explain our motivations to propose the kTLNN classifier.
came up with the concept of general nearest neighbors (GNN) and
The kNN classifier is used to assign the query into the class
developed a k-general nearest neighbor (kGNN) classifier [27]. A
that appears most frequently among its k-nearest neighbors in
training instance can be selected as a general nearest neighbor as
the training set according
{ ⏐ to the }N majority voting rule. Consider
long as it is a k-nearest neighbor of the query or the query is a
k-nearest neighbor of it. a training set T = yi ⏐yi ∈ RD i=1 with N training instances in
Thirdly, the neighborhood structure of the query in the NN a D-dimensional space, and C = {ci |ci ∈ {w1 , w2 , . . . , wM } }Ni=1 is
rule is too unitary. It only consists of the neighborhood of the the class label set corresponding to T , where M represents the
query. As reported in an article [28] published in Nature Human number of classes. For a given query x, firstly the kNN classifier
Behavior in 2019, approximately 95% of the potential predictive calculates Euclidean distances between the query x and each
accuracy attainable for an individual is available within the social training instance yi in T by (1).
ties of that individual only, without requiring the individual’s
√
data. In other words, by knowing who an individual’s social ties d (x, yi ) = (x − yi )T (x − yi ), 1≤i≤N (1)
are and what the activities of those ties are, in principle, this Then, the k-nearest neighbors of x are selected according to the
individual can be accurately analyzed even he is not present in ascending
the data. Inspired by this article, we infer that in addition to the { i ⏐ i order }k of the Euclidean distances,
( denoted as( NN k (x)
) =
nnx ⏐nnx ∈ T i=1 , where k ≤ N and d x, nn1x ≤ d x, nn2x ≤
)
nearest neighbors of the query, the neighborhood information of
· · · ≤ d x, nnkx . And the class labels of the k-nearest neighbors
( )
these nearest neighbors also plays a very important role in the }k
are denoted as Ck (x) = cxi ⏐cxi ∈ {w1 , w2 , . . . , wM } i=1 .
{ ⏐
category prediction of the query. Hence, further using the neigh-
borhood of these nearest neighbors to enrich the neighborhood Finally, the kNN classifier determines the class label cx of x
structure of the query may improve the performance of the kNN according to the majority voting rule, as given in (2).
classifier. ∑
cx = arg max δ (wj = cxi ), 1 ≤ i ≤ k, 1≤j≤M (2)
In order to solve the above-mentioned problems in the clas- wj
sical NN rule used by the kNN classifier in the nearest neighbor nnix ∈NN k (x)
In this section, we firstly introduce the two-layer nearest where NN k (x) is the k-nearest neighborhood of x.
neighbor (TLNN) rule and then propose the k-two-layer nearest nn1st,x is a first-layer nearest neighbor of x.
neighbor (kTLNN) classifier. From Definition 1, it can be seen that the first-layer neigh-
borhood NN 1st (x) of x is actually the k-nearest neighborhood of x
3.1. The TLNN rule used for classification decision in the NN rule. The proposed TLNN
rule attempts to generalize this neighborhood and further restrict
The nearest neighbors are the training instances closest to the it to find a more suitable neighborhood for the classification
query in the NN rule. However, as discussed in Section 2, the decision.
3
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
Fig. 2. A brief illustration of the basic idea of TLNN rule. y1 , y2 and y3 represent the three nearest neighbors of query x; y1,1 , y1,2 , y1,3 , y2,1 , y2,2 , y2,3 , y3,1 , y3,2 and
y3,3 represent the three nearest neighbors of y1 , y2 and y3 , respectively.
Firstly, consider the k-nearest neighborhoods of each first- first-layer nearest neighbors according to the NN rule. The size
layer nearest neighbor of x to determine the second-layer neigh- of the second-layer neighborhood of x cannot exceed twice the
borhood of x. Those training instances in the k-nearest neigh- size of its first-layer neighborhood, which ensures that second-
borhood of a first-layer nearest neighbor and closer to x consti- layer nearest neighbors will not have poor classification capability
tute the effective neighborhood of this first-layer nearest neigh- because they are not very far from x.
bor, and the effective neighborhoods of all first-layer nearest Secondly, consider the distribution of second-layer neighbor-
neighbors together constitute the second-layer neighborhood of hood of x, specifically the distribution of effective neighborhoods
x, which is given in Definition 2. of each first-layer nearest neighbor, to determine the extended
neighborhood of x. Those effective neighborhoods with a distri-
Definition 2. Given a query x, the second-layer neighborhood of bution close to x will constitute the extended neighborhood of
x in the training set T represented by NN 2nd (x) is given in (4): x together with the first-layer neighborhood, which is given in
Definition 3.
NN 2nd (x) = nn2nd,x ⏐nn2nd,x ∈ NN eff nn1st,x , nn1st,x ∈ NN 1st (x)
{ ⏐ ( ) }
Box I.
Fig. 3. An example of using TLNN rule to obtain two-layer nearest neighbors of query x (k, kb = 4).
6
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
two-layer nearest neighbors from the second-layer neighborhood of kb and k is similar. The minimum error rate of the kTLNN
NN 2nd
t w o (x), and discuss the superiority of the selected two-layer classifier under the six settings of kb is not much different on most
nearest neighbors NN t wo (x). Finally, we compare the proposed datasets, and even the same on the datasets Segment, Tae, Wine
kTLNN classifier with the kNN* classifier. and Winequality_red. Note that on these four datasets, the equal
minimum error rates under six settings of kb are all obtained
4.1. Initialization of parameter kb when k = 1, since the two-layer nearest neighbor selected
according to the TLNN rule under six settings of kb is generally
As mentioned in Section 3, k and kb are the neighborhood sizes the traditional nearest neighbor selected according to the NN rule
used to measure the forward and backward nearest neighbor when k = 1.
relationships between the query and training instances, respec- However, the setting of kb = 1.4k still wins with a slight
tively. The meaning of k is the same as that in the kNN classifier, advantage according to the respective results and average results
but kb is a new parameter, which needs to be initialized before on the twenty datasets. The kTLNN classifier has the lowest
verifying the performance of our proposed kTLNN classifier. minimum error rate on 8, 9, 11, 7, 6 and 5 datasets under the
In this subsection, we evaluate the classification performance six settings of kb , which means that the setting of kb = 1.4k can
of the kTLNN classifier with six different relationships of kb and achieve the best performance according to the respective results
k, i.e., kb = 1.0k, kb = 1.2k, kb = 1.4k, kb = 1.6k, kb = 1.8k on the twenty datasets. In addition, the average results on the
and kb = 2.0k. Note that the cases where kb is smaller than k twenty datasets also indicate that the kTLNN algorithm can obtain
are not considered, since strict restrictions on the backward near- the best classification performance under the setting of kb = 1.4k.
est neighbor relationship may result in few selected two-layer As a result, we choose the setting of kb = 1.4k in the following
nearest neighbors, which is unfavorable to classification. experiments.
The final initialization result of kb is determined by comparing
the optimal performance of kTLNN classifier under these six 4.2. Comparisons with the standard kNN classifier
different relationships of kb and k on twenty real-world datasets.
Note that the optimal performance is the minimum error rate of To preliminarily demonstrate the classification performance
kTLNN algorithm when k = 1, 2, . . . , 20. of our proposed kTLNN classifier, we first compare it with the
Table 2 shows the minimum error rate of kTLNN classifier standard kNN classifier on twenty real-world datasets in terms
when k = 1, 2, . . . , 20 under six different relationships of kb and of error rate.
k on twenty real-world datasets, and the value of k corresponding Table 3 shows the error rate of the standard kNN classifier
to the minimum error rate is given in parentheses. and the proposed kTLNN classifier when k = 1, 3, 5, 7, 9, 11,
It can be seen from Table 2 that the best classification per- 13, and 15 on twenty real-world datasets, where kb = 1.4k is set
formance of the kTLNN classifier under six different relationships according to the experimental results in Section 4.1. Bold value
7
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
Table 2
The minimum error rate (%) of kTLNN classifier when k = 1, 2, . . . , 20 under six different relationships of kb and k
on twenty real-world datasets.
Datasets kb = 1.0k kb = 1.2k kb = 1.4k kb = 1.6k kb = 1.8k kb = 2.0k
Breast 24.23(14) 24.23(11) 24.94(10) 24.59(9) 23.47(8) 24.19(7)
Dermatology 8.35(4) 8.37(2) 8.37(2) 8.92(1) 8.92(1) 8.92(1)
Dna 11.95(20) 12.10(20) 11.60(20) 11.15(20) 11.35(20) 11.35(19)
German 28.60(20) 28.00(19) 28.40(16) 28.30(15) 28.50(15) 28.40(20)
Glass 27.05(7) 26.25(2) 26.25(2) 27.05(12) 27.16(2) 27.16(2)
Heart 19.27(14) 18.72(20) 17.66(20) 18.63(17) 18.31(14) 18.31(13)
Ionosphere 12.22(18) 11.02(18) 10.46(19) 10.19(17) 10.74(15) 10.46(15)
Landsat 10.20(3) 10.25(11) 10.25(10) 10.35(9) 10.35(7) 10.40(9)
Msplice 6.71(20) 6.80(20) 6.68(20) 7.25(20) 7.59(20) 7.94(19)
Optdigits 0.93(17) 0.96(6) 0.96(12) 1.00(5) 0.96(6) 0.98(8)
Phoneme 9.31(1) 9.29(2) 9.29(2) 9.33(1) 9.33(1) 9.33(1)
Ring 12.95(20) 12.47(20) 12.11(20) 11.72(20) 11.54(20) 11.24(20)
Segment 3.55(1) 3.55(1) 3.55(1) 3.55(1) 3.55(1) 3.55(1)
Sonar 16.34(4) 14.91(3) 14.44(3) 14.44(3) 15.86(3) 15.86(3)
Tae 42.23(1) 42.23(1) 42.23(1) 42.23(1) 42.23(1) 42.23(1)
Texture 0.76(6) 0.80(2) 0.80(2) 0.84(5) 0.84(2) 0.84(2)
Thyroid 5.79(9) 5.76(8) 5.76(6) 5.79(6) 5.78(5) 5.81(5)
Vehicle 28.13(17) 28.14(7) 28.15(8) 28.15(7) 28.05(6) 28.28(7)
Wine 25.28(1) 25.28(1) 25.28(1) 25.28(1) 25.28(1) 25.28(1)
Winequality_red 39.65(1) 39.65(1) 39.65(1) 39.65(1) 39.65(1) 39.65(1)
Average 16.67 16.44 16.34 16.42 16.47 16.51
indicates that the error rate is lower than that of the standard In order to make a fair comparison, we use ten-fold cross-
kNN classifier. validation method to obtain the optimized value of k. Note that
From Table 3, we can see that when k = 1, the error rates the FRNN and HBKNN classifiers do not need to optimize the
of the proposed kTLNN classifier and kNN classifier are basically value of k because their classification results are not dependent
the same on all twenty datasets. That is, the classification perfor- on the choice of k. Moreover, kb = 1.4k is set as before.
mance of the two classifiers in the case of k = 1 is basically the The classification error rates with the optimized value of k
same. When k > 1, the error rate of the proposed kTLNN classifier of each classifier on twenty real-world datasets are shown in
is lower than that of the kNN classifier in most cases. On very Table 4, and the rankings of the error rate for each classifier are
few datasets, the error rate of the kTLNN classifier is higher than also given in parentheses. Note that the results of the standard
that of the kNN classifier when k is equal to some certain values, kNN classifier are also presented in Table 4 as a baseline,
but a lower error rate is obtained again when k takes another As it can be seen from Table 4, the proposed kTLNN classifier
value. In other words, the kTLNN classifier achieves significantly ranks first on seven datasets and second on five datasets, and
better performance on most datasets, and similar performance to achieves the highest average ranking and the lowest average error
the kNN classifier on very few datasets. In addition, the average rate on the twenty datasets, which indicates that the proposed
error rate of the kTLNN classifier on the twenty datasets is lower kTLNN classifier has the best classification performance compared
than that of kNN classifier, indicating that its overall classification with standard kNN classifier and all the other seven competitive
performance is better than the kNN classifier. kNN-based classifiers.
Moreover, it can be noticed that the error rate of the proposed
kTLNN classifier is higher than that of the kNN classifier mostly 4.4. Analysis of NN t wo (x)
when the value of k is small. This is because compared with the
benefits of adding some farther nearest neighbors, the loss of After verifying the superiority of our proposed kTLNN clas-
deleting some closer nearest neighbors may be greater when the sifier over standard kNN classifier and other competitive kNN-
value of k is small. based classifiers, we further conduct a series of analyses on the
properties of the kTLNN classifier in the next a few subsections.
4.3. Comparisons with seven competitive kNN-based classifiers As described in Algorithm 1, the acquisition of two-layer near-
est neighbors needs four steps: (1) Find first-layer nearest neigh-
To further verify the classification performance of the pro- bors; (2) Find second-layer nearest neighbors; (3) Find extended
posed kTLNN classifier, we compare it with kNN [1], WKNN [15], nearest neighbors; (4) Find two-layer nearest neighbors. There-
PNN [20], MKNN [29], kGNN [27], CFKNN [32], FRNN [33] and fore, to verify the effectiveness of the introduced two-layer near-
HBKNN [34] classifiers on twenty real-world datasets in terms of est neighbors in the classification process, we compare the clas-
error rate. sification error rate of the neighbors obtained at each step. Fig. 4
WKNN is a famous distance-weighted k-nearest neighbor clas- shows the error rate of first-layer nearest neighbors NN 1st (x),
sifier, where larger weights are given to the nearest neighbors first-layer nearest neighbors and second-layer nearest neighbors
closer to the query. PNN utilizes the weighted average distance NN 1st (x) & NN 2nd (x), extended nearest neighbors NN ext (x), and
from the k-nearest neighbors to the query in each class as the two-layer nearest neighbors NN t wo (x) when k varies from 1 to
distance from the pseudo nearest neighbor to the query for clas- 20 on four real-world datasets. In addition, the optimal k for each
sification decision. MKNN and kGNN take the neighborhood in- dataset is given in parentheses for clarity.
formation of the training instances into account. CFKNN aims The classification error rate of first-layer nearest neighbors
to obtain neighbors containing less redundant information ac- is actually the error rate of kNN classifier. After the introduc-
cording to the representation-based distance. FRNN is a famous tion of second-layer nearest neighbors, the classification error
fuzzy-based k-nearest neighbor classifier, which can obtain richer rate sometimes increases. Since there are many second-layer
class confidence values with fuzzy-rough ownership function. nearest neighbors (at most k2 ), it is easy to remove some second-
HBKNN makes use of the fuzzy strategy and the local and global layer nearest neighbors that are far from the query and have
information of the query. poor classification capability, which has a negative impact on
8
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
Table 3
Comparisons with the standard kNN classifier on twenty real-world datasets in terms of error rate (%) when k =
1, 3, 5, 7, 9, 11, 13, 15.
Dataset k=1 k=3 k=5 k=7
kNN kTLNN kNN kTLNN kNN kTLNN kNN kTLNN
Breast 31.77 31.77 31.73 32.17 27.04 27.76 27.76 25.97
Dermatology 8.92 8.92 11.70 9.48 11.96 10.02 13.07 11.96
Dna 27.40 27.40 22.70 23.60 19.75 20.85 18.05 18.05
German 33.60 33.70 31.60 32.60 31.90 32.70 30.20 30.70
Glass 28.07 28.07 32.95 29.60 30.68 30.40 32.84 28.58
Heart 22.41 22.73 19.69 22.18 21.95 22.82 22.04 20.01
Ionosphere 12.96 12.96 15.09 12.69 15.56 12.78 16.39 12.96
Landsat 11.25 11.25 10.75 11.20 11.20 10.85 12.00 10.85
Msplice 23.94 23.91 21.14 17.61 18.46 13.14 18.30 11.81
Optdigits 1.19 1.19 1.07 1.07 1.23 1.01 1.33 1.05
Phoneme 9.31 9.33 10.92 10.12 11.43 10.64 12.19 11.21
Ring 25.18 25.18 28.31 22.84 30.55 21.05 32.65 18.96
Segment 3.55 3.55 4.46 3.68 5.11 3.90 6.15 5.06
Sonar 17.29 17.29 18.82 14.44 17.72 17.29 22.58 21.15
Tae 42.23 42.23 56.61 56.16 59.46 53.04 59.46 59.29
Texture 0.87 0.87 1.02 0.87 1.27 0.84 1.56 0.98
Thyroid 6.97 7.00 5.90 6.24 6.00 5.83 6.08 5.76
Vehicle 31.10 31.10 29.33 28.98 29.46 28.60 29.23 29.10
Wine 25.28 25.28 27.64 28.19 31.04 31.46 31.46 29.72
Winequality_red 39.65 39.65 44.22 44.53 48.22 46.28 49.29 46.97
Average 20.15 20.17 21.28 20.41 21.50 20.06 22.13 20.01
Dataset k=9 k = 11 k = 13 k = 15
kNN kTLNN kNN kTLNN kNN kTLNN kNN kTLNN
Breast 26.64 26.01 25.97 27.09 27.04 27.04 24.94 25.61
Dermatology 14.46 13.64 16.68 15.60 17.79 17.53 18.09 19.79
Dna 16.65 16.55 16.05 14.25 15.45 13.60 15.40 13.20
German 30.80 30.70 30.80 30.00 29.10 29.20 29.20 28.70
Glass 38.75 32.84 37.22 28.13 36.76 27.95 36.02 27.95
Heart 20.66 18.82 20.43 18.82 20.85 20.11 20.34 19.46
Ionosphere 16.39 12.96 16.39 12.69 16.11 12.22 16.30 11.85
Landsat 12.05 10.35 12.50 10.60 13.00 11.05 13.10 10.55
Msplice 16.47 10.77 15.81 9.04 14.84 8.38 14.65 8.06
Optdigits 1.39 1.03 1.44 1.01 1.44 1.00 1.62 0.98
Phoneme 12.82 11.93 13.25 12.84 13.45 12.86 13.71 12.99
Ring 33.96 17.51 35.03 16.01 36.08 14.76 36.86 13.97
Segment 6.88 5.24 7.32 6.19 8.05 7.10 8.35 7.32
Sonar 28.87 24.44 33.21 26.34 34.64 26.87 33.68 29.30
Tae 61.34 63.21 60.09 60.71 65.09 59.29 65.09 61.16
Texture 1.73 1.15 1.87 1.22 2.15 1.27 2.22 1.36
Thyroid 6.25 5.92 6.28 5.90 6.44 5.97 6.47 5.93
Vehicle 29.91 28.27 31.68 28.96 30.87 30.15 30.52 29.20
Wine 29.79 30.83 29.86 30.35 31.46 26.39 29.24 27.01
Winequality_red 50.54 46.22 49.03 45.72 49.22 45.85 48.72 46.91
Average 22.82 20.42 23.05 20.07 23.49 19.93 23.23 20.07
Table 4
Comparisons with standard kNN classifier and seven competitive kNN-based classifiers on twenty real-world datasets in terms of
error rate (%) with the optimized value of k.
Dataset kNN WKNN PNN MKNN kGNN CFKNN FRNN HBKNN kTLNN
Breast 24.19(2) 25.26(5) 25.26(5) 24.59(3) 23.83(1) 28.16(9) 26.69(7) 28.11(8) 24.94(4)
Dermatology 8.92(5) 8.92(5) 9.18(7) 7.52(3) 7.24(2) 9.74(8) 6.94(1) 13.91(9) 8.37(4)
Dna 14.65(3) 15.65(7) 15.65(7) 12.20(2) 15.00(5) 15.15(6) 65.60(9) 14.65(3) 11.60(1)
German 28.50(3) 29.20(8) 29.10(7) 28.90(4) 28.40(1) 28.90(4) 30.10(9) 28.90(4) 28.40(1)
Glass 28.07(6) 27.61(5) 26.25(1) 27.33(4) 26.42(3) 46.48(9) 41.31(8) 30.40(7) 26.25(1)
Heart 18.95(6) 19.46(7) 18.40(3) 19.69(9) 18.86(5) 18.72(4) 17.57(1) 19.46(7) 17.66(2)
Ionosphere 12.96(5) 12.96(5) 12.96(5) 11.57(3) 11.67(4) 5.93(1) 60.56(9) 16.57(8) 10.46(2)
Landsat 10.75(5) 10.10(2) 10.20(3) 10.00(1) 10.90(6) 40.50(9) 20.95(8) 12.10(7) 10.25(4)
Msplice 13.01(4) 15.81(8) 15.18(7) 7.05(2) 10.77(3) 14.05(5) 32.22(9) 14.46(6) 6.68(1)
Optdigits 1.07(6) 0.98(3) 0.98(3) 0.91(1) 1.00(5) 1.39(8) 90.07(9) 1.48(7) 0.96(2)
Phoneme 9.31(4) 9.31(4) 9.23(2) 9.21(1) 9.31(4) 26.87(9) 21.67(8) 9.55(7) 9.29(3)
Ring 25.18(4) 25.18(4) 25.18(4) 13.05(2) 24.31(3) 36.36(7) 50.62(9) 36.72(8) 12.11(1)
Segment 3.55(5) 3.51(4) 3.42(2) 3.55(5) 3.42(2) 21.56(9) 3.33(1) 4.63(8) 3.55(5)
Sonar 17.29(6) 16.29(4) 15.34(2) 16.77(5) 18.25(8) 17.67(7) 15.39(3) 21.63(9) 14.44(1)
Tae 42.23(2) 42.23(2) 44.11(7) 42.23(2) 43.48(6) 56.79(9) 49.29(8) 41.16(1) 42.23(2)
Texture 0.87(5) 0.84(4) 0.80(1) 0.82(3) 0.87(5) 5.22(8) 13.09(9) 1.58(7) 0.80(1)
Thyroid 5.90(6) 5.82(5) 5.75(3) 5.72(2) 5.71(1) 10.19(9) 7.42(8) 6.08(7) 5.76(4)
Vehicle 28.61(7) 28.26(5) 28.26(5) 27.55(3) 26.96(2) 23.06(1) 32.18(9) 30.04(8) 28.15(4)
Wine 25.28(2) 25.28(2) 25.28(2) 25.28(2) 25.28(2) 29.79(9) 14.10(1) 26.39(8) 25.28(2)
Winequality_red 39.65(4) 39.03(3) 38.02(2) 39.65(4) 40.03(7) 55.85(9) 41.65(8) 36.84(1) 39.65(4)
Average ranking 4.50 4.60 3.55 3.05 3.75 7.00 6.70 6.50 2.45
Average error rate 17.95 18.08 17.93 16.68 17.58 24.62 32.04 19.73 16.34
9
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
the classification. However, after filtering the first-layer nearest 4.6. Comparisons with kNN* classifier
neighbors and second-layer nearest neighbors, the classification
error rate of the obtained extended nearest neighbors decreases It can be seen from Section 4.5 that the number of two-
significantly. And the final classification error rate of the two- layer nearest neighbors has no clear relationship with the value
layer nearest neighbors not only greatly decreases, but also is of k. Therefore, in order to further verify the superiority of the
lower than the classification error rate of the original first-layer kTLNN classifier, we compare it with the kNN* classifier. The
nearest neighbors. That is to say, although the simple addition kNN* classifier refers to selecting the top k* nearest neighbors
of second-layer nearest neighbors may result in a decrease in using the NN rule to classify the query, where k*=|NN t wo (x)|.
classification performance, the classification performance of the Fig. 7 shows the error rate of kNN* classifier and kTLNN classifier
obtained neighbors can be continuously improved step by step on four real-world datasets when k varies from 1 to 20, where
through the filtering process, which is ultimately better than the k refers to the first-layer neighborhood size used to measure the
classification performance of the nearest neighbors in the kNN
forward nearest neighbor relationship.
classifier.
From the results shown in Fig. 7, we can see that the error
4.5. Analysis of NN 1st 2nd rate of the proposed kTLNN classifier is smaller than that of
t w o (x) and NN t w o (x)
the kNN* classifier in most cases. This is because, although the
It can be known from the TLNN rule that the superiority of the kTLNN classifier may delete some nearest neighbors closer to the
kTLNN classifier mainly comes from the use of two-layer neigh- query and add some other nearest neighbors farther from the
borhood information, that is, the two-layer nearest neighbors query compared with the kNN* classifier, these two-layer nearest
from the first-layer neighborhood, NN 1st t w o (x), and the two-layer
neighbors have a more positive effect on classification. Therefore,
nearest neighbors from the second-layer neighborhood, NN 2nd the classification performance of the kTLNN classifier is better
t w o (x).
Therefore, in order to further illustrate the necessity of introduc- than that of the kNN* classifier.
ing the two-layer neighborhood information for the selection of
two-layer nearest neighbors, we analyze NN 1st 2nd
t w o (x) and NN t w o (x)
5. Discussions
in detail.
First, we analyze the complementarity of NN 1st t w o (x) and 5.1. Computational complexity
NN 2nd
t w o (x) in classification capability. Fig. 5 gives the classification
error rate of NN 1st t w o (x), NN t w o (x) and NN t w o (x), and the proportion
2nd
In practice, computational complexity is a key factor to be
of queries incorrectly classified by both NN 1st 2nd
t w o (x) and NN t w o (x) on
considered when designing classifiers. Given a query x and a
four real-world datasets. training set T with N training instances in M classes, it can
Experimental results show that the classification capability of be shown that the online computation of the proposed kTLNN
the two-layer nearest neighbors from the second-layer neigh- classifier is only a little more than that of the kNN classifier.
borhood is worse than that of the two-layer nearest neighbors In the offline stage of the kTLNN classifier, the kb -nearest
from the first-layer neighborhood in most cases. However, the neighbors of each training instance can be found. Note that kb
two-layer nearest neighbors composed of them often have better is generally greater than or equal to k, so the k nearest neighbors
classification performance. In addition, the proportion of queries of each training instance are also obtained at the same time.
that are incorrectly classified by both two-layer nearest neigh- In the online stage of the kTLNN classifier, it only has a few
bors from the first-layer neighborhood and two-layer nearest more times of centroid computations, distance computations and
neighbors from the second-layer neighborhood is less than the comparisons than the kNN classifier. Table 5 shows the increased
classification error rate obtained by one of them, which means online computations of the kTLNN classifier compared to the kNN
that the queries incorrectly classified by the two-layer nearest classifier.
neighbors from the first-layer neighborhood are very different In the first step of finding first-layer nearest neighbors of the
from the queries incorrectly classified by the two-layer nearest
query, the computational complexity is exactly the same as that
neighbors from the second-layer neighborhood. Therefore, the
of the kNN classifier to find k-nearest neighbors, that is, N times
combining use of them can reduce the classification error rate,
of distance computations and once sorting operation involving N
that is, the two-layer nearest neighbors from the second-layer
elements.
neighborhood and the two-layer nearest neighbors from the first-
layer neighborhood are in fact complementary in classification In the second step of finding second-layer nearest neighbors
capability. of the query, we need to compare the distances between the k
Then, we observe⏐ the number 1st 2nd nearest neighbors of each first-layer nearest neighbor and the
⏐ of ⏐NN t2nd
w o (x) ⏐and NN t wo (x). Fig. 6
query with 2RNN 1st (x) , which requires k2 times of comparisons at
shows |NN t wo (x)|, ⏐NN 1st t wo (x)⏐ and ⏐NN (x)⏐ on the same four
t wo
real-world datasets, where the operator |·|⏐ represents most.
⏐ the
⏐ size of ⏐a
set. It should be noted that |NN t wo (x)| = ⏐NN 1st ⏐ ⏐ 2nd ⏐ In the third step of finding extended nearest neighbors of
t w o (x) + NN t w o (x) ,
and k refers to the first-layer neighborhood size used to measure the query, we need to compare the distance between the cen-
the forward nearest neighbor relationship between the query and troid of the effective neighborhood of each first-layer nearest
training instances, which is different from the number of nearest neighbor and the query with the distance between this first-
neighbor used for classification decision in the kNN classifier. layer nearest neighbor and the query, which requires k times of
As can be seen from Fig. 6, |NN t wo (x)|may be greater than k centroid computations, k times of distance computations and k
or less than k. In other words, the number of two-layer nearest times of comparisons, where each centroid computation involves
neighbors selected by the kTLNN classifier is uncertain ⏐ and de-⏐ at most (k + 1) training instances (including the first-layer nearest
pends on the characteristics of the dataset. In addition, ⏐NN 1st t w o (x)
⏐ neighbor itself).
1st
is often less than k. Since NN t wo (x) comes from the first-layer In the fourth step of finding two-layer nearest neighbors of
neighborhood, and the first-layer neighborhood is the nearest the query, we need to compare the distance between each ex-
neighborhood used in the kNN classifier, the kTLNN⏐ classifier⏐ tended nearest neighbor and the query with the distance between
actually deletes some ⏐ 2nd ⏐
⏐ nearest
1st
⏐ neighbors. Moreover, NN t wo (x) this extended nearest neighbor and its kb th nearest neighbor. In
is always less than NN t wo (x) , which is reasonable because the
⏐ ⏐ the worst case, all effective neighborhoods of first-layer nearest
second-layer nearest neighbors are farther away from the query neighbors do not overlap, and all first-layer nearest neighbors
and have weaker classification capability. and second-layer nearest neighbors are kept as extended nearest
10
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
Fig. 4. Comparison of NN 1st (x), NN 1st (x) & NN 2nd (x), NN ext (x) and NN t wo (x) on four real-world datasets in terms of error rate (%) when k varies from 1 to 20.
Table 5
The increased online computations of the kTLNN classifier compared to the kNN classifier.
Step Increased online computations Increased online computational cost
Step 1 – –
Step 2 Comparisons k2 times at most
Step 3 Centroid computation k times
Distance computation k times
Comparisons k times
Step 4 Comparisons (k2 + k) times at most
11
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
Table 6 that only the indexes need to be stored when storing the k-
The increased storage cost of the kTLNN classifier compared to the kNN classifier.
nearest neighbors of each training instance. Therefore, in the
Stage Step Increased storage type Increased storage cost
offline stage, a total of Nk indexes and N distances need to be
Offline – Index Nk stored.
Distance N
Online Step 1 – – In the online stage of the kTLNN classifier, the increased stor-
Step 2 – – age cost is small.
Step 3 Centroid k In the first step of finding first-layer nearest neighbors of the
Distance k
query, the space complexity is exactly the same as that of the kNN
Step 4 – –
classifier to find k-nearest neighbors, that is, N distances need to
be stored.
In the second step of finding second-layer nearest neighbors
neighbors, then (k2 + k) times of comparisons are needed, but in of the query, the distances from the k nearest neighbors of each
practice the number is much smaller than (k2 + k). first-layer nearest neighbor to the query need to be compared
Based on the above discussions, although the online compu- with 2RNN 1st (x) , which have already been stored in the first step.
tational complexity of the proposed kTLNN classifier is increased Therefore, there is no increased storage cost in the second step.
compared with the kNN classifier, the increased computations are In the third step of finding extended nearest neighbors of the
not significant since the value of k is usually very small.
query, the centroids of the effective neighborhoods of each first-
layer nearest neighbor and the distance from these centroids to
5.2. Space complexity
the query need to be stored. That is to say, the increased storage
cost is k centroids and k distances.
In addition to computational complexity, space complexity is
In the fourth step of finding two-layer nearest neighbors of the
also a very important factor that needs to be considered when
designing classifiers. In this section, we will analyze the space query, the distances from each extended nearest neighbor and the
complexity of the proposed kTLNN classifier in detail. Table 6 query needs to be compared with the distance from this extended
shows the increased storage cost of the proposed kTLNN classifier nearest neighbor to its kb th nearest neighbor, which have been
compared with the kNN classifier. stored in the first step and offline stage. Therefore, there is no
The increased storage cost mainly occurs in the offline stage increased storage cost in the fourth step.
of the kTLNN classifier. After finding the kb -nearest neighbors Based on the above discussions, it can be seen that although
of each training instance, we need to store the distances from the space complexity of the proposed kTLNN classifier is in-
each training instance to its kb th nearest neighbor and the k- creased compared with the kNN classifier, the increased storage
nearest neighbors of each training instance. It should be noted cost is not large.
12
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
Fig. 7. Comparison of kNN* and kTLNN classifier on four real-world datasets in terms of error rate (%) when k varies from 1 to 20.
In this paper, we propose a new neighbor selection method, The authors declare that they have no known competing finan-
called two-layer nearest neighbor (TLNN) rule. The core idea of cial interests or personal relationships that could have appeared
the TLNN rule is the use of two-layer neighborhood information, to influence the work reported in this paper.
that is, the neighborhood of the query and the neighborhoods of
all training instances in this neighborhood. On this basis, in order Acknowledgments
to ensure the reliability of the obtained two-layer nearest neigh-
bors, we furthermore filter each of the training instance in these This work is supported in part by the National Natural Sci-
neighborhoods based on the distance, distribution relationship, ence Foundation of China (Grant No. U1903213), the Key Sci-
and backward nearest neighbor relationship between them and ence and Technology Program of Shaanxi Province, China (Grant
the query. No. 2020GY-005), the Zhejiang Provincial Commonweal Project,
Based on the TLNN rule, we propose a k-two-layer nearest China (Grant No. LGF21F030002) and the Open Project of the
neighbor (kTLNN) classifier to select k-two-layer nearest neigh- National Laboratory of Pattern Recognition, China (Grant No.
bors of the query, where k-nearest neighbors are used to repre- 202100033).
sent the neighborhood of the query and the neighborhood of all We also thank Dr. Wei Wang with the National Laboratory
training instances in this neighborhood, and kb -nearest neighbors of Pattern Recognition for his insightful discussions and valuable
are used to measure the backward nearest neighbor relationship suggestions.
between the training instance and the query. Then, the majority
voting rule is used to make final classification decision for the References
query.
The performance of the proposed kTLNN classifier is verified [1] T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inform.
through extensive experiments on twenty real-world datasets. Theory 13 (1) (1967) 21–27.
[2] X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, Top 10
Experimental results show that the kTLNN classifier outperforms
algorithms in data mining, Knowl. Inform. Syst. 14 (1) (2008) 1–37.
not only the kNN classifier, but also seven other state-of-the- [3] S. Jiang, G. Pang, M. Wu, L. Kuang, An improved k-nearest neigh-
art kNN-based classifiers. In addition, we find that the first-layer bor algorithm for text categorization, Expert Syst. Appl. 39 (1) (2012)
neighborhood and second-layer neighborhood are in fact com- 1503–1509.
plementary in the classification capability, and the two-layer [4] W. Xia, Y. Mita, T. Shibata, A nearest neighbor classifier employing critical
boundary vectors for efficient on-chip template reduction, IEEE Trans.
nearest neighbors composed of some of the neighbors in these Neural Netw. Learn. Syst. 27 (5) (2016) 1094–1107.
two neighborhoods can improve the classification performance [5] B. Tang, H. He, A local density-based approach for outlier detection,
of the nearest neighbors used in the NN rule. Neurocomputing 241 (2017) 171–180.
13
Y. Wang, Z. Pan and J. Dong Knowledge-Based Systems 235 (2022) 107604
[6] C.H. Cheng, C.P. Chan, Y.J. Sheu, A novel purity-based k nearest neighbors [20] Y. Zeng, Y. Yang, L. Zhao, Pseudo nearest neighbor rule for pattern
imputation method and its application in financial distress prediction, Eng. classification, Expert Syst. Appl. 36 (2) (2008) 3587–3595.
Appl. Artif. Intell. 81 (2019) 283–299. [21] J. Gou, Y. Zhan, Y. Rao, X. Shen, X. Wang, W. He, Improved pseudo nearest
[7] Wang Yidi, Pan Zhibin, Pan Yiwei, A training data set cleaning method neighbor classification, Knowl. Based Syst. 70 (2014) 361–375.
by classification ability ranking for the k-nearest neighbor classifier, IEEE [22] Z. Pan, Y. Wang, W. Ku, A new k-harmonic nearest neighbor classifier
Trans. Neural Network. Learning Syst. (2020). based on the multi-local means, Expert Syst. Appl. 67 (2017) 115–125.
[8] Pan Yiwei, Pan Zhibin, Wang Yikun, Wang Wei, A new fast search [23] Pan Zhibin, Pan Yiwei, Wang Yidi, Wang Wei, A new globally adaptive k-
algorithm for exact k-nearest neighbors based on optimal triangle- nearest neighbor classifier based on local mean optimization, Soft Comput.
inequality-based check strategy, Knowl. Based Syst. (2020). (2021).
[9] B. Li, Y. Chen, Y. Chen, The nearest neighbor algorithm of local probability [24] J.S. Sánchez, F. Pla, F.J. Ferri, On the use of neighbourhood-based
centers, IEEE Trans. Syst. Man, Cybern. B, Cybern. 38 (1) (2008) 141–154. non-parametric classifiers, Pattern Recognit. Lett. 18 (1179) (1997)
[10] T.J. Wagner, Convergence of the nearest neighbor classifier, IEEE Trans. 11-13–1186.
Inform. Theory 17 (5) (1971) 566–571. [25] J.S. Sánchez, F. Pla, F.J. Ferri, Improving the k-NCN classification rule
[11] N. García-Pedrajas, J.A.D. Castillo, G. Cerruela-García, A proposal for local through heuristic modifications, Pattern Recognit. Lett. 19 (13) (1998)
k values for k-nearest neighbor rule, IEEE Trans. Neural Netw. Learn. Syst. 1165–1170.
28 (2) (2017) 470–475. [26] J. Gou, Y. Zhang, L. Du, T. Xiong, A local mean-based k-nearest centroid
[12] S.S. Mullick, S. Datta, S. Das, Adaptive learning-based k-nearest neighbor neighbor classifier, Comput. J. 55 (9) (2012) 1058–1071.
classifiers with resilience to class imbalance, IEEE Trans. Neural Netw. [27] Z. Pan, Y. Wang, W. Ku, A new general nearest neighbor classification based
Learn. Syst. 29 (11) (2018) 5713–5725. on the mutual neighborhood information, Knowl. Based Syst. 121 (2017)
[13] S. Zhang, X. Li, Z. Ming, X. Zhu, R. Wang, Efficient kNN classification with 142–152.
different numbers of nearest neighbors, IEEE Trans. Neural Netw. Learn. [28] J.P. Bagrow, X. Liu, L. Mitchell, Information flow reveals prediction limits
Syst. 29 (5) (2018) 1774–1785. in online social activity, Nat. Hum. Behav. 3 (2) (2019).
[14] Z. Pan, Y. Wang, Y. Pan, A new locally adaptive k-nearest neighbor [29] H. Liu, S. Zhang, Noisy data elimination using mutual k-nearest neighbor
algorithm based on discrimination class, Knowl. Based Syst. 204 (2020). for classification mining, J. Syst. Softw. 85 (5) (2012) 1067–1074.
[15] S.A. Dudani, The distance-weighted k-nearest-neighbor rule, IEEE Trans. [30] C. Blake, Uci repository of machine learning databases, 1998,
Syst. Man Cybern. 6 (4) (1976) 325–327. (ftp://ftp.ics.uci.edu/pub/machine-learning-databases).
[16] J. Gou, T. Xiong, J. Kuang, A novel weighted voting for K-nearest neighbor [31] J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, Keel data-
rule, J. Comput. 6 (5) (2011) 833–840. mining software tool: Data set repository, integration of algorithms and
[17] J. Gou, L. Du, Y. Zhang, T. Xiong, A new distance-weighted k-nearest experimental analysis framework, Multiple-Valued Logic Soft Comput. 17
neighbor classifier, J. Inf. Comput. Sci. 9 (6) (2011) 1429–1436. (2011) 255–287.
[18] H.G. Ma, J.P. Gou, X.L. Wang, J. Ke, S.N. Zeng, Sparse coefficient-based [32] Y. Xu, Q. Zhu, Z. Fan, M. Qiu, Y. Chen, H. Liu, Coarse to fine K nearest
k-nearest neighbor classification, IEEE Access 5 (2017) 16618–16634. neighbor classifier, Pattern Recognit. Lett. 34 (9) (2013) 980–986.
[19] Y. Mitani, Y. Hamamoto, A local mean-based nonparametric classifier, [33] M. Sarkar, Fuzzy-rough nearest neighbor algorithms in classification, Fuzzy
Pattern Recognit. Lett. 27 (10) (2006) 1151–1159. Sets and Systems 158 (19) (2007) 2134–2152.
[34] Z. Yu, H. Chen, J. Liu, J. You, Hybrid k-nearest neighbor classifier, IEEE
Trans. Cybern. 46 (6) (2016) 1263–1275.
14