You are on page 1of 6

Improving Time-complexity of k Nearest Neighbors

Classifier: A Systematic Review


Syed Ali Qamber
National University of Computer and Emerging Sciences, Karachi, Pakistan
k180889@nu.edu.pk

Abstract—k Nearest Neighbors is one of the most simple data, you can get started doing classification straightaway
and effective classification algorithms in Machine Learning. without wasting any time in training your model.
In comparison to other classification algorithms, the superior As kN N does not need to train a model before starting
stability of kN N makes it an ideal choice for dynamic data or
data streams. However, being a lazy learner, there are also some to classify data, it can successfully classify test data points
limitations to the kN N that make its use with big data difficult that may be difficult to be modeled by classifiers like Support
and very time consuming. In this paper, we look at what makes Vector Machines or Decision Trees. This is due to the fact that
the kN N so effective, and what are the causes of its high time some test data points may be situated in those parts of the data
and computational complexity. We group the various attempts space where the decision boundaries are not easily modeled.
made by researchers at optimizing the kN N in terms of its’ time
complexity, into two broad categories: reduction of the training Hence, eager learners (classifiers that train a model first before
set size, and speeding up or approximating the neighbor search. classifying test data) may have a difficult time creating a model
We look at how all these attempts compare with each other and that can classify these peculiar data points correctly. kN N ,
try to predict which will perform best with big data based on on the other hand, can deal with these difficult data points
the information provided in the literature being reviewed. in isolation and hence, classify them without worrying about
Keywords: k Nearest Neighbor; kN N ; Time Complexity; decision boundaries or generalizing data.
Classification; Another advantage the kN N classifier holds over other
classification algorithms is its robustness and stability. This
I. I NTRODUCTION means that kN N is less prone to changing its output in
The k Nearest Neighbors classifier is a non-parametric Ma- response to slight changes in the training or test data. In
chine Learning algorithm that can be used for both classifica- contrast, classifiers like Decision Trees or even stochastic
tion and regression. kN N is one of the oldest algorithms being models like Naı̈ve Bayes can be prone to small changes in
used in ML. The origins of the approach have been traced input data. Add that to the fact that almost all eager learners
back to Alhazen’s works as early as the eleventh century, work in batches, i.e. they classify a batch of training data at
where the medieval Islamic scholar introduced the concept a time, and kN N becomes an optimal choice for incremental
of visual recognition based on similarity examination [1]. The learning or data streams, as it is a lazy learner and is not
recognized formulation of the classifier in 1967 by Cover and model-based.
Hart [2] was a watershed event in the ML timeline, and Moreover, the error rate of the kN N classifier has been
marked the beginning of basic pattern recognition. established, after extensive research, to be really low. In fact,
the kN N classifier, in comparison to its competitors, has
The traditional kN N algorithm can be broken down into
been proven to give test error rates closest to that of the
three simple steps for any training data set D (with n training
Bayesian classier [3], which is the gold standard in supervised
samples and d dimensions or number of features) and a test
classification problems. All the advantages listed above make
sample a:
kN N one of the best classifiers to be used with data of
1) The distance between a and each training sample in D any type. However, there are limitations to the algorithm that
is computed provided complications in its usage with real-life data.
2) The k nearest neighbors of a (the k points which have The organization of this paper is as follows. In Section II,
the smallest distance from a) are selected we first look at the limitations of the k Nearest Neighbors
3) a is classified into the class which has the simple classifier; specifically, its time complexity, and give a general
majority in the k selected samples i.e. the most common discussion on the attempts to improve it in past works. In
class of its’ k nearest neighbors Section III, we elaborate the methodology used to search for,
The success of this algorithm lies in its’ simplicity, effective- and then select the articles to be reviewed here. In Section
ness, and its ease of use. As can be seen from these steps, IV, we look at how various researchers have decreased the
kN N does not need any prior training before classification. size of the training sets to improve the time complexity of
All these steps are only started when a test sample is received kN N . In Section V, we analyze the attempts at speeding up
and asked to be classified. This approach is known as lazy neighbor searching and their effectiveness in lowering the time
learning in ML, and it means that once you have the training consumption of the subject algorithm. Finally, in Section VI,
we conclude our findings, and look at the future avenues of its computational and time complexity, numerous modifica-
research with reference to the topic. tions have been suggested to the kN N classifier which have
allowed it to be used with big data. These can be divided into
II. L IMITATIONS OF THE kN N C LASSIFIER two broad categories.
While lazy learning does have initial advantages, here, it The first approach is to decrease the size of the training
comes at a cost in two forms. Since the kN N classifier needs set in such a way that the smaller subset obtained is an
to store the entire training set in memory, it can be very apt representation of the larger original set. This reduction
memory intensive. Also, as can be seen in the step-by-step in the training data has to be done while ensuring that the
breakdown of the algorithm in Section I, the algorithm has accuracy of the classifier is not affected significantly, while
to loop through all the training examples to first calculate the the decrease in the size of training set is noteworthy enough
distance from the test sample and then select the k nearest to warrant the use of the technique itself. An example of this
neighbors. This long calculation and exhaustive searching kind of approach would be the identification of dense regions
method means that the algorithm can be very time-consuming. of shared identity to remove redundant training samples in the
Both these problems become significantly pronounced as the Condensed Nearest Neighbor rule [4].
number of samples or dimensionality, or both, of the data set The second approach is to speed up the search for the
increases. closest neighbors by using either faster searching algorithms
The effect of the increase of dimensionality is arguably or approximating the nearest neighbors instead of finding the
much greater. This is because in most cases, the number of exact ones. This approach was used by Chen, Guo, and Wang
samples is much higher than the number of features of the data. in their work Nearest Neighbor Classification by Partially
Hence, for a data set with n number of samples and d number Fuzzy Clustering [5]. Both approaches have found notable
of features, with every unit increase in the dimensionality, success in the research field, and allowed the implementation
there is an increase of n values in the data set. Likewise, with of the k-NN classifier on big data with plenty of success.
the addition of every training sample, there is an increase of
d values in the data set. Since n >> d, we can see that the III. L ITERATURE S EARCH M ETHODOLOGY
effect of an increase in dimensionality is much greater. This Since this is a Systematic Literature Review, it is necessary
phenomenon is also known as the ‘curse of dimensionality’ that the methodology for the search of the literature to be
in ML and can be visualized in Figure 1. We can clearly considered in the beginning, and the eventual selection of the
see that the classification time grows exponentially as the articles that were included in this review, both are outlined in
dimensionality of the data increases. This is a major reason detail to ensure the removal of any bias associated with the
why traditional kN N is generally accepted to be unsuitable reviewer. Following are the guidelines that were followed for
for large data sets. the search and selection of literature to be reviewed in this
survey.

A. Search Specifications
The search terms were used as they appear in the topic of
this article, with the exception that ‘reducing’ was also used
in addition to ‘improving’ with the help of Boolean operators
(AND and OR) available in the Advanced Search option of the
IEEE Xplore digital library. This was done to ensure that no
research work on the concerned topic may be overlooked. The
search was carried out only on IEEE Xplore for simplification
and time-saving purposes. The range of time period for the
search was chosen to be the maximum possible available in
the IEEE Xplore library.
The specifications are listed below:
• Search Term(s): (((improving) OR (reducing)) AND time
complexity of ((k Nearest Neighbours) or (kN N ))
• Database(s) Searched: IEEE Xplore Digital Library
• Search Date: 8th March, 2019
Fig. 1. Classification Time for 50000 queries using kN N against Number • Search Duration: 2004 - 2019
of Dimensions
• Search Filters: None
In practice, the high memory requirements are of minor
concern thanks to advances in hardware technology and feature B. Publication Selection Criteria
extraction and transformation methods. However, for reducing 1) Inclusion Criteria
•Studies that have proposed such novel modifications the classification using the condensed data set was 1.28, which
to the kN N classifier that improve or reduce its’ was higher than other classifiers [4]. This was attributed to the
time complexity. fact that CNN is prone to noise as it tends to always move
• Studies that have only applied the kN N classifier outliers to the selected data set S.
for a specific task in their research, but while A complement to the CNN rule was proposed in the form
proposing novel modifications to the algorithm that of the Reduced Nearest Neighbor (RNN) rule [7]. Instead
particularly reduce its time complexity. of choosing one sample in the reduced set S and working
2) Exclusion Criteria upwards (incremental search), RNN employs a decremental
• Studies that have simply applied the kN N classifier search. After copying all data points in D to S, RNN removes
(without any novel modification) for a particular all those points from S, that can be removed without a
task in their research, for example, Jun et al. used misclassification being introduced into S. Results have shown
the kN N (in addition to other ML algorithms) that using RNN results in S sets that are smaller in size than
for the purpose of internet traffic identification and those produced through CNN. Moreover, RNN is less sensitive
classification [6]. to noise, as it successfully removes outliers from S. Both CNN
• Studies that have proposed modifications to kN N and RNN have been known to give either a similar or smaller
classifier which improve it in ways other than re- error rate on test data than kN N [7], so it can be safely
ducing or bettering its time complexity. surmised that the improvement in time and space complexity
does not come at the cost of accuracy.
IV. D ECREASING THE T RAINING S ET S IZE Similar modifications to the kN N classifier are the Edited
The most time-consuming part of the kN N classification Nearest Neighbor (ENN) [8] algorithm, and its refined version
is the calculation of the distance of the test data point to each AllKNN [9], which are both decremental search algorithms.
point in the training data. For a data set D with n training After copying all data points from D to S, ENN removes
samples and d dimensions, the distance calculation is bound those points that are mislabeled when classified using the
to take θ(nd) time. Here, we will look at a various number kN N classifier. AllKNN uses all values of k from 1 up to a
of methods that attempt to reduce this time by reducing the preassigned number to check for misclassified examples before
number of training samples n and thus shortening the time removing them. Although both ENN and All k-NN deal well
for distance calculations. The following notation will be used with noise by removing outliers, they are not as effective as
throughout this section; the original large training data set is CNN [4], or RNN [7] in reducing data set size, as they mostly
referred to as D, while the new reduced data set to be obtained retain the points that are in dense regions of shared identity
is referred to as S. (i.e. have same class). However, ENN may also remove those
Some of methods that propose the reduction in the size of points that lie on the decision or class boundaries, and hence,
training sets have not been specifically devised to improve may result in oversimplification of the data set.
the time or computational complexity of the kN N classifier. The Variable Similarity Metric (VSM) [10] is another
Instead, they are general-purpose approaches designed to deal decremental search method, and was an important variation on
with big data that would otherwise take too much space the RNN rule. VSM proposed to remove all those data points
to store and too much time to train a model on. These from S, whose k nearest neighbours, all have the same class.
approaches are referred to, in some literature, as instance The removal criteria does not take the class of the subject point
selection techniques, and are widely used to preprocess huge into account, and hence, results in the removal of both outliers
data sets before being used for ML applications. and points that are placed in the middle of dense regions of
One of the oldest of these approaches was the Condensed shared identity. VSM lies between CNN/RNN [4], [7] and
Nearest Neighbor (CNN) rule [4] introduced by Hart, one of ENN [8] on the spectrum of effectiveness in reducing data set
the two founders of kN N . The CNN works by removing those size.
points from the data set that share a dense region of the data A series of similar algorithms, with minor changes, were
space with a lot of other points that belong to the same class introduced by Aha et al. [11], [12] in their work on Instance-
as the subject data point. The algorithm starts out by moving based (IB) learning algortihms. The IB1 and IB2 are analogous
one random data point of each class from D to S. Then all the to the CNN, except that IB2 does not need to start with a
remaining points in D are classified using kN N over S. Every random point from each class in S, and it only has one iteration
point in D that is misclassified is moved to S. This process through D. The IB3, IB4, and IB5 are consecutive improve-
is repeated until there are no misclassifications in one whole ments on IB2, but not in the domain of time complexity, so
iteration through D, i.e. all the remaining points in D can be they shall not be discussed here.
successfully classified using kN N over S. Then S is used Another series of reduction techniques are the Decremental
as the new reduced data set for test data classifications. The Reduction Optimization Procedure (DROP) algortihms [13],
CNN algorithm has been shown to be very effective where the that can be seen as an improvement to RNN. They start with
underlying densities of the different classes have little to no DROP1, which is different to RNN only on the basis that the
overlap. It condensed a data set provided by IBM from 6295 accuracy check for the removal criteria is done on S instead of
to 197 training samples. However, the error rate resulting from D. In DROP2, the removed data points are also included into
the removal criteria instead of just the ones remaining in S. TABLE I
DROP3 improves on its’ predecessor by removing noisy data T HE R EDUCTION AND ACCURACY % OF VARIOUS DATA S ET S IZE
R EDUCTION A LGORITHMS WHEN IMPLEMENTED ON IRIS ( FROM [13],
points from S using ENN. DROP4 and DROP5 are further [14]
modifications but do not add much to the cause of reducing
time complexity. Algorithm Reduction (%) Accuracy(%)
Hit Miss Networks (HMN) have also been used in reduction kNN 0 94
CNN 87.26 90
techniques [14] and employ decremental search methods. ENN 5.26 95.33
HMNs can be defined as graphs that have edges drawn from IB3 80.22 94.67
each point in the data set to all of its nearest neighbors. An DROP3 85.19 95.33
edge going to a neighbor that has the same class as the subject HMN-EI 95.4 75.2
MNNCBC N/A 97.06
point is defined as a hit edge, while a miss edge is an edge LDIS 87 93
going to a neighbor of a different class. Hit and miss degrees CDIS 87 94
are calculated for every point based on the lengths of the edges,
and these degrees are then used to make multiple rules that
become the criteria for the removal of data points from S. V. ACCELERATING OR A PPROXIMATING THE N EIGHBOR
The Modified Nearest Neighbor Classifier Based on Cluster- S EARCH
ing (MNNCBC) [15] employs the clustering technique to re- While the most time consuming part of kN N is the distance
move parts of a data set instead of just data points. MNNCBC calculation, the search for the nearest neighbors can be an
first divides the training set into n number of partitions. By equally tedious process for large data sets. For traditional
setting a tight bound c < n < 2c, the method ensures that the kN N , which employs an exhaustive search method to find the
clusters are representatives of the inherent classes within the k nearest neighbors, the time complexity of the subject step is
data. Each cluster is labeled with the class of the majority of θ(nk). Reducing data set size automatically reduces the time
its members. Then, these clusters are only included in S, the complexity for this step also. However, there are methods that
reduced data set, if they increase the classification accuracy of employ techniques other than data set size reduction to either
the training and test data sets. speed up the search for the nearest neighbors, or approximate
In a duet of works on data set reduction [16], [17], the their location instead of finding the exact points.
Local Density-Based Instance Selection (LDIS) and the Cen- One example of this would be the kNNModel based ap-
tral Density-Based Instance Selection (CDIS) algorithms were proach [19], which creates representative (rep) portions of
introduced. In these methods, the focus is shifted from trying the data set based on the point di which has the largest
to retain the class boundary to finding dense regions of shared neighborhood containing the same class members as itself.
identity (data points with same class). This is similar to CNN, Each rep has the following properties; Cls(di ); the class label
but LDIS does this using local search (only within each class) of di , Sim(di ); the point within rep with the largest distance
instead of global search (in the whole data set) [16]. Every data to di , Num(di ); the number of data points in rep, and Rep(di );
point is assigned a density value based on its distance from an identifier of rep itself. Once all data is broken down into
other members of its class. Then all data points that satisfy reps, the classification of a test point a is made on the basis
a preset standard of density are included in S, and the rest of the following inequality: dist(a, di ) < Sim(di ). If the
are rejected. CDIS is mostly similar to LDIS, except that it inequality is satisfied only for one rep, a is assigned to that
changes the calculation of the density value to be based on the reps’ Cls(di ). If it is satisfied for multiple reps, a is assigned
distance from a centroid in a partial k-member-neighborhood, to the Cls(di ) of the rep with the largest Num(di ). While
instead of distances from all class member points [17]. the kN N Model-Based approach retains almost all the data
A pre-classification based approach to reducing data set size points of the original training set, it speeds up the neighbor
devised specifically for improving kN N s’ time complexity search by looking for the nearest representative portions of the
incorporated the use of decision trees [18]. With all data points data instead of the nearest neighbors. A further modification
falling in at least one leaf node of the data set, the method was proposed to this approach with the introduction of e-
sets the following bounds for p, the classification probability kNNModel [5], where the rep is represented by the mean of all
(probability of a data-point belonging to the positive class); the points inside it, instead of di , the point which was the basis
p < α and p > (1 − α) where α is a predetermined constant of its formation. Using the centroid of the data points in each
within the range 0 < α < 0.5. Only the data points that rep make it a better representative of the actual distribution of
satisfy the aforementioned bounds for p are included in S. the data points.
This ensures that the noisy data points that are lying on the A series of algorithms proposed by various authors [20],
decision boundaries are eliminated. Like ENN [8], this may [21], [22] used the Haar Wavelet transform to decompose the
result in an oversimplification of the data set. feature vectors of the training data. To begin, k random data
The effectiveness of some of the aforementioned algorithms points are inducted into the set of the nearest neighbors K,
can be seen in the table below which consolidates their of a test sample a. The possibility of a training sample di
reduction and accuracy percentages on the Iris data set, where being one of the k nearest neighbors of a, is rejected if Di
available. (the Wavelet Coefficient Vector (WCV) of di ) has a lower
similarity rating with A (the WCV of a), than all of the tance matrix calculations required for identifying the nearest
data points already in set K. Hence, the need for calcula- neighbors. Further, dimensionality reduction techniques were
tion of the distance between a and di was eliminated. This also employed to increase the accuracy of the SC computations
was the principle introduced in the WKPDS algorithm [20] as they are adversely affected by high dimensions.
that rejected points based on partial distance searching. The Another improved and scalable version of kN N titled BAL-
WKENNS algorithm [21] further refined the idea by using LKNN [27] was proposed for improving the time complexity of
the approximate coefficients of the fully decomposed feature kN N using a subtraction-and-comparison approach to reduce
vectors as a means to establish the rejection criteria for the the distance computations required to identify the nearest
training data points. The WKEENNS algorithm [22] added neighbors. BALLKNN starts by inducting k training samples
variance calculations to the filtering criteria in addition to the into the set of nearest neighbors S0 , of a test sample a. It
approximate coefficients. This resulted in the rejection of even also stores radius; the Euclidean distance of a to sf , the
more training data points before the actual distance calculation, point in S0 farthest from a, and max; the square of radius.
thereby reducing time complexity even further. Then, for every remaining data point di in the training set,
While proposing a couple of new algorithms for recom- feature-wise distances between a and di are compared to the
mender systems for both new and old users, the authors radius. If all of the feature-wise distances are lesser than
employed Singular Value Decomposition (SVD) to increase the radius, only then is the distance of a to di calculated
time efficiency of kN N which was being used to classify old and compared to radius to see if it needs to replace sf in
users [23]. SVD is used to decompose the similarity matrix S0 . Otherwise, the point is discarded from the considerations
into factors with latent feature space before using kN N to and no more calculations are required, resulting in saving
identify the nearest neighbors. The method is analogous to significant time and computation. Experiments indicated that
other dimensionality reduction techniques and helps reduce the BALLKNN scaled well with an increase in dimensions while
distance calculations needed to identify the nearest neighbors. performing as accurately as kN N with much lower time
Many of the methods for speeding up neighbor searching complexity.
combine clustering with kN N to identify a particular area
of the data set in which to look for the k nearest neighbors. VI. C ONCLUSION AND F UTURE W ORK
One such method was employed in [24] where the k-means Many techniques have been proposed to reduce or improve
algorithm is first used to partition the training data set into kc the time complexity. As we looked at many of those techniques
clusters, where k is a non-negative integer and c is the inherent in this review, we saw that they are quite effective at making
number of classes in the training data set. For a test sample use of various methods to counter the curse of dimensionality
a, the criteria for rejecting a cluster is as follows: d(a, ci ) > and big data. This makes the application of the kN N classifier
dmax , where ci is the respective cluster center, and dmax is possible on bigger data sets when it would otherwise have
the distance from the cluster center to the farthest data point not been possible. When looking at methods that attempt to
in that cluster. Once the nearest cluster(s) are identified, the reduce the size of training sets, it can be seen from Table I
search for the k nearest neighbors is carried out only inside that DROP3 is one of the best performers, as it is able to
that cluster, reducing the time complexity significantly. balance the reduction-accuracy trade-off in the best manner,
A similar implementation of clustering to find relevant while others like the HMN-EI or the ENN, either have too
search areas was introduced where an enhancement was made small an impact on the data set size, or lose too much accuracy
in the form of assigning weights to each attribute in proportion while attempting the reduction.
to its information (entropy) [25]. After dividing the training An interesting thing to note is that the recently introduced
data into clusters using any standard clustering algorithm, MNNCBC algorithm achieves the highest accuracy on the Iris
the nearest cluster to a test point, a, is chosen based on the data set. While its’ reduction percentage is not available, the
Euclidean distance to each of the cluster centers, ci . Then the fact that it employs clustering to further speed up the neighbor
k nearest neighbors of a are identified in that cluster based search makes it a very tempting choice to use with big data.
on Weighted Euclidean distances between a and each cluster Clustering, or similar partitioning of data, seems to be the go-
member. Again, clustering ensures that a specific search area is to approach when attempting to speed up or approximate the
first identified before looking for the nearest neighbors, thereby neighbor searching process, as it has a drastic effect on the
saving significant time and computation. time complexity. However, there is scant work on determining
The MNNCBC [15] algorithm mentioned in Section IV also which clustering approach is the best for both the efficiency
employs the same approach by classifying a test data point and the accuracy of the kN N , and this should be explored
based on the class label of the nearest cluster instead of the further.
nearest neighboring data points. While there are many methods available to modify kN N
A unique approach was adopted when utilizing the emerg- for use with big data, as presented in this review, there is
ing technique of Stochastic Computing (SC) for the specific no way to compare them, as some are just proof-of-concepts
reduction of the kN N s hardware cost [26], which indirectly without any testing data, and others that have been tested have
also resulted in an improvement in its time complexity. A been done on either small data sets like Iris or different data
modified version of the SC technique was used in the dis- sets thus making the studies incomparable. Hence, there is
a need for a meta-analysis that compares the effectiveness [24] H. Hong, G. Juan, and W. Ben, “An improved knn algorithm based on
of these approaches on actual big data of varying types and adaptive cluster distance bounding for high dimensional indexing,” in
3rd Global Congr. Intell. Syst., Nov 2012, pp. 213–217.
constitutions that would give a clearer picture on what are the [25] S. Taneja, C. Gupta, K. Goyal, and D. Gureja, “An enhanced k-nearest
best approaches to implement kN N on big data. neighbor algorithm using information gain and clustering,” in 4th Int.
Conf. on Adv. Comput. Commun. Technol., Feb 2014, pp. 325–329.
[26] D. Cannisi and B. Yuan, “Design space exploration for k-nearest neigh-
R EFERENCES bors classification using stochastic computing,” in IEEE Int. Workshop
on Sign. Process. Syst. (SiPS), Oct 2016, pp. 321–326.
[1] M. Pelillo, “Alhazen and the nearest neighbor rule,” Pattern Recognition [27] P. Lv, P. Yang, Y. Dong, and L. Gu, “Ballknn: An efficient and scalable
Letters, vol. 38, p. 34–37, 03 2014. knn based on euclidean similarity,” in Int. Joint Conf. Neural Netw.
[2] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE (IJCNN), July 2016, pp. 5141–5148.
Trans. Info. Theory, vol. 13, no. 1, pp. 21–27, January 1967.
[3] L. Gyorfi and Z. Gyorfi, “An upper bound on the asymptotic error prob-
ability on the k-nearest neighbor rule for multiple classes (corresp.),”
IEEE Trans. Info. Theory, vol. 24, no. 4, pp. 512–514, July 1978.
[4] P. Hart, “The condensed nearest neighbor rule (corresp.),” IEEE Trans.
Info. Theory, vol. 14, no. 3, pp. 515–516, May 1968.
[5] L. Chen, G. Guo, and S. Wang, “Nearest neighbor classification by
partially fuzzy clustering,” in 26th Int. Conf. Adv. Information Netw.
and Appl. Workshops, March 2012, pp. 789–794.
[6] L. Jun, Z. Shunyi, L. Yanqing, and Z. Zailong, “Internet traffic classi-
fication using machine learning,” in 2nd Int. Conf. Commun. and Netw.
in China, Aug 2007, pp. 239–243.
[7] G. Gates, “The reduced nearest neighbor rule (corresp.),” IEEE Trans.
Info. Theory, vol. 18, no. 3, pp. 431–433, May 1972.
[8] D. L. Wilson, “Asymptotic properties of nearest neighbor rules using
edited data,” IEEE Trans. Syst., Man, and Cybern., vol. SMC-2, no. 3,
pp. 408–421, July 1972.
[9] I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE
Trans. Syst., Man, and Cybern., vol. SMC-6, no. 6, pp. 448–452, June
1976.
[10] D. G. Lowe, “Similarity metric learning for a variable-kernel classifier,”
J. Neural Comput., vol. 7, no. 1, pp. 72–85, Jan 1995.
[11] D. W. Aha, D. Kibler, and M. K. Albert, “Instance-based learning
algorithms,” J. Mach. Learn., vol. 6, no. 1, pp. 37–66, Jan 1991.
[Online]. Available: https://doi.org/10.1007/BF00153759
[12] W. Aha, “Tolerating noisy, irrelevant, and novel attributes in instance-
based learning algorithms,” Int. J. Man-Mach. Studies, vol. 36, pp. 267–
287, 02 1992.
[13] D. R. Wilson and T. R. Martinez, “Reduction techniques
for instance-based learning algorithms,” J. Mach. Learn.,
vol. 38, no. 3, pp. 257–286, Mar 2000. [Online]. Available:
https://doi.org/10.1023/A:1007626913721
[14] E. Marchiori, “Hit miss networks with applications to instance selec-
tion,” J. Mach. Learn. Research, vol. 9, pp. 997–1017, 06 2008.
[15] M. M. Samadpour, H. Parvin, and F. Rad, “Diminishing prototype size
for k-nearest neighbors classification,” in 14th Mexican Int. Conf. Artif.
Intell. (MICAI), Oct 2015, pp. 139–144.
[16] J. L. Carbonera and M. Abel, “A density-based approach for instance
selection,” in IEEE 27th Int. Conf. Tools Artif. Intell. (ICTAI), Nov 2015,
pp. 768–774.
[17] ——, “A novel density-based approach for instance selection,” in IEEE
28th Int. Conf. Tools with Artif. Intell. (ICTAI), Nov 2016, pp. 549–556.
[18] H. Xie, D. Liang, Z. Zhang, H. Jin, C. Lu, and Y. Lin, “A novel pre-
classification based knn algorithm,” in IEEE 16th Int. Conf. Data Mining
Workshops (ICDMW), Dec 2016, pp. 1269–1275.
[19] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “Knn model-based
approach in classification,” in On The Move to Meaningful Internet
Systems 2003: CoopIS, DOA, and ODBASE, R. Meersman, Z. Tari, and
D. C. Schmidt, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg,
2003, pp. 986–996.
[20] W.-J. Hwang and K.-W. Wen, “Fast knn classification algorithm based
on partial distance search,” Electron. Lett., vol. 34, pp. 2062 – 2063, 11
1998.
[21] J.-S. Pan, Y.-L. Qiao, and S.-H. Sun, “A fast k nearest neighbors
classification algorithm,” IEICE Trans. Fundam. of Electron., Commun.
and Comput. Sci., vol. E87A, pp. 961–963, 04 2004.
[22] Y.-L. Qiao, J.-S. Pan, and S.-H. Sun, “Improved k nearest neighbor
classification algorithm,” Tien Tzu Hsueh Pao/Acta Electron. Sinica,
vol. 33, pp. 1101 – 1104 vol.2, 01 2005.
[23] L. Xiong, Y. Xiang, Q. Zhang, and L. Lin, “A novel nearest neighbor-
hood algorithm for recommender systems,” in 3rd Global Congr. Intell.
Syst., Nov 2012, pp. 156–159.

You might also like