You are on page 1of 10

Concurrent Linearizable Nearest Neighbour Search in

LockFree-kD-tree ∗

Bapi Chatterjee, Ivan Walulya and Philippas Tsigas


Chalmers University of Technology, Gothenburg, Sweden
{bapic, ivanw, tsigas}@chalmers.se

ABSTRACT quential setting. Samet’s book [26] provides an excellent col-


The Nearest neighbour search (NNS) is a fundamental prob- lection of data structures for storing multidimensional data.
lem in many application domains dealing with multidimen- Several of these have been adapted to perform parallel NNS
sional data. In a concurrent setting, where dynamic modi- over a static data structure. However, both sequential and
fications are allowed, a linearizable implementation of NNS parallel designs primarily consider NNS queries without ac-
is highly desirable. commodating dynamic addition or removal (modifications)
in the data structure. Allowing concurrent dynamic modifi-
This paper introduces the LockFree-kD-tree (LFkD-tree):
cations exacerbates the challenge substantially.
a lock-free concurrent kD-tree, which implements an ab-
stract data type (ADT) that provides the operations Add, The wide availability of multi-core machines, large sys-
Remove, Contains, and NNS. Our implementation is lin- tem memory, and a surge in the popularity of in-memory
earizable. The operations in the LFkD-tree use single-word databases, have led to a significant interest in the index
read and compare-and-swap (CAS) atomic primitives, which structures that can support NNS with dynamic concurrent
are readily supported on available multi-core processors. addition and removal of data. However, to our knowledge
no complete work exists in the literature on concurrent data
We experimentally evaluate the LFkD-tree using several
structures that support NNS.
benchmarks comprising real-world and synthetic datasets.
The experiments show that the presented design is scalable Typically, a hierarchical tree-based multidimensional data
and achieves significant speed-up compared to the imple- structure stores the points following a space partitioning
mentations of an existing sequential kD-tree and a recently scheme. Such data structures provide an excellent tool to
proposed multidimensional indexing structure, PH-tree. prune the subsets of a dataset that do not contain the target
nearest neighbour. Thus, an NNS query iteratively scans the
Keywords dataset using such a data structure. The iterative scan pro-
cedure starts with an initial guess, at every iteration visits
concurrent data structure, kD-tree, nearest neighbor search, a subset of the data structure (e.g. a subtree of a tree) that
similarity search, lock-free, linearizability, data structure can potentially contain a better guess, and is unvisited until
1 Introduction the last iteration, updates the current guess if required, and
thereby finally returns the nearest neighbour.
1.1 Background In a concurrent setting, performing an iterative scan along
Given a dataset of multidimensional points, finding the point with concurrent modifications, faces an inescapable chal-
in the dataset at the smallest distance from a given tar- lenge. Consider the case of an operation op performing an
get point is typically known as the nearest neighbour search NNS query in a hierarchical multidimensional data structure
(NNS) problem. This fundamental problem arises in numer- that stores points from Rd and where Euclidean distance is
ous application domains such as data mining, information used. Let a={ai }di=1 ∈ Rd be the target point of the NNS.
retrieval, machine learning, robotics, etc. Let us assume that k∗ ={ki∗ }di=1 ∈ {k : k is key of a node}
is the nearest neighbour of a at the invocation of op. In
A variety of data structures available in the literature,
a sequential setting, where no addition or removal of data-
which store multidimensional points, solve the NNS in a se-
points occurs during the lifetime of op, k∗ remains the near-

This work was partially supported by the Swedish Foundation for est neighbour of a at the return of op. However, if a con-
Strategic Research as part of the project RIT-10-0033 “Software Ab- current addition is allowed, a new node with key k∗∗ may
stractions for Heterogeneous Multi-core Computer” and the EU FP7
project EXCESS (www.excess-project.eu) under grant 611183. be added to the data structure in a subset that may already
have been visited or got pruned by the completion of the
Permission to make digital or hard copies of all or part of this work for personal or latest iteration step. Clearly, op would not visit that subset.
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation Suppose that k∗∗ was closer to a compared to k∗ , if op re-
on the first page. Copyrights for components of this work owned by others than the turns k∗ , it is not consistent to an operation which observes
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or that the addition of k∗∗ completes before op.
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org. Considering concurrent operations on data structure, lin-
ICDCN ’18, January 4–7, 2018, Varanasi, India earizability [16] is the most popular framework for consis-
c 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
tency. A concurrent data structure is linearizable if every
ISBN 978-1-4503-6372-3/18/01. . . $15.00 execution has alinearization points, between the invocation
DOI: https://doi.org/10.1145/3154273.3154307 and response of each operation, where it seems to take effect
instantaneously. In a concurrent setting, we desire lineariz- In general, concurrent data structures do not trivially sup-
ability of an NNS query. port atomic snapshots. Some exceptions are - the lock-based
Non-blocking progress guarantees are preferred for concur- BST by Bronson et al. [5], the lock-free Trie by Prokopec et
rent operations. In an asynchronous shared-memory system, al. [25] and lock-free k-ary search tree by Avni et al. [6].
where an infinite delay or crash failure of a thread is pos- Petrank et al. presented a method to support atomic
sible, a lock-based concurrent data structure is vulnerable snapshots in one dimensional lock-free ordered data struc-
to pitfalls such as deadlock, priority inversion and convoy- tures that implement sets [24]. They illustrated their method
ing. On the other hand, in a non-blocking data structure, in lock-free linked-lists and skip-lists. Chatterjee [9] ex-
threads do not hold locks, and at least one non-faulty thread tended the method of [24] to propose lock-free linearizable
is guaranteed to finish its operation in a finite number of range search in one dimensional data structures.
steps(lock-freedom). Wait-freedom is a stronger progress The main idea in [9, 24] is augmenting the data struc-
condition that all threads will complete an operation in a ture with a pointer to a special object, which provides a
finite number of steps. platform for an Add/ Remove/ Contains operation to re-
In recent years, a number of practical lock-free search data port modifications to a concurrent operation performing a
structures have been designed: skip-lists [27], binary search full or partial snapshot. Nevertheless, collecting an atomic
trees (BSTs) [10, 12, 17, 23], etc. Despite the growing liter- snapshot of a multidimensional data structure to perform
ature on lock-free data structures, the research community an NNS would be naive. We need to adapt the procedure of
has largely focused on one-dimensional search problems. To iterative scan, which benefits from an efficient hierarchical
our knowledge, no complete design of any lock-free multidi- space partitioning structure, to a concurrent setting.
mensional data structure exists in the literature. Our work proposes a solution based on augmenting a con-
The challenge appears in two ways: designing a concur- current data structure with a pointer to a special object
rent lock-free multidimensional data structure that supports called neighbour-collector. A neighbour-collector provides
NNS and ensuring the linearizability of NNS. a platform for reporting concurrent modifications that can
One of the most commonly used multidimensional data otherwise invalidate the output of a linearizable NNS.
structures for NNS is the kD-tree, introduced by Bentley [4]. Essentially, an operation NNS(α) first searches for an ex-
In principle, a kD-tree is a generalization of the BST to store act match of α in the data structure, and if it succeeds,
multidimensional data. Friedmann et al. [13] proved that a returns α itself as its nearest neighbour. If an exact match
kD-tree can process an NNS in expected logarithmic time as- is not found, before starting the iterative scan, NNS(α) an-
suming uniformly distributed data points. Various efforts, nounces itself. The announcement uses a new active neighbour-
including approximate solutions, have contributed to im- collector that contains the target point α and the current
proving the performance of NNS in kD-trees [3,22]. Further- best guess for the nearest neighbour of α. On completing the
more, several parallel kD-tree implementations have been iterative scan, it deactivates the neighbour-collector. A con-
presented, specifically in the computer graphics community, current operation, after completing its steps, checks for any
where the focus is on accelerating the applications, such as active neighbour-collector, and if found, reports its output
the ray tracing, in single-instruction-multiple-data (SIMD) if it was a better guess than the current best guess available.
programming model [29]. Unfortunately, these designs do Finally, NNS(α) outputs the best guess among the collected
not fit concurrent setting where we desire linearizable NNS and the reported neighbours as the nearest neighbour of α.
with concurrent modifications. For robotic motion planning, Naturally, there can be multiple concurrent NNS opera-
Ichnowski et al. [18] used a kD-tree of 3-dimensional data tions with different target points, and we must allow each
in which they add nodes concurrently. However, this design of them to continue its iterative scan, after announcing it as
does not support Remove and the canonical implementation soon as it begins. To handle multiple concurrent announce-
of NNS, using recursive tree-traversal, is not linearizable. ments, we use a lock-free linked-list of neighbour-collector
Contributions: We introduce LockFree-kD-tree (LFkD- objects. The data structure stores a pointer to one end of
tree) - an efficient concurrent lock-free kD-tree (section 2). this list, say the head. A new neighbour-collector is allowed
LFkD-tree implements an abstract data type (ADT) that to be added only at the other end, say the tail.
provides Add, Remove, Contains and NNS operations Consequently, before announcing a new iterative scan, an
for a multidimensional dataset. We describe a novel lock- NNS operation goes through the list and checks whether
free linearizable implementation of NNS in LFkD-tree (sec- there is an active neighbour-collector with the same target
tion 3). We provide a Java implementation of the LFkD- point. If an active neighbour-collector is found, it is used
tree (code available at [7]). We evaluate our implementa- for a concurrent coordinated iterative scan(explained in the
tion against that of an existing sequential kD-tree and the next paragraph). A neighbour-collector is removed from the
PATRICIA-hypercube-tree [28]∗ (section 4). lock-free linked-list as soon as the associated iterative scan
is completed. Hence, at any point in time, the length of the
1.2 A high-level summary of the work list is at most the number of active NNS operations.
The main challenge in implementing a linearizable NNS is During an iterative scan, a subset of the dataset is pruned
to ensure that it is not oblivious to the concurrent modifica- depending on whether the distance of the target point from
tions in the data structure. NNS requires an iterative scan, a bounding box covering the subset is greater than that from
which collects, along with pruning, an atomic snapshot. the current best guess. Now, if the current best guess at a
∗ neighbour-collector is the outcome of already pruned many
In this work, we are not interested in an existing parallel subsets, an NNS that starts its iterative scan at a later time-
or sequential implementation that does not provide a Re-
move operation, in which case lock-free design poses little point, or is slow (or even delayed), will be able to complete
challenge. We could find only these two existing implemen- much faster. Thus, the coordination among the concur-
tations that provide Remove along with NNS. rent NNS, via their iterative scans at the same neighbour-
x1 =1
collector, speeds them up in aggregation.
x1 ,4
The basic design of the LFkD-tree is based on the lock-free (7,8)
BST of Natarajan et al. [23]. To perform an iterative scan,
(2,7) x2 =7 x2 ,3 x2 ,5
we implement an efficient fully non-recursive traversal using
parent links, which is not available in [23]. Thus, to manage (6,6)
an extra link in each node, our design requires extra effort (0,5) x2 =5 (1,1)
for the lock-free synchronization. The modify operations use x1 ,1 x1 ,6 x2 ,7
single-word-sized atomic CAS primitives. The helping mech- (5,4)
anism is based on operation descriptors at the child-links. (0,5) (2,7) (5,4) (6,6) (7,8)
x2 =3 (7,3)
Consequently, extra object allocations for synchronization
is avoided. The linearizable implementation of NNS is not (1,1) (9,1) x1 ,8
confined to the LFkD-tree, and it can be used in a similar
(a) (b)
concurrent implementation of any other multidimensional x1 =4 x1 =6 x1 =8
(7,3) (9,1)
data structure available in [26]. Therefore, we describe the
NNS operations independently and comprehensively. Other Figure 1: LFkD-tree Structure
operations in the LFkD-tree, which are similar to their re-
spective analogues in the lock-free BST of [23], are described
at a high level in the paper and their details are included For each internal-node N(i, c), Υ maintains the following
in the extended version [8].We implemented the LFkD-tree invariants (for the nodes in the subtrees rooted at N(i, c)):
algorithm in Java. The code is available at [7]. (i) a node N({ki }di=1 ) belongs to the left subtree, if ki <c, (ii)
a node N({ki }di=1 ) belongs to the right subtree, if ki ≥c and
2 LockFree-kD-tree Implementation (iii) both subtrees are themselves LFkD-tree. (i) and (ii)
2.1 Design Overview of the LFkD-tree together are called the symmetric order of the LFkD-tree.
Figure 1 illustrates the structure of a subtree of a LFkD-tree
The LFkD-tree is a point kD-tree in which each node that
corresponding to a sample 2-dimensional dataset.
stores data is assigned at most one data-point. Typically,
to partition Rd , we use axis-orthogonal hyperplanes that are 2.2 Sequential Behaviour of the ADT Opera-
given by xi =c, 1≤i≤d. The structure and consequently the tions
NNS performance of a kD-tree heavily depends on the split-
ting rule - the procedure to select the partitioning hyper- LFkD-tree implements an abstract data type kDSet; that
planes. In a sequential setting, to construct a kD-tree from provides operations Add, Remove, Contains and NNS.
static data, the partitioning hyperplanes are chosen to co- For each operation, we start with a query: from the root,
incide with points that belong to the given dataset. In this traverse down Υ, at an internal node decide left / right child
approach, similar to an internal BST representation [10], using the symmetric order until arrival at a leaf-node.
each node is used for storing data. However, removing a To perform Add(a), a∈Rd , if the query terminates at a
node from an internal BST is costly, more so in a concurrent leaf-node N(b), b∈Rd , and b = a (an element-wise comparison
setting [10, 17]. With this in mind, we opt for an external of keys), Add(a) returns false. However, if b 6= a, we allocate
BST representation [12,23] to design the LFkD-tree. In this a new internal-node N(i, c) with its child links connected to
design, only leaf-nodes contain the data-points and internal- two leaf-nodes N(a) and N(b). If p(N(b)) was the parent of
nodes route a traversal, see fig. 1 (b). More importantly, it N(b) at the termination of query, we connect the parent link
gives us the flexibility to compute c and i : 1≤i≤d for a hy- of N(i, c) to p(N(b)). We update the link p(N(b));N(b) to
perplane xi =c, which may not coincide with a data-point. point to N(i, c) and return true. To compute i and c, we
To compute the values of c and i, in the scenarios where employ the local-midpoint rule as given below.
the entire dataset is available beforehand, a number of split- Local-midpoint rule: 1≤i≤d is the index of the coordi-
ting rules exist in the literature [13, 22]. These rules focus nate axis along which a and b have the maximum coordinate
on the hierarchical partition of a closed hyperrectangle that difference; if there are more than one such axis then se-
covers the entire dataset and not only tries to balance a kD- lect the one with the lowest index. Take the hyperplane as
tree but also optimize its depth. In a concurrent setting, xi = a[i]+b[i]
2
.
where we do not have knowledge of the entire dataset in To perform Remove(a), if the leaf-node where the query
advance, the partitioning hyperplane needs to be computed terminates, has the key a, i.e. N(a)∈Υ, we modify the link
dynamically and in a very localized fashion. For the LFkD- from the grandparent of N(a), denoted by g(N(a)), to its par-
tree, we formulate a simple and practical splitting rule, the ent, to connect the sibling of N(a), s(N(a)), to g(N(a)); and
local-midpoint rule, as given in the section 2.2. return true. If N(a)∈Υ,
/ Remove(a) returns false. To per-
A leaf-node of a LFkD-tree Υ, contains a unique data- form Contains(a), using a similar query we check whether
point as its key, whereas, an internal-node corresponds to N(a)∈Υ and return true or false accordingly.
a partitioning hyperplane. Without ambiguity, we denote a The operation NNS(a) is non-trivial. On termination of
leaf-node containing key k={ki }di=1 ∈Rd by N(k) (or N({ki }di=1 )), the initial query, if we reach at N(b) and b = a, clearly the
and an internal node associated with a hyperplane xi =c, by nearest neighbour of a, available in the dataset stored in
N(i, c). An internal-node has three links connected to its Υ, is a itself. However, if b 6= a, we take b as our current
left-child, right-child and parent. We indicate the link ema- best guess and check whether the other subtree of p(N(b))
nating from a node N and into a node M by N;M. Access to (the current subtree consists the single node N(b)) stores a
Υ is given by the address of a unique node root. A node N is better guess. Suppose that p(N(b))=N(i, c). Now, any point
said to be present in Υ(N∈Υ), if it can be reached following on the other side of the hyperplane xi =c will be at least at
the links starting from the root. a distance |ai −c| from the target point {ai }di=1 . Therefore,
if |ai −c|>||a, b||2 (the Euclidean distance between a and b), To realize the atomic step to inject an operation descrip-
we must prune the other subtree i.e. one rooted at s(N(b)), tor, we replace a link using a CAS with a single-word-sized
otherwise we visit it in the next iteration. A subtree once packet of a link and a descriptor. Given a pointer dele-
visited is not visited again and thus we traverse back to the gates a link, a well-known method in C/C++ to pack extra
root of Υ. At the termination of the iterative scan of Υ, the information with a pointer in a single memory-word is bit-
current best guess is returned as the nearest neighbour of a. stealing. In a x86/64 machine, where memory allocation is
2.3 Lock-free Synchronization aligned on a 64-bit boundary, three least significant bits in
a pointer are unused. The three operation descriptors used
As the basic structure of our LFkD-tree is based on an exter- in our algorithm fit over these bits.
nal BST, for the lock-free synchronization in the LFkD-tree, For ease of exposition, we assume that a memory alloca-
we build upon the lock-free BST algorithm of [23]. The fun- tor always allocates a variable at a new address and thus
damental idea of the design is a lazy remove procedure that an ABA† problem does not occur. For lock-free memory
is essentially based on a protocol of atomically injecting op- reclamation in a C/C++ implementation of the algorithm, a
eration descriptors on the links connected to the node to method such as one based on reference counting [14] can be
be removed, and then modifying those links to disconnect used. Whereas, traditionally a Java implementation uses the
the node from the LFkD-tree. If multiple concurrent opera- JVM garbage collector. Furthermore, to avoid null point-
tions try to modify a link simultaneously, they synchronize ers at the beginning of an application, we use a subtree
by helping one of the pending operations that would have containing an internal-node and two leaf-nodes which work
successfully injected its descriptor. as sentinel nodes. See fig. 2(a). The keys in the sentinel
nodes maintain ∞0 >∞1 >∞2 >ki , 1≤i≤d, for any data point
g(a) g(a)
d (x1 ,∞1) {ki }di=1 stored in the LFkD-tree. The sentinel internal-node
{∞0} p(a) p(a)
{∞2}
d | {z } N(1, ∞1 ) works as the root of the LFkD-tree and the entire
d−tuple
| {z }
s(a) dataset is stored in its left subtree.
d−tuple {∞2}d {∞0}d a
s(a)
a The focus of the paper is on the linearizable NNS; We have
(x1 =∞1)
(i) (ii) (i) (ii) presented the detailed description of the operations Add,
(a)
Remove and Contains in [8]. The full correctness proof is
g(a) g(a) g(a) also presented in [8], but here as a precursor we state the
p(a)
linearization points of the operations Add, Remove and
p(a)
p(a) Contains.
s(a) s(a)
s(a)
a a
a
2.4 Linearization points of the Add, Remove and
(b)
Containsoperations
(iii) (iv)
mark tag flag (c) For a successful Add operation, execution of the CAS, where
a new internal-node is added, is the linearization point. For
Figure 2: LFkD-tree Structure an unsuccessful Add and a successful Contains, it is at the
point where we read the address of the leaf-node with match-
More specifically, to Remove the node N(a), as shown
ing key. For a successful Remove operation, the CAS that
in the fig. 2(b), we use a CAS to inject operation descrip-
replaces a descriptor-free pointer with marker i.e. the logical
tors at the links p(N(a));N(a), g(N(a));p(N(a)) and p(N(a))
remove step is the linearization point. Linearization argu-
;s(N(a)), in this order. We call these descriptors mark, tag
ments for an unsuccessful Remove and a similar Contains
and flag respectively. An operation descriptor works as
have two cases - (a) if there existed a node N containing the
an information source about the steps already performed in
query key in the LFkD-tree at the invocation but was logi-
Remove(a) and thus a concurrent operation, if obstructed
cally or completely removed by a concurrent Remove opera-
at a link with descriptor, helps by performing the remaining
tion before the return of Search(), the linearization point is
steps. In particular, a mark at a link indicates that the next
placed just after the linearization point of that Remove op-
step would be to inject a tag at the link g(N(a));p(N(a)),
eration (b) if no node containing the query key existed in the
whereas, a tag indicates that the next step is to inject the
LFkD-tree at the invocation of the Remove or Contains,
descriptor flag at the link p(N(a));s(N(a)). Finally, a flag
the invocation point is taken as the linearization point.
indicates the completion of steps of injecting operation de-
scriptors and thereafter the required link updates are done. 3 Linearizable Nearest Neighbour Search
The helping mechanism ensures that the concurrent Add
and Remove operations do not violate any invariant main- In this section, we begin with the algorithm that addresses
tained by the LFkD-tree. The steps of a Remove operation the case where concurrent NNS operations have coinciding
are shown in the fig. 2(c). An Add operation uses a single target points. We build on it to present the algorithm for
CAS to update the target link only if it is free from any op- general cases without any restriction.
eration descriptor, otherwise it helps the pending Remove. First, we present the node-structures in the LFkD-tree,
A Contains or NNS operation does not perform help. which will help in the subsequent discussion. The classes
We call the CAS step, which injects a mark at p(N(a));N(a), INode and LNode, which represent an internal- and a leaf-
the logical remove of a. After this step, a Contains(a) that node respectively, are shown in lines 1 and 3 in Algorithm 1.
reads p(N(a));N(a) returns false. Accordingly, Add(a) helps Every INode, in addition to the fields i and c that repre-
to complete the pending Remove(a), if it reads p(N(a));N(a) †
ABA is an acronym to indicate a typical problem in a CAS-
with a mark descriptor, and then reattempts its own steps. based lock-free algorithm: a value at a shared variable can
The helping mechanism guarantees that a logically removed change from A to B and then back to A, which can corrupt
node will be eventually detached from the LFkD-tree. the semantics of the algorithm.
sent the associated hyperplane, has three pointers lt, rt and sense. Suppose that op2 and op3 both got delayed after their
pr that delegate the left-child, right-child and parent links, linearization. Now, if invocation of op happened after that,
respectively. A LNode contains only an array k to represent op is guaranteed to read N, if N contained the nearest neigh-
a data-point k={ki }di=1 ∈Rd . The node-pointer root, line 5, bour of the target point. But, if in between the linearization
delegates address of the sentinel node N(1, ∞1 ). As a con- of op3 and invocation of op, a concurrent Remove removed
vention, if x is a field of a class C, we use pc·x to indicate N, op will certainly not read it, and a reporting may ren-
the field x of an instance of C pointed by pc; and, the type der the linearization point of op to be shifted to even before
of a pointer to an instance of C is indicated by C∗. Note its invocation, which is undesired. To avoid this situation,
that, INode and LNode inherit Node. Now, before de- before every reporting, we first ascertain whether the node
scribing the NNS algorithms, we discuss the linearizability to be reported is logically removed by calling the method
of the operations. IsMark().
1 class INode {  A subclass of Node. 3.2 Concurrent NNS with coinciding target point
long i; double c;
Node∗ lt, rt, pr; When concurrent NNS operations have coinciding target
2 } points, they can output same result by adopting a single
3 class LNode {  A subclass of Node. atomic step, which is performed during the lifetime of one
double[] k; of them, as the linearization point for each of them; the
4 } real-time order amongst them can be taken as the order of
5 root := INode∗(1, ∞1 , null, null, null); any fixed step for example their invocation step. Thus, es-
root·lt := LNode∗({∞2 }d ); sentially they require a single iterative scan. Basically, it is
root·rt := LNode∗({∞0 }d ); similar to the linearizable snapshot algorithm of [24]. The
pseudo-code of the algorithm is given in Algorithm 2.
Algorithm 1. The node structure in the LFkD-tree The class Nebr, line 1, represents a packet of a data-point,
as contained in a leaf-node pointed by the node-pointer a,
3.1 Linearization argument and its distance, given as d, from the target point of an NNS.
Consider the concurrent modifications in the LFkD-tree, The class NbrClctr, line 2, represents a neighbour-collector :
when an NNS operation, say op, performs its iterative scan. the platform for collecting and reporting the nearest neigh-
We can ensure, by checking whether IsMark() returns true, bour. NbrClctr contains pointers to two Nebr instances:
that the key of a leaf-node which was logically removed, is col points to one that contains collected data-point during it-
never collected as a current guess for the nearest neighbour. erative scan by an NNS operation and rep points to one that
Similar to a Contains operation, we can place the lineariza- contains a data-point reported by a concurrent operation, in
tion point of op at the point where it reads the pointer to the addition to the target point tgt. It also contains a boolean
leaf-node, say N, whose key is returned as the nearest neigh- isAct, which if set true, implies an active neighbour-collector;
bour. Now, if N is logically removed after it was read by op, and a neighbour-collector-pointer nxt, which is used in Al-
by a concurrent Remove operation, say op1 , which returns gorithm 3. The LFkD-tree is augmented with a pointer ncp,
before the return of op, we still do not lose the linearizability line 4, initialized to point to an inactive neighbour-collector.
argument, simply because the linearization point of op1 is 3.2.1 The coordinated iterative scan:
ordered after that of op. The operation NNS, line 5–10, starts with calling the method
However, in case of a concurrent Add operation, if we Seek(), line 6, to perform the initial query to arrive at a leaf-
adopt a similar approach, we may be at the risk of return- node. The process of a query down the kD-tree is trivial;
ing not the latest nearest neighbour, which we desire as ex- the pseudo-code of Seek() is given in [8]. If the pointer to
plained in the section 1.1, while maintaining linearizability. leaf-node a is free of descriptor mark, which indicates that
We explain it here. Suppose op2 is the Add operation that the node pointed by a is not logically removed, and if the
added N(k∗∗ ) to the LFkD-tree as described in section 1.1. query key k matches at the leaf-node, which is checked by
Suppose that op2 got delayed after completing the Add and the distance between k and the key at the leaf-node, k itself
the NNS operation op returned k∗ . Now, we can order the is the nearest neighbour available in the dataset and NNS
linearization point of op before that of op2 , however, the re- returns, line 10. Otherwise, NNS calls NNSync() to perform
turn of op is inconsistent to any operation which observes further steps and returns the nearest neighbour, line 9. The
the outcome of both op and op2 . Thus, op2 needs to report arrays hi and lo are used to support non-recursive traversal,
its modification to op, after completing its own steps. described in the [8]. NNSync() and methods called subse-
Additionally, if in the meantime a concurrent Contains quently are described here.
operation, say op3 , read N(k∗∗ ) and returned as usual, we The method NNSync(), line 18–29, starts with checking
may again loose linearizability because to an outside ob- whether ncp points to an active neighbour-collector, and
server the addition of a better guess is visible, possibly be- if it does not, it allocates a new active neighbour-collector
fore the return of op, by way of op3 , although op did not and attempts a CAS to modify ncp to point to the new one,
return it. Therefore, op3 also needs to report its output to line 23. In case ncp was pointing to an active neighbour-
op. Now, given that op2 and op3 are made to report their collector, the current best guess of nearest-neighbour, as
output to op, we need to change the linearization point of contained in the leaf-node, is attempted to be added to that.
op. To maintain the order, we put the linearization point of On an active neighbour-collector, the method Collect() is
op just after that of op2 or op3 , if op happens to return the called to perform a coordinated iterative scan, line 28.
nearest neighbour which was a report by one of them. Collect(), line 11–14, calls the method NextGuess(), line 12,
Note that, we need to be careful about unnecessary report- to perform next iteration that can better the current best
ing, which may possibly be harmful as well, in the following guess of the nearest neighbour. NextGuess() performs a
1 class Nebr { Neighbour Sync(Node∗ pa, Node∗ a)
Node∗ a; double d; 31 if ncp·isAct then
} 32 hd, nbi := NearNbr(a, ncp);
2 class NbrClctr{ Neighbour-collector
33 if nb 6= null and ChkValid(pa, a) then
34 Report(a, ncp);
double[] tgt; bool isAct;
Nebr∗ col, rep; NbrClctr∗ next; AdNebr(Node∗ a, NbrClctr∗ nn, bool nt)
3 }  nt (Neighbour-type): col or rep.
4 ncp := NbrClctr∗(null, false, null, null, null);
35 while true do
36 nbr := (nt = col ) ? nn·col : nn·rep;
NNS(double[] k ) 37 if nn·isAct and !IsFinish(nbr) then
5 pa := root; a := pa·lt; hi := {∞0 }d ; lo := {−∞0 }d ; 38 hdst, nbi := NearNbr(a, nn);
6 hpa, ai := Seek(pa, a, k , hi, lo); 39 if nb = null then return dst;
7 dst := IsMark(a) ? ∞ : ||k , a·k||2 ; 40 if nt = col then
8 if dst 6= 0 then 41 res := CAS(nn·col·ref, nbr, nb);
9 return NNSync(pa, a, dst, k , hi, lo); 42 else res := CAS(nn·rep·ref, nbr, nb);
10 else {Sync(pa, a); return k ;} 43 if res then return dst;
44 else return 0;
Collect(Node∗ pa, Node∗ a, double[] k , double[]
NearNbr(Node∗ a, NbrClctr∗ nn)
hi, double[] lo, double dst, NbrClctr∗ nn)
45 distTgt := ||a·k, nn·tgt||2 ;
11 while pa 6= Ptr(root) and dst 6= 0 do 46 col := nn·col; rep := nn·rep;
12 hpa, ai := NextGuess(pa, a, dst, k , hi, lo); 47 if distTgt<col·d and distTgt<rep·d then
13 if ChkValid(pa, a) then 48 return hdistTgt, Nebr∗(a, distTgt)i;
14 dst := AdNebr(a, nn, col ); 49 else return hdistTgt, null i;
15 return nn;
Deactivate(NbrClctr∗ nn)
Seek(Node∗ pa, Node∗ a, double[] k , double[] 50 BlockNebr(nn, col );
hi, double[] lo) 51 nn·isAct := false; BlockNebr(nn, rep);
16 · · · ; return hpa, ai; See [8].
Process(NbrClctr∗ nn)
NextGuess(Node∗ pa, Node∗ a, doubledst, double[] 52 if nn·rep·d < nn·col·d then
k , double[] hi, double[] lo) 53 return nn·rep·a;
17 · · · ; return hpa, ai; See [8]. 54 else return nn·col·a;
NNSync(Node∗ pa, Node∗ a, double dst, BlockNebr(NbrClctr∗ nn, bool nt)
double[] k , double[] hi, double[] lo)  nt (Neighbour-type): col or rep.
18 while true do 55 nbr := (nt = col ) ? nn·col : nn·rep;
19 on := ncp;
20 if on·isAct = false then 56 while !IsFinish(nbr) do
57 if nt = col then
21 cN := Nebr∗(a, dst);
58 CAS(nn·col·ref, nbr, Finish(nbr));
22 nn := NbrClctr∗(k , true, cN, cN, null);
59 else CAS(nn·rep·ref, nbr, Finish(nbr));
23 if CAS(ncp·ref, on, nn) then break; 60 nbr := nt = col ? nn·col : nn·rep;
24 else
25 if ChkValid(pa, a) then ChkValid(Node∗ pa, Node∗ a)
26 dst := AdNebr(a, on, col ); 61 k := a·k; ch := Child(pa, Dir(pa, k));
27 nn := on; break; 62 while Ptr(ch)·class 6= LNode do
28 nn := Collect(pa, a, dst, k , hi, lo, nn); 63 ch := Ptr(Child(ch, Dir(ch, k)));
29 Deactivate(nn); return Process(nn); 64 if IsMark(ch) then return false;
65 return ch = a ? true: false;
Report(Node∗ a, NbrClctr∗ nn)
30 AdNebr(a, nn, rep);

Algorithm 2. Linearizable NNS operations with single target point in LFkD-tree


classical non-recursive traversal in the kD-tree. The non- of neighbour-collectors. In this list, before adding a new
recursive method improves the traversal in a concurrent set- neighbour-collector, an NNS must scan through it so that
ting, where the kD-tree can change its structure during the if there was already an active neighbour-collector with a
traversal. We describe NextGuess() in the [8]. matching target point, coordination among the concurrent
Before attempting to add the new guess, contained in a iterative scans with coinciding target points can be achieved.
leaf-node, to the neighbour-collector using the method Ad- For each of the operations in the LFkD-tree to be lock-free,
Nebr(), it is always checked whether the leaf-node is log- we ensure the lock-freedom of this list as well. Hence, we
ically removed by calling the method ChkValid(). Please augment the LFkD-tree with a single-word CAS based lock-
note that, given a (possibly stale) pointer to a leaf-node, free list of neighbour-collectors.
we can not directly check whether it was logically removed. The linearization points remain as before: the concurrent
Therefore, we also supply the pointer to the parent and thus NNS with coinciding target points share an atomic step dur-
the method ChkValid(), line 61–65, gets the latest pointer to ing the lifetime of one of them as their linearization point
the leaf-node considering the fact that a new internal-node with some order among themselves; other operations lin-
may get added between the parent of the leaf-node and the earize as described in the section 2.4.
leaf-node to be reported. 3.3.1 Algorithm:
AdNebr(), line 35–44, is called to add a collected or re-
ported neighbour to an active neighbour-collector. It calls The pseudo-code of the algorithm is given in Algorithm 3,
the method NearNbr(), shown in line 45–48, which returns in which every method is absolutely same as that in Al-
a new neighbour only if the distance of the new guess is gorithm 2, except NNSync() and Sync(). The list is ini-
less than the distance of the already collected or reported tialized with two sentinel nodes pointed by tail and head,
neighbours to the neighbour-collector. with head·nxt set as tail, as given in lines 1 and 2. A new
After the iterative scan, the method Deactivate() is called neighbour-collector is added to this list at one of the ends
by NNSync() at line 29. Deactivate(), line 50–51, other than only, which is just before the node pointed by tail. The
setting the IsAct to false, also injects a descriptor finish at method of maintaining this list is similar to the lock-free
both the neighbour-pointers of the neighbour-collector us- linked-list of Harris [15], except the fact that no addition
ing the method BlockNebr(). BlockNebr(), line 55–60, per- happens anywhere in the middle of the list. Removal of a
forms a CAS to replace a neighbour-pointer with one that neighbour-collector, say one pointed by c, takes two success-
has the descriptor finish over it, see lines 58 and 59. It ful CAS steps: first we inject a mark descriptor at the c·nxt
ensures that each of the concurrent NNS operations using using a CAS and then modify the pointer p·nxt to n with a
same neighbour-collector have same view of it after lineariza- CAS, if p and n happened to be the pointers to the prede-
tion. The method IsFinish() returns true when called on a cessor and successor, respectively, of the neighbour-collector
neighbour-pointer with descriptor finish. Thus, AdNebr() pointed by c. We use the method Mark() to get a word-sized
can not add a new neighbour in a neighbour-collector if the packet of a neighbour-collector-pointer and the descriptor
corresponding pointer is injected with finish, see line 37. mark, whereas, the method Ptr() masks the descriptor off
Finally, the method Process(), line 52–54, is called by such a packet and does not change a neighbour-collector-
NNSync() to select the better candidate between the reported pointer. Adding a neighbour-collector takes a single suc-
and the collected neighbours of the target point, which is cessful CAS similar to [15].
returned to the caller NNS to output. Note that, once a The method NNSync(), line 3–20, as called by NNS after
neighbour-collector is deactivated by an NNS, the method the initial query in Algorithm 2, starts with traversing the
AdNebr() returns 0, line 44. This in turn, immediately ter- list. We maintain an enum variable mode that indicates
minates the While loop in Collect() at the line 11. Thus, the stages of NNSync(). Initially, the mode is INIT. During
as mentioned in section 1.2, we can observe that the coor- the traversal, if an active neighbour-collector with match-
dination among the concurrent iterative scans at the same ing target point is found, the mode is changed to COLLECT
neighbour-collector helps a delayed NNS to complete faster. and traversal terminates, line 13. Otherwise, the traver-
sal terminates in the mode INIT itself. On the termination
3.2.2 The reporting methods: of the traversal in the mode INIT, it is checked whether the
Add or Contains operations (detail pseudo-code included neighbour-collector, where traversal terminated (in this case
in [8]) after completion, use the method Sync(), line 31–34, c), is already logically removed, line 15, and if it is, a CAS
to synchronize with concurrent NNS operations. Sync() is attempted to detach it from the list and the traversal is
first checks the status of the neighbour-collector, then calls restarted, line 16.
NearNbr() to create a neighbour. If the point to be reported After that, if the mode is INIT or COLLECT, the method
is not better than the current best guess available, NearNbr() Finalize() is called. Finalize(), line 40–48, if called in the
returns null and then Sync() returns without any change. mode INIT, allocates a new neighbour-collector by calling
Otherwise, it checks whether the leaf node with the point the method Allocate(), otherwise uses the input neighbour-
to be reported is logically removed by calling the method collector. If Allocate() could not add a new neighbour-
ChkValid(), and then calls the method Report(), which in collector, it returns null and the entire process restarts from
turn calls AdNebr() to add the reported neighbour, line 30. scratch with a fresh traversal. After successfully adding
a new neighbour-collector to the list or asserting that it
3.3 A general Concurrent NNS with multiple needs to use an existing one, Finalize() calls the meth-
target points ods Collect() and Deactivate() similar to those in Al-
To allow multiple concurrent NNS with non-coinciding tar- gorithm 2. On deactivating the neighbour-collector, the
get points to progress together, we need to have as many method Clean() is called to remove it from the list and re-
active neighbour-collectors as the number of different tar- turn the value of the nearest neighbour.
get points. Essentially, we need to have a dynamic list Clean(), line 25–31, performs the two CAS steps to re-
1 tail := NbrClctr∗(null, false, null, null, null); Clean(NbrClctr∗ pre, NbrClctr∗ nn)
2 head := NbrClctr∗(null, false, null, null, tail); 25 nxt := nn·nxt;
26 while !IsMark(nxt) do
NNSync(Node∗ pa, Node∗ a, double dst, 27 CAS(nn·nxt·ref, nxt, Mark(nxt));
double[] k , double[] hi, double[] lo) 28 nxt := nn·nxt;
3 nn := null; mode := INIT; 29 if CAS(pre·nxt·ref, nn, Ptr(nxt)) then
4 retry: 30 return Process(nn);
5 while true do 31 else return null;
6 p := null; c := head; n := c·nxt;
7 while Ptr(n) 6= tail do Sync(Node∗ pa, Node∗ a)
8 if n = nn and mode = CLEAN then 32 n := head·nxt;
9 if (val := Clean(c, nn)) 6= null then 33 while n 6= tail do
10 return val; 34 if n·isAct then
11 else goto retry; 35 nb := NearNbr(a, n);
12 else if k = n·tgt and n·isAct then 36 if nb 6= null and ChkValid(pa, a) then
13 nn := n; mode := COLLECT; break; 37 Report(a, n);
14 else {p := c; c := n; n := n·nxt;} 38 else break;
15 if mode = INIT and IsMark(n) then 39 else n := Ptr(n·nxt);
16 CAS(p·nxt·ref, c, Ptr(n)); goto retry; Finalize(Node∗ pa, Node∗ a, double dst, double[]
17 if mode 6= CLEAN then k , double[] hi, double[] lo, NbrClctr∗ p, NbrClctr∗
18 hval, modei := Finalize(pa, a, dst, k , c, enum md )
hi, lo, p, c, mode); 40 if md = COLLECT then nn := c; pre := p;
19 if val 6= null then return val; 41 else if md = INIT then
20 else return Process(nn); 42 nn := Allocate(a, dst, k , c); pre := c;
43 if nn 6= null then mode := COLLECT;
Allocate(Node∗ a, double dst, double[] k , NbrClctr∗ 44 if md = COLLECT then
c) 45 nn := Collect(pa, a, dst, k , hi, lo, nn);
21 cNb := Nebr∗(a, dst); 46 Deactivate(nn); md := CLEAN;
22 nn := NbrClctr∗(k , true, cNb, cNb, tail); 47 if (val := Clean(pre, nn)) 6= null then
23 if CAS(c·ref, on, nn) then return nn; 48 return hval, md i;
24 else return null;
Algorithm 3. Linearizable NNS with multiple distinct target points in LFkD-tree
NNSRelaxed(double[] k ) impact the design-complexity as long as we follow the same
1 pa := root·ref; a := pa·lt; hi := {∞0 }d ; lo := {−∞0 }d ; consistency framework. However, to get a higher throughput
2 hpa, ai := Seek(pa, a, k , hi, lo); at the cost of an exact solution, we can certainly explore the
3 dst := IsMark(a) ? ∞ : ||k , a·k||2 ; relaxation in consistency requirements of an NNS operation.
4 if dst 6= 0 then Suppose that we do not make an Add or a Contains
5 while pa 6= Ptr(root) do operation report its output to a concurrent NNS and each
hpa, ai := NextGuess(pa, a, dst, k , hi, lo); NNS progresses in oblivion to any concurrent NNS. Consid-
6 return a;
ering a point of reference local to a thread, an NNS outputs
Algorithm 4. Relaxed NNS operations in LFkD-tree the nearest neighbour that it discovers with a certainty that
there was no operation performed by the thread itself that
can alter the result. Clearly, this relaxed arrangement satis-
fies the requirements of the sequential consistency [11]. The
move the neighbour-collector and calls the method Pro-
operation NNSRelaxed, described in Algorithm 4, imple-
cess(), line 30, to compute the nearest neighbour. However,
ments a sequentially consistent version of NNS. We shall
if after injecting mark, it could not modify the nxt pointer of
utilize its experimental performance to empirically evaluate
the predecessor, it returns null, which again causes a fresh
the overhead of linearizability with respect to NNS.
traversal in the mode CLEAN in Finalize(). A traversal
in mode CLEAN, if finds the deactivated neighbour-collector,
calls the method Clean(), line 10, to redo the remaining
4 Experimental Evaluation
steps and return the nearest neighbour. If the traversal Hardware Platform: We implemented the LFkD-tree al-
terminates in the mode CLEAN, that implies that a concur- gorithm in Java using RTTI. We used AtomicRef-
rent NNS would have detached the deactivated neighbour- erenceFieldUpdater objects for CAS. The test environment
collector and therefore Process() is called to finish, line 20. comprised a dual-socket server with a 2.0GHz Intel (R) Xeon
(R) E5-2650 with 8 physical cores each (32 hardware threads
in total with hyper-threading enabled). The server has 64
3.4 Relaxation in Consistency Requirements GB of RAM, runs Ubuntu 13.04 Linux (3.8.0-35-generic x86 64)
Practitioners prefer higher throughput at the cost of exact with Java HotSpot (TM) 64-Bit VM (build 25.60-b23).
solution in various applications of NNS, which is commonly Data Structures:
known as approximate-NN (ANN). Generally, in a hierar- 1. Levy-Kd: An implementation of kD-tree of [19, 21] by
chical multidimensional data structure like kD-tree, ANN Levy that supports Remove operations (we could not
algorithms relax the pruning criterion so that an NNS op- find any other kD-tree with Remove in Java).
eration visits lesser number of subsets and thereby it speeds 2. LFKD: Our implementation of the LFkD-tree with NNS.
up the performance. Implementing ANN in a concurrent hi- 3. LFKD(SC): The LFkD-tree with NNSRelaxed.
erarchical multidimensional data structure may not directly 4. PH-tree: A multi-dimensional storage and indexing data
structure by Zäschke et al. [28] that supports Remove the LFkD-tree outperforms similar implementations.
operations. The implementation is single-threaded. The performance significantly scales up with increasing
thread count. This shows that our implementation is both
lightweight and scalable. As we increase the key dimen-
Throughput (ops/µs)

2 sion, the performance degrades for workloads dominated by


the NNS. This degradation with increasing key dimensions
1
is expected in kD-trees due to the curse of dimensionality
[26]. This performance pattern is identical for different key
ranges. We further observe that, as expected, LFKD(SC)
0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
outperforms LFKD in NNS dominated workload, however,
Threads (2x)
the gap reduces with increasing dimensionality of the data
LFKD LFKD(SC NNS) Levy Kd Ph tree
set that brings the increased load of BFS traversal. This can
be explained in terms of additional step complexity in ac-
Figure 3: 2-D TIGER/Line dataset. count of reporting and maintaining the augmented lock-free
list for linearizability. More importantly, it provides a sig-
Datasets: We performed evaluation using a 2D real-
nificant exposition of NNS vis-a-vis consistency framework
world dataset and a set of synthetic benchmarks. For the
of a concurrent implementation: the overhead of lineariz-
real-world dataset, we used the United States Census Bureau
ability, which is visible in a low dimension, gets subsumed
2010 TIGER/Line KML [1] dataset that consists of polylines
by the cost of the iterative scan, which visits almost every
describing map features of the United States of America.
node of the kD-tree as the dimension increases.
TIGER/Line is a standard dataset used for benchmarking
spatial databases. Two synthetic datasets represent more For the TIGER/Line dataset, in a single thread case, both
extreme cases. The SKEWED dataset described by Arge LFKD and LFKD(SC) perform at least 2.5× better than
et al. [2] in which different dimensions have different distri- Levy-Kd, and, it goes up to 19× in the NNS dominated
butions. The CLUSTER dataset [28] is an extension of a workload. Additionally, the PH-tree outperforms the Levy-
synthetic dataset previously described by Arge et al. [2]. It Kd only for workloads that do not involve NNS (00% NNS,
consists of 10000 clusters of points evenly spaced on a hor- 50% Add and 50% Remove).
izontal line. A detailed description of the datasets in [8]. We observe that for NNS dominated workload (90% NNS,
5% Add and 5% Remove), the LFKD(SC) achieves speed-
Workload and Methodology: We run each test for 5 ups up to 66× for SKEWED and up to 150× for CLUSTER
seconds and measured throughput as the operations per mi- datasets over the sequential implementations. These ob-
crosecond executed by all threads. We run each experi- servations can be partially attributed to the local-midpoint
ment in a separate instance of the JVM, starting off with rule, which carries the essence of the sliding-midpoint-splitting
a 2-second “warm-up” period. During this warm-up phase, rule of [22] that targets the extreme cases such as a CLUS-
we performed random Add, Remove and Contains opera- TER dataset, to a concurrent setting.
tions, and then flushed the tree at the end of the period. At For a mixed workload (50% NNS, 25% Add and 25%
the start of each execution, the data structure is pre-filled Remove), the performance of LFkD-tree degrades by in-
with a set of keys in the selected key-range. creasing the dimension. The throughput figures are higher
To simulate the variation in contention and tree struc- for the NNS dominated workload in lower dimensions than
ture, we chose following combination of workload configura- in mixed workloads. This is because the modify operations
incur higher synchronization (conflicts, expensive atomic op-
tions: i) dataset
 space dimension ∈ {2, 3, 4, 5}, ii) number of erations, and helping) overhead. However in higher dimen-
key entries ∈ {0-106 }, {0-107 } , iii) distribution of (Add-
Remove-NNS) ∈ {(05, 05, 90), (25, 25, 50), (50, 50 sions, the throughput of the NNS is lower as the number of
, 00)}, and iv) number of threads ∈ {1, 2, 4, 8, 16, 32}. We visited nodes increases tremendously with dimension.
have not included Contains operations in the experiments 5 Conclusion and Future Work
because essentially it would increase the proportion of exact-
match NNS. All executions use the same set of randomly For a large number of applications, which require a multi-
generated points for the selected workload characteristics. dimensional data structure supporting modifications along
The graphs present average throughput over 6 runs. with nearest neighbour search, research community has largely
focused on improving the design of sequential data struc-
4.1 Observations and Discussion tures. Parallel implementations of the sequential designs
The Figures 3, 4(a) and 4(b) show the performance of the im- speed up lNNS on a static data structure. However, they
plementations for TIGER/Line, SKEWED and CLUSTER do not address the issue of dynamic modifications in the
datasets respectively. In Figure 4(a) and 4(b), each row rep- datasets. On the other hand, the concurrent data structure
resents a combination of the range of key (k=N, N being the research is primarily confined to one-dimensional problems.
maximum) and the associated workload distribution while Our work is the first to extend the concurrent data struc-
each column the dimensionality of key (d=dimension). tures to problems covering multidimensional datasets. We
It is easy to observe that LFKD and LFKD(SC) have introduced LFkD-tree, a lock-free design of kD-tree, which
higher performance compared to both the PH-tree and the supports linearizable nearest neighbour search operations
Levy-Kd in single thread cases for all workload distributions. with concurrent dynamic addition and removal of data. We
Thus it is imperative that a sequential‡ implementation of provided a sample implementation which shows that the
‡ LFkD-tree algorithm is highly scalable.
We have not plotted the single thread performance sepa- Our method to implement the linearizable nearest neigh-
rately in the interest of the limited space. The graphs depict
the single thread comparison as well as the scalability of the LFkD-tree with respect to the number of threads.
(a) SKEWED(6) Dataset (b) CLUSTER Dataset
d=2 d=3 d=4 d=5 d=2 d=3 d=4 d=5
3 2.5

05 05 90
2.0

k=106
2 1.5
1 1.0
0.5
0 0.0
2.5

25 25 50
2.0

k=106
2 1.5
1 1.0
0.5
0 0.0

50 50 00
2.0
2

k=106
Throughput (ops/µs)

1.5
1 1.0
0.5
2.5

05 05 90
2.0

k=107
2
1.5
1 1.0
0.5
0 0.0

25 25 50
2.0
2

k=107
1.5
1 1.0
0.5
0 0.0
2.5 2.5

50 50 00
2.0 2.0

k=107
1.5 1.5
1.0 1.0
0.5 0.5

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Threads (2x) Threads (2x)
LFKD LFKD(SC NNS) Levy Kd Ph tree

Figure 4: Performance on the SKEWED(6) and CLUSTER datasets. A column corresponds to the dimensionality.
bour search is generic and can be adapted to other multidi- Efficient and reliable lock-free memory reclamation based on
mensional data structures. We plan to design lock-free data reference counting. IEEE Transactions on Parallel and
structures which are suitable for nearest neighbour search in Distributed Systems, 20(8):1173–1187, 2009.
high dimensions, such as the ball-tree [20]. We also plan to [15] T. L. Harris. A pragmatic implementation of non-blocking
linked-lists. In DISC, volume 1, pages 300–314. Springer, 2001.
extend our work to k-nearest neighbour (kNN) search. [16] M. P. Herlihy and J. M. Wing. Linearizability: A correctness
condition for concurrent objects. ACM Transactions on
6 References Programming Languages and Systems (TOPLAS),
12(3):463–492, 1990.
[1] https://www.census.gov/geo/maps-data/data/tiger.html. [17] S. V. Howley and J. Jones. A non-blocking internal binary
[2] L. Arge, M. D. Berg, H. Haverkort, and K. Yi. The priority search tree. In 24th SPAA, pages 161–171, 2012.
r-tree: A practically efficient and worst-case optimal r-tree. [18] J. Ichnowski and R. Alterovitz. Scalable multicore motion
ACM Trans. Algorithms, 4(1):9:1–9:30, 2008. planning using lock-free concurrency. Robotics, IEEE
[3] S. Arya and H.-Y. A. Fu. Expected-case complexity of Transactions on, 30(5):1123–1136, 2014.
approximate nearest neighbor searching. SIAM Journal on [19] S. D. Levy. KDTree. In edu.wlu.cs.levy.CG.KDTree.
Computing, 32(3):793–815, 2003. [20] T. Liu, A. W. Moore, and A. Gray. New algorithms for efficient
[4] J. L. Bentley. Multidimensional binary search trees used for high-dimensional nonparametric classification. Journal of
associative searching. CACM, 18(9):509–517, 1975. Machine Learning Research, 7(Jun):1135–1158, 2006.
[5] N. G. Bronson, J. Casper, H. Chafi, and K. Olukotun. A [21] A. W. Moore. Efficient memory-based learning for robot
practical concurrent binary search tree. SIGPLAN Not., control. Technical Report 209, University of Cambridge, 1991.
45(5):257–268, 2010. [22] D. M. Mount and S. Arya. Ann: a library for approximate
[6] T. Brown and H. Avni. Range Queries in Non-blocking k-ary nearest neighbor searching.
Search Trees, pages 31–45. Springer Berlin Heidelberg, 2012. http:// www.cs.umd.edu/ ˜mount/ ANN/ , 1998.
[7] B. Chatterjee. CKDTree. In https:// tinyurl.com/ hl3oktt. [23] A. Natarajan and N. Mittal. Fast concurrent lock-free binary
[8] B. Chatterjee. Concurrent linearizable nearest neighbour search search trees. In 19th PPoPP, pages 317–328, 2014.
in lockfree-kd-tree. Technical Report 2015:08, ISNN 1652-926X, [24] E. Petrank and S. Timnat. Lock-free data-structure iterators.
Chalmers University of Technology, 2015. In Distributed Computing, pages 224–238. Springer, 2013.
[9] B. Chatterjee. Lock-free linearizable 1-dimensional range [25] A. Prokopec, N. G. Bronson, P. Bagwell, and M. Odersky.
queries. In 18th ICDCN, pages 9–18. ACM, 2017. Concurrent tries with efficient non-blocking snapshots. In Acm
[10] B. Chatterjee, N. Nguyen, and P. Tsigas. Efficient lock-free Sigplan Notices, volume 47, pages 151–160. ACM, 2012.
binary search trees. In 33rd PODC, pages 322–331, 2014. [26] H. Samet. Foundations of multidimensional and metric data
[11] M. Dubois and C. Scheurich. Memory access dependencies in structures. Morgan Kaufmann, 2006.
shared-memory multiprocessors. Software Engineering, IEEE [27] H. Sundell and P. Tsigas. Fast and lock-free concurrent priority
Transactions on, 16(6):660–673, 1990. queues for multi-thread systems. J. Parallel Distrib. Comput.,
[12] F. Ellen, P. Fatourou, E. Ruppert, and F. van Breugel. 65(5):609–627, 2005.
Non-blocking binary search trees. In 29th PODC, pages [28] T. Zäschke, C. Zimmerli, and M. C. Norrie. The ph-tree: A
131–140, 2010. space-efficient storage structure and multi-dimensional index.
[13] J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm In SIGMOD 2014, pages 397–408, 2014.
for finding best matches in logarithmic expected time. [29] K. Zhou, Q. Hou, R. Wang, and B. Guo. Real-time kd-tree
Transactions on Mathematical Software (TOMS), construction on graphics hardware. ACM Transactions on
3(3):209–226, 1977. Graphics (TOG), 27(5):126, 2008.
[14] A. Gidenstam, M. Papatriantafilou, H. Sundell, and P. Tsigas.

You might also like