Professional Documents
Culture Documents
DOI 10.1007/s00778-016-0445-2
REGULAR PAPER
Received: 6 May 2016 / Revised: 8 September 2016 / Accepted: 14 October 2016 / Published online: 5 November 2016
© Springer-Verlag Berlin Heidelberg 2016
Abstract Given a set of facilities and a set of users, a reverse to answer SRTk queries. Then, we propose a novel regions-
k nearest neighbors (RkNN) query q returns every user for based pruning algorithm following Slice framework to solve
which the query facility is one of the k closest facilities. the problem. Our extensive experimental study on synthetic
Almost all of the existing techniques to answer RkNN queries and real data sets demonstrates that Slice is significantly
adopt a pruning-and-verification framework. Regions-based more efficient than all existing RkNN and SRTk algorithms.
pruning and half-space pruning are the two most notable
pruning strategies. The half-space-based approach prunes a Keywords Reverse k nearest neighbors · Reverse top-k
larger area and is generally believed to be superior. Influenced queries · Spatial queries · Nearest neighbors
by this perception, almost all existing RkNN algorithms uti-
lize and improve the half-space pruning strategy. We observe
the weaknesses and strengths of both strategies and discover 1 Introduction
that the regions-based pruning has certain strengths that have
not been exploited in the past. Motivated by this, we present Consider a set of facilities (e.g., supermarkets) and a set of
a new regions-based pruning algorithm called Slice that uti- users (e.g., residents). The influence set of a query facility
lizes the strength of regions-based pruning and overcomes q consists of every user u for which the query facility is
its limitations. We also study spatial reverse top-k (SRTk) an important facility. Since q is an important facility for
queries that return every user u for which the query facil- such users, these users are said to be influenced by q, e.g.,
ity is one of the top-k facilities according to a given linear the query facility can send targeted deals to these users to
scoring function. We first extend half-space-based pruning attract them. In this paper, we study two types of influence set
namely reverse k nearest neighbors and spatial reverse top-k
B Muhammad Aamir Cheema users. These two differ from each other due to the different
aamir.cheema@monash.edu
definition of important facilities. Below, we give more details
Shiyu Yang of each.
yangs@cse.unsw.edu.au
Xuemin Lin Reverse k Nearest Neighbors (Rk NN)Queries. In the con-
lxue@cse.unsw.edu.au text of RkNN queries, a facility is considered to be important
Ying Zhang for a user u if it is among her k closest facilities. An RkNN
ying.zhang@uts.edu.au query q returns every user u for which q is one of its k closest
Wenjie Zhang facilities. Consider the example of a restaurant. The people
zhangw@cse.unsw.edu.au for which this restaurant is one of the k closest restaurants
1
are its potential customers and may be influenced by targeted
School of Computer Science and Engineering, The University
marketing or deals.
of New South Wales, Sydney, Australia
2 Faculty of Information Technology, Monash University, Spatial Reverse Top-k Queries (SRTk) Queries. In RkNN
Melbourne, Australia queries, the notion of importance is based only on the
3 University of Technology, Sydney, Australia distance between users and facilities. However, in many real-
123
152 S. Yang et al.
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 153
than the existing algorithms for all settings (except when nitude better than the TPL-based algorithm and PCK in
k = 1). terms of both CPU cost and I/O cost.
1.2 Spatial reverse top-k queries The rest of the paper is organized as follows. In Sect. 2,
we present the related work. Our techniques to answer RkNN
Spatial reverse top-k query is closely related to the tra- queries are presented in Sect. 3. Section 4 presents the algo-
ditional reverse top-k query which has been extensively rithms to answer spatial reverse top-k queries. Experimental
studied [11,12,14,17,33,35–37,47] in the past few years. evaluation is presented in Sect. 5 followed by conclusion in
One major difference is that the traditional reverse top-k Sect. 6.
queries do not consider the spatial distance between the loca-
tions of users and facilities as one of the criterions. Due to
this important difference, the existing techniques to answer
2 Related work
traditional reverse top-k queries cannot be applied or easily
extended to answer spatial reverse top-k queries. Specifically,
2.1 Reverse k nearest neighbors
the traditional reverse top-k queries assume that each facility
f has a set of attributes (e.g., price) and, for each attribute,
RkNN queries have been extensively studied considering dif-
the facility has the same value for all the users (e.g., price is
ferent settings such as continuous RkNN queries [2,5,18,40],
the same for all the users). In contrast, the distance between
probabilistic RkNN queries [3,4,7,21], RkNN queries on
a user u and a facility f depends on the locations of u and f .
graphs [26,31,46], metric spaces [31] and ad hoc spaces [45],
Consequently, the distance value of a facility is different for
etc. Since the focus of this paper is on static queries in Euclid-
each user, and this important difference makes the existing
ean space, we provide a brief overview of the techniques to
techniques inapplicable for the spatial reverse top-k queries.
solve RkNN queries in Euclidean space.
To the best of our knowledge, the only existing technique
Korn et al. [19] were first to study RNN queries. They
that can be used to answer spatial reverse top-k queries has
answer the RNN query by pre-calculating a circle for each
been recently proposed in [23]. However, the proposed tech-
data object p such that the nearest neighbor of p lies on the
nique has certain limitations. It can only handle the case when
perimeter of the circle. RNN of a query q is every point that
k = 1. Furthermore, the techniques can only be applied
contains q in its circle. Techniques to improve their work
when the number of attributes/criterions is two (including
were proposed in [22,42].
the distance criterion). It requires non-trivial changes in the
Next, we briefly describe some of the most related
indexing and query processing algorithm to handle more
techniques that do not require pre-computation namely six-
attributes and k > 1. Our experiments demonstrate that
regions [28], TPL [30], FINCH [39] and InfZone [8]. A
our proposed approach significantly outperforms it even for
comprehensive experimental study comparing these algo-
k = 1 and two attributes both in terms of CPU time and I/O
rithms is presented in [44]. These techniques have two phases
cost.
namely pruning and verification. In the pruning phase, the
Below, we summarize our contributions for spatial reverse
space that cannot contain any RkNN is pruned by using the
top-k queries.
set of facilities. In the verification phase, the users that lie
within the unpruned space are retrieved. These are the possi-
– To the best of our knowledge, we are the first to propose
ble RkNNs and are called the candidates. Most of the existing
solutions for spatial reverse top-k queries for arbitrary
techniques verify a candidate by issuing a range query and
number of attributes/criterions and k ≥ 1. To showcase
checking whether q is one of its k nearest facilities or not.
the strength of regions-based pruning, based on several
We present only the pruning strategy of these techniques. The
non-trivial observations, we develop regions-based prun-
verification phase of these techniques (except of InfZone) is
ing techniques for SRTk queries and propose a highly
similar. Specifically, for each candidate u, a range query cen-
efficient algorithm.
tered at u and range set as dist (u, q) is issued and the user is
– As stated earlier, the existing technique (PCK) [23]
returned as a RkNN if the range contains less than k facilities.
can only handle at most two attributes and k = 1.
The verification phase of InfZone will be discussed later.
To demonstrate the effectiveness of our proposed algo-
rithm for arbitrary number of attributes and k > 1, we Six-regions. Stanoi et al. [28] proposed the first technique
carefully extend TPL [30] to efficiently answer SRTk that does not need any pre-computation. They solve RkNN
queries—TPL is arguably the most popular half-space- queries by partitioning the whole space centered at the query
based pruning approach for RkNN queries. We conduct q into six equal regions of 60◦ each (P1 to P6 in Fig. 2a). As
extensive experiments on real and synthetic data sets and stated in Sect. 1, the kth nearest facility of q in each region
demonstrate that Slice is up to several orders of mag- defines the area that can be pruned. In other words, assume
123
154 S. Yang et al.
Fig. 2 a Six-regions [28], b TPL [30] Fig. 3 a FINCH [39], b InfZone [8]
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 155
because when the bisectors of a, c and d are considered, the to the work by Chester et al. [12], the proposed techniques
facility b lies in the pruned area and is ignored. Cheema et are only applicable for two-dimensional data sets.
al. [8] presented several properties to reduce the number of We remark that the problem studied in this paper is differ-
facilities that must be considered in order to correctly com- ent from the traditional reverse top-k queries. Specifically,
pute the influence zone. The cost of computing the influence in the traditional reverse top-k queries, the attributes of a
zone using m facilities is O(km2 ) [9]. facility are the same for all users, e.g., price is the same
regardless of the user. In contrast, in spatial reverse top-k
2.2 Reverse top-k queries queries, the distance between facilities and users is also one
of the attributes/criterions. Note that the distance between
Reverse top-k query has gained significant research atten- a facility and users may be different for each user. Hence,
tion since it was first introduced by Vlachou et al. [34]. They the existing techniques cannot be applied or extended for the
proposed two variants of the reverse top-k queries namely spatial reverse top-k queries.
monochromatic and bichromatic reverse top-k queries. A The two most closely related works to the problem stud-
monochromatic query returns every possible scoring func- ied in this paper are presented in [23,36]. In [36], the
tion for which q is one of the top-k objects. On the other authors studied a continuous version of SRTk problem where
hand, in a bichromatic query, a set of scoring functions F the users are continuously moving. They identify various
is given and every function f ∈ F is returned for which q properties of spatial reverse top-k queries that enable effi-
is one of the top-k objects. Many variations of reverse top- cient monitoring of the result set. However, they assume
k query have also been studied in the past few years such that the initial results of spatial reverse top-k query are
as probabilistic reverse top-k query [17], continuous reverse already known, and they focus on updating the results
top-k query [14,36] and reverse top-k query on graphs [47] when the users move, without re-computing the results from
etc. Next, we briefly describe the most closely related work. scratch.
Vlachou et al. [34] proposed a threshold-based algo- Recently, Park et al. [23] proposed an algorithm (PCK) for
rithm which utilizes some geometric properties to eliminate the spatial reverse top-k (k = 1) queries. The PCK algorithm
unnecessary objects based on a threshold. The threshold is consists of both spatial and non-spatial pruning techniques.
maintained throughout the execution of the algorithm and is They also introduce Hausdorff distance to improve efficiency
improved as new users are accessed. They also proposed a of query processing. However, the online Hausdorff distance
space partitioning-based index structure to further improve computation is quite time-consuming. Furthermore, the pro-
reverse top-k query processing. posed techniques can only handle two attributes (including
In their follow-up work, Vlachou et al. [35,37] proposed the spatial distance). Also, the proposed techniques are only
techniques to further improve their earlier algorithm. In [35], applicable for k = 1. Extending the techniques for more
the authors discussed the geometric interpretation of the solu- than two attributes and k > 1 is non-trivial. Nevertheless, we
tion space for higher dimensionality. They also present a demonstrate that our proposed algorithm is several orders of
more comprehensive experimental study which evaluates the magnitude better than PCK in terms of both CPU and I/O
proposed algorithms for both monochromatic and bichro- cost.
matic reverse top-k query. In [37], they proposed an efficient
algorithm which processes reverse top-k query by traversing
the branch-and-bound data structures while utilizing novel 3 Answering reverse k nearest neighbors queries
lower and upper bounds.
Chester et al. [12] proposed an index structure specific for In Sect. 3.1, we formally define RkNN queries and its two
a fixed value k such that multiple reverse top-k queries can variations. The motivation behind our algorithm, Slice is
be efficiently answered using this index. If the value of k is presented in Sect. 3.2. Our pruning-and-verification tech-
unknown at the query time (which is usually the case), the niques are presented in Sects. 3.3 and 3.4, respectively. The
index is to be constructed on-the-fly which becomes the bot- overall algorithm is presented in 3.5.
tleneck. The proposed index pays off when numerous queries
use the same value of k. The proposed techniques are only 3.1 Problem definition
applicable to two-dimensional data sets.
Cheema et al. [11] proposed I/O and CPU efficient algo- RkNN queries are classified [19] into bichromatic RkNN
rithms to answer various ranking-related queries including queries and monochromatic RkNN queries.
the reverse top-k queries. They utilize the concept of k lower Bichromatic RkNN Queries. Consider a set of facilities F
envelope to efficiently answer the reverse top-k queries. Their and a set of users U . Given a query q ∈ F, a bichromatic
main reverse top-k query processing algorithm is based on an RkNN query returns every user u ∈ U for which q is one of
I/O optimal algorithm to compute k lower envelope. Similar its k closest facilities.
123
156 S. Yang et al.
Monochromatic RkNN Queries. Given a set of facilities F O(m) to O(log m) but also significantly improves the verifi-
and a query q ∈ F, a monochromatic RkNN query returns cation cost. However, note that the pruning cost of InfZone is
every facility f ∈ F for which q is one of its k closest still significantly higher than that of six-regions. The pruning
facilities. cost of InfZone for a facility is O(m) which is higher than
Consider a set of police stations. For a given police station the pruning cost of a user O(log m). This is because Inf-
q, its monochromatic RkNNs are the police stations for which Zone utilizes a different pruning criterion for facilities and
q is one of the k nearest police stations. Such police stations prunes only the facilities that are unable to further prune the
may seek assistance (e.g., extra policemen) from q in case of unpruned area.
an emergency event. As shown in Table 1, the bottleneck of the six-regions
Since most of the applications of RkNN queries are in approach is its verification phase. Six-regions approach ver-
location-based services, like existing techniques [8,28,39], ifies a candidate u by issuing a range query centered at u
the focus of this paper is on two-dimensional location data. with radius dist (u, q). This results in not only additional
Similar to most of the existing algorithms, we assume that computational cost but also I/O cost because the index is to
the users and facilities are indexed by two separate R*-trees be accessed to verify each candidate. Table 1 also shows the
called user R*-tree and facility R*-tree, respectively. Our expected number of candidates (i.e., the users that cannot be
proposed approach can be applied to answer both bichro- pruned) where |F| denotes the total number of facilities and
matic and monochromatic RkNN queries. However, for ease |U | denotes the total number of users. Hence, the verification
|
of presentation, we limit our discussion to bichromatic RkNN phase of six-regions approach is expected to issue 6k|U|F| range
queries unless specifically mentioned. Later, in Sect. 3.5.3, queries that dominates the total cost of the algorithm. Note
we show that our techniques can be easily applied for the that the expected number of candidates for six-regions is 6
case of monochromatic RkNN queries. times the expected number of candidates for InfZone. This
is due to the poor pruning power of six-regions approach.
3.2 Motivation Several techniques have been proposed to address the lim-
itations of half-space pruning (e.g., FINCH [39], InfZone
As discussed earlier, there are two most notable pruning [8]). Surprisingly, there is no work that tries to utilize the
strategies used by the existing techniques: half-space prun- strength of regions-based pruning (i.e., cheap pruning) and
ing and regions-based pruning. The advantage of half-space addresses its limitations (low pruning power and expensive
pruning is that it prunes a much larger area as compared to verification). This motivates us to carefully develop a tech-
the area pruned by six-regions (compare the shaded area in nique called Slice that utilizes the strengths of regions-based
Fig. 2a, b). The advantage of the regions-based approach is pruning and address its limitations.
that, once dk is computed, the cost of checking whether a Specifically, Slice uses a more powerful and flexible prun-
point p can be pruned or not is O(1) where dk is the distance ing approach that prunes a much larger area as compared to
between q and its kth nearest facility in the region that p six-regions with almost similar computational complexity.
lies in (see Sect. 2). Specifically, to check whether a point p Slice also significantly improves the verification phase by
is pruned by six-regions pruning, we only need to compare computing sigList for each partition Pi . The sigList of a
dist ( p, q) with dk . On the other hand, half-space pruning is partition Pi consists of a set of facilities that is sufficient
significantly more expensive. For instance, checking whether to verify every candidate u ∈ Pi . Hence, a candidate can
p can be pruned requires to check whether p lies in at least k be verified by accessing sigList instead of issuing a range
half-spaces or not. The cost is O(m) where m is the number query.
of facilities considered for pruning. Table 1 compares the cost of Slice with six-regions and
Table 1 compares six-regions approach [28] and InfZone InfZone. t denotes the number of partitions used by our
[8,9], the state-of-the-art half-space pruning-based approach. approach (typically 6–12). We remark that, in the worst case,
InfZone not only improves the pruning cost of a user from m may be as large as |F| where |F| is the total number of
facilities. Note that the pruning cost of our algorithm is signif-
Table 1 Comparison of computational complexities icantly smaller than InfZone and is quite close to six-regions
approach.
Operation Six-regions InfZone Slice
We also remark that a majority of the users are pruned by
Prune a facility O(1) O(m) O(t) “prune a user” operation and the number of candidates that
Prune the space O(m log k) O(km2 ) O(tm log k) require verification is usually small, e.g., when |U | = |F|,
Prune a user O(1) O(log m) O(1) the expected number of candidates is less than 3.1 k (see
Verify candidate Range query O(log m) O(k) theoretical analysis in [43]). Hence, the dominating cost of
Expected #cand 6k|U | k|U |
< 3.1k|U | InfZone and Slice is the pruning phase for which Slice is
|F| |F| |F|
significantly more efficient.
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 157
inequality holds.
123
158 S. Yang et al.
Fig. 5 Comparison with six-regions approach. a Prunes more area, b Fig. 6 a Shaded area is pruned, b lower arc
effect of number of partitions
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 159
123
160 S. Yang et al.
are the points where the bounding arc of P intersects the 3.5.1 Pruning algorithm
boundary of P (see Fig. 7b).
Algorithm 1 describes the pruning phase. The space around
Proof Figure 7b shows a facility f for which dist (M, f ) > q is divided into t equally sized partitions (t = 12 in our
r B and dist (N , f ) > r B (i.e., f lies outside the circles cen- experiments). In the conference version of this paper [43],
tered at M and N with radius r B ). We prove that the radius of we provided a detailed theoretical analysis to study the effect
the lower arc of f is not less than the radius of the bounding of t. The algorithm utilizes a min-heap h which is initial-
arc of P, i.e., r L ≥ r B . This implies that f cannot prune any ized by inserting the root of the facility R-tree. The entries
point that lies inside the bounding arc r B and is an insignif- are de-heaped iteratively from the heap. If the de-heaped
icant facility. Let θ = min Angle( f, P). If θ ≥ 90◦ , the entry cannot contain any significant facility for any partition,
facility is not a significant facility because it cannot prune we ignore it because it cannot prune space in any partition
any point p ∈ P (according to Lemma 3). (line 5). To check whether an entry e may or may not contain
If θ < 90◦ , the line that joins f and q intersects at least a significant facility, we apply Lemmas 4 and 5 for each par-
one of the circles of radius r B centered at M and N (this is tition P. Specifically, if e lies completely outside the shaded
because q lies at the boundary of these two circles). With- area shown in Fig. 7, e cannot contain a significant facility
out loss of generality, assume that the line f q intersects the for the partition P.
circle centered at N at a point C (as shown in Fig. 7b).
Now consider the triangle N Cq. Since N C = N q = r B , Algorithm 1: Filtering
N Cq = N qC = θ . The length of Cq can be obtained by 1 Divide the space around q in t equally sized partitions;
2 Insert root of facility R-tree in a min-heap h;
the following equation.
3 while h is not empty do
4 deheap an entry e;
2 5 if e may contain a significant facility for at least one partition
Cq = r B2 + r B2 − 2r B2 · cos(α) = 2r B2 (1 − cos(α)) then // apply Lemma 4 & 5 for each
partition P
if e is an intermediate node or leaf then
Since α = 180◦ − 2θ , the following is obtained.
6
7 insert every child c in h with key mindist (q, c);
8 else // e is a data object
2
Cq = 2r B2 (1 − cos(180◦ − θ )) = 2r B2 (1 + cos(2θ )) 9 pruneSpace(e) //Algorithm 2
2
Cq = 4r B2 cos2 (θ )
If e may contain a significant facility, it is processed as
Hence, Cq = 2r B cos(θ ). Since dist ( f, q) > Cq, we follows. If e is an intermediate or leaf node, every child c
( f,q)
have dist ( f, q) > 2r B cos(θ ). Recall that r L = dist
2 cos(θ) of e is inserted in the heap with its key set to mindist (q, c)
which implies that 2r L cos(θ ) > 2r B cos(θ ) or r L > r B . (line 7). The algorithm accesses the entries in ascending order
of mindist (q, c) because the facilities that are closer to the
Lemmas 4 and 5 demonstrate that the facilities that lie query are expected to prune a larger area. If e is a data object
outside the shaded area of Fig. 7 are not significant facilities (i.e., a facility), it is used to prune the space by calling Algo-
and are not required to verify any user u ∈ P. In fact, it rithm 2. The algorithm terminates when the heap becomes
can be proved that a facility is significant if and only if it empty.
lies in the shaded area of Fig. 7. We omit the proof due to
space limitations, but it can be obtained using the similar Algorithm 2: pruneSpace( f )
arguments. 1 for each partition P for which min Angle( f, P) < 90◦ do
2 if max Angle( f, P) ≥ 90◦ then
3 r Uf:P = ∞;
3.5 Algorithm 4 else
dist ( f,q)
5 r Uf:P = 2 cos(max Angle( f,P)) ;
Our algorithm has two phases: (i) pruning and (ii) verifica-
6 Set r B:P to the radius of kth smallest upper arc of P;
tion. In the pruning phase, the algorithm prunes the search 7 if f is a significant facility of P then // use Lemma 4&5
space using the set of facilities. It also identifies the set of 8 insert f in sigList of P in sorted order of
dist ( f,q)
significant facilities that are used later in the verification r Lf:P = 2 cos(min Angle( f,P)) ;
phase. In the verification phase, the set of users that lie in
the unpruned area are identified. These users are then veri-
fied as RkNN if there are at most k − 1 facilities closer to it Algorithm 2 describes how the space is pruned using a
than q. facility f . According to Lemma 3, a facility f cannot prune
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 161
any point p ∈ P if min Angle( f, P) ≥ 90◦ . Therefore, the counter is initialized to zero. This counter records the number
algorithm considers only the partitions that have minimum of facilities that prune u and is incremented whenever the
subtended angle from f less than 90◦ (line 1). For each such accessed facility f is found to prune u, i.e., dist (u, f ) <
partition P, the algorithm computes r Uf:P , the upper arc of dist (u, q) (line 7). If the counter is at least equal to k, the
f , as described in the previous section. Then, the bounding algorithm returns false because u is not RkNN of q (line 9).
arc r B:P of the partition P is updated (line 6). Recall that the At any stage, if dist (u, q) ≤ r Lf:P for the accessed facility f ,
bounding arc corresponds to the kth smallest upper arc of P. the user is confirmed as RkNN and the algorithm returns true
The algorithm uses a heap of size k to maintain k smallest (line 5). This is because counter is less than k and none of the
upper arcs. Hence, updating r B:P takes O(log k). remaining facilities can prune u (as implied by Lemma 2).
Finally, the facility is inserted in the sigList if it is a sig- Also, if the counter remains less than k after processing all
nificant facility of the partition P (by applying Lemma 4 facilities in sigList, the algorithm confirms u as RkNN and
or 5 depending on whether f lies in P or outside it). sigList returns true (line 10).
is maintained in sorted order of r Lf:P (line 8). The expected
size of sigList is O(k) [43]. Hence, the expected cost of 3.5.3 Answering monochromatic queries
line 8 is O(log k). The worst case size of sigList is O(m)
where m is the number of facilities considered for pruning. Since a facility f cannot prune itself, we cannot prune a facil-
Hence, the worst case insertion cost is O(log m). Since t par- ity f if it lies outside the upper arcs of k facilities. However,
titions are considered, the total expected cost of Algorithm 2 we can safely prune a facility that lies outside k + 1 upper
is O(t log k) and the total worst case cost is O(t log m). arcs, i.e., the bounding arc r B corresponds to the radius of
Since m is the total number of times Algorithm 2 is called (k +1)th smallest upper arc of a partition. Hence, the pruning
(at line 9 of Algorithm 1), the total time the algorithm spends phase is called by setting k to k + 1. In the verification phase,
in pruning the space is O(tm log m) in the worst case (while the facilities that lie inside r B are considered the candidates.
the expected cost is O(tm log k)). Each candidate f is verified by calling Algorithm 3 with a
minor change that the candidate facility f is skipped from
3.5.2 Verification algorithm sigList (at line 3) because f cannot prune itself.
In the verification phase, the users that do not lie in the pruned
area are shortlisted and are called candidate users. The ver- 4 Answering spatial reverse top-k queries
ification algorithm iteratively accesses the nodes of the user
R-tree. If an entry e (the intermediate node, leaf node or the In this section, we present our algorithms to answer spatial
user object) lies completely in the pruned area, it is ignored. reverse top-k queries. First, we formally define the problem,
Otherwise, if e is an intermediate or leaf node, its children and introduce terms and notations in Sect. 4.1. Then, we
are accessed iteratively. If e is a data object and does not lie in present a half-space-based algorithm in Sect. 4.2 followed
the pruned area, it is called a candidate object and is verified by our techniques to answer spatial reverse top-k queries
by calling isRkNN(u) (Algorithm 3). using Slice in Sect. 4.3.
123
162 S. Yang et al.
w[i] (for 1 ≤ i ≤ d) is the weight for each static attribute Table 3 Notations used for reverse top-k queries algorithms
f [i]. The score of a facility f w.r.t. a user u is denoted as Notation Definition
scor e(u, f ) and is computed as follows.
f [i] ith attribute of a facility
d
w[i] Weight of ith dimension
scor e(u, f ) = w[d + 1] · dist (u, f ) + w[i] · f [i] (1) d
fs i=1 w[i] · f [i] (static score of f )
i=1
scor e(u, f ) f s + w[d + 1] · dist (u, f )
f (qs − f s )/w[d + 1]
Top-k facilities. Given a scoring function W , the top-k facil-
e An entry of the facility R*-tree
ities of a user u are the k facilities having the smallest scores
e.min[i] Minimum value of e in ith dimension
w.r.t. the user u. In other words, a facility f is one of the
top-k facilities for a user u if there are less than k facilities e.max[i] Maximum value of e in ith dimension
d
that have a score smaller than scor e(u, f ). emin
s i=1 w[i] · e.min[i]
d
emax
s i=1 w[i] · e.max[i]
Spatial reverse top-k (SRTk) query. Given a query facility
min
e (qs − emax
s )/w[d + 1]
q and a scoring function W , a spatial reverse top-k query
max (qs − emin
s )/w[d + 1]
returns every user u for which the query facility is one of its e
top-k facilities.
Remark. This definition differs from traditional reverse top- Although our proposed techniques can be applied on any
k queries (e.g., [33]) that assume that the users may have branch-and-bound data structure, we assume that both the
different scoring functions. Our definition is inspired by the sets of facilities and users are indexed by two different R*-
existing work on spatial reverse top-k query [23] where the trees. The tree that indexes the users is called the user R*-
query defines a scoring function that is assumed to be the tree, and the tree that stores the facilities is called the facility
same for all users. Although we intend to investigate it in R*-tree. The user R*-tree is a 2-dimensional R*-tree that
future, in this paper, we do not focus on the definition that indexes the location coordinates of the users. The facility
assumes unique scoring functions for the users mainly due R*-tree is a (d + 2)-dimensional R*-tree that indexes d static
to the following reason. attributes and two location coordinates for each facility f . We
While the locations of the users are easily obtainable, their remark that converting the d static attributes of each facility
preferences (i.e., scoring functions) are not well defined and to a one-dimensional static score using the scoring function
are usually unknown to the system. Even if all users are able W and indexing in a 3-dimensional R*-tree (two location
to define suitable scoring functions and the system manages coordinates and one static score) is not a feasible solution
to obtain these, it cannot be guaranteed that the users will use because such an index would not support queries involving
the same scoring functions in future, e.g., a user who usually a different scoring function W .
gives higher weighting to price compared to the distance may Static scores of a facility entry. Let e be an entry (inter-
give a higher weighting to distance when he is in hurry. Since mediate or leaf node) of the facility R*-tree. We denote
the essence of reverse queries is to analyze the potential influ- e.min[i] (respectively, e.max[i]) to denote the minimum
ence of a query facility as compared to the other facilities in (respectively, maximum) value of e for the ith static attribute.
the system, our definition allows the query facility to analyze The minimum (respectively, maximum) static score of an
its impact without requiring the scoring functions of all the entry min (respectively, emax ) and emin =
d e is denoted as es max d s s
i=1 w[i] · e.min[i] and es = i=1 w[i] · e.max[i].
users. If required, the query facility may further analyze the
influence by issuing different queries each using a different Given a point p, maxdist ( p, e) (respectively, mindist
scoring function. ( p, e)) corresponds to the maximum (respectively, mini-
mum) possible Euclidean distance between the entry e and
4.1.2 Terms and notations the point p (considering only the location coordinates).
Table 3 summarizes the terms and notations used throughout
Static score of a facility. The static score of a facility f this section.
d
(denoted as f s ) is i=1 w[i] · f [i]. It is called static score
because it remains the same for every user and does not 4.2 Half-space-based solution
depend on the distance between f and users.
Note that scor e(u, f ) can be computed by rewriting In this section, we present a solution based on the half-space-
Eq. (1) as follows. based approach used to answer RkNN queries. The perpen-
dicular bisector-based pruning used for RkNN queries is not
scor e(u, f ) = f s + w[d + 1] · dist (u, f ) (2) applicable for the reverse top-k query. However, we note
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 163
123
164 S. Yang et al.
We say that a facility f prunes the whole data space if it Lemma 8 There exists at least one facility f ∈ e that prunes
prunes every point in the data space. The next lemma identi- the whole data space if Min Max Dist (q, e) < min e where
fies the facilities that prune the whole data space. Min Max Dist (q, e) is the upper bound on the minimum
dist ( f, q) for every f ∈ e as defined in [25].
Lemma 6 A facility f prunes the whole data space if
Proof By definition of Min Max Dist (q, e), there exists a
dist ( f, q) < .
facility f for which dist (q, f ) ≤ Min Max Dist (q, e).
Proof Let u be an arbitrary user located anywhere in the Since f ≥ min e for every facility f ∈ e, we have
data space. Due to the triangular inequality, dist ( f, u) − dist (q, f ) < f which implies that f prunes the whole
dist (q, u) ≤ dist ( f, q). Since dist ( f, q) < , we have data space.
dist ( f, u)−dist (q, u) < . Hence, u is pruned by f (Corol-
If there is an entry e that satisfies the condition in Lemma 8,
lary 1).
the algorithm increments the counter by 1 and its children are
Example 4 Consider the example of Fig. 8a where dist (q, f ) inserted in the heap to be processed later.
= 5 and = 6. Regardless of the location of u, every user u Note that applying Lemmas 6, 7 and 8 together may incor-
can be pruned by f . This is intuitive because if two facilities rectly update the counter. For example, assume that an entry
are reasonably close to each other and one facility (say f ) e has only two facilities f 1 and f 2 and e satisfies Lemma 8,
has a significantly better static score than the other (say q) f 1 satisfies Lemma 6, and f 2 does not satisfy the condition
then every user will prefer f over q. of Lemma 6. The correct counter after processing the entry
e is one because there is only one facility that prunes the
Note that a query q cannot have any reverse top-k users whole data space. However, when e is processed the counter
if there are at least k facilities that satisfy Lemma 6 (i.e., is incremented by one. When its child f 1 is accessed, the
dist ( f, q) < for each such facility). The algorithm main- counter is incremented again which is incorrect. We avoid
tains a counter that records the number of facilities that prune this issue as follows.
the whole data space. The algorithm terminates and returns We maintain a list that records the entries that satisfy
empty results if the counter reaches k. Lemma 8, and the list is kept sorted on IDs of the entries
Next, we extend Lemma 6 to an entry of the facility R*- for logarithmic search. If a facility f or an entry e satisfies
tree. First, we extend the notion of static score gap for an one of the Lemmas 6, 7 and 8, we check whether its parent e
entry of the facility R*-tree. Let e be an entry of the facility is present in the list or not. If its parent e is not present in the
R*-tree, and emax
s and emin
s be the maximum and minimum list, the algorithm increments the counter as described above
static scores (as defined in Sect. 4.1), respectively. The fol- (depending on which of the three lemmas are applicable).
lowing two equations define the maximum static score gap However, if its parent e is found in the list, then the counter
max
e and minimum static score gap min e between q and e. is first decremented by one and then incremented accordingly
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 165
Lemma 10 Every facility f ∈ e is an unnecessary facility Definition 7 Critical point/distance. We say that a point p θ
if maxdist (q, e) ≤ −max is a critical point on L θ if the scores of f and q are the
e .
same for a user that is located at p θ , i.e., scor e( f, p θ ) =
Proof For every f ∈ e, dist (q, f ) ≤ maxdist (q, e) and scor e(q, p θ ). The distance dist (q, p θ ) is called the critical
f ≤ max
e . Hence, dist (q, f ) ≤ − f for every f ∈ e.
distance and is denoted as d θ .
Hence, f is an unnecessary facility (as per Lemma 9).
The next lemma identifies the critical point/distance on a
During the execution of the algorithm, any facility entry ray L θ .
e that satisfies the condition in Lemma 10 is pruned.
Lemma 11 Consider a facility f and a query q such that
dist (q, f ) ≥ ||. For a ray L θ , the critical distance is d θ =
4.3 S LICE: a regions-based solution dist (q, f )2 −2
2(+dist (q, f ) cos θ) .
In this section, we extend Slice to efficiently answer the
spatial reverse top-k queries. Our proposed solution follows Proof We abuse the notation and use p to denote p θ , the
the same framework as Slice for RkNN queries. However, critical point on L θ . Considering the triangle pq f (with
the underlying techniques (e.g., how to compute upper arc side lengths a, b and c) in Fig. 9, we compute a = dist ( p, f ).
and significant facilities etc.) are different. In Sect. 4.3.1,
we present our regions-based pruning techniques to prune a 2 = b2 + c2 − 2bc cos θ (6)
the search space using a facility f . Section 4.3.2 presents
the techniques to identify significant facilities. Finally, we Recall that the scores of q and f are equal for a user u if
present the overall algorithm in Sect. 4.3.3. dist ( f, u) − dist (q, u) = . Since p is the critical point,
123
166 S. Yang et al.
Fig. 10 a Pruning using upper arc, b f is not significant
Lemma 12 A user u that lies on a valid ray L θ and has Based on Property 1, the next lemma defines the area of a
dist (u, q) > d θ is pruned by f , i.e., scor e(u, f ) < partition P that can be pruned by a facility f .
scor e(u, q).
Lemma 14 Let θmax = max Angle( f, P) for a partition P
Proof For the critical point p on L θ , we know that scor e and f (see Definition 2) and θmax < cos −1 (− dist (q, f ) ).
( p, f ) = scor e( p, q) which implies that dist ( p, f ) − Every user u ∈ P can be pruned if dist (u, q) > d θmax , i.e.,
dist ( p, q) = . We add dist (u, p) to both of the terms dist (q, f )2 −2
dist (u, q) > 2(+dist (q, f ) cos θmax ) .
on the left-hand side of this equation which gives
Proof Figure 10a shows a user u that lies in the partition P
dist ( p, f ) + dist (u, p) − (dist ( p, q) + dist (u, p)) = and dist (u, q) > d θmax . Let α be the subtended angle of u,
i.e., α = f qu. Since α ≤ θmax , d α ≤ dmax θ (Property 1).
α
Hence, dist (u, q) > d . Furthermore, since α ≤ θmax <
Note that dist ( p, q) + dist (u, p) = dist (u, q) (see Fig. 9).
On the other hand, dist ( p, f ) + dist (u, p) > dist (u, f ) cos −1 (− dist α
(q, f ) ), the ray L is a valid ray. According to
(due to the triangular inequality2 ). Hence, dist (u, f ) − Lemma 12, u can be pruned by f .
dist (u, q) < . Thus, according to the inequality (3),
Based on Lemma 14, we describe how to compute the
scor e(u, f ) < scor e(u, q).
upper arc for reverse top-k queries.
Lemma 13 A user u that lies on a ray L θ and has
Definition 8 Upper arc. Upper arc of a facility f w.r.t. a
dist (u , q) ≤ d θ cannot be pruned by f , i.e., scor e(u , f ) ≥
partition P is the arc centered at q with radius d θmax where
scor e(u , q).
θmax = max Angle( f, P) and θmax < cos−1 (− dist (q, f ) ).
Proof For the critical point p on L θ , we know that dist ( p, f ) U
The radius of the upper arc is denoted as r f :P (or simply r U
− dist ( p, q) = . We subtract dist (u , p) from both terms when the facility f and the partition P are clear by context).
in the above equation which yields If θmax ≥ cos−1 (− dist(q, f ) ), r f :P = ∞.
U
2 Lemma 20 in “Appendix” section shows that dist (u, f ) can never be Figure 10a shows the upper arc of f for the partition P.
equal to dist ( f, p) + dist (u, p)) even when f lies on the ray L θ . We say that a point lies outside (respectively, inside) an arc
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 167
of radius r if its distance from the center of the arc is greater shows the details of pruning the search space using a facility
(respectively, smaller) than r . According to Lemma 14, a f . Since upper arc can only be computed if dist (q, f ) > ||,
facility prunes area outside the upper arc of f for every par- the algorithm first ensures that dist (q, f ) > || (line 1).
tition P for which θmax < cos−1 (− dist (q, f ) ). In Fig. 10a, Then, for each partition P for which min Angle( f, P) <
every user that lies in the shaded area can be pruned by f . cos −1 (− dist(q, f ) ), the algorithm computes the upper arc
Recall that a user u cannot be the reverse top-k (i.e., it U
r f :P and updates the bounding arc if required (lines 3–6).
can be filtered), if it is pruned by at least k facilities. In other The lines 7–9 are used to insert lower arcs of significant
words, a user u can be filtered if it lies outside the upper arcs facilities that are identified as described in the next section. As
of at least k facilities. To efficiently employ this filtering rule, we will show later, this significantly improves the verification
we maintain kth smallest upper arc in each partition which phase of the algorithm. For now, the readers can ignore these
is called the bounding arc and is defined below. lines.
Definition 9 Bounding arc. The kth smallest upper arc of a
4.3.2 Identifying significant facilities
partition P is called its bounding arc. Its radius is denoted as
r B:P (or simply r B when the partition is clear by context). If
Recall that a facility f is called a significant facility of a
there are less than k upper arcs in a partition P, then r B:P =
partition P if it prunes at least one point p ∈ P that lies
∞.
in the bounding arc of P, i.e., scor e( p, f ) < scor e( p, q)
Next, we describe how to compute lower arc of a partition for at least one p ∈ P for which dist ( p, q) ≤ r B:P . In this
P w.r.t. a facility f . section, we describe techniques on how to determine whether
an entry e may contain a significant facility or not. The next
Lemma 15 Let θmin = min Angle( f, P) for a partition P lemma identifies a facility that is not a significant facility for
and f (see Definition 2) and θmin < cos −1 (− dist
(q, f ) ). a partition P.
The facility f cannot prune any user u ∈ P for which
dist (u, q) < d θmin . Lemma 16 A facility f is an insignificant facility for a par-
tition P if f lies inside the partition P and dist (q, f ) >
Proof Let α be the subtended angle of u. Since u ∈ P, α ≥ 2r B:P + .
θ
θmin . Hence, d α ≥ dmin (Property 1). Since dist (u, q) <
θ Proof Consider the example of Fig. 10b that shows a parti-
dmin , we have dist (u, q) < d α . According to Lemma 13, u
tion P, a facility f , the bounding arc r B:P (shown in solid
cannot be pruned by f .
line) and an arc with radius 2r B:P + (shown in broken
Definition 10 Lower arc. Lower arc of a facility f w.r.t. a line). We prove that f is not significant by showing that
partition P is the arc centered at q with radius d θmin where scor e(u, f ) ≥ scor e(u, q) for every user in P that lies inside
θmin = min Angle( f, P) and θmin < cos−1 (− dist the bounding arc, i.e., for which dist (u, q) ≤ r B:P .
(q, f ) ).
Figure 10b shows a user u that lies inside the bounding arc.
The radius of the lower arc is denoted as r Lf:P (or simply r L
Due to the triangular inequality, dist (u, f ) ≥ dist (q, f ) −
when the facility f and the partition P are clear by context).
dist (u, q). Since dist (q, f ) > 2r B:P + and dist (u, q) ≤
If θmin ≥ cos−1 (− dist(q, f ) ), r f :P = ∞.
L
r B:P , we have dist (u, f ) ≥ (2r B:P + ) − r B:P . This can
be rewritten as dist (u, f ) − r B:P ≥ . Since dist (u, q) ≤
Algorithm 4: pruneSpace( f ) r B:P , we have dist (u, f )−dist (u, q) ≥ . Hence, f cannot
1 if dist ( f, q) > || then prune the user u (Corollary 1).
2 θ ← cos −1 ( dist−( f,q) );
3 for each partition P for which min Angle( f, P) < θ) do
Next, we extend this lemma for an entry e of the facility
4 if max Angle( f, P) < θ then // otherwise r Uf:P is R*-tree.
∞ Lemma 17 Every facility f ∈ e is an insignificant facil-
5 r Uf:P ← d max Angle( f,P) ;
ity for a partition P if f lies inside the partition P and
6 Set r B:P to the radius of kth smallest upper arc of P;
mindist (q, e) > 2r B:P + max
e .
7 if f is a significant facility then
8 r Lf:P ← d min Angle( f,P) ; Proof For every f ∈ e, f ≤ max e and dist (q, f ) ≥
9 insert f in sigList of P in sorted order of r Lf:P ; mindist (q, e). Hence, we have dist (q, f ) > 2r B:P + f
for every f ∈ e. Hence, according to Lemma 16, f cannot
be a significant facility.
Whenever we access a facility f , we prune the search The above two lemmas define the conditions to identify
space by computing its upper arc for the relevant partitions whether a facility f that lies inside a partition P is a signifi-
and updating the bounding arcs if required. Algorithm 4 cant facility or not. The next lemma defines the condition to
123
168 S. Yang et al.
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 169
Hence, according to Lemma 18, f cannot be a significant e can satisfy Lemma 7 only if it satisfies the condition in
facility.
Lemma 8. Therefore, we first check whether the condition in
Lemma 8 is satisfied (line 8). To correctly update the value
4.3.3 Algorithm of k, we first increment k by 1 if the parent of e is present in
the list L (line 10) and then delete its parent from L (line 11).
Now, we are ready to present the algorithm to answer spatial If all the facilities in f prune the whole data space (i.e., e
reverse top-k queries. satisfies Lemma 7), then the value of k is decremented by
|e|. In this case, the entry can be discarded because its chil-
Filtering Algorithm. Algorithm 5 shows the details of the
dren are not needed to be accessed (line 15). On the other
filtering algorithm. The space is divided into t equal sized
hand, if the entry e satisfies Lemma 8 but does not satisfy the
partitions. A list is introduced to store intermediate nodes
condition in Lemma 7, then the value of k is decremented by
and initialized to empty. A min-heap is initialized with the
one because there exists at least one facility that prunes the
root entry of the facility R*-tree. Since the entries that have
whole data space (line 17). Such an entry e is inserted in L to
smaller static scores and are closer to the query are expected
correctly update the value of k later (line 18). At any stage,
to prune a larger area, each entry e is inserted in the heap with
if the value of k is not greater than zero (lines 14 and 19), the
e +w[d +1]·mindist (e, q) (line 22). The entries are
key max
algorithm returns empty because the query is a futile query.
iteratively deheaped from the heap until the heap becomes
The rest of the algorithm (lines 20–24) is quite similar
empty. Each deheaped entry e is processed as follows.
to the filtering algorithm (Algorithm 1) for RkNN queries.
Algorithm 5: Filtering Specifically, if an entry e may contain a significant facilities,
1 Divide the space around q in t equally sized partitions; it is processed as follows. If e is an intermediate node or leaf
2 Initialize an empty list L; node, its children are inserted in the heap. However, if e is
3 Insert root of facility R*-tree in a min-heap h; an object (i.e., a facility), it is used to prune the search space
4 while h is not empty do
by using Algorithm 4.
5 deheap an entry e;
6 if maxdist (e, q) < −max e then // Lemma 10 Verification Algorithm. The verification algorithm is quite
7 continue;
similar to the verification algorithm for RkNN queries (Algo-
8 if Min Maxdist (e, q) < min e then // Lemma 8 rithm 3). In the verification phase, the users that cannot be
9 if parent of e exists in L then
10 k ← k + 1; filtered are shortlisted and are called candidate users. This is
11 remove parent of e from L; done by iteratively accessing the nodes of the user R*-tree
12 if maxdist (e, q) < min then // Lemma 7 and checking whether the node can be pruned by upper arcs of
e
13 k ← k − |e|; the relevant partitions or not. If not, its children are inserted in
14 if k ≤ 0 then terminate algorithm and return empty; a stack to be iteratively accessed later. If the accessed entry
15 continue; is a data object (e.g., a user) and cannot be filtered, it is a
16 else // satisfies conditions in Lemma 8 candidate object and is verified by calling Algorithm 6.
17 k ← k − 1;
18 insert e in L;
Algorithm 6: isReverseTopk(u)
19 if k ≤ 0 then terminate algorithm and return empty;
Output : Returns true if u is reverse top-k. Otherwise, returns
20 if e may contain a significant facility for at least one partition false.
then 1 Let P be the partition in which u lies;
//Lemma 17 & 19 for each partition 2 counter=0;
21 if e is an intermediate node or leaf then 3 for each f ∈ sigList of P in ascending order of r Lf:P do
22 insert every child c of e into the heap h with key 4 if r Lf:P > dist (u, q) then
max
c + w[d + 1]mindist (q, c);
5 return true;
23 else // e is a data object
6 if scor e(u, f ) < scor e(u, q) then
24 pruneSpace(e) //Algorithm 4
7 counter ← counter + 1;
8 if counter ≥ k then
9 return false;
In lines 6–19, we apply the observations presented in
Sects. 4.2.2 and 4.2.3. Below, we provide a quick overview, 10 return true;
and the readers are referred back to these sections for a
detailed explanation of the ideas. If every facility f ∈ e is an Algorithm 6 presents the details of how to check whether
unnecessary facility (see Sect. 4.2.3), e is discarded (line 7). a candidate user u is an answer or not. It first determines
The algorithm then checks whether the entry e contains one the partition P in which the candidate user u lies. The algo-
or more facilities that prune the whole data space by using rithm also initializes a counter to zero that records the number
Lemmas 7 and 8 presented in Sect. 4.2.2. Note that an entry of facilities that have a better score than q. Then, it itera-
123
170 S. Yang et al.
14 Verification Verification
tively accesses the facilities in the sigList of the partition 12
Pruning 70 Pruning
60
# IO
f , the algorithm increments the counter if scor e(u, f ) <
8 40
6 30
scor e(u, q). If the counter reaches k, the user u cannot be 4 20
2 10
the reverse top-k and the algorithm returns false (see 6–9). 0 0
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
X
X
F
F
IC
IC
IC
IC
IC
IC
IC
IC
If the counter does not reach k and all the facilities in the 50000 100000 150000 200000
E
50000 100000 150000 200000
sigList have been accessed, the algorithm returns true, indi- (a) (b)
cating that u is an answer (line 10). At any stage, if r Lf:P
Fig. 13 Monochromatic queries: effect of data set size (normal distri-
of the next accessed facility f is larger than dist (u, q), the bution). a Number of facilities, b number of facilities
algorithm returns true (line 5), i.e., u is an answer. This is
because (1) f cannot prune u because r Lf:P > dist (u, q) 20
Verification
Pruning
60
Verification
Pruning
50
(Lemma 15) and (2) since the algorithm accesses the facil-
# IO
30
10
prune the user u because for each such remaining facility f , 20
5
r Lf :P > dist (u, q). 10
0 0
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
X
X
F
F
IC
IC
IC
IC
IC
IC
IC
IC
50000 100000 150000 200000
E
50000 100000 150000 200000
(a) (b)
5 Experimental evaluation
Fig. 14 Bichromatic queries: effect of data set size (normal distribu-
5.1 Reverse k nearest neighbors queries tion). a Number of facilities, b number of facilities
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 171
20 20 Verification
Verification Verification Pruning
10
18 Pruning Pruning 20
173
64
16
CPU Cost (ms)
46
14 15
32
33
86
37
53
12
9
15
31
29
48
27
10 10
22
27
9
16
8
16
11
9
12
6 10
9
5
14
4 8
8
10
12
13
13
14
14
2
10
11
7
7
9
7
8
9
6
5
5
6
0 0 5
17
14
13
17
13
17
12
12
11
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
X
X
F
F
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
15
14
14
13
13
12
E
E
1 5 10 15 20 25 1 5 10 15 20 25
0
(a) (b)
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
X
X
F
F
IC
IC
IC
IC
IC
IC
IC
IC
IC
E
E
(U,U) (R,U) (N,U) (U,R) (R,R) (N,R) (U,N) (R,N) (N,N)
35 Verification Verification
457
500
68
20 25 Pruning Pruning
Verification SIX 30
65
18
16 20 SLICE 25
51
CPU Cost (ms)
37
14 300
# IO
20
28
40
204
12
22
34
15 15
17
200
15
27
10
8 10
10 100
40
29
29
29
14
6 5
18
15
13
14
12
20
16
15
14
13
15
15
15
15
15
15
20
4
18
0 0
17
13
15
5
13
SI
SL
SI
SL
SI
SL
SI
SL
SI
SL
SI
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
10
X
11
X
F
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
25% 50% 100% 200% 400% 2 5 10 20 40 100
9
E
E
E
0
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
0
(a) (b)
X
X
F
F
IC
IC
IC
IC
IC
IC
1 5 10 15 20 25 1 5 10 15 20 25
E
(a) (b)
Fig. 18 a % of users, b effect of buffer size
Fig. 16 Bichromatic queries: effect of k. a Varying k (real data), b
varying k (normal distribution)
5.1.4 Effect of data distribution
dislayed above the bars correspond to the number of I/Os In Fig. 17, we study the effect of data distribution on each
unless mentioned otherwise. Several existing works show the algorithm. The data distribution of the facilities and the users
total cost of the algorithms by penalizing each algorithm for is shown as (D f , Du ) where D f and Du correspond to the
each I/O (e.g., average I/O cost for SSD disks is less than distribution of facilities and users, respectively. U, R and N
0.1 ms [32] so each algorithm may be charged 0.1 ms per correspond to uniform, real and normal distribution, respec-
I/O). We do not follow this approach mainly because the I/O tively. For instance, (U,N) correspond to the data set where
cost is highly system specific (e.g., type of disk drive used, the facilities follow uniform distribution and the users follow
workload). Nevertheless, the interested readers can estimate normal distribution. Since real data set consists of two sets
the I/O cost by charging say 0.1 ms for each I/O. We remark each containing almost 87,900 points, in this experiment, the
that under our system settings, the I/O cost for each algorithm synthetic data sets also contain the same number of points.
is negligible as compared to its CPU cost. Figure 17 demonstrates that Slice is significantly faster than
the other two algorithms for all different combinations of
5.1.3 Effect of k data distribution.
In Figs. 15 and 16, we study the effect of k for mono- 5.1.5 Effect of number of users
chromatic and bichromatic RkNN queries, respectively. The
performance of InfZone rapidly deteriorates as the value of In this experiment, we fix the number of facilities to 100,000
k increases. Recall that the cost of pruning the space for Inf- and change the number of users to see the effect of change
Zone is O(km2 ). The value of m increases as k increases. in the relative size of the two data sets. The data set shown
Hence, the pruning phase becomes quite expensive as can as x % correspond to the case when the number of users
be observed in Figs. 15 and 16. The number of I/Os for our is x % of the number of facilities, i.e., the number of users
algorithm is larger than that of InfZone but is much lower as is 100,000×x
100 . Figure 18a shows that the cost of six-regions
compared to the number of I/Os for six-regions. approach increases significantly with the increase in the num-
The CPU cost of Slice is lower than InfZone except when ber of users. On the other hand, the cost of InfZone and Slice
k = 1. This is because when k = 1, m is also small and the is not significantly affected. This is because the number of
pruning cost of InfZone is low. However, note that Slice candidates increases as the number of users increases. Since
scales much better than the other two algorithms and is sev- the verification phase for the six-regions approach is sig-
eral times more efficient for larger k. In Fig. 16b, we show the nificantly more expensive, its cost is severely affected. In
results for normal distribution using lines (instead of bars) to contrast, InfZone and Slice have much more efficient ver-
clearly demonstrate how the three algorithms scale with the ification techniques. Therefore, the cost does not increase
increase in k. substantially.
123
172 S. Yang et al.
70000 450
Table 5 Experimental settings (spatial reverse top-k queries) PCK
47691
60000 Verification 400
Filtering
# Results 350
50000
33865
Parameter Range 300
# Results
IO Cost
24193
40000 250
30000 200
12885
150
Number of facilities (×1000) 100, 1000
6467
4960
20000
3344
1345
100
521
521
999
389
388
10000
216
77
17
10
50
Number of users (×1000) 25, 50, 100, 200, 400, 1000
4
0 0
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
L
IC
IC
IC
IC
IC
IC
IC
K
K
Number of static attributes 1, 2, 3
E
0.1% 0.5% 1% 5% 10% 50% 100%
CPU Cost(ms)
16851.01
300
# Results
25000
250
9369.72
20000
200
5255.46
15000
2909.19
150
2734.16
2413.61
765.34
537.60
118.75
120.60
10000
189.36
190.24
100
67.65
32.04
2.74
0.31
0.26
0.08
0.07
5000 50
5.1.6 Effect of buffer size 0 0
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
L
IC
IC
IC
IC
IC
IC
IC
K
K
E
E
0.1% 0.5% 1% 5% 10% 50% 100%
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 173
6891
6859
8000 PCK PCK PCK
1226
Verification 1400 Verification 300 Verification
222.57
120
6016
7000 Filtering Filtering Filtering
192.36
# Results 1200
186.24
CPU Cost(ms)
184.88
179.57
6000 100 250
# Results
1000
IO Cost
IO Cost
5000 200
3670
3663
3609
80
117.60
105.83
103.17
800
98.35
4000
530
2692
521
519
518
83.68
60 150
460
600
390
388
386
385
3000
1542
1432
40 400 100
2000
789
0.88
0.20
0.12
0.06
0.07
50
288
200
153
20
21
1000
46
34
7
4
2
1
36
26
25
16
2
0 0 0 0
PC
TP K
SLL
PC
TP K
SL L
PC
TP K
SLL
PC
TP K
SL
PC
TP K
SL L
PC
TP K
SL L
PC
TP K
SLL
PC
TP K
SL L
PC
TP K
SLL
PC
TP K
SL L
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
IC
IC
IC
L
IC
IC
IC
IC
IC
IC
IC
L
IC
IC
IC
IC
IC
IC
IC
K
E
E
E
0.1% 0.5% 1% 5% 10% 50% 100% 0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9
Filtering 120
4000 # Results
CPU Cost(ms)
2448.12
100
CPU cost
2034.83
# Results
1668.42
80
1631.18
1620.92
3000
10000
60 PCK PCK
2000 5000
668.68
Verification Verification
542.61
6779
374.62
Filtering Filtering
256.12
3055.18
40
2988.89
8000
72.75
53.68
2734.16
12.40
CPU Cost(ms)
4000
3.34
1000
11.71
5.98
5.49
5.36
1.12
0.66
0.02
4960
20
4531
IO Cost
6000
4266
1654.27
1624.32
0 0 3000
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
1001.07
2617
L
IC
IC
IC
IC
IC
IC
IC
4000
K
765.34
1998
725.57
2000
E
576.04
1345
1281
0.1% 0.5% 1% 5% 10% 50% 100%
279.59
1026
2000
23.38
1000
1.75
0.13
0.17
0.31
100
13
10
(b)
7
3
0 0
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SL L
PC
TP K
SLL
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
E
E
0.1 0.3 0.5 0.7 0.9 0.1 0.3 0.5 0.7 0.9
Fig. 20 Effect of query quality (synthetic data set). a I/O cost, b CPU
cost (a) (b)
123
174 S. Yang et al.
110.72
65338.01
140 70 Verification Verification
52.63
3758
Verification Verification 90000
Filtering Filtering Filtering Filtering
120 60 4000 80000
CPU Cost(ms)
42886.74
2855
CPU Cost(ms)
100 50 70000
IO Cost
2424
63.36
60000
IO Cost
80 40 3000
22.78
46.01
50000
17849.89
36.74
34.81
34.02
60 30
31.64
28.34
2000 40000
12.33
21.64
19.95
4158.41
40 20 30000
6.26
4.42
4.38
4.37
3.39
1.97
410
1.32
1.86
1.80
1.80
1.80
1.80
0.04
0.02
0.02
0.02
0.03
20 10 1000 20000
0.45
1.32
4.30
5.97
58
47
35
22
10000
0 0 0 0
PC
TP K
SLL
PC
TP K
SL L
PC
TP K
SLL
PC
TP K
SL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
IC
IC
IC
L
IC
IC
IC
IC
IC
IC
IC
E
L
IC
L
IC
L
IC
L
IC
L
IC
L
IC
L
IC
L
IC
E
E
25% 50% 100% 200% 400% 25% 50% 100% 200% 400%
1 5 10 15 1 5 10 15
2596.15
10000 Verification 3500 Verification
6588.27
445
Filtering Filtering 500
2071.02
393
392
3000
CPU Cost(ms)
8000 60000
CPU Cost(ms)
4592.62
2500 400
IO Cost
25455.52
50000
IO Cost
1120.25
6000
1026.54
2691.86
670.60
1431.86
540.72
1214.12
1500
1072.91
1067.76
4436.77
278.37 200
246.60
146.38
1000 20000
18.65
14.51
14.88
15.54
16.37
2000
9.15
5.01
3.65
4.30
3.45
100
34
0.56
1.67
2.05
500
16
10000
7
0 0 0 0
PC
TP K
SLL
PC
TP K
SL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
PC
TP K
SLL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
IC
L
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
IC
E
E
25% 50% 100% 200% 400% 25% 50% 100% 200% 400% 1 2 3 1 2 3
Fig. 24 Effect of number of users (high-quality queries). a I/O cost, b Fig. 27 Effect of number of static attributes (mixed queries). a I/O
CPU cost cost, b CPU cost
600 50000
33567.51
Verification Verification
Filtering Filtering
500 5000
25568.75
140000
386
379
Filtering Filtering
3583
CPU Cost(ms)
82535.90
400 4000 120000
IO Cost
30000
2855
CPU Cost(ms)
13633.63
300 100000
43094.83
IO Cost
3000
20000 80000
1770
28150.96
4120.98
200
10000 2000 60000
100
0.38
0.72
1.68
2.58
19
16
13
10
40000
1000
113
0 0
3.23
4.31
9.53
20000
47
25
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
L
IC
L
IC
L
IC
L
IC
L
IC
L
IC
L
IC
L
IC
0 0
E
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
TP
SL
1 5 10 15 1 5 10 15
L
IC
IC
IC
IC
IC
C
E
E
(a) (b) 1 2 3 1 2 3
(a) (b)
Fig. 25 Effect of k (mixed queries). a I/O cost, b CPU cost
Fig. 28 Effect of number of static attributes (high-quality queries). a
I/O cost, b CPU cost
5.2.3 Performance evaluation for d ≥ 1 and k ≥ 1
In this section, we compare TPL and Slice for the spatial of both I/O and CPU cost. Figure 27 shows that the I/O cost
reverse top-k queries for k ≥ 1 and on the data sets with and CPU cost of both algorithms increases as the number of
one or more static attributes. We vary k from 1 to 15 and the attributes increases. Figure 28 shows the same trend except
number of static attributes d from 1 to 3. We choose d ≤ 3 that the I/O cost for TPL decreases with the number of static
because TPL could not compute the results for d > 3 even attributes. This anomaly in the trend for I/O cost is possi-
after a few hours due to its very high CPU cost. The default bly due to the different opposing factors that may contribute
values are k = 10 and d = 2. The scoring function is set by positively or negatively toward the overall cost as explained
assigning equal weight to each attribute, e.g., w[1] = w[2] = below.
· · · = w[d + 1] = d+1
1
. The cost of algorithms is affected by several factors.
Firstly, as d increases, the weight of dynamic attribute
Effect of k. Figures 25 and 26 show the effect of k on real
decreases (recall, w[i] = d+1 1
) which contributes toward
data set for mixed queries and high-quality queries, respec-
reducing the overall cost (as explained earlier for Fig. 22).
tively. As expected, the I/O and CPU cost of both algorithms
On the other hand, as the number of attributes increases, the
increases with the increase in k. However, Slice significantly
size of R*-tree increases which contributes toward increasing
outperforms TPL and scales better. As shown in the figures,
the overall cost. Also, as d increases, the difference between
both I/O cost and CPU cost of two algorithms increase as k
the static scores between the query and facilities decreases,
increases.
i.e., |qs − f s | decreases. This is because as the number of
Effect of number of static attributes. Figures 27 and 28 static attributes increases, it becomes less likely that one
study the effect of number of static attributes on real data facility is better than another facility on all static attributes—
set for mixed queries and high-quality queries, respectively. this behavior is due to the same reason the size of skyline
Slice significantly outperforms TPL in all settings in terms grows for increasing dimensionality. Since the static score
123
Reverse k nearest neighbors queries and spatial reverse top-k queries 175
668.72
Pruning Filtering
20.289
25 800
17.659
537.96
535.27
17.298
521.54
16.790
icantly better than the existing RkNN algorithms in terms
488.57
CPU Cost (ms)
CPU Cost(ms)
20
600
15 of CPU cost and scales better as k increases. Slice also
203.80
400
122.68
10
outperforms other SRTk algorithms by several orders of
59.17
2.707
2.509
2.493
2.459
2.450
1.833
1.384
1.120
200
5.62
4.05
3.44
3.25
5
0 0 magnitude.
PC
TP
SL
PC
TP
SL
PC
TP
SL
PC
TP
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
SI
IN
SL
L
IC
L
IC
L
IC
L
IC
X
K
F
F
IC
IC
IC
IC
E
E
E
512(7) 1024(14) 2048(31) 4096(62) 512(5) 1024(10) 2048(22) 4096(45)
6 Conclusion
123
176 S. Yang et al.
2. Benetis, R., Jensen, C.S., Karciauskas, G., Saltenis, S.: Nearest 24. Preparata, F.P., Shamos, M.I.: Computational Geometry an Intro-
neighbor and reverse nearest neighbor queries for moving objects. duction. Springer, Berlin (1985)
In: IDEAS, pp. 44–53 (2002) 25. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries.
3. Bernecker, T., Emrich, T., Kriegel, H.-P., Mamoulis, N., Renz, M., In: Proceedings of the 1995 ACM SIGMOD International Confer-
Züfle, A.: A novel probabilistic pruning approach to speed up simi- ence on Management of Data, pp. 71–79, San Jose, 22–25 May
larity queries in uncertain databases. In: ICDE, pp. 339–350 (2011) (1995)
4. Bernecker, T., Emrich, T., Kriegel, H.-P., Renz, M., Züfle, S.Z.A.: 26. Safar, M., Ebrahimi, D., Taniar, D.: Voronoi-based reverse nearest
Efficient probabilistic reverse nearest neighbor query processing neighbor query processing on spatial networks. Multimed. Syst.
on uncertain data. Proc. VLDB Endow. 4(10), 669–680 (2011) 15(5), 295–308 (2009)
5. Cheema, M.A., Lin, X., Zhang, Y., Wang, W., Zhang, W.: Lazy 27. Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi dia-
updates: an efficient technique to continuously monitoring reverse grams for efficient processing of spatial nearest neighbor queries.
knn. Proc. VLDB Endow. 2(1), 1138–1149 (2009) Proc. VLDB Endow. 3(1), 1231–1242 (2010)
6. Cheema, M.A., Brankovic, L., Lin, X., Zhang, W., Wang, W.: 28. Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse nearest neighbor
Multi-guarded safe zone: an effective technique to monitor moving queries for dynamic databases. In: ACM SIGMOD, Workshop, pp.
circular range queries. In: ICDE, pp. 189–200 (2010) 44–53 (2000)
7. Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic 29. Stanoi, I., Riedewald, M., Agrawal, D., Abbadi, A.E.: Discovery
reverse nearest neighbor queries on uncertain data. IEEE Trans. of influence sets in frequently updated databases. In: PVLDB, pp.
Knowl. Data Eng. 22(4), 550–564 (2010) 99–108 (2001)
8. Cheema, M.A., Lin, X., Zhang, W., Zhang, Y.: Influence zone: effi- 30. Tao, Y., Papadias, D., Lian, X.: Reverse knn search in arbitrary
ciently processing reverse k nearest neighbors queries. In: ICDE, dimensionality. In: VLDB, pp. 744–755 (2004)
pp. 577–588 (2011) 31. Tao, Y., Yiu, M.L., Mamoulis, N.: Reverse nearest neighbor search
9. Cheema, M.A., Zhang, W., Lin, X., Zhang, Y.: Efficiently process- in metric spaces. IEEE Trans. Knowl. Data Eng. 18(9), 1239–1252
ing snapshot and continuous reverse k nearest neighbors queries. (2006)
VLDB J. 21(5), 703–728 (2012) 32. Tsirogiannis, D., Harizopoulos, S., Shah, M.A., Wiener, J.L.,
10. Cheema, M.A., Zhang, W., Lin, X., Zhang, Y., Li, X.: Continuous Graefe, G.: Query processing techniques for solid state drives. In:
reverse k nearest neighbors queries in euclidean space and in spatial SIGMOD, pp. 59–72 (2009)
networks. VLDB J. 21, 69–95 (2012) 33. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg, K.: Reverse top-
11. Cheema, M.A., Shen, Z., Lin, X., Zhang, W.: A unified framework k queries. In: ICDE, pp. 365–376 (2010)
for efficiently processing ranking related queries. In: Proceeding 34. Vlachou, A., Doulkeridis, C., Kotidis, Y., Nørvåg,K.: Reverse top-
of the 17th International Conference on Extending Database Tech- k queries. In: 2010 IEEE 26th International Conference on Data
nology (EDBT), pp. 427–438, Athens, 24–28 Mar (2014) Engineering (ICDE), pp. 365–376. IEEE (2010)
12. Chester, S., Thomo, A., Venkatesh, S., Whitesides, S.: Indexing 35. Vlachou, A., Doulkeridis, C., Kotidis, Y., Norvag, K.: Monochro-
reverse top-k queries in two dimensions. In: Meng, W., Feng, L., matic and bichromatic reverse top-k queries. IEEE Trans. Knowl.
Bressan, S., Winiwarter, W., Song, W. (eds.) Database Systems for Data Eng. 23(8), 1215–1229 (2011)
Advanced Applications, pp. 201–208. Springer, Berlin (2013) 36. Vlachou, A., Doulkeridis, C., Nørvåg, K.: Monitoring reverse top-
13. Emrich, T., Kriegel, H.-P., Kröger, P., Renz, M., Züfle, A.: Incre- k queries over mobile devices. In: Proceedings of the 10th ACM
mental reverse nearest neighbor ranking in vector spaces. In: SSTD International Workshop on Data Engineering for Wireless and
(2009) Mobile Access, pp. 17–24. ACM (2011)
14. Gkorgkas, O., Vlachou, A., Doulkeridis, C., Nørvåg, K.: Discover- 37. Vlachou, A., Doulkeridis, C., Nørvåg, K., Kotidis,Y.: Branch-and-
ing influential data objects over time. In: Nascimento, M.A., Sellis, bound algorithm for reverse top-k queries. In: Proceedings of the
T., Cheng, R., Sander, J., Zheng, Y., Kriegel, H.-P., Renz, M., Sen- 2013 ACM SIGMOD International Conference on Management of
gstock, C. (eds.) Advances in Spatial and Temporal Databases, pp. Data, pp. 481–492. ACM (2013)
110–127. Springer, Berlin (2013) 38. Wu, W., Yang, F., Chan, C.Y., Tan, K.-L.: Continuous reverse k-
15. https://international.ipums.org/international/ nearest-neighbor monitoring. In: MDM, pp. 132–139 (2008)
16. https://www.census.gov/geo/maps-data/data/tiger-line.html 39. Wu, W., Yang, F., Chan, C.Y., Tan, K.-L.: Finch: evaluating reverse
17. Jin, C., Zhang, R., Kang, Q., Zhang, Z., Zhou, A.: Probabilis- k-nearest-neighbor queries on location data. Proc. VLDB Endow.
tic reverse top-k queries. In: Bhowmick, S.S., Dyreson, C.E., 1(1), 1056–1067 (2008)
Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) Data- 40. Xia, T., Zhang, D.: Continuous reverse nearest neighbor monitor-
base Systems for Advanced Applications, pp. 406–419. Springer, ing. In: ICDE, pp. 77–86 (2006)
Switzerland (2014) 41. www.cs.fsu.edu/%7Elifeifei/SpatialDataset.htm
18. Kang, J.M., Mokbel, M.F., Shekhar, S., Xia, T., Zhang, D.: Contin- 42. Yang, C., Lin, K.-I.: An index structure for efficient reverse nearest
uous evaluation of monochromatic and bichromatic reverse nearest neighbor queries. In: ICDE, pp. 485–492 (2001)
neighbors. In: ICDE, pp. 806–815 (2007) 43. Yang, S., Cheema, M.A., Lin, X., Zhang, Y.: SLICE: reviving
19. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest regions-based pruning for reverse k nearest neighbors queries. In:
neighbor queries. In: SIGMOD, pp. 201–212 (2000) ICDE, pp. 760–771 (2014)
20. Kriegel, H.-P., Kröger, P., Renz, M., Züfle, A., Katzdobler, A.: 44. Yang, S., Cheema, M.A., Lin, X., Wang, W.: Reverse k near-
Incremental reverse nearest neighbor ranking. In: ICDE, pp. 1560– est neighbors query processing: experiments and analysis. Proc.
1567 (2009) VLDB Endow. 8(5), 605–616 (2015)
21. Lian, X., Chen, L.: Efficient processing of probabilistic reverse 45. Yiu, M.L., Mamoulis, N.: Reverse nearest neighbors search in ad
nearest neighbor queries over uncertain data. VLDB J. 18(3), 787– hoc subspaces. IEEE Trans. Knowl. Data Eng. 19(3), 412–426
808 (2009) (2007)
22. Lin, K.-I., Nolen, M., Yang, C.: Applying bulk insertion techniques 46. Yiu, M.L., Papadias, D., Mamoulis, N., Tao, Y.: Reverse nearest
for dynamic reverse nearest neighbor problems. In IDEAS, pp. neighbors in large graphs. IEEE Trans. Knowl. Data Eng. 18(4),
290–297 (2003) 540–553 (2006)
23. Park, J.-H., Chung, C.-W., Kang, U.: Reverse nearest neighbor 47. Yu, A.W., Mamoulis, N., Su, H.: Reverse top-k search using random
search with a non-spatial aspect. Inf. Syst. 54, 92–112 (2015) walk with restart. Proc. VLDB Endow. 7(5), 401–412 (2014)
123