Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1
Locality-Sensitive Hashing Using Stable Distributions

# Locality-Sensitive Hashing Using Stable Distributions

Ratings: (0)|Views: 22 |Likes:

### Availability:

See more
See less

11/19/2011

pdf

text

original

4 Locality-sensitive hashing usingstable distributions
4.1 The LSH scheme based on
s
-stable distributions
In this chapter, we introduce and analyze a novel locality-sensitive hashing family.The family is deﬁned for the case where the distances are measured according tothe
l
s
norm, for any
s
[0
,
2]. The hash functions are particularly simple for thecase
s
= 2, i.e., the Euclidean norm. The new family provides an e
cient solutionto (approximate or exact) randomized near neighbor problem.Part of this work appeared earlier in [DIIM04].
4.1.1
s
-stable distributions
Stable distributions [Zol86] are deﬁned as limits of normalized sums of independentidentically distributed variables (an alternate deﬁnition follows). The most well-known example of a stable distribution is Gaussian (or normal) distribution.However, the class is much wider; for example, it includes heavy-tailed distributions.
Deﬁnition 4.1
A distribution
D
over
is called
s
-stable
, if there exists
p
0
such that for an
n
real numbers
v
1
...v
n
and i.i.d. variables
1
...X
n
with distribution
D
, the random variable
i
v
i
i
has the same distribution as thevariable
(
i
|
v
i
|
p
)
1
/p
, where
is a random variable with distribution
D
.
It is known [Zol86] that stable distributions exist for any
p
(0
,
2]. In particular:a
Cauchy distribution
D
, deﬁned by the density function
c
(
x
) =
1
π
11+
x
2
, is 1-stablea
Gaussian (normal) distribution
D
G
, deﬁned by the density function
g
(
x
) =
1
√
2
π
e
x
2
/
2
, is 2-stableWe note from a practical point of view, despite the lack of closed form densityand distribution functions, it is known [CMS76] that one can generate
s
-stablerandom variables essentially from two independent variables distributed uniformlyover [0
,
1].Stable distribution have found numerous applications in various ﬁelds (see thesurvey [Nol] for more details). In computer science, stable distributions were used for

56 Locality-sensitive hashing using stable distributions
“sketching” of high dimensional vectors by Indyk ([Ind00]) and since have found usein various applications. The main property of
s
-stable distributions mentioned in thedeﬁnition above directly translates into a sketching technique for high dimensionalvectors. The idea is to generate a random vector
a
of dimension
d
whose each entryis chosen independently from a
s
-stable distribution. Given a vector
v
of dimension
d
, the dot product
a.v
is a random variable which is distributed as (
i
|
v
i
|
s
)
1
/s
(i.e.,
||
v
||
s
), where
is a random variable with
s
-stable distribution. A smallcollection of such dot products (
a.v
), corresponding to di
ﬀ
erent
a
’s, is termed asthe sketch of the vector
v
and can be used to estimate
||
v
||
s
(see [Ind00] for details).It is easy to see that such a sketch is linearly composable, i.e., for any
p,q
d
,
a.
(
p
q
) =
a.p
a.q
.
4.1.2 Hash family based on
s
-stable distributions
We use
s
-stable distributions in the following manner. Instead of using the dotproducts (
a.v
) to estimate the
l
s
norm we use them to assign a hash value to eachvector
v
. Intuitively, the hash function family should be locality sensitive, i.e. if two points (
p,q
) are close (small
||
p
q
||
s
) then they should collide (hash to thesame value) with high probability and if they are far they should collide with smallprobability. The dot product
a.v
projects each vector to the real line; It followsfrom
s
-stability that for two vectors (
p,q
) the distance between their projections(
a.p
a.q
) is distributed as
||
p
q
||
s
where
is a
s
-stable distribution. If we“chop” the real line into equi-width segments of appropriate size
w
and assign hashvalues to vectors based on which segment they project onto, then it is intuitivelyclear that this hash function will be locality preserving in the sense described above.Formally, each hash function
h
a,b
(
v
) :
R
d
maps a
d
dimensional vector
v
onto the set of integers. Each hash function in the family is indexed by a choiceof random
a
and
b
where
a
is, as before, a
d
dimensional vector with entrieschosen independently from a
s
-stable distribution and
b
is a real number chosenuniformly from the range [0
,w
]. For a ﬁxed
a,b
the hash function
h
a,b
is given by
h
a,b
(
v
) =
a
·
v
+
bw
4.1.2.1 Collision probability
We compute the probability that two vectors
p,q
collide under a hash functiondrawn uniformly at random from this family. Let
s
(
t
) denote the probabilitydensity function of the
absolute value
of the
s
-stable distribution. We may dropthe subscript
s
whenever it is clear from the context.For the two vectors
p,q
, let
u
=
||
p
q
||
s
and let
p
(
u
) denote the probability(as a function of
u
) that
p,q
collide for a hash function uniformly chosen from thefamily
H
described above. For a random vector
a
whose entries are drawn from a
s
-stable distribution,
a.p
a.q
is distributed as
cX
where
is a random variabledrawn from a
s
-stable distribution. Since
b
is drawn uniformly from [0
,w
] it is easyto see that

4.2 Approximate near neighbor 5
p
(
u
) =
Pr
a,b
[
h
a,b
(
p
) =
h
a,b
(
q
)] =

w
0
1
u
s
(
tu
)(1
tw
)
dt
For a ﬁxed parameter
w
the probability of collision decreases monotonically with
u
=
||
p
q
||
s
. Thus, as per the deﬁnition, the family of hash functions above is(
R,cR,P
1
,
2
)-sensitive for
1
=
p
(1) and
2
=
p
(
c
).
4.2 Approximate near neighbor
In what follows we will bound the ratio
ρ
=
ln1
/P
1
ln1
/P
2
, which as discussed earlier iscritical to the performance when this hash family is used to solve the
c
-approximatenear neighbor problem.Note that we have not speciﬁed the parameter
w
, for it depends on the value of
c
and
s
. For every
c
we would like to choose a ﬁnite
w
that makes
ρ
as small aspossible.We focus on the cases of
s
= 1
,
2. In these cases the ratio
ρ
can be explicitlyevaluated. We compute and plot this ratio and compare it with 1
/c
. Note, 1
/c
isthe best (smallest) known exponent for
n
in the space requirement and query timethat is achieved in [IM98] for these cases.For
s
= 1
,
2 we can compute the probabilities
1
,
2
, using the density func-tions mentioned before. A simple calculation shows that
2
= 2
tan
1
(
w/c
)
π
1
π
(
w/c
)
ln(1 + (
w/c
)
2
) for
s
= 1 (Cauchy) and
2
= 1
2
norm
(
w/c
)
2
√
2
π
w/c
(1
e
(
w
2
/
2
c
2
)
) for
s
= 2 (Gaussian), where
norm
(
·
) is the cumulative distribution func-tion (cdf) for a random variable that is distributed as
(0
,
1). The value of
1
canbe obtained by substituting
c
= 1 in the formulas above.For
c
values in the range [1
,
10] (in increments of 0
.
05) we compute the minimumvalue of
ρ
,
ρ
(
c
) =
min
w
log(1
/P
1
)
/
log(1
/P
2
), using
Matlab
. The plot of
c
versus
ρ
(
c
) is shown in Figure 4.1. The crucial observation for the case
s
= 2 is thatthe curve corresponding to optimal ratio
ρ
(
ρ
(
c
)) lies strictly below the curve 1
/c
.As mentioned earlier, this is a strict improvement over the previous best knownexponent 1
/c
from [IM98]. While we have computed here
ρ
(
c
) for
c
in the range[1
,
10], we believe that
ρ
(
c
) is strictly less than 1
/c
for all values of
c
.For the case
s
= 1, we observe that
ρ
(
c
) curve is very close to 1
/c
, although itlies above it. The optimal
ρ
(
c
) was computed using
Matlab
as mentioned before.The
Matlab
program has a limit on the number of iterations it performs to computethe minimum of a function. We reached this limit during the computations. If wecompute the true minimum, then we suspect that it will be very close to 1
/c
,possibly equal to 1
/c
, and that this minimum might be reached at
w
=
.If one were to implement our LSH scheme, ideally they would want to know theoptimal value of
w
for every
c
. For
s
= 2, for a given value of
c
, we can compute thevalue of
w
that gives the optimal value of
ρ
(
c
). This can be done using programslike
Matlab
. However, we observe that for a ﬁxed
c
the value of
ρ
as a function of