Professional Documents
Culture Documents
Indexing Techniques: Mei-Chen Yeh
Indexing Techniques: Mei-Chen Yeh
Mei-Chen Yeh
Last week
Matching two sets of features
Strategy 1
Convert to a fixed-length feature vector
(Bag-of-words)
Use a conventional proximity measure
Strategy 2:
Build point correspondences
Last week: bag-of-words
visual vocabulary
frequen
cy
codeword ..
s
Matching local features: building
patch correspondences
Image 1 Image 2
Image 1 Image 2
Descriptors
feature space
Database
images
Indexing local features
When we see close points in feature space,
we have similar descriptors, which
indicates similar local content.
Query
Descriptors
image
feature space
Database
images
Problem statement
With potentially thousands of
features per image, and hundreds
to millions of images to search, how
to efficiently find those that are
relevant to a new image?
50 thousand images
4m
?
The Nearest-Neighbor Search
Problem
The Nearest-Neighbor Search
Problem
r-nearest neighbor
for any query q, returns a point p S
s.t. p q r
Inverted file
http://www.robots.ox.ac.uk/~vgg
/research/vgoogle/index.html
Retrieved
Or perform geometric
Tutorial
verification
Object
Perceptual
Visual Augmented
Recognition
and Sensory
T: hash table
?
Locality-sensitive hashing
Pr[h( x) h( y )] sim( x, y ),
hF
Locality Sensitive Hashing
A family H of functions h: Rd U is
called (r, cr, P1, P2)-sensitive, if for
p qq: r
any p,
if p q cr then Pr[h(p)=h(q)] > P1
if then Pr[h(p)=h(q)] < P2
LSH Function: Hamming
Space
Consider binary vectors
points from {0, 1}d
Hamming distance D(p, q) = # positions
on which p and q differ
Example: (d = 3)
D(100, 011) =
3
D(010, 111) =
2
LSH Function: Hamming
Space
Define hash function h as hi(p) = pi
where pi is the i-th bit of p
Example: select the 1st dimension
h(010) = 0
h(111) = 1
Pr[h(010)h(111)] = vs.
?= D(p,
D(p, q)?
q)/dd?
Pr[h(p)=h(q)] 1=- ?D(p, q)/d
Clearly, h is locality sensitive.
LSH Function: Hamming
Space
A k-bit locality-sensitive hash
function is defined as g(p) = [h1(p),
h2(p), , hk(p)]T
Each hi(p) is chosen randomly
k
Each hi(p) results in a single bit 1
Pr(similar points collide) 1 1 P
1
r: segment width
Building the hash table
Building the hash table
: segment width
(max-min)/t