You are on page 1of 5

An Improved Offline Stable Point Filtering Method

for Mobile Search Application

Yizun Wang1 Kai Chen2 Yi Zhou3 Qi Zheng4 Haibing Guan5


School of Information Security Engineering1, 2, 4, Shanghai Jiao Tong University
Department of Electronic Engineering3, Shanghai Jiao Tong University
Department of Computer Science and Engineering5, Shanghai Jiao Tong University
Shanghai , China
E-mail1: wyztcl@sjtu.edu.cn E-mail2: kchen@sjtu.edu.cn

Abstract— Mobile visual search business has been prospering be statistical information such as color histogram or entropy.
since research break-through on content based image retrieval But they don’t support sub-image retrieval or transformed
(CBIR) and continuous improving mobile technology. In such image retrieval and often deliver poor precision and recall. So
search application, user takes photo and uploads photo to server. the need to find a kind of image feature that has robust
Retrieval is done at the server side and user gets result from the resistance to scaling, rotation, crop and affine transform is
server. Currently search applications have difficulties. 1) Mobile growing stronger and stronger.
photos have relatively low qualities, and photos contain various
affine transforms. The two drawbacks will influence the retrieval The research break-through on distinctive local features
accuracy. 2) Retrieval speed and accuracy could not meet a such as SIFT [5] and SURF [6] opened up a new direction for
balance for large scale database. Current index mechanism for image retrieval applications. They give high quality match
high dimensional local feature such as SIFT is not fast enough even under severe transformations. We construct a CBIR
while lots of low dimensional feature cannot ensure high application based on SIFT combined with high-dimensional
accuracy. This paper proposes a fully-automated offline stable similarity search method such as locality sensitive hashing
point filtering method for mobile visual search application. We (LSH) [7] or its variants such as MP-LSH [8] or other
use various transforms to simulate effects caused by mobile photo improvements [9]. Fig. 1 gives a retrieval example. However,
images. They are processed in our offline method to reduce the this application requires a lot of memory space. And its query
index size of the retrieval application. Experiments show that speed is sub-linear to the index size.
with proper offline processing, 13.83% memory or disk space
could be saved when retrieval application loads SIFT features This paper presents an improving method at the offline
into memory for LSH at the online stage while application still stage for this application. We reduced the index size using
maintains a high query accuracy. stable point filtering method both at the preprocess stage and
post-process stage. Post process utilizes machine learning
Key words: CBIR; Simulated Affine Transform; Stable Point; method to classify retrieval images to decide whether they will
Tow-View Segmentation be used in stable point filtering. We are able to save 13.83%
space while this application still maintains high retrieval
I. INTRODUCTION accuracy. This method not only reduces the index size, but
The continuing rapid growth in size of digital image also improves the retrieval performance.
databases leads to growing requirement for content based
image retrieval applications (CBIR). Several companies such
as kooaba [1], idée [2], myclick [3] and evryx [4] have released
commercial applications for mobile phone. User takes picture
with mobile phone and sends images to authorized server to get
reply from those companies through GSM or 3G network.
Those companies’ core technologies are all based on content
based image retrieval, which is different from nowadays old
fashioned text-annotated image retrieval applications. Text-
annotated image retrieval is the same as text retrieval and could
not meet the trend because the number of images grows at a
very high speed and manual annotating work tends to cause too Figure 1. Retrieval example: Left image is a retrieval image taken by
much time. So image features extracted automatically from mobile phone. Right image is the retrieval result, a standard image in the
image content are one of the most realistic annotations. Due to image database
the complexity of digital image, a lot of retrieval methods have
been promoted. Features directly extracted from image might

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 28,2023 at 07:53:12 UTC from IEEE Xplore. Restrictions apply.
978-1-4244-4994-1/09/$25.00 ©2009 IEEE
II. RELATED WORK mobile phones can both take pictures and send images to
retrieval systems. So we assume that people will use photo they
take as the retrieval image. Then based on this fact, we can
Many integrated image retrieval systems [10] [11] [12]
inspect some degrade conditions of the pictures in comparison
have been presented before. Their offline processing usually
with the original ones. We do various transforms on images in
involves the index construction.
the database and each transform represent a possibility that a
Intel suggested a copy-right detection system [10] in 2004. mobile photo image would be like. We consider following
This system uses 6261 gallery pictures as the base database in mobile photo effects,
one of the datasets and demonstrates very high precision and
recall. But it expands the 150 query images by transforms. 40 1. Shake
kinds of transforms are made for each query image selected out 2. Rigid Affine Transform
of the base database. They are all added to the base database to 3. Light Graduate Change
improve accuracy. And this query system has better accuracy 4. Color Cast
for just 150 query images which we can find similar images in Those transforms are done to each database image. Then
the database. That will cause additional time and space cost. for each transformed image, we extract SIFT feature from it
Although it doesn’t use standard LSH implementation and and run two-view segmentation method [16] with database
store index on disk to overcome the problem of too many image to evaluate whether matched points belong to an
feature descriptors for not enough memory, their method of effective affine transform and mark them stable points.
improving accuracy by adding redundant images is not very
extensible when the number of images is huge. Stable points of database image are readjusted using
retrieval image that is taken by mobile phone. It takes the same
CORTINA [11] computes five types of feature descriptors process to add more stable points to the database image. But
for each image in database. CORTINA uses SIFT feature to do we use machine learning method to determine which database
scene classification [13] via pLSA and collect manual image will be used to do the stable point filtering process
annotations from web site tool at the offline stage and queries which will be described later.
KD-Tree based index using 12-dimension CFMT feature. This
system considers a lot about offline process. But its online Fig. 2 illustrates the architecture of the stable point filtering
query is not sufficient since 12-dimension feature contains only system. It has two parts. Preprocess part do transforms to
a little information. So the accuracy will be doubtful if the original image, filtering stable points with each transform
image is complicated. image and the original image. And post-process part uses
retrieval photo image to further filter stable points.
Video Google [12] is an object tracking system for videos.
It uses clustering method to cluster nearby SIFT descriptors.
Each cluster is considered a vocabulary and Video Google uses
text approach to do image frame search. When the image
database consists of frames from a movie, there will be
consecutive frames that have similar objects. So the
distribution of image features is appropriate for clustering.
When one image in the image database has no duplicates or
near duplicates, clustering method will generate many poor
quality clusters and both the performance and accuracy will
lower down.
A new stable interest point has been promoted in [14] to
reduce feature points in database image. It computes stability
for each point in the image. A threshold value is set to filter out
points whose stability value is below the threshold value. It
utilizes probabilistic pose prediction model to detect whether a
point is in the pose of an object. Only points agreeing with one
pose will be used in stability analysis. Its idea is quite novel,
but its offline process is not fully automated and pose
prediction method could only determine one transform just like
RANSAC [15] for all the key points and is not very accurate.
We proposed a fully automated offline process method based Figure 2. Architecture of Stable Point Filtering Method
on simulating mobile phone picture, two-view segmentation Stable point filtering rules are described as follows,
[16] and machine learning to overcome those limitations.
1. Record all the sift key points of the original image
III. STABLE POINT FILTERING METHOD 2. After two-view segmentation method between
original image and one transformed image, mark all the key
Current commercialized content-based image retrieval points ‘stable’ in original image that is considered matched to
application mainly takes advantage of mobile phones as the another key point in the transformed image and the matched
carrier of the client to the image retrieval application. Because pair belongs to one effective affine transform.

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 28,2023 at 07:53:12 UTC from IEEE Xplore. Restrictions apply.
3. Repeat step 2 to get results of stable points filtering h − yi
for all transformed images. Unite all the results in step 2 to get O( xi , yi ) = I ( xi , yi ) + k ⋅ (5)
final stable points of one original image. h
Fig. 3 illustrates how Classifier module in Fig. 2 is acquired.
We use two groups of test images to learn a classifier. After
two-view segmentation method, the result is represented using
following formula:

n di 2

M = ¦ ai e 2
(1)
i =1
(a) (b) (c)
In (1), n is number of points of the original image that
belong to effective affine transforms. d i is the Euclidean
distance between matched point i in database image and its
counterpart in other test images. ai is the modifier of each
matched point default set to 1 which we can improve when
we evaluate affine transforms using non-parameter clustering
method [17]. We pick average of smallest M from related
photo group and largest M from non-related photo group. One
classifier is learned for each database image. In post-process, (d) (e)
search is performed and classifier will filter candidate
database images. Stable point filtering is only performed on Figure 4. Simulating mobile photo effects, (a) is the original image.
(b) simulates shake effect. (c) simulates affine transform effect (d)
images whose M is larger than the threshold in classifier. simulates light graduate change effect and (e) simulates color cast
Fig. 4 shows some simulated effect that we make using

Figure 3. Classifier Learning


(a)
following transform method.
We make a linear motion filter and do convolution with
original image to simulate shake effect.
1 1
M =[ ,..., ] (2)
K K
O = I *M (3)
As for rigid affine transformation, we do transform based
(b)
on following formula, mi are transform matrix and [t x , t y ] is
the translation vector. Figure 5. Stable Point filtering example, the left images in (a) and
(b) are the database image. The right images are simulated photo
image taken from different degree. Points decorated with circle and
ª x2 º ª m1 m2 º ª x1 º ªt x º
« y » = «m +« » triangle are matched points that considered belonging to certain
(4)
¬ 2¼ ¬ 3 m4 »¼ «¬ y1 »¼ ¬t y ¼ effective affine transform. Points of red square are considered as
unstable points and eliminated.
Linear light graduate change is accomplished by using (5).
k is the increment value and h is the weight of the image.

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 28,2023 at 07:53:12 UTC from IEEE Xplore. Restrictions apply.
When the image is multiplied by a gamma factor, color cast transform combination and reduced index size for each group
effect can be made. We adjust γ to change the cast effect. is displayed in Table I. Post-processes are also experimented
When image pixel value is normalized, we can set γ > 1 to for combination 13, 15 and 17. 386 retrieval images are used
for post-process so we only use other 901 images to test the
darken the image or 0 < γ < 1 to lighten the image. effect of post-process.

O( xi , yi ) = I ( xi , yi )γ (6) TABLE I. TRANSFORM COMBINATION


Fig. 5 gives an example of stable points that are evaluated No Transform Combinations Adjusted
through two view segmentation method. Since our retrieval (Shake, Affine, Graduate Index Size
system also uses two-view segmentation method to filter out Change, Cast)
0 No preprocess 260060
unrelated candidate images at the online stage and only the
1 1,4,1,2 191107
stable points are used. So we may omit other key points in the 2 2,8,2,4 235373
image from the beginning. If the accuracy of the retrieval 3 2,5,1,4 216661
system is not influenced, then online system could load fewer 4 3,12,3,6 248999
key points feature vectors into memory and the query speed 5 3,12,3,6 241067
will be higher a little. So we can improve performance of 6 3,12,3,6 242280
image retrieval application by adding an offline stable point 7 3,12,3,6 243436
filtering framework to it. 8 3,9,3,6 238319
9 3,12,6,6 244232
10 3,12,6,6 245603
IV. EXPERIMENTAL SETUP AND RESULTS 11 1,5,2,2 207759
12 1,5,1,2 206577
For our first experiment, our image database consists of 285 13 2,2,6,4 220946
different book cover or post images, 260060 SIFT points in 14 2,2,6,4 Post-Process (13) 224097
total. Each image is different from another. We do not put near- 15 1,1,3,2 186531
duplicate copies of the 285 images into the database because 16 1,1,3,2 Post-Process (15) 192068
we want to reduce the size of the index. We take 10 photos to 17 1,1,1,1 129301
each image and randomly select 1300 photo images as the test 18 1,1,1,1 Post-Process(17) 141824
query image. All test images have poor qualities. We develop
an image retrieval system based on existing SIFT and E2LSH When using above indices to do retrieving, we got
[18] implementation and the parameters are all optimized following results,
manually. The online query process goes like this.
1. Resize query image proportionally and make sure the TABLE II. RETRIEVAL RESULT
size of its larger dimension is 500.
Data Set Number of Test Accuracy
2. Extract SIFT features from the query image in SIFT No(size) Images
module. 0(260060) 1287 100%
1(191107) 1287 94.48%
3. For each SIFT feature in the query image, look up 2(216661) 1287 97.05%
nearby features in E2LSH index in LSH module. 3(235373) 1287 98.52%
4(248999) 1287 99.23%
4. Divide results in 3 into groups where all features 5(241067) 1287 98.99%
belong to one image in the database. Sort the features in each 6(242280) 1287 99.15%
group according to Euclidean distance between result feature 7(243436) 1287 98.99%
and query feature. 8(238319) 1287 98.91%
9(244232) 1287 98.99%
5. Use relative distance ratio to eliminate query features 10(245603) 1287 99.15%
for each group. 11(207759) 1287 93.16%
12(206577) 1287 94.48%
6. Remaining features vote for candidate images 13(220946) 901 97.67%
14(224097) 901 98.56%
7. Repeat 3-6 for all query image features
15(186531) 901 93.67%
8. Get fixed number of candidate images and use two- 16(192068) 901 96.89%
view segmentation method [16] between each candidate image 17(129301) 901 71.70%
18(141824) 901 81.69%
and query image to eliminate unrelated images in the
candidates
TABLE III. POST-PROCESS ANALYSIS
9. Return final results after step 8 to the user Data Set Index Accuracy Competitive
We test our system on PC servers that have Xeon 4 core No(size) Increment to Increment to to Data Set
Preprocess Preprocess No
CPU and 2 gigabyte RAM. Without any offline preprocess, our 14 3151 0.89% 3
retrieval system claims accuracy near 100%. Parameters in (2), 16 5537 3.22% 1
(3), (4), (5), (6) are optimized to do transform. Number of 18 12523 9.99% None

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 28,2023 at 07:53:12 UTC from IEEE Xplore. Restrictions apply.
The best result data set is No 14, with a 13.83% index size Search” Proceedings of the 33rd international conference on Very large
reduced and accuracy loss is less than 1.5%. And we can see data bases, Vienna, Austria 2007 pp. 950-961
that post-process is very useful for both larger data-set and [9] Mayank Bawa, Tyson Condie, Prasanna Ganesan, “LSH Forest:
SelfTuning Indexes for Similarity Search”, International World Wide
small data-set. They make the larger data-set the best Web Conference, 2005, pp. 651-660
substitution for the original index. And the relationship [10] Yan Ke, Rahul Sukthankar, Larry Huston. “Efficient Near-duplicate
between data-size increment and accuracy increment for small Detection and Sub-Image Retrieval” Proceedings of ACM International
data-set are better than linear. Conference on Multimedia (MM), 2004, pp. 869 - 876
[11] Elisa Drelie Gelasca, Pratim Ghosh, Emily Moxley, Joriz De Guzman,
V. CONCLUSION JieJun Xu, Zhiqiang Bi, Steffen Gauglitz, Amir M. Rahimi, B. S.
Manjunath, “CORTINA: Searching a 10 Million + Images Database”
The primary contribution of our paper is that we propose a Proceedings of VLDB, 2007, pp. 508-511
fully automated offline image process method for image [12] Josef Sivic, Andrew Zisserman. “Video Google: A Text Retrieval
retrieval application. We analyze the user habits of using image Approach to Object Matching in Videos” Proceedings of the Ninth IEEE
International Conference on Computer Vision (ICCV 2003) Vol.2 pp.
retrieval application and develop a method to reduce system 1470-1477
load at the online stage. Post-process is done to achieve the
[13] Aditya Vailaya, Mário A. T. Figueiredo, Anil K. Jain, Hong-Jiang
best balance between accuracy and size to make the index Zhang, “Image Classification for Content-Based Indexing” IEEE
adaptive. Transactions on Image Processing 2001, pp. 117 – 130
We show experimentally that we will have little accuracy [14] Matthew Johnson, Roberto Cipolla, “Stable Interest Points for Improved
Image Retrieval and Matching”, technical report,
lose (1.49%) when we reduce the LSH index size by 13.83%. http://citeseer.ist.psu.edu/760077.html
The best transforms from the experiment is No 14 in Table I. [15] Martin A. Fischler, Robert C. Bolles “Random Sample Consensus: A
Paradigm for Model Fitting with Applications to Image Analysis and
Our method could integrate PCA-SIFT, clustering and Automated Cartography”, Communications of the ACM, June 1981, Vol.
improved distance measure [19] [20] to gain better effect. 6, pp. 381 – 395
Further research could be performed on improving two-view [16] Yan Zhang, Kai Chen, Huijing Wang, Yi Zhou, Haibing Guan, “Two-
segmentation method, modifying stable point filtering rules and view Motion Segmentation by Gaussian Blurring Mean Shift with
replace SIFT descriptor with PCA-SIFT descriptor. Fitness Measure” to be published on CISP in Oct 2009.
[17] Miguel A, Carreira-Perpinan, “Fast Nonparametric Clustering with
[1] “Kooaba Inc.”, http://www.kooaba.com
Gaussian Blurring Mean-Shift”, in Proceedings of the 23rd international
[2] “Idee Inc.” http://www.ideeinc.com conference on Machine learning, Pittsburgh, Pennsylvania, 2006 pp.
[3] “Myclick” http://www.myclick.cn/desktop/ch/index.asp 153-160
[4] “Evryx” http://www.linkmemobile.com/ [18] Alexandr Andoni, ‘‘LSH Algorithm and Implementation (E2LSH)’’,
[5] David G. Lowe. “Distinctive Image Features from Scale-Invariant technical manual, http://web.mit.edu/andoni/www/LSH/index.html
Keypoints” International Journal of Computer Vision, 2004. pp. 91-110 [19] O Egecioglu, “Parametric Approximation Algorithms for High-
[6] Herbert Bay, Tinne Tuytelaars, and Luc Van Goo, “SURF: Speeded Up Dimensional Euclidean Similarity” Principles of Data Mining and
Robust Features” Computer Vision – ECCV 2006, pp. 404-417 Knowledge Discovery, 2001, pp. 79̄90
[7] Alexandr Andoni, Piotr Indyk. “Near-Optimal Hashing Algorithms for [20] Haiying Shen, Felix Ching, Ting Li, Ze Li, “An Intelligent Locally
Approximate Nearest Neighbor in High Dimensions” Foundations of Sensitive Hashing Based Algorithm for Data Searching” Southeastcon,
Computer Science, 2006. FOCS '05. 47th Annual IEEE Symposium on 2008. IEEE, pp.192 – 197
Oct. 2006 pp. 459 - 468
[8] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, Kai Li
“MultiProbe LSH: Efficient Indexing for High Dimensional Similarity

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on June 28,2023 at 07:53:12 UTC from IEEE Xplore. Restrictions apply.

You might also like