You are on page 1of 5

Car Re-identification from Large Scale Images

Using Semantic Attributes

Qi Zheng, Chao Liang, Wenhua Fang, Da Xiang, Xin Zhao, Chengping Ren, Jun Chen
National Engineering Research Center for Multimedia Software
School of Computer Science, Wuhan University, Wuhan, China
Email: zhengq@whu.edu.cn, cliang@whu.edu.cn

Abstract—Car re-identification, searching a specific car object the utility of example-based retrieval with color correlograms
from a large-scale car image database, is investigated in this avoid the limitations of strict color classification. Ming-Kuang
paper. Previous work mainly focuses on fixed pose and overlooks Tsai et al. [4] used active shape model (ASM) to fit 3D vehicle
the special appearance. However, avoiding matching other poses models to a 2D image, then obtain those parts to rectify
would lead to coarse results of the car retrieval. And some special vehicles from disparate views into the same reference view.
attributes like individual paintings which are greatly helpful
for car retrieval have not drawn enough attention. This paper
These methods are mainly searching in low dimensional, so
addresses these problems through multi-poses matching and re- they avoid finding other features in the objects, leading to the
ranking based on special attributes. Our core idea lies in query coarse results of the car retrieval. Another influential approach
expansion method that can capture weighted attributes to build considers sensors as add-ons [5] [6]. Rogerio Feris et al. [5]
the retrieval model, which allows us to estimate invisible attributes searched for vehicles in surveillance videos based on attributes
by the visible ones to construct complete attributes vectors to (such as color, direction of travel, speed, length, height, etc.).
car retrieval in any poses. Furthermore, we divide all attributes They focus on surveillance surroundings that some of the
into two groups, special attributes and common attributes. characteristics are obtained by the kinetic sensors, which
Here special attributes represent the abnormal appearance like limits their applications. Besides, the last two methods [4] [5]
individual paintings or car damage while common attributes mentioned above are restricted to the limited condition of
denote the intrinsic appearance of car. Using special attributes
to re-rank results turns out to be beneficial to improve the
common-pose car retrieval.
retrieval performance. In the end, the experiments demonstrate
Query Image Gallery Images
the effectiveness of our approach on the car datasets.
……

I. I NTRODUCTION
Attributes

The car re-identification problem is the task of instance in front-left


1 0 0 1 0 1…
Attributes in Attributes in Attributes in
search on specific car, which can obtain several similar car front-left
1 0 0 1 0 1…
right
0 1 0 0 0 1…
back-right
1 0 0 0 0 1…

images in a gallery. It has attracted a lot of attention in 1 0 1 1 0 1…

1 1 0 1 0 1…
1 0 0 1 0 0…

1 0 1 1 0 1…
1 0 0 1 0 0…

1 1 0 0 0 1…
……
multimedia retrieval system with the greatly increasing number Attributes in Attributes Attributes in
……
0 0 0 1 0 1… 0 0 0 1 0 1… 0 0 0 1 0 1…
front-right in right back-right …… …… ……
of cars. Unfortunately, the traditional technology using license 1 0 0 1 0 1… 0 0 0 1 0 1… 0 1 0 1 0 1…

plate recognition seems impossible to be utilized in practical


situations, due to massive cases of invisible or vague license ×
plates. Under some special conditions, e.g., plates are replaced
by fake ones or new cars always have no plates, car plate ×
recognition methods are totally useless. Thus plate recognition ×
is only useful in constrained conditions like HD bayonet
monitoring system. However, many car makers produce dif- × Distance
: Measure
Ranking

ferent types of cars, and provide variety of unique car suit


or painting for individuals, which makes car retrieval based ……
on appearance greatly important. Nonetheless, compared with
the traditional car categorization [1], the car re-identification
is still a more challenging work due to the requirement of Fig. 1. The framework of our retrieval system. This is the core part of our
finding out the specific target from a group of cars of the same system. When it comes a query image with a car of front-left pose, we firstly
flip it to front-right pose due to the same attributes in this two poses, more
class. Considering the practical conditions, it is even worse details are in section II. Then the attributes in other 4 poses would be estimated
due to appearances of cars change with the pose variation and by the original pose (here is front-right pose). The red cells is estimated value
appearance change. which is combined with other attributes in different pose to construct a new
attributes vector. Next, we measure all of the attributes vector pairs distance
Most of previous retrieval approaches typically exploit low- between query image and gallery image in each pose. To the end, the retrieval
level features [2] [3] [4], such as color, texture, SIFT feature result, shown in the bottom of figure, is obtained.
, because they can be relatively easily and reliably measured.
Shin-Yu Chen et al. [2] used the multiple-instance learning Noted that there are strong correlations between different
(MIL) and eigen color method to learn specific visual proper- poses, e.g. a car in front-right direction with ”head-bumper”
ties of vehicles from query images. Lisa M. Brown [3] studied indicate that there must be ”back-bumper” in that car though

978-1-4673-7478-1/15/$31.00
c 2015 IEEE
TABLE I. B INARY ATTRIBUTES SPACE WITH 27 ATTRIBUTES .
we could not see it. Besides, some special appearance of the
car, like individual painting, can locate the same car in gallery head-bumper back-bumper roof-antenna container
roof-rack front-plate back-plate low-underpan
images more accurately. Thus we propose a novel method middle-underpan high-underpan sivler-wheel spare-wheel
in Fig.1. It can tackle with the car retrieval problem under tail-fin vertical-taillight four-headlight horizontal-grille
multi-poses circumstances. Our work is inspired by query coldCair-intake minivan sportyCar jeep
ordinary-car ragtop-open wheel-visible new
expansion based on term co-occurrence and term similarity simple-texture damage roof-Cargo
which has been widely investigated in the text retrieval works
with varying degrees of success [7]. So we bring the query
expansion idea to our system to deal with the multi-poses to the car statue, and the left attributes are the type of the car.
problem and take advantage of the strong correlations to Fig.3 shows an example of some attributes.
evaluate the expansive attributes in another pose. Here using
invisible attributes as expansive attributes to deal with pose
change problem is an important part in our work. Then we
Jeep
adopt distance measurements trained by 5 car poses (these
poses are seen in Fig.5) to find the optimal result. Based on head-bumper spare-wheel
the visible attributes combined with expansive attributes for
each car pose, we focus on the sematic attributes distance
between the query image and each gallery image. In addition,
we divide all attributes into two groups, special attributes high-underpan
back-bumper
including ”damage”, ”complex-texture” and common attributes
including others. If any special attributes detected in the query
image, the SIFT features would be utilized for re-ranking the
sivler-wheel
retrieval result. In our experiments, It is demonstrated that our ragtop-open

methods can handle those conditions effectively.


Above all, there are two primary contributions in this paper:
(1) we propose a novel method for multi-poses car retrieval low-underpan horizontal-grille

with the estimation in visible attributes, as outlined in Fig.2.


wheel-visible
(2) we introduce the special attributes different from common
attributes to improve our re-identification performance.
Front-left attributes Back-right attributes
Fig. 3. A typical example of attributes.Note that some attributes are not
head-bumper back-bumper
visible in one pose.
jeep spare-wheel

simple-texture jeep

N-roof-antenna simple-texture
Car Pose Based on the statistics of car datasets, we
N-roof-rack N-roof-antenna
divide the car pose into 8 directions illustrate in Fig.4.
high-underpan N-roof-rack

… …
… …
Color Histogram The main color of a car is not the
estimate

back-bumper
binary variable, but it hardly changes by the car pose. Thus as
Invisible attributes :
spare-wheel a preprocessing step in our framework, the color is used for
clustering the images of similar color to obtain a subset of the
Fig. 2. Attributes estimation. We estimate the invisible attributes to measure original datasets as a coarse retrieval result. Experiment shows
the images of other poses. Some invisible attributes like spare-wheel in front- that it improves our performance substantially. Standard color
left can be easily estimated by other visible attributes. Thus a query expansion histogram is used here because they are robust to occlusions
is used to measure attributes vector in other pose.
and lighting and view changes. And the pioneering work of
Swain [8] has shown the effectiveness of color histograms to
II. C AR R EPRESENTATION distinguish the large number of objects.
Introduction of some relevant concepts in car retrieval is Sift Feature The SIFT [9] algorithm has been suc-
presented in this section. In addition, we show the training of cessfully used for describing images local features. The SIFT
detector for each attribute in Section 2.2. features show the invariant to image scaling, translation, and
rotation, even partially invariant to affine distortion, illumi-
A. Appearances nation changes, noise addition and partial occlusion. Because
of its remarkable performance on feature matching, the SIFT
The appearances of cars are usually changing with different
has been widely applied to the retrieval work [10] [11]. In this
poses which bring the variation of visibility about attributes.
paper, any special attributes detected will lead the sift matching
We introduce the invisible attributes estimation based on
here to ranking the retrieval result.
car pose to bridge the gaps among the different car poses.
Meanwhile, the color histogram and sift feature are significant
to our preprocessing and result re-ranking respectively. B. Attribute Detection
Binary Semantic Attributes According to the intrinsic As described in section 2.1, we classify the pose of car
property of cars, we define the following space of Na = 27 into 8 directions. Additionally, we notice that the symmetry
binary attributes for our study (summarized in Tab. I ). Seven- direction (except front-back pair ) owe the same attributes.
teen of those attributes are car components, and six are related Thus we decrease the number of directions to 5 by horizontal
We derived 8 color channels (RGB, HSV and YCbCr) and
20 texture filters (Gabor, Schmid, LBP) from the luminance
channel. We use the same parameter choices for γ, λ , θ and
σ 2 as [13] for Gabor filter extraction, and for τ and σ for
Schmid extraction, similar to [13] . Finally, we use a bin size
of 16 to describe each channel except LBP luminance channel
which is set to 59 bins.
However, some attributes will be occluded from different
poses. For example, if a car is facing to the camera, the
attribute like back-bumper can not be seen in the image. But
the back-bumper often exists with the emergence of front-
bumper or the car type of Jeep( illustrated in Fig.2). So it
inspires us that occluded attributes could be estimated by other
visible attributes. Here we use SVM to train the weight vector
wv from each viewpoint:
1
min θ(wv ) = max min( ||wv ||2
wv ,b αi ≥0 wv ,b 2
X n (1)
− αi (yvi (wvT avi + b) − 1))
i=1

Where yvi denotes the self-occluded attribute vector from the


v direction, avi is the the non-occluded attributes vector at the
same direction, and αi , b, n represents the Lagrange multiplier,
intercept term, n samples, respectively . Then to the test data,
Fig. 4. Typical poses. Eight canonical poses of cars noted by red arrows.
we estimate the self-occluded attributes Ovm by:
Ovm = wv Av + b (2)
flipping processing without losing the accuracy, as shown in Note that the output value of the classifiers for each attribute
Fig.5. is from minus infinity to plus infinity. For better distance
measure, we transform the original value to (0,1) by sigmoid
function [14] as:
0 1
Ovm = −Ovm (3)
e +1

III. S IMILARITY M EASUREMENT


This section describes the similarity measurement ap-
proaches to match the car instance from different images. Each
measurement is implemented in one direction and we exploit
occluded attributes detector to bridge those different directions.
In the case of color histograms the dissimilarity between
a query image Q and a gallery images G in the datasets is
measured using the weighted Mahalanobis distance follow the
work of [15]
Dh (Q, G) = [h(Q) − h(G)]t A[h(Q) − h(G)] (4)
where h(Q) and h(G) denote the color histogram of query
image Q and gallery images G respectively. A is a weight
Fig. 5. Symmetry characteristic of the 8 poses. The 8 pose could be flipped
into 5 poses without decreasing the accuracy but bring down the complexity matrix, whose elements aij corresponding to the similarity of
of the computation. colors. In order to make the final result more reliable, the
distance Dh (Q, G) is set a higher value than a threshold value.
If satisfied results are calculated, we leave this decision for
To detect semantic attributes, we first cluster the car images
attributes measurement.
by 5 directions. In each cluster we extract mult-dimensional
low-level color and texture feature vector denoted attributes In order to apply our attributes to re-identification, we in-
from each car image with the method in [12]. It consists vestigate how attributes can be fused to enhance performance.
of 507-dimensional feature vectors extracted from m*n equal Because each attribute is not held the same important position
sized blocks from the image. Both of the number m and n are due to variability in how reliably they are calculated, how
related to the car direction and the attribute position tightly. prevalent they are in the imbalanced data, and how informative
They are obtained experimentally, and set to (3,2) or (2,2) here. they are about identity, we need to decide which attributes are
more important from the full set and how to obtain the weights. All of our training imgaes attributes are labeled manually due
Besides, the attributes weights will be changed with the variant to those datasets do not contain complete attributes.
direction. For better measuring the attribute vector similarity,
metric learning is conducted. Given a pair of samples xi and Our train data is classified into two sets, Set-I for invisible
xj for each car pose, the Mahalanobis distance between them attributes estimation and Set-II for attributes detectors. In Set-I
is: we chose 500 images for training. Meanwhile, the left 4028
images in Set-II are organized into 2014 similar pairs and 2014
d2M (xi , xj ) = (xi − xj )T M (xi − xj ) (5) dissimilar pairs. Each image contains the 27 attributes. The
where M >= 0 is a positive semi-definite matrix. We adopt the ground truth of attributes in our dataset obtained manually.
Mahalanobis distance matrix M definition by Martin Kostinger Note that we only train the visible attributes (less than 27 at-
et.al. [13] as tributes) weight for each direction cluster. For a input image to
query, we estimate the invisible attributes by Eq.1 to complete
X−1 X−1
M= − (6) the other direction attributes. It supports us to retrieval the car
yij =1 yij =0 image from other 4 directions. The Fig.6 indicates that not all
Where attributes could perform well.
X X
= (xi − xj )(xi − xj )T (7)
yij =1 1
yij =1
0.9
X X
T
= (xi − xj )(xi − xj ) (8) 0.8
yij =0
yij =0
0.7
here yij = 1 and yij = 0 denote similar pairs and dissimilar
pairs respectively. More details can be found in [16]. The
attribute accuracy
0.6

learned matrix M is used for computing the distance between


the attribute vectors. 0.5

The SIFT features matching stage is only working on the 0.4

condition of any special attributes are found in the previous


0.3
stage. The SIFT algorithm is utilized to extracted features from
each image in gallery as well as query image. The best can- 0.2
didate match for each feature point is found by identifying its
nearest neighbor in the database of feature points. The feature 0.1

point with minimum Euclidean distance for the invariant SIFT


descriptor vector is used for defining the nearest neighbor. And 0
head−bumper back−bumper front−plate back−plate simple−texture ragtop−open low−underpan sivler−wheel spare−wheel

a match is accepted only if its distance is less than a ratio times


the distance of the second closest one, and the ratio is usually Fig. 6. Attribute accuracy. It demonstrates the fluctuation of the detection
obtained experimentally. accuracy of attributes. The ”sivler-wheel” shows the lowest value here may be
caused by the rapid rotation of wheel. Meanwhile, the ”ragtop-open” attribute
The number of matching points is considered as a criterion take the best performance with 0.8025.
for re-ranking the car retrieval result. Supposing Mi is the
match amount between query image and the number i retrieval
image. We reorder the retrieval result by its value of the
corresponding Mi due to the high accuracy of the SIFT match. We compared our method (AOR+OAE+SAP) with tradi-
The first image in the final results is considerable meaningful tional attributes object retrieval method (AOR) without oc-
for the related work. cluded attributes estimation (OAE) and special attributes pro-
cessing (SAP). For evaluation, we use the average cumulative
match characteristic (CMC) curves [18]
IV. E XPERIMENTS
In the following, we conduct several experiments on a PN
n=1 f (pi , r)
challenging dataset to investigate the performance of our CM C@r = × 100% (9)
N
attributes-based re-identification approach.
We collect 4556 images from cars datasets supported by to show the ranked matching rates. We randomly choose
Jonathan Krause [17] and NetCarShow.com for training. The 200 pairs from the test set and split the cars images into
Krause Cars dataset contains 16,185 images of 196 classes of the probe set and gallery set. We repeated 10 times and
cars. Classes are typically at the level of Make, Model, Year. compute the average matching rate. Fig.7 shows CMC curves
The data is split into 8,144 training images and 8,041 testing for our approach and AOR.The AOR approach take the worst
images. We chose 2856 images from that dataset for training performance since it miss the images of irrelevant car poses
because some cars are not the same one though they are belong and serve as baseline. The best performance is reached by the
to the same class. The NetCarShow website contains the most AOR+OAE+SAP approach, especially in rank5 to rank20. The
of cars in the world, and it supports car images with many AOR+OAE approach perform slightly worse but much better
poses. We capture 1700 images from NetCarShow website. than traditional approach.
CMC [2] S. Chen, J. Hsieh, and J. Wu, “Vehicle retrieval using eigen color
0.8 and multiple instance learning,” IEEE International Conference on
AOR Intelligent Information Hiding and Multimedia Signal Processing, pp.
0.7 AOR+OAE 657–660, 2009.
AOR+OAE+SAP [3] L. Brown, “Example-based color vehicle retrieval for surveillance,”
0.6 IEEE International Conference on Advanced Video and Signal Based
Surveillance, pp. 91–96, 2010.
0.5 [4] M. Tsai, Y. Lin, and W. Hsu, “Content-based vehicle retrieval using 3d
Matches

model and part information,” IEEE Transactions on Acoustics, Speech


0.4 and Signal Processing, pp. 1025–1028, 2012.
[5] R. Feris, B. Siddiquie, and Y. Zhai, “Attribute-based vehicle search
0.3 in crowded surveillance videos,” ACM International Conference on
Multimedia Retrieval, 2011.
0.2 [6] K. Tanaka, Y. Kishino, T. Terada, and S. Nishio, “A destination pre-
diction method using driving contexts and trajectory for car navigation
0.1 systems,” pp. 190–195, 2009.
[7] J. Xu and W. B. Croft, “Improving the effectiveness of information
0 retrieval with local context analysis.” Acm Transactions on Information
10 20 30 40 50 Systems, vol. 18, no. 1, pp. 79–112, 2000.
Rank [8] M. Swain and D. Ballard, “Color indexing,” International Journal of
Computer Vision, vol. 7, no. 1, pp. 11–32, 1991.
[9] D. Lowe, “Distinctive image features from scale-invariant keypoints,”
Fig. 7. Cumulative Match Characteristic curve for AOR, AOR+OAE and International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110,
AOR+OAE+SAP methods. 2004.
[10] X. Wu, Y. Tang, and Z. Zhang, “Video object matching based on sift
algorithm,” Proc. Int. Conf. Neural Netw. Signal Process, pp. 412–415,
V. C ONCLUSIONS 2008.
We have proposed a feasible method of car re-identification [11] A. Anjulan and N. Canagarajah, “Object based video retrieval with local
region tracking,” Signal Process.:Image Commun., vol. 22, pp. 607–621,
based on semantic attributes. We have also proposed a new 2007.
measurement approach with the invisible attributes estimation [12] B. Prosser, W. Zheng, and S. Gong, “Person re-identification by support
and introduced special attributes to re-rank our retrieval result. vector ranking,” British Machine Vision Conference, vol. 42, no. 7, pp.
25–32, 2010.
In the future work, we will focus on the two points: (1)
[13] D. Gray and H. Tao, “Viewpoint invariant pedestrian recognition with
refining the features representation to improve the performance an ensemble of localized features,” European Conference on Computer
of attributes detection, (2) deep analysis of the correlation Vision, vol. 5302, pp. 262–275, 2008.
among the different car poses. [14] T. Mitchell, “Machine learning,” WCB McGraw-Hill, 1997.
[15] J. Hafner, H. Sawhney, and W. Equitz, “Efficient color histogram
ACKNOWLEDGMENT indexing for quadratic form distance functions,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 17, no. 7, pp. 729–736,
The research was supported by National Nature Science 1995.
Foundation of China (No. 61231015). National High Technol- [16] M. Kostinger, M. Hirzer, and P. Wohlhart, “Large scale metric learning
ogy Research and Development Program of China (863 Pro- from equivalence constraints,” IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, pp. 2288–2295, 2012.
gram, No. 2015AA016306). National Natural Science Foun-
[17] J. Krause, M. Stark, and D. Jia, “3d object representations for fine-
dation of China (6 1172173, 61303114). Technology Research grained categorization,” IEEE International Conference on Computer
Program of Ministry of Public Security (No. 2014JSYJA016). Vision Workshops, pp. 554–561, 2013.
The EU FP7 QUICK project under Grant Agreement (No. [18] H. Moon and P. Phillips, “Computational and performance aspects of
PIRSES-GA-2013-612652). National Nature Science Founda- pca-based face-recognition algorithms,” Perception, vol. 30, no. 3, pp.
tion of China (No. 61170023). Major Science and Technol- 303–321, 2001.
ogy Innovation Plan of Hubei Province (No. 2013AAA020).
Internet of Things Development Funding Project of Ministry
of industry in 2013 (No. 25). China Postdoctoral Science
Foundation funded project (2013M530350, 2014M562058).
Specialized Research Fund for the Doctoral Program of High-
er Education (No. 20130141120024). Nature Science Foun-
dation of Hubei Province (2014CFB712). The Fundamental
Research Funds for the Central Universities (2042014kf0025).
The Fundamental Research Funds for the Central Univer-
sities (2042014kf0250, 2014211020203). Scientific Research
Foundation for the Returned Overseas Chinese Scholars, State
Education Ministry ([2014]1685).

R EFERENCES
[1] Z. Sun and E. Technol, “On-road vehicle detection: a review,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, pp. 694–
711, 2006.

You might also like